scylladb

Author	SHA1	Message	Date
Łukasz Paszkowski	98a6002a1a	compaction_manager: cancel submission timer on drain The `drain` method, cancels all running compactions and moves the compaction manager into the disabled state. To move it back to the enabled state, the `enable` method shall be called. This, however, throws an assertion error as the submission time is not cancelled and re-enabling the manager tries to arm the armed timer. Thus, cancel the timer, when calling the drain method to disable the compaction manager. Fixes https://github.com/scylladb/scylladb/issues/24504 All versions are affected. So it's a good candidate for a backport. Closes scylladb/scylladb#24505 (cherry picked from commit `a9a53d9178`) Closes scylladb/scylladb#24585	2025-06-29 14:40:43 +03:00
Avi Kivity	9f1ed14f9d	Merge '[Backport 6.2] cql: create default superuser if it doesn't exist' from Marcin Maliszkiewicz Backport of https://github.com/scylladb/scylladb/pull/20137 is needed for another backport https://github.com/scylladb/scylladb/pull/24690 to work correctly. Fixes https://github.com/scylladb/scylladb/issues/24712 Closes scylladb/scylladb#24711 * github.com:scylladb/scylladb: test: test_restart_cluster: create the test auth: standard_role_manager allows awaiting superuser creation auth: coroutinize the standard_role_manager start() function auth: don't start server until the superuser is created	2025-06-29 14:34:55 +03:00
Aleksandra Martyniuk	8321747451	test: rest_api: fix test_repair_task_progress test_repair_task_progress checks the progress of children of root repair task. However, nothing ensures that the children are already created. Wait until at least one child of a root repair task is created. Fixes: #24556. Closes scylladb/scylladb#24560 (cherry picked from commit `0deb9209a0`) Closes scylladb/scylladb#24652	2025-06-28 09:40:37 +03:00
Paweł Zakrzewski	3bd3d720e0	test: test_restart_cluster: create the test The purpose of this test that the cluster is able to boot up again after a full cluster shutdown, thus exhibiting no issues when connecting to raft group 0 that is larger than one. (cherry picked from commit `900a6706b8`)	2025-06-27 17:50:15 +02:00
Paweł Zakrzewski	3d6d3484b5	auth: standard_role_manager allows awaiting superuser creation This change implements the ability to await superuser creation in the function ensure_superuser_is_created(). This means that Scylla will not be serving CQL connections until the superuser is created. Fixes #10481 (cherry picked from commit `7008b71acc`)	2025-06-27 17:50:08 +02:00
Paweł Zakrzewski	db1d3cc342	auth: coroutinize the standard_role_manager start() function This change is a preparation for the next change. Moving to coroutines makes the code more readable and easier to process. (cherry picked from commit `04fc82620b`)	2025-06-27 17:50:01 +02:00
Paweł Zakrzewski	66301eb2b6	auth: don't start server until the superuser is created This change reorganizes the way standard_role_manager startup is handled: now the future returned by its start() function can be used to determine when startup has finished. We use this future to ensure the startup is finished prior to starting the CQL server. Some clusters are created without auth, and auth is added later. The first node to recognize that auth is needed must create the superuser. Currently this is always on restart, but if we were to ever make it LiveUpdate then it would not be on restart. This suggests that we don't really need to wait during restart. This is a preparatory commit, laying ground for implementation of a start() function that waits for the superuser to be created. The default implementation returns a ready future, which makes no change in the code behavior. (cherry picked from commit `f525d4b0c1`)	2025-06-27 17:49:33 +02:00
Pavel Emelyanov	e5ac2285c0	Merge '[Backport 6.2] memtable: ensure _flushed_memory doesn't grow above total_memory' from Scylladb[bot] `dirty_memory_manager` tracks two quantities about memtable memory usage: "real" and "unspooled" memory usage. "real" is the total memory usage (sum of `occupancy().total_space()`) by all memtable LSA regions, plus a upper-bound estimate of the size of memtable data which has already moved to the cache region but isn't evictable (merged into the cache) yet. "unspooled" is the difference between total memory usage by all memtable LSA regions, and the total flushed memory (sum of `_flushed_memory`) of memtables. `dirty_memory_manager` controls the shares of compaction and/or blocks writes when these quantities cross various thresholds. "Total flushed memory" isn't a well defined notion, since the actual consumption of memory by the same data can vary over time due to LSA compactions, and even the data present in memtable can change over the course of the flush due to removals of outdated MVCC versions. So `_flushed_memory` is merely an approximation computed by `flush_reader` based on the data passing through it. This approximation is supposed to be a conservative lower bound. In particular, `_flushed_memory` should be not greater than `occupancy().total_space()`. Otherwise, for example, "unspooled" memory could become negative (and/or wrap around) and weird things could happen. There is an assertion in `~flush_memory_accounter` which checks that `_flushed_memory < occupancy().total_space()` at the end of flush. But it can fail. Without additional treatment, the memtable reader sometimes emits data which is already deleted. (In particular, it emites rows covered by a partition tombstone in a newer MVCC version.) This data is seen by `flush_reader` and accounted in `_flushed_memory`. But this data can be garbage-collected by the `mutation_cleaner` later during the flush and decrease `total_memory` below `_flushed_memory`. There is a piece of code in `mutation_cleaner` intended to prevent that. If `total_memory` decreases during a `mutation_cleaner` run, `_flushed_memory` is lowered by the same amount, just to preserve the asserted property. (This could also make `_flushed_memory` quite inaccurate, but that's considered acceptable). But that only works if `total_memory` is decreased during that run. It doesn't work if the `total_memory` decrease (enabled by the new allocator holes made by `mutation_cleaner`'s garbage collection work) happens asynchronously (due to memory reclaim for whatever reason) after the run. This patch fixes that by tracking the decreases of `total_memory` closer to the source. Instead of relying on `mutation_cleaner` to notify the memtable if it lowers `total_memory`, the memtable itself listens for notifications about LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's estimate of flushed memory decreased by the change in `total_memory` since the beginning of flush (if it was positive), and it keeps the amount of "spooled" memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`. Fixes scylladb/scylladb#21413 Backport candidate because it fixes a crash that can happen in existing stable branches. - (cherry picked from commit `7d551f99be`) - (cherry picked from commit `975e7e405a`) Parent PR: #21638 Closes scylladb/scylladb#24601 * github.com:scylladb/scylladb: memtable: ensure _flushed_memory doesn't grow above total memory usage replica/memtable: move region_listener handlers from dirty_memory_manager to memtable	2025-06-24 10:12:41 +03:00
Michał Chojnowski	1c23edad22	memtable: ensure _flushed_memory doesn't grow above total memory usage dirty_memory_manager tracks two quantities about memtable memory usage: "real" and "unspooled" memory usage. "real" is the total memory usage (sum of `occupancy().total_space()`) by all memtable LSA regions, plus a upper-bound estimate of the size of memtable data which has already moved to the cache region but isn't evictable (merged into the cache) yet. "unspooled" is the difference between total memory usage by all memtable LSA regions, and the total flushed memory (sum of `_flushed_memory`) of memtables. dirty_memory_manager controls the shares of compaction and/or blocks writes when these quantities cross various thresholds. "Total flushed memory" isn't a well defined notion, since the actual consumption of memory by the same data can vary over time due to LSA compactions, and even the data present in memtable can change over the course of the flush due to removals of outdated MVCC versions. So `_flushed_memory` is merely an approximation computed by `flush_reader` based on the data passing through it. This approximation is supposed to be a conservative lower bound. In particular, `_flushed_memory` should be not greater than `occupancy().total_space()`. Otherwise, for example, "unspooled" memory could become negative (and/or wrap around) and weird things could happen. There is an assertion in ~flush_memory_accounter which checks that `_flushed_memory < occupancy().total_space()` at the end of flush. But it can fail. Without additional treatment, the memtable reader sometimes emits data which is already deleted. (In particular, it emites rows covered by a partition tombstone in a newer MVCC version.) This data is seen `flush_reader` and accounted in `_flushed_memory`. But this data can be garbage-collected by the mutation_cleaner later during the flush and decrease `total_memory` below `_flushed_memory`. There is a piece of code in mutation_cleaner intended to prevent that. If `total_memory` decreases during a `mutation_cleaner` run, `_flushed_memory` is lowered by the same amount, just to preserve the asserted property. (This could also make `_flushed_memory` quite inaccurate, but that's considered acceptable). But that only works if `total_memory` is decreased during that run. It doesn't work if the `total_memory` decrease (enabled by the new allocator holes made by `mutation_cleaner`'s garbage collection work) happens asynchronously (due to memory reclaim for whatever reason) after the run. This patch fixes that by tracking the decreases of `total_memory` closer to the source. Instead of relying on `mutation_cleaner` to notify the memtable if it lowers `total_memory`, the memtable itself listens for notifications about LSA segment deallocations. It keeps `_flushed_memory` equal to the reader's estimate of flushed memory decreased by the change in `total_memory` since the beginning of flush (if it was positive), and it keeps the amount of "spooled" memory reported to the `dirty_memory_manager` at `max(0, _flushed_memory)`. (cherry picked from commit `975e7e405a`)	2025-06-22 17:37:27 +00:00
Michał Chojnowski	40d1186218	replica/memtable: move region_listener handlers from dirty_memory_manager to memtable The memtable wants to listen for changes in its `total_memory` in order to decrease its `_flushed_memory` in case some of the freed memory has already been accounted as flushed. (This can happen because the flush reader sees and accounts even outdated MVCC versions, which can be deleted and freed during the flush). Today, the memtable doesn't listen to those changes directly. Instead, some calls which can affect `total_memory` (in particular, the mutation cleaner) manually check the value of `total_memory` before and after they run, and they pass the difference to the memtable. But that's not good enough, because `total_memory` can also change outside of those manually-checked calls -- for example, during LSA compaction, which can occur anytime. This makes memtable's accounting inaccurate and can lead to unexpected states. But we already have an interface for listening to `total_memory` changes actively, and `dirty_memory_manager`, which also needs to know it, does just that. So what happens e.g. when `mutation_cleaner` runs is that `mutation_cleaner` checks the value of `total_memory` before it runs, then it runs, causing several changes to `total_memory` which are picked up by `dirty_memory_manager`, then `mutation_cleaner` checks the end value of `total_memory` and passes the difference to `memtable`, which corrects whatever was observed by `dirty_memory_manager`. To allow memtable to modify its `_flushed_memory` correctly, we need to make `memtable` itself a `region_listener`. Also, instead of the situation where `dirty_memory_manager` receives `total_memory` change notifications from `logalloc` directly, and `memtable` fixes the manager's state later, we want to only the memtable listen for the notifications, and pass them already modified accordingl to the manager, so there is no intermediate wrong states. This patch moves the `region_listener` callbacks from the `dirty_memory_manager` to the `memtable`. It's not intended to be a functional change, just a source code refactoring. The next patch will be a functional change enabled by this. (cherry picked from commit `7d551f99be`)	2025-06-22 17:37:27 +00:00
Pavel Emelyanov	60cb1c0b2f	sstable_directory: Print ks.cf when moving unshared remove sstables When an sstable is identified by sstable_directory as remote-unshared, it will at some point be moved to the target shard. When it happens a log-message appears: sstable_directory - Moving 1 unshared SSTables to shard 1 Processing of tables by sstable_directory often happens in parallel, and messages from sstable_directory are intermixed. Having a message like above is not very informative, as it tells nothing about sstables that are being moved. Equip the message with ks:cf pair to make it more informative. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23912 (cherry picked from commit `d40d6801b0`) Closes scylladb/scylladb#24014	2025-06-17 18:24:13 +03:00
Pavel Emelyanov	a00b4a027e	Update seastar submodule (no nested stall backtraces) * seastar e40388c4...b383afb0 (1): > stall_detector: no backtrace if exception Fixes #24464 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24536	2025-06-17 18:23:06 +03:00
Nikos Dragazis	8ef55444f5	sstables: Fix race when loading checksum component `read_checksum()` loads the checksum component from disk and stores a non-owning reference in the shareable components. To avoid loading the same component twice, the function has an early return statement. However, this does not guarantee atomicity - two fibers or threads may load the component and update the shareable components concurrently. This can lead to use-after-free situations when accessing the component through the shareable components, since the reference stored there is non-owning. This can happen when multiple compaction tasks run on the same SSTable (e.g., regular compaction and scrub-validate). Fix this by not updating the reference in shareable components, if a reference is already in place. Instead, create an owning reference to the existing component for the current fiber. This is less efficient than using a mutex, since the component may be loaded multiple times from disk before noticing the race, but no locks are used for any other SSTable component either. Also, this affects uncompressed SSTables, which are not that common. Fixes #23728. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#23872 (cherry picked from commit `eaa2ce1bb5`) Closes scylladb/scylladb#24267	2025-06-17 13:20:30 +03:00
Szymon Malewski	8eebc97ae1	mapreduce_service: Prevent race condition In parallelized aggregation functions super-coordinator (node performing final merging step) receives and merges each partial result in parallel coroutines (`parallel_for_each`). Usually responses are spread over time and actual merging is atomic. However sometimes partial results are received at the similar time and if an aggregate function (e.g. lua script) yields, two coroutines can try to overwrite the same accumulator one after another, which leads to losing some of the results. To prevent this, in this patch each coroutine stores merging results in its own context and overwrites accumulator atomically, only after it was fully merged. Comparing to the previous implementation order of operands in merging function is swapped, but the order of aggregation is not guaranteed anyway. Fixes #20662 Closes scylladb/scylladb#24106 (cherry picked from commit `5969809607`) Closes scylladb/scylladb#24387	2025-06-17 13:20:12 +03:00
Pavel Emelyanov	884293a382	Merge '[Backport 6.2] tablets: deallocate storage state on end_migration' from Scylladb[bot] When a tablet is migrated and cleaned up, deallocate the tablet storage group state on `end_migration` stage, instead of `cleanup` stage: * When the stage is updated from `cleanup` to `end_migration`, the storage group is removed on the leaving replica. * When the table is initialized, if the tablet stage is `end_migration` then we don't allocate a storage group for it. This happens for example if the leaving replica is restarted during tablet migration. If it's initialized in `cleanup` stage then we allocate a storage group, and it will be deallocated when transitioning to `end_migration`. This guarantees that the storage group is always deallocated on the leaving replica by `end_migration`, and that it is always allocated if the tablet wasn't cleaned up fully yet. It is a similar case also for the pending replica when the migration is aborted. We deallocate the state on `revert_migration` which is the stage following `cleanup_target`. Previously the storage group would be allocated when the tablet is initialized on any of the tablet replicas - also on the leaving replica, and when the tablet stage is `cleanup` or `end_migration`, and deallocated during `cleanup`. This fixes the following issue: 1. A migrating tablet enters cleanup stage 2. the tablet is cleaned up successfuly 3. The leaving replica is restarted, and allocates storage group 4. tablet cleanup is not called because it's already cleaned up 5. the storage group remains allocated on the leaving replica after the migration is completed - it's not cleaned up properly. Fixes https://github.com/scylladb/scylladb/issues/23481 backport to all relevant releases since it's a bug that results in a crash - (cherry picked from commit `34f15ca871`) - (cherry picked from commit `fb18fc0505`) - (cherry picked from commit `bd88ca92c8`) Parent PR: #24393 Closes scylladb/scylladb#24486 * github.com:scylladb/scylladb: test/cluster/test_tablets: test restart during tablet cleanup test: tablets: add get_tablet_info helper tablets: deallocate storage state on end_migration	2025-06-17 13:19:49 +03:00
Michael Litvak	06e5a48d17	test/cluster/test_tablets: test restart during tablet cleanup Add a test that reproduces issue scylladb/scylladb#23481. The test migrates a tablet from one node to another, and while the tablet is in some stage of cleanup - either before or right after, depending on the parameter - the leaving replica, on which the tablet is cleaned, is restarted. This is interesting because when the leaving replica starts and loads its state, the tablet could be in different stages of cleanup - the SSTables may still exist or they may have been cleaned up already, and we want to make sure the state is loaded correctly. (cherry picked from commit `bd88ca92c8`)	2025-06-12 14:33:07 +03:00
Michael Litvak	c5d11205c1	test: tablets: add get_tablet_info helper Add a helper for tests to get the tablet info from system.tablets for a tablet owning a given token. (cherry picked from commit `fb18fc0505`)	2025-06-12 03:24:00 +00:00
Michael Litvak	0b46cfb60d	tablets: deallocate storage state on end_migration When a tablet is migrated and cleaned up, deallocate the tablet storage group state on `end_migration` stage, instead of `cleanup` stage: * When the stage is updated from `cleanup` to `end_migration`, the storage group is removed on the leaving replica. * When the table is initialized, if the tablet stage is `end_migration` then we don't allocate a storage group for it. This happens for example if the leaving replica is restarted during tablet migration. If it's initialized in `cleanup` stage then we allocate a storage group, and it will be deallocated when transitioning to `end_migration`. This guarantees that the storage group is always deallocated on the leaving replica by `end_migration`, and that it is always allocated if the tablet wasn't cleaned up fully yet. It is a similar case also for the pending replica when the migration is aborted. We deallocate the state on `revert_migration` which is the stage following `cleanup_target`. Previously the storage group would be allocated when the tablet is initialized on any of the tablet replicas - also on the leaving replica, and when the tablet stage is `cleanup` or `end_migration`, and deallocated during `cleanup`. This fixes the following issue: 1. A migrating tablet enters cleanup stage 2. the tablet is cleaned up successfuly 3. The leaving replica is restarted, and allocates storage group 4. tablet cleanup is not called because it was already cleaned up 4. the storage group remains allocated on the leaving replica after the migration is completed - it's not cleaned up properly. Fixes scylladb/scylladb#23481 (cherry picked from commit `34f15ca871`)	2025-06-12 03:24:00 +00:00
Michał Chojnowski	4a18a08284	utils/lsa/chunked_managed_vector: fix the calculation of max_chunk_capacity() `chunked_managed_vector` is a vector-like container which splits its contents into multiple contiguous allocations if necessary, in order to fit within LSA's max preferred contiguous allocation limits. Each limited-size chunk is stored in a `managed_vector`. `managed_vector` is unaware of LSA's size limits. It's up to the user of `managed_vector` to pick a size which is small enough. This happens in `chunked_managed_vector::max_chunk_capacity()`. But the calculation is wrong, because it doesn't account for the fact that `managed_vector` has to place some metadata (the backreference pointer) inside the allocation. In effect, the chunks allocated by `chunked_managed_vector` are just a tiny bit larger than the limit, and the limit is violated. Fix this by accounting for the metadata. Also, before the patch `chunked_managed_vector::max_contiguous_allocation`, repeats the definition of logalloc::max_managed_object_size. This is begging for a bug if `logalloc::max_managed_object_size` changes one day. Adjust it so that `chunked_managed_vector` looks directly at `logalloc::max_managed_object_size`, as it means to. Fixes scylladb/scylladb#23854 (cherry picked from commit `7f9152babc`) Closes scylladb/scylladb#24369	2025-06-03 18:10:36 +03:00
Piotr Dulikowski	1571948cd7	topology_coordinator: silence ERROR messages on abort When the topology coordinator is shut down while doing a long-running operation, the current operation might throw a raft::request_aborted exception. This is not a critical issue and should not be logged with ERROR verbosity level. Make sure that all the try..catch blocks in the topology coordinator which: - May try to acquire a new group0 guard in the `try` part - Have a `catch (...)` block that print an ERROR-level message ...have a pass-through `catch (raft::request_aborted&)` block which does not log the exception. Fixes: scylladb/scylladb#22649 Closes scylladb/scylladb#23962 (cherry picked from commit `156ff8798b`) Closes scylladb/scylladb#24074	2025-05-13 20:41:41 +03:00
Pavel Emelyanov	5a82f0d217	Merge '[Backport 6.2] replica: skip flush of dropped table' from Scylladb[bot] Currently, flush throws no_such_column_family if a table is dropped. Skip the flush of dropped table instead. Fixes: #16095. Needs backport to 2025.1 and 6.2 as they contain the bug - (cherry picked from commit `91b57e79f3`) - (cherry picked from commit `c1618c7de5`) Parent PR: #23876 Closes scylladb/scylladb#23904 * github.com:scylladb/scylladb: test: test table drop during flush replica: skip flush of dropped table	2025-05-13 14:00:38 +03:00
Aleksandra Martyniuk	874b4f8d9c	streaming: skip dropped tables Currently, stream_session::prepare throws when a table in requests or summaries is dropped. However, we do not want to fail streaming if the table is dropped. Delete table checks from stream_session::prepare. Further streaming steps can handle the dropped table and finish the streaming successfully. Fixes: #15257. Closes scylladb/scylladb#23915 (cherry picked from commit `20c2d6210e`) Closes scylladb/scylladb#24050	2025-05-13 13:56:46 +03:00
Aleksandra Martyniuk	91a1acc314	test: test table drop during flush (cherry picked from commit `c1618c7de5`)	2025-05-09 09:27:08 +02:00
Aleksandra Martyniuk	720bd681f0	replica: skip flush of dropped table (cherry picked from commit `91b57e79f3`)	2025-05-09 09:26:44 +02:00
Piotr Dulikowski	b3bc3489dd	utils::loading_cache: gracefully skip timer if gate closed The loading_cache has a periodic timer which acquires the _timer_reads_gate. The stop() method first closes the gate and then cancels the timer - this order is necessary because the timer is re-armed under the gate. However, the timer callback does not check whether the gate was closed but tries to acquire it, which might result in unhandled exception which is logged with ERROR severity. Fix the timer callback by acquiring access to the gate at the beginning and gracefully returning if the gate is closed. Even though the gate used to be entered in the middle of the callback, it does not make sense to execute the timer's logic at all if the cache is being stopped. Fixes: scylladb/scylladb#23951 Closes scylladb/scylladb#23952 (cherry picked from commit `8ffe4b0308`) Closes scylladb/scylladb#23980	2025-05-06 10:19:32 +02:00
Botond Dénes	e4c6a3c068	Merge '[Backport 6.2] topology coordinator: do not proceed further on invalid boostrap tokens' from Scylladb[bot] In case when dht::boot_strapper::get_boostrap_tokens fail to parse the tokens, the topology coordinator handles the exception and schedules a rollback. However, the current code tries to continue with the topology coordinator logic even if an exception occurs, leaving boostrap_tokens empty. This does not make sense and can actually cause issues, specifically in prepare_and_broadcast_cdc_generation_data which implicitly expect that the bootstrap_tokens of the first node in the cluster will not be empty. Fix this by adding the missing break. Fixes: scylladb/scylladb#23897 From the code inspection alone it looks like 2025.1 and 6.2 have this problem, so marking for backport to both of them. - (cherry picked from commit `66acaa1bf8`) - (cherry picked from commit `845cedea7f`) - (cherry picked from commit `670a69007e`) Parent PR: #23914 Closes scylladb/scylladb#23948 * github.com:scylladb/scylladb: test: cluster: add test_bad_initial_token topology coordinator: do not proceed further on invalid boostrap tokens cdc: add sanity check for generating an empty generation	2025-05-01 08:32:13 +03:00
Botond Dénes	254f535f63	Merge '[Backport 6.2] tasks: check whether a node is alive before rpc' from Scylladb[bot] Check whether a node is alive before making an rpc that gathers children infos from the whole cluster in virtual_task::impl::get_children. Fixes: https://github.com/scylladb/scylladb/issues/22514. Needs backport to 2025.1 and 6.2 as they contain the bug. - (cherry picked from commit `53e0f79947`) - (cherry picked from commit `e178bd7847`) Parent PR: #23787 Closes scylladb/scylladb#23942 * github.com:scylladb/scylladb: test: add test for getting tasks children tasks: check whether a node is alive before rpc	2025-05-01 08:30:09 +03:00
Avi Kivity	c20b2ea2af	Merge '[Backport 6.2] Ensure raft group0 RPCs use the gossip scheduling group.' from Scylladb[bot] Scylla operations use concurrency semaphores to limit the number of concurrent operations and prevent resource exhaustion. The semaphore is selected based on the current scheduling group. For RAFT group operations, it is essential to use a system semaphore to avoid queuing behind user operations. This patch ensures that RAFT operations use the `gossip` scheduling group to leverage the system semaphore. Fixes scylladb/scylladb#21637 Backport: 6.2 and 6.1 - (cherry picked from commit `60f1053087`) - (cherry picked from commit `e05c082002`) Parent PR: #22779 Closes scylladb/scylladb#23769 * github.com:scylladb/scylladb: ensure raft group0 RPCs use the gossip scheduling group Move RAFT operations verbs to GOSSIP group.	2025-04-30 16:45:20 +03:00
Aleksandra Martyniuk	cc37c64467	test: add test for getting tasks children Add test that checks whether the children of a virtual task will be properly gathered if a node is down. (cherry picked from commit `e178bd7847`)	2025-04-30 10:40:32 +02:00
Aleksandra Martyniuk	47377cd5d4	tasks: check whether a node is alive before rpc Check whether a node is alive before making an rpc that gathers children infos from the whole cluster in virtual_task::impl::get_children. (cherry picked from commit `53e0f79947`)	2025-04-30 10:35:45 +02:00
Sergey Zolotukhin	5c90107c14	ensure raft group0 RPCs use the gossip scheduling group Scylla operations use concurrency semaphores to limit the number of concurrent operations and prevent resource exhaustion. The semaphore is selected based on the current scheduling group. For Raft group operations, it is essential to use a system semaphore to avoid queuing behind user operations. This commit adds a check to ensure that the raft group0 RPCs are executed with the `gossiper` scheduling group. (cherry picked from commit `e05c082002`)	2025-04-30 08:49:07 +02:00
Sergey Zolotukhin	612f184638	Move RAFT operations verbs to GOSSIP group. In order for RAFT operations to use the gossip system semaphore, moving RAFT verbs to the gossip group in `do_get_rpc_client_idx`, messaging_service. Fixes scylladb/scylladb21637 (cherry picked from commit `60f1053087`)	2025-04-29 19:25:55 +00:00
Piotr Dulikowski	d881f3f14d	test: cluster: add test_bad_initial_token Adds a test which checks that rollback works properly in case when a bad value of the initial_token function is provided. (cherry picked from commit `670a69007e`)	2025-04-28 17:07:23 +00:00
Piotr Dulikowski	7eda572129	topology coordinator: do not proceed further on invalid boostrap tokens In case when dht::boot_strapper::get_boostrap_tokens fail to parse the tokens, the topology coordinator handles the exception and schedules a rollback. However, the current code tries to continue with the topology coordinator logic even if an exception occurs, leaving boostrap_tokens empty. This does not make sense and can actually cause issues, specifically in prepare_and_broadcast_cdc_generation_data which implicitly expect that the bootstrap_tokens of the first node in the cluster will not be empty. Fix this by adding the missing break. Fixes: scylladb/scylladb#23897 (cherry picked from commit `845cedea7f`)	2025-04-28 17:07:23 +00:00
Piotr Dulikowski	41e6df7407	cdc: add sanity check for generating an empty generation It doesn't make sense to create an empty CDC generation because it does not make sense to have a cluster with no tokens. Add a sanity check to cdc::make_new_generation_description which fails if somebody attempts to do that (i.e. when the set of current tokens + optionally bootstrapping node's tokens is empty). The function does not work correctly if it is misused, as we saw in scylladb/scylladb#23897. While the function should not be misused in the first place, it's better to throw an exception rather than crash - especially that this crash could happen on the topology coordinator. (cherry picked from commit `66acaa1bf8`)	2025-04-28 17:07:22 +00:00
Tomasz Grabiec	b48d1abade	Merge '[Backport 6.2] Cache base info for view schemas in the schema registry' from Scylladb[bot] Currently, when we load a frozen schema into the registry, we lose the base info if the schema was of a view. Because of that, in various places we need to set the base info again, and in some codepaths we may miss it completely, which may make us unable to process some requests (for example, when executing reverse queries on views). Even after setting the base info, we may still lose it if the schema entry gets deactivated due to all `schema_ptr`s temporarily dying. To fix this, this patch adds the base schema to the registry, alongside the view schema. We store just the frozen base schema, so that we can transfer it across shards. With the base schema, we can now set the base info when returning the schema from the registry. As a result, we can now assume that all view schemas returned by the registry have base_info set. In this series we also make sure that the view schemas in the registry are kept up-to-date in regards to base schema changes. Fixes https://github.com/scylladb/scylladb/issues/21354 This issue is a bug, so adding backport labels 6.1 and 6.2 - (cherry picked from commit `6f11edbf3f`) - (cherry picked from commit `dfe3810f64`) - (cherry picked from commit `82f2e1b44c`) - (cherry picked from commit `3094ff7cbe`) - (cherry picked from commit `74cbc77f50`) Parent PR: #21862 Closes scylladb/scylladb#23046 * github.com:scylladb/scylladb: test: add test for schema registry maintaining base info for views schema_registry: avoid setting base info when getting the schema from registry schema_registry: update cached base schemas when updating a view schema_registry: cache base schemas for views db: set base info before adding schema to registry	2025-04-25 18:42:57 +02:00
Tomasz Grabiec	9190779ee3	Merge '[Backport 6.2] storage_service: preserve state of busy topology when transiting tablet' from Scylladb[bot] Commit `876478b84f` ("storage_service: allow concurrent tablet migration in tablets/move API", 2024-02-08) introduced a code path on which the topology state machine would be busy -- in "tablet_draining" or "tablet_migration" state -- at the time of starting tablet migration. The pre-commit code would unconditionally transition the topology to "tablet_migration" state, assuming the topology had been idle previously. On the new code path, this state change would be idempotent if the topology state machine had been busy in "tablet_migration", but the state change would incorrectly overwrite the "tablet_draining" state otherwise. Restrict the state change to when the topology state machine is idle. In addition, add the topology update to the "updates" vector with plain push_back(). emplace_back() is not helpful here, as topology_mutation_builder::build() cannot construct in-place, and so we invoke the "canonical_mutation" move constructor once, either way. Unit test: Start a two node cluster. Create a single tablet on one of the nodes. Start decommissioning that node, but block decommissioning at once. In that state (i.e., in "tablet_draining"), move the tablet manually to the other node. Check that transit_tablet() leaves the topology transition state alone. Fixes https://github.com/scylladb/scylladb/issues/20073. Commit `876478b84f` was first released in scylla-6.0.0, so we might want to backport this patch accordingly. - (cherry picked from commit `e1186f0ae6`) - (cherry picked from commit `841ca652a0`) Parent PR: #23751 Closes scylladb/scylladb#23768 * github.com:scylladb/scylladb: storage_service: add unit test for mid-decommission transit_tablet() storage_service: preserve state of busy topology when transiting tablet	2025-04-20 20:13:11 +02:00
Avi Kivity	8db020e782	Merge '[Backport 6.2] managed_bytes: in the copy constructor, respect the target preferred allocation size' from Scylladb[bot] Commit `14bf09f447` added a single-chunk layout to `managed_bytes`, which makes the overhead of `managed_bytes` smaller in the common case of a small buffer. But there was a bug in it. In the copy constructor of `managed_bytes`, a copy of a single-chunk `managed_bytes` is made single-chunk too. But this is wrong, because the source of the copy and the target of the copy might have different preferred max contiguous allocation sizes. In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB is copied from the standard allocator into LSA, the resulting `managed_bytes` is a single chunk which violates LSA's preferred allocation size. (And therefore is placed by LSA in the standard allocator). In other words, since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments. Consequences of the bug: 1. Effective memory consumption of an affected cell is rounded up to the nearest power of 2. 2. With a pathological-enough allocation pattern (for example, one which somehow ends up placing a single 16 kiB memtable-owned allocation in every aligned 128 kiB span), memtable flushing could theoretically deadlock, because the allocator might be too fragmented to let the memtable grow by another 128 kiB segment, while keeping the sum of all allocations small enough to avoid triggering a flush. (Such an allocation pattern probably wouldn't happen in practice though). 3. It triggers a bug in reclaim which results in spurious allocation failures despite ample evictable memory. There is a path in the reclaimer procedure where we check whether reclamation succeeded by checking that the number of free LSA segments grew. But in the presence of evictable non-LSA allocations, this is wrong because the reclaim might have met its target by evicting the non-LSA allocations, in which case memory is returned directly to the standard allocator, rather than to the pool of free segments. If that happens, the reclaimer wrongly returns `reclaimed_nothing` to Seastar, which fails the allocation. Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072 Fixes https://github.com/scylladb/scylladb/issues/22941 Fixes https://github.com/scylladb/scylladb/issues/22389 Fixes https://github.com/scylladb/scylladb/issues/23781 This is a regression fix, should be backported to all affected releases. - (cherry picked from commit `4e2f62143b`) - (cherry picked from commit `6c1889f65c`) Parent PR: #23782 Closes scylladb/scylladb#23809 * github.com:scylladb/scylladb: managed_bytes_test: add a reproducer for #23781 managed_bytes: in the copy constructor, respect the target preferred allocation size	2025-04-19 18:43:31 +03:00
Calle Wilund	d566010412	network_topology_strategy/alter ks: Remove dc:s from options once rf=0 Fixes #22688 If we set a dc rf to zero, the options map will still retain a dc=0 entry. If this dc is decommissioned, any further alters of keyspace will fail, because the union of new/old options will now contained an unknown keyword. Change alter ks options processing to simply remove any dc with rf=0 on alter, and treat this as an implicit dc=0 in nw-topo strategy. This means we change the reallocate_tablets routine to not rely on the strategy objects dc mapping, but the full replica topology info for dc:s to consider for reallocation. Since we verify the input on attribute processing, the amount of rf/tablets moved should still be legal. v2: * Update docs as well. v3: * Simplify dc processing * Reintroduce options empty check, but do early in ks_prop_defs * Clean up unit test some Closes scylladb/scylladb#22693 (cherry picked from commit `342df0b1a8`) (Update: workaround python test objects not having dc info) Closes scylladb/scylladb#22876	2025-04-18 14:09:52 +03:00
Michał Chojnowski	ba3975858b	managed_bytes_test: add a reproducer for #23781 (cherry picked from commit `6c1889f65c`)	2025-04-18 07:55:23 +00:00
Michał Chojnowski	f75b0d49a0	managed_bytes: in the copy constructor, respect the target preferred allocation size Commit `14bf09f447` added a single-chunk layout to `managed_bytes`, which makes the overhead of `managed_bytes` smaller in the common case of a small buffer. But there was a bug in it. In the copy constructor of `managed_bytes`, a copy of a single-chunk `managed_bytes` is made single-chunk too. But this is wrong, because the source of the copy and the target of the copy might have different preferred max contiguous allocation sizes. In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB is copied from the standard allocator into LSA, the resulting `managed_bytes` is a single chunk which violates LSA's preferred allocation size. (And therefore is placed by LSA in the standard allocator). In other words, since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments. Consequences of the bug: 1. Effective memory consumption of an affected cell is rounded up to the nearest power of 2. 2. With a pathological-enough allocation pattern (for example, one which somehow ends up placing a single 16 kiB memtable-owned allocation in every aligned 128 kiB span), memtable flushing could theoretically deadlock, because the allocator might be too fragmented to let the memtable grow by another 128 kiB segment, while keeping the sum of all allocations small enough to avoid triggering a flush. (Such an allocation pattern probably wouldn't happen in practice though). 3. It triggers a bug in reclaim which results in spurious allocation failures despite ample evictable memory. There is a path in the reclaimer procedure where we check whether reclamation succeeded by checking that the number of free LSA segments grew. But in the presence of evictable non-LSA allocations, this is wrong because the reclaim might have met its target by evicting the non-LSA allocations, in which case memory is returned directly to the standard allocator, rather than to the pool of free segments. If that happens, the reclaimer wrongly returns `reclaimed_nothing` to Seastar, which fails the allocation. Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072 Fixes https://github.com/scylladb/scylladb/issues/22941 Fixes https://github.com/scylladb/scylladb/issues/22389 Fixes https://github.com/scylladb/scylladb/issues/23781 (cherry picked from commit `4e2f62143b`)	2025-04-18 07:55:23 +00:00
Botond Dénes	e1ce7c36a7	test/cluster/test_read_repair.py: increase read request timeout This test enables trace-level logging for the mutation_data logger, which seems to be too much in debug mode and the test read times out. Increase timeout to 1minute to avoid this. Fixes: #23513 Fixes: #23512 Closes scylladb/scylladb#23558 (cherry picked from commit `7bbfa5293f`) Closes scylladb/scylladb#23793	2025-04-18 06:34:25 +03:00
Laszlo Ersek	597030a821	storage_service: add unit test for mid-decommission transit_tablet() Start a two node cluster. Create a single tablet on one of the nodes. Start decommissioning that node, but block decommissioning at once. In that state (i.e., in "tablet_draining"), move the tablet manually to the other node. Check that transit_tablet() leaves the topology transition state alone. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `841ca652a0`)	2025-04-17 09:09:43 +02:00
Laszlo Ersek	add4022d32	storage_service: preserve state of busy topology when transiting tablet Commit `876478b84f` ("storage_service: allow concurrent tablet migration in tablets/move API", 2024-02-08) introduced a code path on which the topology state machine would be busy -- in "tablet_draining" or "tablet_migration" state -- at the time of starting tablet migration. The pre-commit code would unconditionally transition the topology to "tablet_migration" state, assuming the topology had been idle previously. On the new code path, this state change would be idempotent if the topology state machine had been busy in "tablet_migration", but the state change would incorrectly overwrite the "tablet_draining" state otherwise. Restrict the state change to when the topology state machine is idle. In addition, add the topology update to the "updates" vector with plain push_back(). emplace_back() is not helpful here, as topology_mutation_builder::build() cannot construct in-place, and so we invoke the "canonical_mutation" move constructor once, either way. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `e1186f0ae6`)	2025-04-17 08:32:56 +02:00
Avi Kivity	b6f1cacfea	scylla-gdb: small-objects: fix for very small objects Because of rounding and alignment, there are multiple pools for small sizes (e.g. 4 for size 32). Because the pool selection algorithm ignores alignment, different pools can be chosen for different object sizes. For example, an object size of 29 will choose the first pool of size 32, while an object size of 32 will choose the fourth pool of size 32. The small-objects command doesn't know about this and always considers just the first pool for a given size. This causes it to miss out on sister pools. While it's possible to adjust pool selection to always choose one of the pools, it may eat a precious cycle. So instead let's compensate in the small-objects command. Instead of finding one pool for a given size, find all of them, and iterate over all those pools. Fixes #23603 Closes scylladb/scylladb#23604 (cherry picked from commit `b4d4e48381`) Closes scylladb/scylladb#23748	2025-04-16 14:36:28 +03:00
Botond Dénes	c1defce1de	Merge '[Backport 6.2] transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing' from Scylladb[bot] A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause) can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver. However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation. For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to set it back then). This patch fixes this. Fixes #23173 The issue fixed by this PR is not critical but the fix is simple and safe enough so we should backport it to all live releases. - (cherry picked from commit `ca6bddef35`) - (cherry picked from commit `f7e1695068`) Parent PR: #23174 Closes scylladb/scylladb#23523 * github.com:scylladb/scylladb: CQL Tracing: set common query parameters in a single function transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing	2025-04-11 17:11:06 +03:00
Botond Dénes	61d1262674	Merge '[Backport 6.2] reader_concurrency_semaphore: register_inactive_read(): handle aborted permit' from Scylladb[bot] It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up. Fixes: scylladb/scylladb#22919 Bug is present in all live versions, backports are required. - (cherry picked from commit `4d8eb02b8d`) - (cherry picked from commit `7ba29ec46c`) Parent PR: #23044 Closes scylladb/scylladb#23144 * github.com:scylladb/scylladb: reader_concurrency_semaphore: register_inactive_read(): handle aborted permit test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()	2025-04-11 17:10:39 +03:00
Botond Dénes	ae54d4e886	Merge '[Backport 6.2] streaming: fix the way a reason of streaming failure is determined' from Scylladb[bot] During streaming receiving node gets and processes mutation fragments. If this operation fails, receiver responds with -1 status code, unless it failed due to no_such_column_family in which case streaming of this table should be skipped. However, when the table was dropped, an exception handler on receiver side may get not only data_dictionary::no_such_column_family, but also seastar::nested_exception of two no_such_column_family. Encountered example: ``` ERROR 2025-02-12 15:20:51,508 [shard 0:strm] stream_session - [Stream #f1cd6830-e954-11ef-afd9-b022e40bf72d] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=756dd3fe-2bf0-4dcd-afbc-cfd5202669a0: seastar::nested_exception: data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14) (while cleaning up after data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14)) ``` In this case, the exception does not match the try_catch<data_dictionary::no_such_column_family> clause and gets handled the same as any other exception type. Replace try_catch clause with table_sync_and_check that synchronizes the schema and check if the table exists. Fixes: https://github.com/scylladb/scylladb/issues/22834. Needs backport to all live version, as they all contain the bug - (cherry picked from commit `876cf32e9d`) - (cherry picked from commit `faf3aa13db`) - (cherry picked from commit `44748d624d`) - (cherry picked from commit `35bc1fe276`) Parent PR: #22868 Closes scylladb/scylladb#23289 * github.com:scylladb/scylladb: streaming: fix the way a reason of streaming failure is determined streaming: save a continuation lambda streaming: use streaming namespace in table_check.{cc,hh} repair: streaming: move table_check.{cc,hh} to streaming	2025-04-11 15:13:14 +03:00
Alexander Turetskiy	e97e729973	Improve compation on read of expired tombstones compact expired tombstones in cache even if they are blocked by commitlog fixes #16781 Closes: #23033	2025-04-11 15:09:43 +03:00
Botond Dénes	20a1edd763	tools/scylla-nodetool: s/GetInt()/GetInt64()/ GetInt() was observed to fail when the integer JSON value overflows the int32_t type, which `GetInt()` uses for storage. When this happens, rapidjson will assign a distinct 64 bit integer type to the value, and attempting to access it as 32 bit integer triggers the wrong-type error, resulting in assert failure. This was hit on the field where invoking nodetool netstats resulted in nodetool crashing when the streamed bytes amounts were higher than maxint. To avoid such bugs in the future, replace all usage of GetInt() in nodetool of GetInt64(), just to be sure. A reproducer is added to the nodetool netstats crash. Fixes: scylladb/scylladb#23394 Closes scylladb/scylladb#23395 (cherry picked from commit `bd8973a025`) Closes scylladb/scylladb#23475	2025-04-11 15:00:47 +03:00
Aleksandra Martyniuk	07236341ab	streaming: fix the way a reason of streaming failure is determined During streaming receiving node gets and processes mutation fragments. If this operation fails, receiver responds with -1 status code, unless it failed due to no_such_column_family in which case streaming of this table should be skipped. However, when the table was dropped, an exception handler on receiver side may get not only data_dictionary::no_such_column_family, but also seastar::nested_exception of two no_such_column_family. Encountered example: ``` ERROR 2025-02-12 15:20:51,508 [shard 0:strm] stream_session - [Stream #f1cd6830-e954-11ef-afd9-b022e40bf72d] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=756dd3fe-2bf0-4dcd-afbc-cfd5202669a0: seastar::nested_exception: data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14) (while cleaning up after data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14)) ``` In this case, the exception does not match the try_catch<data_dictionary::no_such_column_family> clause and gets handled the same as any other exception type. Replace try_catch clause with table_sync_and_check that synchronizes the schema and check if the table exists. Fixes: https://github.com/scylladb/scylladb/issues/22834. (cherry picked from commit `35bc1fe276`)	2025-04-11 11:55:38 +02:00
Aleksandra Martyniuk	2027b9a21b	streaming: save a continuation lambda In the following patches, an additional preemption point will be added to the coroutine lambda in register_stream_mutation_fragments. Assign a lambda to a variable to prolong the captures lifetime. (cherry picked from commit `44748d624d`)	2025-04-11 11:55:38 +02:00
Aleksandra Martyniuk	c1a211101d	streaming: use streaming namespace in table_check.{cc,hh} (cherry picked from commit `faf3aa13db`)	2025-04-11 11:55:38 +02:00
Aleksandra Martyniuk	e52fd85cf8	repair: streaming: move table_check.{cc,hh} to streaming (cherry picked from commit `876cf32e9d`)	2025-04-11 11:55:36 +02:00
Botond Dénes	6091e81e18	reader_concurrency_semaphore: register_inactive_read(): handle aborted permit It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up. Fixes: scylladb/scylladb#22919 (cherry picked from commit `7ba29ec46c`)	2025-04-11 04:04:42 -04:00
Botond Dénes	742ff76d4d	test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now() Unless the test in question actually wants to test timeouts. Timeouts will have more pronounced consequences soon and thus using db::timeout_clock::now() becomes a sure way to make tests flaky. To avoid this, use db::no_timeout in the tests that don't care about timeouts. (cherry picked from commit `4d8eb02b8d`)	2025-04-11 04:04:42 -04:00
Botond Dénes	76f5816e43	Merge '[Backport 6.2] scylla sstable: Add standard extensions and propagate to schema load ' from Scylladb[bot] Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them. Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points. Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line. - (cherry picked from commit `e6aa09e319`) - (cherry picked from commit `4aaf3df45e`) - (cherry picked from commit `00b40eada3`) - (cherry picked from commit `48fda00f12`) Parent PR: #22327 Closes scylladb/scylladb#23089 * github.com:scylladb/scylladb: tools: Add standard extensions and propagate to schema load cql_test_env: Use add all extensions instead of inidividually main: Move extensions adding to function tomstone_gc: Make validate work for tools	2025-04-11 11:00:41 +03:00
Lakshmi Narayanan Sreethar	1f272eade5	topology_coordinator: handle_table_migration: do not continue after executing metadata barrier Return after executing the global metadata barrier to allow the topology handler to handle any transitions that might have started by a concurrect transaction. Fixes #22792 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#22793 (cherry picked from commit `0f7d08d41d`) Closes scylladb/scylladb#23020	2025-04-11 10:59:43 +03:00
yangpeiyu2_yewu	f9cb9b905f	mutation_writer/multishard_writer.cc: wrap writer into futurize_invoke wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction. Fixes scylladb/scylladb#22790 Closes scylladb/scylladb#22812 (cherry picked from commit `0de232934a`) Closes scylladb/scylladb#22944	2025-04-11 10:59:17 +03:00
Avi Kivity	11993efc8c	Merge '[Backport 6.2] row_cache: don't garbage-collect tombstones which cover data in memtables' from Scylladb[bot] The row cache can garbage-collect tombstones in two places: 1) When populating the cache - the underlying reader pipeline has a `compacting_reader` in it; 2) During reads - reads now compact data including garbage collection; In both cases, garbage collection has to do overlap checks against memtables, to avoid collecting tombstones which cover data in the memtables. This PR includes fixes for (2), which were not handled at all currently. (1) was already supposed to be fixed, see https://github.com/scylladb/scylladb/issues/20916. But the test added in this PR showed that the test is incomplete: https://github.com/scylladb/scylladb/issues/23291. A fix for this issue is also included. Fixes: https://github.com/scylladb/scylladb/issues/23291 Fixes: https://github.com/scylladb/scylladb/issues/23252 The fix will need backport to all live release. - (cherry picked from commit `c2518cdf1a`) - (cherry picked from commit `6b5b563ef7`) - (cherry picked from commit `7e600a0747`) - (cherry picked from commit `d126ea09ba`) - (cherry picked from commit `cb76cafb60`) - (cherry picked from commit `df09b3f970`) - (cherry picked from commit `e5afd9b5fb`) - (cherry picked from commit `34b18d7ef4`) - (cherry picked from commit `f7938e3f8b`) - (cherry picked from commit `6c1f6427b3`) - (cherry picked from commit `0d39091df2`) Parent PR: #23255 Closes scylladb/scylladb#23671 * github.com:scylladb/scylladb: test/boost/row_cache_test: add memtable overlap check tests replica/table: add error injection to memtable post-flush phase utils/error_injection: add a way to set parameters from error injection points test/cluster: add test_data_resurrection_in_memtable.py test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts replica/mutation_dump: don't assume cells are live replica/database: do_apply() add error injection point replica: improve memtable overlap checks for the cache replica/memtable: add is_merging_to_cache() db/row_cache: add overlap-check for cache tombstone garbage collection mutation/mutation_compactor: copy key passed-in to consume_new_partition()	2025-04-10 21:41:52 +03:00
Botond Dénes	5fb8a6dae2	mutation/frozen_mutation: frozen_mutation_consumer_adaptor: fix end-of-partition handling This adaptor adapts a mutation reader pausable consumer to the frozen mutation visitor interface. The pausable consumer protocol allows the consumer to skip the remaining parts of the partition and resume the consumption with the next one. To do this, the consumer just has to return stop_iteration::yes from one of the consume() overloads for clustering elements, then return stop_iteration::no from consume_end_of_partition(). Due to a bug in the adaptor, this sequence leads to terminating the consumption completely -- so any remaining partitions are also skipped. This protocol implementation bug has user-visible effects, when the only user of the adaptor -- read repair -- happens during a query which has limitations on the amount of content in each partition. There are two such queries: select distinct ... and select ... with partition limit. When converting the repaired mutation to to query result, these queries will trigger the skip sequence in the consumer and due to the above described bug, will skip the remaining partitions in the results, omitting these from the final query result. This patch fixes the protocol bug, the return value of the underlying consumer's consume_end_of_partition() is now respected. A unit test is also added which reproduces the problem both with select distinct ... and select ... per partition limit. Follow-up work: * frozen_mutation_consumer_adaptor::on_end_of_partition() calls the underlying consumer's on_end_of_stream(), so when consuming multiple frozen mutations, the underlying's on_end_of_stream() is called for each partition. This is incorrect but benign. * Improve documentation of mutation_reader::consume_pausable(). Fixes: #20084 Closes scylladb/scylladb#23657 (cherry picked from commit `d67202972a`) Closes scylladb/scylladb#23693	2025-04-10 21:38:13 +03:00
Botond Dénes	a157b3e62f	test/boost/row_cache_test: add memtable overlap check tests Similar to test/cluster/test_data_resurrection_in_memtable.py but works on a single node and uses more low-level mechanism. These tests can also reproduce more advanced scenarios, like concurrent reads, with some reading from flushed memtables. (cherry picked from commit `0d39091df2`)	2025-04-10 07:33:09 -04:00
Botond Dénes	ce1d990dd6	replica/table: add error injection to memtable post-flush phase After the memtable was flushed to disk, but before it is merged to cache. The injection point will only active for the table specified in the "table_name" injection parameter. (cherry picked from commit `6c1f6427b3`)	2025-04-10 07:33:09 -04:00
Botond Dénes	37b51871ec	utils/error_injection: add a way to set parameters from error injection points With this, now it is possible to have two-way communication between the error injection point and its enabler. The test can enable the error injection point, then wait until it is hit, before proceedin. (cherry picked from commit `f7938e3f8b`)	2025-04-10 07:33:09 -04:00
Botond Dénes	ac18570069	test/cluster: add test_data_resurrection_in_memtable.py Reproducers for #23252 and #23291 -- cache garbage collecting tombstones resurrecting data in the memtable. (cherry picked from commit `34b18d7ef4`)	2025-04-10 07:33:09 -04:00
Botond Dénes	990e92d7cf	test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts Such that a given index in the return hosts refers to the same underlying Scylla instance, as the same index in the passed-in nodes list. This is what users of this method intuitively expect, but currently the returned hosts list is unordered (has random order). (cherry picked from commit `e5afd9b5fb`)	2025-04-10 07:33:09 -04:00
Botond Dénes	67a56ae192	replica/mutation_dump: don't assume cells are live Currently the dumper unconditionally extracts the value of atomic cells, assuming they are live. This doesn't always hold of course and attempting to get the value of a dead cell will lead to marshalling errors. Fix by checking is_live() before attempting to get the cell value. Fix for both regular and collection cells. (cherry picked from commit `df09b3f970`)	2025-04-10 07:33:09 -04:00
Botond Dénes	85a7a9cb05	replica/database: do_apply() add error injection point So writes (to user tables) can be failed on a replica, via error injection. Should simplify tests which want to create differences in what writes different replicas receive. (cherry picked from commit `cb76cafb60`)	2025-04-10 07:33:09 -04:00
Botond Dénes	95205a1b29	replica: improve memtable overlap checks for the cache The current memtable overlap check that is used by the cache -- table::get_max_purgeable_fn_for_cache_underlying_reader() -- only checks the active memtable, so memtables which are either being flushed or are already flushed and also have active reads against them do not participate in the overlap check. This can result in temporary data resurrection, where a cache read can garbage-collect a tombstone which still covers data in a flushing or flushed memtable, which still have active read against it. To prevent this, extend the overlap check to also consider all of the memtable list. Furthermore, memtable_list::erase() now places the removed (flushed) memtable in an intrusive list. These entries are alive only as long as there are readers still keeping an `lw_shared_ptr<memtable>` alive. This list is now also consulted on overlap checks. (cherry picked from commit `d126ea09ba`)	2025-04-10 07:33:09 -04:00
Botond Dénes	ef423eb4c7	replica/memtable: add is_merging_to_cache() And set it when the memtable is merged to cache. (cherry picked from commit `7e600a0747`)	2025-04-10 07:33:08 -04:00
Botond Dénes	d10a2688b1	db/row_cache: add overlap-check for cache tombstone garbage collection The cache should not garbage-collect tombstone which cover data in the memtable. Add overlap checks (get_max_purgeable) to garbage collection to detect tombstones which cover data in the memtable and to prevent their garbage collection. (cherry picked from commit `6b5b563ef7`)	2025-04-10 07:33:08 -04:00
Botond Dénes	4647aa0366	mutation/mutation_compactor: copy key passed-in to consume_new_partition() This doesn't introduce additional work for single-partition queries: the key is copied anyway on consume_end_of_stream(). Multi-partition reads and compaction are not that sensitive to additional copy added. This change fixes a bug in the compacting_reader: currently the reader passes _last_uncompacted_partition_start.key() to the compactor's consume_new_partition(). When the compactor emits enough content for this partition, _last_uncompacted_partition_start is moved from to emit the partition start, this makes the key reference passed to the compaction corrupt (refer to moved-from value). This in turn means that subsequent GC checks done by the compactor will be done with a corrupt key and therefore can result in tombstone being garbage-collected while they still cover data elsewhere (data resurrection). The compacting reader is violating the API contract and normally the bug should be fixed there. We make an exception here because doing the fix in the mutation compactor better aligns with our future plans: * The fix simplifies the compactor (gets rid of _last_dk). * Prepares the way to get rid of the consume API used by the compactor. (cherry picked from commit `c2518cdf1a`)	2025-04-09 14:02:30 +00:00
Michał Chojnowski	664d36c737	table: fix a race in table::take_storage_snapshot() `safe_foreach_sstable` doesn't do its job correctly. It iterates over an sstable set under the sstable deletion lock in an attempt to ensure that SSTables aren't deleted during the iteration. The thing is, it takes the deletion lock after the SSTable set is already obtained, so SSTables might get unlinked before we take the lock. Remove this function and fix its usages to obtain the set and iterate over it under the lock. Closes scylladb/scylladb#23397 (cherry picked from commit `e23fdc0799`) Closes scylladb/scylladb#23627	2025-04-08 19:06:42 +03:00
Lakshmi Narayanan Sreethar	90add328ad	replica/table::do_apply : do not check for async gate's closure The `table::do_apply()` method verifies if the compaction group's async gate is open to determine if the compaction group is active. Closing this async gate prevents any new operations but waits for existing holders to exit, allowing their operations to complete. When holding a gate, holders will observe the gate as closed when it is being closed, but this is irrelevant as they are already inside the gate and are allowed to complete. All the callers of `table::do_apply()` already enter the gate before calling the method. So, the async gate check inside `table::do_apply()` will erroneously throw an exception when the compaction group is closing despite holding the gate. This commit removes the check to prevent this from happening. Fixes #23348 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#23579 (cherry picked from commit `750f4baf44`) Closes scylladb/scylladb#23644	2025-04-08 18:58:35 +03:00
Yaron Kaikov	f4909aafc7	.github: Make "make-pr-ready-for-review" workflow run in base repo in `57683c1a50` we fixed the `token` error, but removed the checkout part which causing now the following error ``` failed to run git: fatal: not a git repository (or any of the parent directories): .git ``` Adding the repo checkout stage to avoid such error Fixes: https://github.com/scylladb/scylladb/issues/22765 Closes scylladb/scylladb#23641 (cherry picked from commit `2dc7ea366b`) Closes scylladb/scylladb#23653	2025-04-08 13:47:50 +03:00
Kefu Chai	9e3eb4329c	.github: Make "make-pr-ready-for-review" workflow run in base repo The "make-pr-ready-for-review" workflow was failing with an "Input required and not supplied: token" error. This was due to GitHub Actions security restrictions preventing access to the token when the workflow is triggered in a fork: ``` Error: Input required and not supplied: token ``` This commit addresses the issue by: - Running the workflow in the base repository instead of the fork. This grants the workflow access to the required token with write permissions. - Simplifying the workflow by using a job-level `if` condition to controlexecution, as recommended in the GitHub Actions documentation (https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution). This is cleaner than conditional steps. - Removing the repository checkout step, as the source code is not required for this workflow. This change resolves the token error and ensures the "make-pr-ready-for-review" workflow functions correctly. Fixes scylladb/scylladb#22765 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22766 (cherry picked from commit `ca832dc4fb`) Closes scylladb/scylladb#23617	2025-04-07 08:11:09 +03:00
Kefu Chai	4ac3f82df9	dist: systemd: use default KillMode before this change, we specify the KillMode of the scylla-service service unit explicitly to "process". according to according to https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html, > If set to process, only the main process itself is killed (not recommended!). and the document suggests use "control-group" over "process". but scylla server is not a multi-process server, it is a multi-threaded server. so it should not make any difference even if we switch to the recommended "control-group". in the light that we've been seeing "defunct" scylla process after stopping the scylla service using systemd. we are wondering if we should try to change the `KillMode` to "control-group", which is the default value of this setting. in this change, we just drop the setting so that the systemd stops the service by stopping all processes in the control group of this unit are stopped. Fixes scylladb/scylladb#21507 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `961a53f716`) Closes scylladb/scylladb#23176	2025-04-04 17:55:04 +03:00
Vlad Zolotarov	e3d063a070	CQL Tracing: set common query parameters in a single function Each query-type (QUERY, EXECUTE, BATCH) CQL opcode has a number of parameters in their payload which we always want to record in the Tracing object. Today it's a Consistency Level, Serial Consistency Level and a Default Timestamp. Setting each of them individually can lead to a human error when one (or more) of them would not be set. Let's eliminate such a possibility by defining a single function that sets them all. This also allows an easy addition of such parameters to this function in the future.	2025-04-02 14:10:03 -04:00
Vlad Zolotarov	1b2ab34647	transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause) can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver. However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation. For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to set it back then). This patch fixes this. Fixes #23173 (cherry picked from commit `ca6bddef35`)	2025-04-01 11:45:33 +00:00
Yaron Kaikov	b67329b34e	.github: add action to make PR ready for review when conflicts label was removed Moving a PR out of draft is only allowed to users with write access, adding a github action to switch PR to `ready for review` once the `conflicts` label was removed Closes scylladb/scylladb#22446 (cherry picked from commit `ed4bfad5c3`) Closes scylladb/scylladb#23006	2025-03-30 11:59:40 +03:00
Kefu Chai	48ff7cf61c	gms: Fix fmt formatter for gossip_digest_sync In commit `4812a57f`, the fmt-based formatter for gossip_digest_syn had formatting code for cluster_id, partitioner, and group0_id accidentally commented out, preventing these fields from being included in the output. This commit restores the formatting by uncommenting the code, ensuring full visibility of all fields in the gossip_digest_syn message when logging permits. This fixes a regression introduced in `4812a57f`, which obscured these fields and reduced debugging insight. Backporting is recommended for improved observability. Fixes #23142 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23155 (cherry picked from commit `2a9966a20e`) Closes scylladb/scylladb#23198	2025-03-30 11:57:15 +03:00
Kefu Chai	18d5af1cd3	storage_proxy: Prevent integer overflow in abstract_read_executor::execute Fix UBSan abort caused by integer overflow when calculating time difference between read and write operations. The issue occurs when: 1. The queried partition on replicas is not purgeable (has no recorded modified time) 2. Digests don't match across replicas 3. The system attempts to calculate timespan using missing/negative last_modified timestamps This change skips cross-DC repair optimization when write timestamp is negative or missing, as this optimization is only relevant for reads occurring within write_timeout of a write. Error details: ``` service/storage_proxy.cc:5532:80: runtime error: signed integer overflow: -9223372036854775808 - 1741940132787203 cannot be represented in type 'int64_t' (aka 'long') SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior service/storage_proxy.cc:5532:80 Aborting on shard 1, in scheduling group sl:default ``` Related to previous fix `39325cf` which handled negative read_timestamp cases. Fixes #23314 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23359 (cherry picked from commit `ebf9125728`) Closes scylladb/scylladb#23386	2025-03-30 11:54:59 +03:00
Tomasz Grabiec	6cdd1cccdc	test: tablets: Fix flakiness due to ungraceful shutdown The test fails sporadically with: cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1} That's becase a server is stopped in the middle of the workload. The server is stopped ungracefully which will cause some requests to time out. We should stop it gracefully to allow in-flight requests to finish. Fixes #20492 Closes scylladb/scylladb#23451 (cherry picked from commit `8e506c5a8f`) Closes scylladb/scylladb#23468	2025-03-28 14:56:39 +01:00
Anna Stuchlik	3f0e52a5ee	doc: zero-token nodes and Arbiter DC This commit adds documentation for zero-token nodes and an explanation of how to use them to set up an arbiter DC to prevent a quorum loss in multi-DC deployments. The commit adds two documents: - The one in Architecture describes zero-token nodes. - The other in Cluster Management explains how to use them. We need separate documents because zero-token nodes may be used for other purposes in the future. In addition, the documents are cross-linked, and the link is added to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document. Refs https://github.com/scylladb/scylladb/pull/19684 Fixes https://github.com/scylladb/scylladb/issues/20294 Closes scylladb/scylladb#21348 (cherry picked from commit `9ac0aa7bba`) Closes scylladb/scylladb#23200	2025-03-10 10:52:13 +01:00
Piotr Dulikowski	9fc27b734f	test: test_mv_topology_change: increase timeout for removenode The test `test_mv_topology_change` is a regression test for scylladb/scylladb#19529. The problem was that CL=ANY writes issued when all replicas were down would be kept in memory until the timeout. In particular, MV updates are CL=ANY writes and have a 5 minute timeout. When doing topology operations for vnodes or when migrating tablet replicas, the cluster goes through stages where the replica sets for writes undergo changes, and the writes started with the old replica set need to be drained first. Because of the aforementioned MV updates, the removenode operation could be delayed by 5 minutes or more. Therefore, the `test_mv_topology_change` test uses a short timeout for the removenode operation, i.e. 30s. Apparently, this is too low for the debug mode and the test has been observed to time out even though the removenode operation is progressing fine. Increase the timeout to 60s. This is the lowest timeout for the removenode operation that we currently use among the in-repo tests, and is lower than 5 minutes so the test will still serve its purpose. Fixes: scylladb/scylladb#22953 Closes scylladb/scylladb#22958 (cherry picked from commit `43ae3ab703`) Closes scylladb/scylladb#23052	2025-03-04 16:04:00 +01:00
Wojciech Mitros	8d392229cc	test: add test for schema registry maintaining base info for views In this patch we test the behavior of schema registry in a few scenarios where it was identified it could misbehave. The first one is reverse schemas for views. Previously, SELECT queries with reverse order on views could fail because we didn't have base info in the registry for such schemas. The second one is schemas that temporarily died in the registry. This can happen when, while processing a query for a given schema version, all related schema_ptrs were destroyed, but this schema was requested before schema_registry::grace_period() has passed. In this scenario, the base info would not be recovered, causing errors. (cherry picked from commit `74cbc77f50`)	2025-03-03 13:57:45 +01:00
Wojciech Mitros	3a1d2cbeb6	schema_registry: avoid setting base info when getting the schema from registry After the previous patches, the view schemas returned by schema registry always have their base info set. As such, we no longer need to set it after getting the view schema from the registry. This patch removes these unnecessary updates. (cherry picked from commit `3094ff7cbe`)	2025-03-03 13:49:11 +01:00
Calle Wilund	ecbb765b3f	tools: Add standard extensions and propagate to schema load Fixes #22314 Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them. (cherry picked from commit `48fda00f12`)	2025-02-27 17:46:28 +00:00
Calle Wilund	3bf553efa1	cql_test_env: Use add all extensions instead of inidividually (cherry picked from commit `00b40eada3`)	2025-02-27 17:46:28 +00:00
Calle Wilund	9500c8c9cb	main: Move extensions adding to function Easily called from elsewhere. The extensions we should always include (oxymoron?) (cherry picked from commit `4aaf3df45e`)	2025-02-27 17:46:28 +00:00
Calle Wilund	4c735abb62	tomstone_gc: Make validate work for tools Don't crash if validation is done as part of loading a schema from file (schema.cql) (cherry picked from commit `e6aa09e319`)	2025-02-27 17:46:28 +00:00
Wojciech Mitros	b0ab86a8ad	schema_registry: update cached base schemas when updating a view The schema registry now holds base schemas for view schemas. The base schema may change without changing the view schema, so to preserve the change in the schema registry, we also update the base schema in the registry when updating the base info in the view schema. (cherry picked from commit `82f2e1b44c`)	2025-02-25 16:55:03 +00:00
Wojciech Mitros	1cf288dd97	schema_registry: cache base schemas for views Currently, when we load a frozen schema into the registry, we lose the base info if the schema was of a view. Because of that, in various places we need to set the base info again, and in some codepaths we may miss it completely, which may make us unable to process some requests (for example, when executing reverse queries on views). Even after setting the base info, we may still lose it if the schema entry gets deactivated. To fix this, this patch adds the base schema to the registry, alongside the view schema. With the base schema, we can now set the base info when returning the schema from the registry. As a result, we can now assume that all view schemas returned by the registry have base_info set. To store the base schema, the loader methods now have to return the base schema alongside the view schema. At the same time, when loading into the registry, we need to check whether we're loading a view schema, and if so, we need to also provide the base schema. When inserting a regular table schema, the base schema should be a disengaged optional. (cherry picked from commit `dfe3810f64`)	2025-02-25 16:55:03 +00:00
Wojciech Mitros	40312f482e	db: set base info before adding schema to registry In the following patches, we'll assure that view schemas returned by the schema registry always have base info set. To prepare for that, make sure that the base info is always set before inserting it into schema registry, (cherry picked from commit `6f11edbf3f`)	2025-02-25 16:55:03 +00:00
Benny Halevy	9fd5909a5e	token_group_based_splitting_mutation_writer: maybe_switch_to_new_writer: prevent double close Currently, maybe_switch_to_new_writer resets _current_writer only in a continuation after closing the current writer. This leaves a window of vulnerability if close() yields, and token_group_based_splitting_mutation_writer::close() is called. Seeing the engaged _current_writer, close() will call _current_writer->close() - which must be called exactly once. Solve this when switching to a new writer by resetting _current_writer before closing it and potentially yielding. Fixes #22715 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#22922 (cherry picked from commit `29b795709b`) Closes scylladb/scylladb#22964	2025-02-23 14:27:11 +02:00
Botond Dénes	103a986eca	Merge '[Backport 6.2] reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Scylladb[bot] `set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: https://github.com/scylladb/scylladb/issues/22629 Backport required to all active releases, they are all affected. - (cherry picked from commit `a3ae0c7cee`) - (cherry picked from commit `9174f27cc8`) Parent PR: #22701 Closes scylladb/scylladb#22751 * github.com:scylladb/scylladb: reader_concurrency_semaphore: set_notify_handler(): disable timeout reader_permit: mark check_abort() as const	2025-02-19 09:59:47 +02:00
Botond Dénes	1d4ea169e3	reader_concurrency_semaphore: set_notify_handler(): disable timeout set_notify_handler() is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: #scylladb/scylladb#22629 (cherry picked from commit `9174f27cc8`)	2025-02-18 04:48:22 -05:00
Botond Dénes	86c9bc778a	tools/scylla-nodetool: netstats: don't assume both senders and receivers The code currently assumes that a session has both sender and receiver streams, but it is possible to have just one or the other. Change the test to include this scenario and remove this assumption from the code. Fixes: #22770 Closes scylladb/scylladb#22771 (cherry picked from commit `87e8e00de6`) Closes scylladb/scylladb#22873	2025-02-18 10:35:10 +02:00
Botond Dénes	4acb366a28	service/storage_proxy: schedule_repair(): materialize the range into a vector Said method passes down its `diff` input to `mutate_internal()`, after some std::ranges massaging. Said massaging is destructive -- it moves items from the diff. If the output range is iterated-over multiple times, only the first time will see the actual output, further iterations will get an empty range. When trace-level logging is enabled, this is exactly what happens: `mutate_internal()` iterates over the range multiple times, first to log its content, then to pass it down the stack. This ends up resulting in a range with moved-from elements being pased down and consequently write handlers being created with nullopt mutations. Make the range re-entrant by materializing it into a vector before passing it to `mutate_internal()`. Fixes: scylladb/scylladb#21907 Fixes: scylladb/scylladb#21714 Closes scylladb/scylladb#21910 (cherry picked from commit `7150442f6a`) Closes scylladb/scylladb#22853	2025-02-18 10:34:42 +02:00
Botond Dénes	24eb8f49ba	service: query_pager: fix last-position for filtering queries On short-pages, cut short because of a tombstone prefix. When page-results are filtered and the filter drops some rows, the last-position is taken from the page visitor, which does the filtering. This means that last partition and row position will be that of the last row the filter saw. This will not match the last position of the replica, when the replica cut the page due to tombstones. When fetching the next page, this means that all the tombstone suffix of the last page, will be re-fetched. Worse still: the last position of the next page will not match that of the saved reader left on the replica, so the saved reader will be dropped and a new one created from scratch. This wasted work will show up as elevated tail latencies. Fix by always taking the last position from raw query results. Fixes: #22620 Closes scylladb/scylladb#22622 (cherry picked from commit `7ce932ce01`) Closes scylladb/scylladb#22718	2025-02-13 15:15:24 +02:00
Botond Dénes	8e6648870b	reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload with_permit() creates a permit, with a self-reference, to avoid attaching a continuation to the permit's run function. This self-reference is used to keep the permit alive, until the execution loop processes it. This self reference has to be carefully cleared on error-paths, otherwise the permit will become a zombie, effectively leaking memory. Instead of trying to handle all loose ends, get rid of this self-reference altogether: ask caller to provide a place to save the permit, where it will survive until the end of the call. This makes the call-site a little bit less nice, but it gets rid of a whole class of possible bugs. Fixes: #22588 Closes scylladb/scylladb#22624 (cherry picked from commit `f2d5819645`) Closes scylladb/scylladb#22703	2025-02-13 15:03:57 +02:00
Botond Dénes	77696b1e43	reader_concurrency_semaphore: foreach_permit(): include _inactive_reads So inactive reads show up in semaphore diagnostics dumps (currently the only non-test user of this method). Fixes: #22574 Closes scylladb/scylladb#22575 (cherry picked from commit `e1b1a2068a`) Closes scylladb/scylladb#22610	2025-02-13 15:03:23 +02:00
Aleksandra Martyniuk	3497ba7f60	replica: mark registry entry as synch after the table is added When a replica get a write request it performs get_schema_for_write, which waits until the schema is synced. However, database::add_column_family marks a schema as synced before the table is added. Hence, the write may see the schema as synced, but hit no_such_column_family as the table hasn't been added yet. Mark schema as synced after the table is added to database::_tables_metadata. Fixes: #22347. Closes scylladb/scylladb#22348 (cherry picked from commit `328818a50f`) Closes scylladb/scylladb#22603	2025-02-13 15:02:59 +02:00
Aleksandra Martyniuk	ade0fe2d7a	nodetool: tasks: print empty string for start_time/end_time if unspecified If start_time/end_time is unspecified for a task, task_manager API returns epoch. Nodetool prints the value in task status. Fix nodetool tasks commands to print empty string for start_time/end_time if it isn't specified. Modify nodetool tasks status docs to show empty end_time. Fixes: #22373. Closes scylladb/scylladb#22370 (cherry picked from commit `477ad98b72`) Closes scylladb/scylladb#22600	2025-02-13 13:26:54 +02:00
Jenkins Promoter	72cf5ef576	Update ScyllaDB version to: 6.2.4	2025-02-09 16:52:35 +02:00
Botond Dénes	2978ed58a2	reader_permit: mark check_abort() as const All it does is read one field, making it const makes using it easier. (cherry picked from commit `a3ae0c7cee`)	2025-02-09 00:32:13 +00:00
Tomasz Grabiec	6922acb69f	Merge '[Backport 6.2] split: run set_split_mode() on all storage groups during all_storage_groups_split()' from Scylladb[bot] `tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups. The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that `tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but `tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE Fixes #22431 This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1 - (cherry picked from commit `24e8d2a55c`) - (cherry picked from commit `8bff7786a8`) Parent PR: #22330 Closes scylladb/scylladb#22559 * github.com:scylladb/scylladb: test: add reproducer and test for fix to split ready CG creation table: run set_split_mode() on all storage groups during all_storage_groups_split()	2025-02-07 14:22:57 +01:00
Tomasz Grabiec	61e303a3e3	locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes In that case, new_racks will be used, but when we discover no candidates, we try to pop from existing_racks. Fixes #22625 Closes scylladb/scylladb#22652 (cherry picked from commit `e22e3b21b1`) Closes scylladb/scylladb#22721	2025-02-06 16:47:14 +01:00
Avi Kivity	8ede62d288	Update seastar submodule (hwloc failures on some AWS instances) * seastar ec5da7a606...e40388c4c7 (1): > resource: fallback to sysconf when failed to detect memory size from hwloc Fixes #22382	2025-02-04 16:29:45 +02:00
Avi Kivity	4ac9c710fc	Merge '[Backport 6.2] api: task_manager: do not unregister finish task when its status is queried' from Scylladb[bot] Currently, when the status of a task is queried and the task is already finished, it gets unregistered. Getting the status shouldn't be a one-time operation. Stop removing the task after its status is queried. Adjust tests not to rely on this behavior. Add task_manager/drain API and nodetool tasks drain command to remove finished tasks in the module. Fixes: https://github.com/scylladb/scylladb/issues/21388. It's a fix to task_manager API, should be backported to all branches - (cherry picked from commit `e37d1bcb98`) - (cherry picked from commit `18cc79176a`) Parent PR: #22310 Closes scylladb/scylladb#22597 * github.com:scylladb/scylladb: api: task_manager: do not unregister tasks on get_status api: task_manager: add /task_manager/drain	2025-02-03 23:04:31 +02:00
Avi Kivity	34fa9bd586	Merge '[Backport 6.2] Simplify loading_cache_test and use manual_clock' from Scylladb[bot] This series exposes a Clock template parameter for loading_cache so that the test could use the manual_clock rather than the lowres_clock, since relying on the latter is flaky. In addition, the test load function is simplified to sleep some small random time and co_return the expected string, rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test. Fixes #20322 * The test was flaky forever, so backport is required for all live versions. - (cherry picked from commit `b509644972`) - (cherry picked from commit `934a9d3fd6`) - (cherry picked from commit `d68829243f`) - (cherry picked from commit `b258f8cc69`) - (cherry picked from commit `0841483d68`) - (cherry picked from commit `32b7cab917`) Parent PR: #22064 Closes scylladb/scylladb#22640 * github.com:scylladb/scylladb: tests: loading_cache_test: use manual_clock utils: loading_cache: make clock_type a template parameter test: loading_cache_test: use function-scope loader test: loading_cache_test: simlute loader using sleep test: lib: eventually: add sleep function param test: lib: eventually: make *EVENTUALLY_EQUAL inline functions	2025-02-03 22:56:31 +02:00
Benny Halevy	79bff0885c	tests: loading_cache_test: use manual_clock Relying on a real-time clock like lowres_clock can be flaky (in particular in debug mode). Use manual_clock instead to harden the test against timing issues. Fixes #20322 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `32b7cab917`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-03 16:15:46 +02:00
Benny Halevy	abf8f44e03	utils: loading_cache: make clock_type a template parameter So the unit test can use manual_clock rather than lowres_clock which can be flaky (in particular in debug mode). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `0841483d68`)	2025-02-03 16:02:37 +02:00
Benny Halevy	00f1dcfd09	test: loading_cache_test: use function-scope loader Rather than a global function, accessing a thread-local `load_count`. The thread-local load_count cannot be used when multiple test cases run in parallel. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b258f8cc69`)	2025-02-03 16:01:53 +02:00
Benny Halevy	b0166a3a9c	test: loading_cache_test: simlute loader using sleep This test isn't about reading values from file, but rather it's about the loading_cache. Reading from the file can sometimes take longer than the expected refresh times, causing flakiness (see #20322). Rather than reading a string from a real file, just sleep a random, short time, and co_return the string. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `d68829243f`)	2025-02-03 16:00:51 +02:00
Benny Halevy	7addc3454d	test: lib: eventually: add sleep function param To allow support for manual_clock instead of seastar::sleep. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `934a9d3fd6`)	2025-02-03 16:00:47 +02:00
Benny Halevy	9d5e3f050e	test: lib: eventually: make *EVENTUALLY_EQUAL inline functions rather then macros. This is a first cleanup step before adding a sleep function parameter to support also manual_clock. Also, add a call to BOOST_REQUIRE_EQUAL/BOOST_CHECK_EQUAL, respectively, to make an error more visible in the test log since those entry points print the offending values when not equal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b509644972`)	2025-02-03 15:50:22 +02:00
Michael Litvak	c7421a8804	view_builder: fix loop in view builder when tokens are moved The view builder builds a view by going over the entire token ring, consuming the base table partitions, and generating view updates for each partition. A view is considered as built when we complete a full cycle of the token ring. Suppose we start to build a view at a token F. We will consume all partitions with tokens starting at F until the maximum token, then go back to the minimum token and consume all partitions until F, and then we detect that we pass F and complete building the view. This happens in the view builder consumer in `check_for_built_views`. The problem is that we check if we pass the first token F with the condition `_step.current_token() >= it->first_token` whenever we consume a new partition or the current_token goes back to the minimum token. But suppose that we don't have any partitions with a token greater than or equal to the first token (this could happen if the partition with token F was moved to another node for example), then this condition will never be satisfied, and we don't detect correctly when we pass F. Instead, we go back to the minimum token, building the same token ranges again, in a possibly infinite loop. To fix this we add another step when reaching the end of the reader's stream. When this happens it means we don't have any more fragments to consume until the end of the range, so we advance the current_token to the end of the range, simulating a partition, and check for built views in that range. Fixes scylladb/scylladb#21829 Closes scylladb/scylladb#22493 (cherry picked from commit `6d34125eb7`) Closes scylladb/scylladb#22606	2025-02-03 13:27:28 +01:00
Avi Kivity	700402a7bb	seatar: point submodule at scylla-seastar.git This allows backporting commits to seastar.	2025-01-31 19:49:15 +02:00
Aleksandra Martyniuk	424fab77d2	api: task_manager: do not unregister tasks on get_status Currently, /task_manager/task_status_recursive/{task_id} and /task_manager/task_status/{task_id} unregister queries task if it has already finished. The status should not disappear after being queried. Do not unregister finished task when its status or recursive status is queried. (cherry picked from commit `18cc79176a`)	2025-01-31 10:12:46 +01:00
Aleksandra Martyniuk	5d16157936	api: task_manager: add /task_manager/drain In the following patches, get_status won't be unregistering finished tasks. However, tests need a functionality to drop a task, so that they could manipulate only with the tasks for operations that were invoked by these tests. Add /task_manager/drain/{module} to unregister all finished tasks from the module. Add respective nodetool command. (cherry picked from commit `e37d1bcb98`)	2025-01-31 10:11:57 +01:00
Aleksandra Martyniuk	e7d891b629	repair: add repair_service gate In main.cc storage_service is started before and stopped after repair_service. storage_service keeps a reference to sharded repair_service and calls its methods, but nothing ensures that repair_service's local instance would be alive for the whole execution of the method. Add a gate to repair_service and enter it in storage_service before executing methods on local instances of repair_service. Fixes: #21964. Closes scylladb/scylladb#22145 (cherry picked from commit `32ab58cdea`) Closes scylladb/scylladb#22318	2025-01-30 11:39:46 +02:00
Anna Stuchlik	e57e3b4039	doc: add troubleshooting removal with --autoremove-ubuntu This commit adds a troubleshooting article on removing ScyllaDB with the --autoremove option. Fixes https://github.com/scylladb/scylladb/issues/21408 Closes scylladb/scylladb#21697 (cherry picked from commit `8d824a564f`) Closes scylladb/scylladb#22232	2025-01-29 20:24:24 +02:00
Kefu Chai	2514f50f7f	docs: fix monospace formatting for `rm` command Add missing space before `rm` to ensure proper rendering in monospace font within documentation. Fixes scylladb/scylladb#22255 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21576 (cherry picked from commit `6955b8238e`) Closes scylladb/scylladb#22257	2025-01-29 20:23:22 +02:00
Botond Dénes	e74c8372a9	tools/scylla-sstable: dump-statistics: fix handling of {min,max}_column_names Said fields in statistics are of type `disk_array<uint32_t, disk_string<uint16_t>>` and currently are handled as array of regular strings. However these fields store exploded clustering keys, so the elements store binary data and converting to string can yield invalid UTF-8 characters that certain JSON parsers (jq, or python's json) can choke on. Fix this by treating them as binary and using `to_hex()` to convert them to string. This requires some massaging of the json_dumper: passing field offset to all visit() methods and using a caller-provided disk-string to sstring converter to convert disk strings to sstring, so in the case of statistics, these fields can be intercepted and properly handled. While at it, the type of these fields is also fixed in the documentation. Before: "min_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "�2y\u0000�}\u007f" ], "max_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "}��B\u0019l%^" ], After: "min_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "80327900e2827d7f" ], "max_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "7df79242196c255e" ], Fixes: #22078 Closes scylladb/scylladb#22225 (cherry picked from commit `f899f0e411`) Closes scylladb/scylladb#22296	2025-01-29 20:22:52 +02:00
Botond Dénes	aa16d736dc	Merge '[Backport 6.2] sstable_directory: do not load remote unshared sstables in process_descriptor()' from Lakshmi Narayanan The sstable loader relied on the generation id to provide an efficient hint about the shard that owns an sstable. But, this hint was rendered ineffective with the introduction of UUID generation, as the shard id was no longer embedded in the generation id. This also became suboptimal with the introduction of tablets. Commit `0c77f77` addressed this issue by reading the minimum from disk to determine sstable ownership but this improvement was lost with commit `63f1969`, which optimistically assumed that hints would work most of the time, which isn't true. This commit restores that change - shard id of a table is deduced by reading minially from disk and then the sstable is fully loaded only if it belongs to the local shard. This patch also adds a testcase to verify that the sstable are loaded only in their respective shards. Fixes #21015 This fixes a regression and should be backported. - (cherry picked from commit `d2ba45a01f`) - (cherry picked from commit `6e3ecc70a6`) - (cherry picked from commit `63100b34da`) Parent PR: #22263 Closes scylladb/scylladb#22376 * github.com:scylladb/scylladb: sstable_directory: do not load remote sstables in process_descriptor sstable_directory: reintroduce `get_shards_for_this_sstable()`	2025-01-29 20:20:50 +02:00
Botond Dénes	fa73a8da34	replica: remove noexcept from token -> tablet resolution path The methods to resolve a key/token/range to a table are all noexcept. Yet the method below all of these, `storage_group_for_id()` can throw. This means that if due to any mistake a tablet without local replica is attempted to be looked up, it will result in a crash, as the exception bubbles up into the noexcept methods. There is no value in pretending that looking up the tablet replica is noexcept, remove the noexcept specifiers so that any bad lookup only fails the operation at hand and doesn't crash the node. This is especially relevant to replace, which still has a window where writes can arrive for tablets that don't (yet) have a local replica. Currently, this results in a crash. After this patch, this will only fail the writes and the replace can move on. Fixes: #21480 Closes scylladb/scylladb#22251 (cherry picked from commit `55963f8f79`) Closes scylladb/scylladb#22379	2025-01-29 20:19:52 +02:00
Avi Kivity	10abba4c64	Merge '[Backport 6.2] repair: handle no_such_keyspace in repair preparation phase' from null Currently, data sync repair handles most no_such_keyspace exceptions, but it omits the preparation phase, where the exception could be thrown during make_global_effective_replication_map. Skip the keyspace repair if no_such_keyspace is thrown during preparations. Fixes: #22073. Requires backport to 6.1 and 6.2 as they contain the bug - (cherry picked from commit `bfb1704afa`) - (cherry picked from commit `54e7f2819c`) Parent PR: #22473 Closes scylladb/scylladb#22541 * github.com:scylladb/scylladb: test: add test to check if repair handles no_such_keyspace repair: handle keyspace dropped	2025-01-29 19:52:38 +02:00
Michael Litvak	36b1a486de	cdc: fix handling of new generation during raft upgrade During raft upgrade, a node may gossip about a new CDC generation that was propagated through raft. The node that receives the generation by gossip may have not applied the raft update yet, and it will not find the generation in the system tables. We should consider this error non-fatal and retry to read until it succeeds or becomes obsolete. Another issue is when we fail with a "fatal" exception and not retrying to read, the cdc metadata is left in an inconsistent state that causes further attempts to insert this CDC generation to fail. What happens is we complete preparing the new generation by calling `prepare`, we insert an empty entry for the generation's timestamp, and then we fail. The next time we try to insert the generation, we skip inserting it because we see that it already has an entry in the metadata and we determine that there's nothing to do. But this is wrong, because the entry is empty, and we should continue to insert the generation. To fix it, we change `prepare` to return `true` when the entry already exists but it's empty, indicating we should continue to insert the generation. Fixes scylladb/scylladb#21227 Closes scylladb/scylladb#22093 (cherry picked from commit `4f5550d7f2`) Closes scylladb/scylladb#22545	2025-01-29 19:52:18 +02:00
Ferenc Szili	5a74ded582	test: add reproducer and test for fix to split ready CG creation This adds a reproducer for #22431 In cases where a tablet storage group manager had more than one storage group, it was possible to create compaction groups outside the group0 guard, which could create problems with operations which should exclude with compaction group creation. (cherry picked from commit `8bff7786a8`)	2025-01-29 14:46:37 +01:00
Ferenc Szili	0ea5a1fc48	table: run set_split_mode() on all storage groups during all_storage_groups_split() tablet_storage_group_manager::all_storage_groups_split() calls set_split_mode() for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using std::ranges::all_of() which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurance of the predicate (set_split_mode()) returning false. set_split_mode() creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups. The missing split compaction groups are later created in tablet_storage_group_manager::split_all_storage_groups() which also calls set_split_mode(), and that is the reason why split completes successfully. The problem is that tablet_storage_group_manager::all_storage_groups_split() runs under a group0 guard, and tablet_storage_group_manager::split_all_storage_groups() does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE (cherry picked from commit `24e8d2a55c`)	2025-01-29 14:42:17 +01:00
Aleksandra Martyniuk	be67b4d634	test: add test to check if repair handles no_such_keyspace (cherry picked from commit `54e7f2819c`)	2025-01-28 21:50:10 +00:00
Aleksandra Martyniuk	2143f8ccb6	repair: handle keyspace dropped Currently, data sync repair handles most no_such_keyspace exceptions, but it omits the preparation phase, where the exception could be thrown during make_global_effective_replication_map. Skip the keyspace repair if no_such_keyspace is thrown during preparations. (cherry picked from commit `bfb1704afa`)	2025-01-28 21:50:10 +00:00
Kefu Chai	7da4223411	compress: fix compressor initialization order by making namespace_prefix a function Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be initialized before compressor::namespace_prefix due to undefined global variable initialization order across translation units. This was causing ZstdCompressor to be unregistered in release builds, making it impossible to create tables with Zstd compression. Replace the global namespace_prefix variable with a function that returns the fully qualified compressor name. This ensures proper initialization order and fixes the registration of the ZstdCompressor. Fixes scylladb/scylladb#22444 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22451 (cherry picked from commit `4a268362b9`) Closes scylladb/scylladb#22510	2025-01-27 19:50:23 +02:00
Calle Wilund	2b40f405f0	commitlog: Fix assertion in oversized_alloc Fixes #20633 Cannot assert on actual request_controller when releasing permit, as the release, if we have waiters in queue, will subtract some units to hand to them. Instead assert on permit size + waiter status (and if zero, also controller value) * v2 - use SCYLLA_ASSERT (cherry picked from commit `58087ef427`) Closes scylladb/scylladb#22455	2025-01-26 16:46:35 +02:00
Kamil Braun	399325f0f0	Merge '[Backport 6.2] raft: Handle non-critical config update errors in when changing voter status.' from Sergey Z When a node is bootstrapped and joined a cluster as a non-voter and changes it's role to a voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried. Fixes scylladb/scylladb#20814 Backport: This issue occurs frequently and disrupts the CI workflow to some extent. Backports are needed for versions 6.1 and 6.2. - (cherry picked from commit `775411ac56`) - (cherry picked from commit `16053a86f0`) - (cherry picked from commit `8c48f7ad62`) - (cherry picked from commit `3da4848810`) - (cherry picked from commit `228a66d030`) Parent PR: #22253 Closes scylladb/scylladb#22358 * github.com:scylladb/scylladb: raft: refactor `remove_from_raft_config` to use a timed `modify_config` call. raft: Refactor functions using `modify_config` to use a common wrapper for retrying. raft: Handle non-critical config update errors in when changing status to voter. test: Add test to check that a node does not fail on unknown commit status error when starting up. raft: Add run_op_with_retry in raft_group0.	2025-01-24 17:05:50 +01:00
Sergey Zolotukhin	9730f98d34	raft: refactor `remove_from_raft_config` to use a timed `modify_config` call. To avoid potential hangs during the `remove_from_raft_config` operation, use a timed `modify_config` call. This ensures the operation doesn't get stuck indefinitely. (cherry picked from commit `228a66d030`)	2025-01-22 09:54:28 +01:00
Sergey Zolotukhin	d419fb4a0c	raft: Refactor functions using `modify_config` to use a common wrapper for retrying. There are several places in `raft_group0` where almost identical code is used for retrying `modify_config` in case of `commit_status_unknown` error. To avoid code duplication all these places were changed to use a new wrapper `run_op_with_retry`. (cherry picked from commit `3da4848810`)	2025-01-22 09:54:26 +01:00
Lakshmi Narayanan Sreethar	e0189ccac5	sstable_directory: do not load remote sstables in process_descriptor The sstable loader relied on the generation id to provide an efficient hint about the shard that owns an sstable. But, this hint was rendered ineffective with the introduction of UUID generation, as the shard id was no longer embedded in the generation id. This also became suboptimal with the introduction of tablets. Commit `0c77f77` addressed this issue by reading the minimum from disk to determine sstable ownership but this improvement was lost with commit `63f1969`, which optimistically assumed that hints would work most of the time, which isn't true. This commit restores that change - shard id of a table is deduced by reading minially from disk and then the sstable is fully loaded only if it belongs to the local shard. This patch also adds a testcase to verify that the sstable are loaded only in their respective shards. Fixes #21015 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `63100b34da`)	2025-01-21 00:48:34 +05:30
Lakshmi Narayanan Sreethar	8191f5d0f4	sstable_directory: reintroduce `get_shards_for_this_sstable()` Reintroduce `get_shards_for_this_sstable()` that was removed in commit ad375fbb. This will be used in the following patch to ensure that an sstable is loaded only once. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `d2ba45a01f`)	2025-01-21 00:48:34 +05:30
Nadav Har'El	bff9ddde12	Merge '[Backport 6.2] view_builder: write status to tables before starting to build' from null When adding a new view for building, first write the status to the system tables and then add the view building step that will start building it. Otherwise, if we start building it before the status is written to the table, it may happen that we complete building the view, write the SUCCESS status, and then overwrite it with the STARTED status. The view_build_status table will remain in incorrect state indicating the view building is not complete. Fixes #20638 The PR contains few additional small fixes in separate commits related to the view build status table. It addresses flakiness issues in tests that use the view build status table to determine when view building is complete. The table may be in incorrect state due to these issues, having a row with status STARTED when it actually finished building the view, which will cause us to wait in `wait_for_view` until it timeouts. For testing I used a test similar to `test_view_build_status_with_replace_node`, but it only creates the views and calls `wait_for_view`. Without these commits it failed in 4/1024 runs, and with the commits it passed 2048/2048. backport to fix the bugs that affects previous versions and improve CI stability - (cherry picked from commit `b1be2d3c41`) - (cherry picked from commit `1104411f83`) - (cherry picked from commit `7a6aec1a6c`) Parent PR: #22307 Closes scylladb/scylladb#22356 * github.com:scylladb/scylladb: view_builder: hold semaphore during entire startup view_builder: pass view name by value to write_view_build_status view_builder: write status to tables before starting to build	2025-01-19 15:36:44 +02:00
Michael Litvak	e8fde3b0a3	view_builder: hold semaphore during entire startup Guard the whole view builder startup routine by holding the semaphore until it's done instead of releasing it early, so that it's not intercepted by migration notifications. (cherry picked from commit `7a6aec1a6c`)	2025-01-19 11:03:53 +02:00
Michael Litvak	a7f8842776	view_builder: pass view name by value to write_view_build_status The function write_view_build_status takes two lambda functions and chooses which of them to run depending on the upgrade state. It might run both of them. The parameters ks_name and view_name should be passed by value instead of by reference because they are moved inside each lambda function. Otherwise, if both lambdas are run, the second call operates on invalid values that were moved. (cherry picked from commit `1104411f83`)	2025-01-19 11:03:53 +02:00
Piotr Dulikowski	b88e0f8f74	Merge '[Backport 6.2] main, view: Pair view builder drain with its start' from null In this PR, we pair draining the view builder with its start. To better understand what was done and why, let's first look at the situation before this commit and the context of it: (a) The following things happened in order: 1. The view builder would be constructed. 2. Right after that, a deferred lambda would be created to stop the view builder during shutdown. 3. group0_service would be started. 4. A deferred lambda stopping group0_service would be created right after that. 5. The view builder would be started. (b) Because the view builder depends on group0_client, it couldn't be started before starting group0_service. On the other hand, other services depend on the view builder, e.g. the stream manager. That makes changing the order of initialization a difficult problem, so we want to avoid doing that unless we're sure it's the right choice. (c) Since the view builder uses group0_client, there was a possibility of running into a segmentation fault issue in the following scenario: 1. A call to `view_builder::mark_view_build_success()` is issued. 2. We stop group0_service. 3. `view_builder::mark_view_build_success()` calls `announce_with_raft()`, which leads to a use-after-free because group0_service has already been destroyed. This very scenario took place in scylladb/scylladb#20772. Initially, we decided to solve the issue by initializing group0_service a bit earlier (scylladb/scylladb@7bad8378c7). Unfortunately, it led to other issues described in scylladb/scylladb#21534, so we revert that patch. These changes are the second attempt to the problem where we want to solve it in a safer manner. The solution we came up with is to pair the start of the view builder with a deferred lambda that deinitializes it by calling `view_builder::drain()`. No other component of the system should be able to use the view builder anymore, so it's safe to do that. Furthermore, that pairing makes the analysis of initialization/deinitialization order much easier. We also solve the aformentioned use-after-free issue because the view builder itself will no longer attempt to use group0_client. Note that we still pair a deferred lambda calling `view_builder::stop()` with the construction of the view builder; that function will also call `view_builder::drain()`. Another notable thing is `view_builder::drain()` may be called earlier by `storage_service::do_drain()`. In other words, these changes cover the situation when Scylla runs into a problem when starting up. Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too. Fixes scylladb/scylladb#20772 Fixes scylladb/scylladb#21534 - (cherry picked from commit `a5715086a4`) - (cherry picked from commit `06ce976370`) - (cherry picked from commit `d1f960eee2`) Parent PR: #21909 Closes scylladb/scylladb#22331 * github.com:scylladb/scylladb: test/topology_custom: Add test for Scylla with disabled view building main, view: Pair view builder drain with its start Revert "main,cql_test_env: start group0_service before view_builder"	2025-01-17 09:57:36 +01:00
Sergey Zolotukhin	2ea97d8c19	raft: Handle non-critical config update errors in when changing status to voter. When a node is bootstrapped and joins a cluster as a non-voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried. Fixes scylladb/scylladb#20814 (cherry picked from commit `8c48f7ad62`)	2025-01-16 20:09:31 +00:00
Sergey Zolotukhin	36ff3e8f5f	test: Add test to check that a node does not fail on unknown commit status error when starting up. Test that a node is starting successfully if while joining a cluster and becoming a voter, it receives an unknown commit status error. Test for scylladb/scylladb#20814 (cherry picked from commit `16053a86f0`)	2025-01-16 20:09:31 +00:00
Sergey Zolotukhin	325cdd3ebc	raft: Add run_op_with_retry in raft_group0. Since when calling `modify_config` it's quite often we need to do retries, to avoid code duplication, a function wrapper that allows a function to be called with automatic retries in case of failures was added. (cherry picked from commit `775411ac56`)	2025-01-16 20:09:30 +00:00
Michael Litvak	c5bdc9e58f	view_builder: write status to tables before starting to build When adding a new view for building, first write the status to the system tables and then add the view building step that will start building it. Otherwise, if we start building it before the status is written to the table, it may happen that we complete building the view, write the SUCCESS status, and then overwrite it with the STARTED status. The view_build_status table will remain in incorrect state indicating the view building is not complete. Fixes scylladb/scylladb#20638 (cherry picked from commit `b1be2d3c41`)	2025-01-16 20:08:36 +00:00
Kamil Braun	cbef20e977	Merge '[Backport 6.2] Fix possible data corruption due to token keys clashing in read repair.' from Sergey This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101 Since the issue affects all the relevant scylla versions, backport to: 6.1, 6.2 - (cherry picked from commit `e577f1d141`) - (cherry picked from commit `39785c6f4e`) - (cherry picked from commit `155480595f`) Parent PR: #21996 Closes scylladb/scylladb#22298 * github.com:scylladb/scylladb: storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. test: Add test case for checking read repair diff calculation when having conflicting keys.	2025-01-16 17:13:12 +01:00
Dawid Mędrek	833ea91940	test/topology_custom: Add test for Scylla with disabled view building Before this commit, there doesn't seem to have been a test verifying that starting and shutting down Scylla behave correctly when the configuration option `view_building` is set to false. In these changes, we add one. (cherry picked from commit `d1f960eee2`)	2025-01-16 14:12:31 +01:00
Dawid Mędrek	1200d3b735	main, view: Pair view builder drain with its start In these changes, we pair draining the view builder with its start. To better understand what was done and why, let's first look at the situation before this commit and the context of it: (a) The following things happened in order: 1. The view builder would be constructed. 2. Right after that, a deferred lambda would be created to stop the view builder during shutdown. 3. group0_service would be started. 4. A deferred lambda stopping group0_service would be created right after that. 5. The view builder would be started. (b) Because the view builder depends on group0_client, it couldn't be started before starting group0_service. On the other hand, other services depend on the view builder, e.g. the stream manager. That makes changing the order of initialization a difficult problem, so we want to avoid doing that unless we're sure it's the right choice. (c) Since the view builder uses group0_client, there was a possibility of running into a segmentation fault issue in the following scenario: 1. A call to `view_builder::mark_view_build_success()` is issued. 2. We stop group0_service. 3. `view_builder::mark_view_build_success()` calls `announce_with_raft()`, which leads to a use-after-free because group0_service has already been destroyed. This very scenario took place in scylladb/scylladb#20772. Initially, we decided to solve the issue by initializing group0_service a bit earlier (scylladb/scylladb@7bad8378c7). Unfortunately, it led to other issues described in scylladb/scylladb#21534. We reverted that change in the previous commit. These changes are the second attempt to the problem where we want to solve it in a safer manner. The solution we came up with is to pair the start of the view builder with a deferred lambda that deinitializes it by calling `view_builder::drain()`. No other component of the system should be able to use the view builder anymore, so it's safe to do that. Furthermore, that pairing makes the analysis of initialization/deinitialization order much easier. We also solve the aformentioned use-after-free issue because the view builder itself will no longer attempt to use group0_client. Note that we still pair a deferred lambda calling `view_builder::stop()` with the construction of the view builder; that function will also call `view_builder::drain()`. Another notable thing is `view_builder::drain()` may be called earlier by `storage_service::do_drain()`. In other words, these changes cover the situation when Scylla runs into a problem when starting up. Fixes scylladb/scylladb#20772 (cherry picked from commit `06ce976370`)	2025-01-16 12:37:04 +01:00
Dawid Mędrek	84b774515b	Revert "main,cql_test_env: start group0_service before view_builder" The patch solved a problem related to an initialization order (scylladb/scylladb#20772), but we ran into another one: scylladb/scylladb#21534. After moving the initialization of group0_service, it ended up being destroyed AFTER the CDC generation service would. Since CDC generations are accessed in `storage_service::topology_state_load()`: ``` for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) { rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id); co_await _cdc_gens.local().handle_cdc_generation(gen_id); ``` we started getting the following failure: ``` Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed. ``` We're reverting the patch to go back to a more stable version of Scylla and in the following commit, we'll solve the original issue in a more systematic way. This reverts commit `7bad8378c7`. (cherry picked from commit `a5715086a4`)	2025-01-16 12:36:41 +01:00
Sergey Zolotukhin	06a8956174	test: Include parent test name in `ScyllaClusterManager` log file names. Add the test file name to `ScyllaClusterManager` log file names alongside the test function name. This avoids race conditions when tests with the same function names are executed simultaneously. Fixes scylladb/scylladb#21807 Backport: not needed since this is a fix in the testing scripts. Closes scylladb/scylladb#22192 (cherry picked from commit `2f1731c551`) Closes scylladb/scylladb#22249	2025-01-14 16:33:27 +01:00
Sergey Zolotukhin	f0f833e8ab	storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. The `data_read_resolver` class inherits from `abstract_read_resolver`, which already includes the `schema_ptr _schema` member. Therefore, using a separate function parameter in `data_read_resolver::resolve` initialized with the same variable in `abstract_read_executor` is redundant. (cherry picked from commit `155480595f`)	2025-01-14 14:43:52 +01:00
Sergey Zolotukhin	b04b6aad9e	storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101 (cherry picked from commit `39785c6f4e`)	2025-01-14 14:37:47 +01:00
Sergey Zolotukhin	12ee41869a	test: Add test case for checking read repair diff calculation when having conflicting keys. The test updates two rows with keys that result in a Murmur3 hash collision, which is used to generate Scylla tokens. These tokens are involved in read repair diff calculations. Due to the identical token values, a hash map key collision occurs. Consequently, an incorrect value from the second row (with a different primary key) is then sent for writing as 'repaired', causing data corruption. (cherry picked from commit `e577f1d141`)	2025-01-13 22:05:32 +00:00
Kamil Braun	ef93c3a8d7	Merge '[Backport 6.2] cache_algorithm_test: fix flaky failures' from Michał Chojnowski This series attempts to get read of flakiness in cache_algorithm_test by solving two problems. Problem 1: The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form: 0x0000000000000000000000000000000000000000... 0x0100000000000000000000000000000000000000... 0x0200000000000000000000000000000000000000... But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar... 0x0100000000000000garbagegarbagegarbagegar... 0x0200000000000000garbagegarbagegarbagegar... Each of these keys is created several times and -- for the test to pass -- the result must be the same each time. By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses. Problem 2: Cache stats are global, so there's no good way to reliably verify that e.g. a given read causes 0 cache misses, because something done by Scylla in a background can trigger a cache miss. This can cause the test to fail spuriously. With how the test framework and the cache are designed, there's probably no good way to test this properly. It would require ensuring that cache stats are per-read, or at least per-table, and that Scylla's background activity doesn't cause enough memory pressure to evict the tested rows. This patch tries to deal with the flakiness without deleting the test altogether by letting it retry after a failure if it notices that it can be explained by a read which wasn't done by the test. (Though, if the test can't be written well, maybe it just shouldn't be written...) Fixes scylladb/scylladb#21536 (cherry picked from commit `1fffd976a4`) (cherry picked from commit `6caaead4ac`) Parent PR: scylladb/scylladb#21948 Closes scylladb/scylladb#22228 * github.com:scylladb/scylladb: cache_algorithm_test: harden against stats being confused by background activity cache_algorithm_test: fix a use of an uninitialized variable	2025-01-09 14:30:31 +01:00
Aleksandra Martyniuk	a59c4653fe	repair: check tasks local to given shard Currently task_manager_module::is_aborted checks the tasks local to caller's shard on a given shard. Fix the method to check the task map local to the given shard. Fixes: #22156. Closes scylladb/scylladb#22161 (cherry picked from commit `a91e03710a`) Closes scylladb/scylladb#22197	2025-01-08 13:07:38 +02:00
Yaron Kaikov	2e87e317d9	.github/scripts/auto-backport.py: Add comment to PR when conflicts apply When we open a PR with conflicts, the PR owner gets a notification about the assignment but has no idea if this PR is with conflicts or not (in Scylla it's important since CI will not start on draft PR) Let's add a comment to notify the user we have conflicts Closes scylladb/scylladb#21939 (cherry picked from commit `2e6755ecca`) Closes scylladb/scylladb#22190	2025-01-08 13:07:08 +02:00
Botond Dénes	f73f7c17ec	Merge 'sstables_manager: do not reclaim unlinked sstables' from Lakshmi Narayanan Sreethar When an sstable is unlinked, it remains in the _active list of the sstable manager. Its memory might be reclaimed and later reloaded, causing issues since the sstable is already unlinked. This patch updates the on_unlink method to reclaim memory from the sstable upon unlinking, remove it from memory tracking, and thereby prevent the issues described above. Added a testcase to verify the fix. Fixes #21887 This is a bug fix in the bloom filter reload/reclaim mechanism and should be backported to older versions. Closes scylladb/scylladb#21895 * github.com:scylladb/scylladb: sstables_manager: reclaim memory from sstables on unlink sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable() sstables: introduce disable_component_memory_reload() sstables_manager: log sstable name when reclaiming components (cherry picked from commit `d4129ddaa6`) Closes scylladb/scylladb#21998	2025-01-08 13:06:24 +02:00
Michał Chojnowski	379f23d854	cache_algorithm_test: harden against stats being confused by background activity Cache stats are global, so there's no good way to reliably verify that e.g. a given read causes 0 cache misses, because something done by Scylla in a background can trigger a cache miss. This can cause the test to fail spuriously. With how the test framework and the cache are designed, there's probably no good way to test this properly. It would require ensuring that cache stats are per-read, or at least per-table, and that Scylla's background activity doesn't cause enough memory pressure to evict the tested rows. This patch tries to deal with the flakiness without deleting the test altogether by letting it retry after a failure if it notices that it can be explained by a read which wasn't done by the test. (Though, if the test can't be written well, maybe it just shouldn't be written...) (cherry picked from commit `6caaead4ac`)	2025-01-08 11:48:23 +01:00
Michał Chojnowski	10815d2599	cache_algorithm_test: fix a use of an uninitialized variable The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form: 0x0000000000000000000000000000000000000000... 0x0100000000000000000000000000000000000000... 0x0200000000000000000000000000000000000000... But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar... 0x0100000000000000garbagegarbagegarbagegar... 0x0200000000000000garbagegarbagegarbagegar... Each of these keys is created several times and -- for the test to pass -- the result must be the same each time. By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses. (cherry picked from commit `1fffd976a4`)	2025-01-08 11:48:17 +01:00
Patryk Jędrzejczak	a63a0eac1e	[Backport 6.2] raft: improve logs for abort while waiting for apply New logs allow us to easily distinguish two cases in which waiting for apply times out: - the node didn't receive the entry it was waiting for, - the node received the entry but didn't apply it in time. Distinguishing these cases simplifies reasoning about failures. The first case indicates that something went wrong on the leader. The second case indicates that something went wrong on the node on which waiting for apply timed out. As it turns out, many different bugs result in the `read_barrier` (which calls `wait_for_apply`) timeout. This change should help us in debugging bugs like these. We want to backport this change to all supported branches so that it helps us in all tests. Fixes scylladb/scylladb#22160 Closes scylladb/scylladb#22159	2025-01-07 17:01:22 +01:00
Kamil Braun	afd588d4c7	Merge '[Backport 6.2] Do not reset quarantine list in non raft mode' from Gleb The series contains small fixes to the gossiper one of which fixes #21930. Others I noticed while debugged the issue. Fixes: #21930 (cherry picked from commit `91cddcc17f`) Parent PR: #21956 Closes scylladb/scylladb#21991 * github.com:scylladb/scylladb: gossiper: do not reset _just_removed_endpoints in non raft mode gossiper: do not call apply for the node's old state	2025-01-03 16:28:08 +01:00
Abhinav	f5bce45399	Fix gossiper orphan node floating problem by adding a remover fiber In the current scenario, if during startup, a node crashes after initiating gossip and before joining group0, then it keeps floating in the gossiper forever because the raft based gossiper purging logic is only effective once node joins group0. This orphan node hinders the successor node from same ip to join cluster since it collides with it during gossiper shadow round. This commit intends to fix this issue by adding a background thread which periodically checks for such orphan entries in gossiper and removes them. A test is also added in to verify this logic. This test fails without this background thread enabled, hence verifying the behavior. Fixes: scylladb/scylladb#20082 Closes scylladb/scylladb#21600 (cherry picked from commit `6c90a25014`) Closes scylladb/scylladb#21822	2025-01-02 14:57:46 +01:00
Gleb Natapov	cda997fe59	gossiper: do not reset _just_removed_endpoints in non raft mode By the time the function is called during start it may already be populated. Fixes: scylladb/scylladb#21930 (cherry picked from commit `e318dfb83a`)	2024-12-25 12:01:16 +02:00
Gleb Natapov	155a0462d5	gossiper: do not call apply for the node's old state If a nodes changed its address an old state may be still in a gossiper, so ignore it. (cherry picked from commit `e80355d3a1`)	2024-12-23 11:47:12 +02:00
Piotr Dulikowski	76b1173546	Merge 'service/topology_coordinator: migrate view builder only if all nodes are up' from Michał Jadwiszczak The migration process is doing read with consistency level ALL, requiring all nodes to be alive. Fixes scylladb/scylladb#20754 The PR should be backported to 6.2, this version has view builder on group0. Closes scylladb/scylladb#21708 * github.com:scylladb/scylladb: test/topology_custom/test_view_build_status: add reproducer service/topology_coordinator: migrate view builder only if all nodes are up (cherry picked from commit `def51e252d`) Closes scylladb/scylladb#21850	2024-12-19 14:10:55 +01:00
Piotr Dulikowski	e5a37d63c0	Merge 'transport/server: revert using async function in `for_each_gently()`' from Michał Jadwiszczak This patch reverts `324b3c43c0` and adds synchronous versions of `service_level_controller::find_effective_service_level()` and `client_state::maybe_update_per_service_level_params()`. It isn't safe to do asynchronous calls in `for_each_gently`, as the connection may be disconnected while a call in callback preempts. Fixes scylladb/scylladb#21801 Closes scylladb/scylladb#21761 * github.com:scylladb/scylladb: Revert "generic_server: use async function in `for_each_gently()`" transport/server: use synchronous calls in `for_each_gently` callback service/client_state: add synchronous method to update service level params qos/service_level_controller: add `find_cached_effective_service_level` (cherry picked from commit `c601f7a359`) Closes scylladb/scylladb#21849	2024-12-19 14:10:31 +01:00
Tomasz Grabiec	0851e3fba7	Merge '[Backport 6.2] utils: cached_file: Mark permit as awaiting on page miss' from ScyllaDB Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522). But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or partition index IO involved because that IO will signal read concurrency semaphore to invite more concurrency. Fixes #21325 (cherry picked from commit `868f5b59c4`) (cherry picked from commit `0f2101b055`) Refs #21323 Closes scylladb/scylladb#21358 * github.com:scylladb/scylladb: utils: cached_file: Mark permit as awaiting on page miss utils: cached_file: Push resource_unit management down to cached_file	2024-12-16 19:55:00 +01:00
Michael Litvak	99f190f699	service/qos/service_level_controller: update cache on startup Update the service level cache in the node startup sequence, after the service level and auth service are initialized. The cache update depends on the service level data accessor being set and the auth service being initialized. Before the commit, it may happen that a cache update is not triggered after the initialization. The commit adds an explicit call to update the cache where it is guaranteed to be ready. Fixes scylladb/scylladb#21763 Closes scylladb/scylladb#21773 (cherry picked from commit `373855b493`) Closes scylladb/scylladb#21893	2024-12-16 14:19:06 +01:00
Michael Litvak	04e8506cbb	service/qos: increase timeout of internal get_service_levels queries The function get_service_levels is used to retrieve all service levels and it is called from multiple different contexts. Importantly, it is called internally from the context of group0 state reload, where it should be executed with a long timeout, similarly to other internal queries, because a failure of this function affects the entire group0 client, and a longer timeout can be tolerated. The function is also called in the context of the user command LIST SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is preferred. The commit introduces a function parameter to indicate whether the context is internal or not. For internal context, a long timeout is chosen for the query. Otherwise, the timeout is shorter, the same as before. When the distinction is not important, a default value is chosen which maintains the same behavior. The main purpose is to fix the case where the timeout is too short and causes a failure that propagates and fails the group0 client. Fixes scylladb/scylladb#20483 Closes scylladb/scylladb#21748 (cherry picked from commit `53224d90be`) Closes scylladb/scylladb#21890	2024-12-16 14:15:26 +01:00
Yaron Kaikov	8e606a239f	github: check if PR is closed instead of merge In Scylla, we can have either `closed` or `merged` PRs. Based on that we decide when to start the backport process when the label was added after the PR is closed (or merged), In https://github.com/scylladb/scylladb/pull/21876 even when adding the proper backport label didn't trigger the backport automation. Https://github.com/scylladb/scylladb/pull/21809/ caused this, we should have left the `state=closed` (this includes both closed and merged PR) Fixing it Closes scylladb/scylladb#21906 (cherry picked from commit `b4b7617554`) Closes scylladb/scylladb#21922	2024-12-16 14:07:32 +02:00
Jenkins Promoter	8cdff8f52f	Update ScyllaDB version to: 6.2.3	2024-12-15 15:55:40 +02:00
Kamil Braun	3ec741dbac	Merge 'topology_coordinator: introduce reload_count in topology state and use it to prevent race' from Gleb Natapov Topology request table may change between the code reading it and calling to cv::when() since reading is a preemption point. In this case cv:signal can be missed. Detect that there was no signal in between reading and waiting by introducing reload_count which is increased each time the state is reloaded and signaled. If the counter is different before and after reading the state may have change so re-check it again instead of sleeping. Closes scylladb/scylladb#21713 * github.com:scylladb/scylladb: topology_coordinator: introduce reload_count in topology state and use it to prevent race storage_service: use conditional_variable::when in co-routines consistently (cherry picked from commit `8f858325b6`) Closes scylladb/scylladb#21803	2024-12-12 15:45:31 +01:00
Anna Stuchlik	e3e7ac16e9	doc: remove wrong image upgrade info (5.2-to-2023.1) This commit removes the information about the recommended way of upgrading ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade procedure is not supported (it was implemented, but then reverted). Refs https://github.com/scylladb/scylladb/issues/15733 Closes scylladb/scylladb#21876 Fixes https://github.com/scylladb/scylla-enterprise/issues/5041 Fixes https://github.com/scylladb/scylladb/issues/21898 (cherry picked from commit `98860905d8`)	2024-12-12 15:22:26 +02:00
Tomasz Grabiec	81d6d88016	utils: cached_file: Mark permit as awaiting on page miss Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO. But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or index file IO involved because that IO will signal read concurrency semaphore to invite more concurrency. (cherry picked from commit `0f2101b055`)	2024-12-09 23:18:00 +01:00
Tomasz Grabiec	56f93dd434	utils: cached_file: Push resource_unit management down to cached_file It saves us permit operations on the hot path when we hit in cache. Also, it will lay the ground for marking the permit as awaiting later. (cherry picked from commit `868f5b59c4`)	2024-12-09 23:17:56 +01:00
Kefu Chai	28a32f9c50	github: do not nest ${{}} inside condition In commit `2596d157`, we added a condition to run auto-backport.py only when the GitHub Action is triggered by a push to the default branch. However, this introduced an unexpected error due to incorrect condition handling. Problem: - `github.event.before` evaluates to an empty string - GitHub Actions' single-pass expression evaluation system causes the step to always execute, regardless of `github.event_name` Despite GitHub's documentation suggesting that ${{ }} can be omitted, it recommends using explicit ${{}} expressions for compound conditions. Changes: - Use explicit ${{}} expression for compound conditions - Avoid string interpolation in conditional statements Root Cause: The previous implementation failed because of how GitHub Actions evaluates conditional expressions, leading to an unintended script execution and a 404 error when attempting to compare commits. Example Error: ``` python .github/scripts/auto-backport.py --repo scylladb/scylladb --base-branch refs/heads/master --commits ..2b07d93beac7bc83d955dadc20ccc307f13f20b6 shell: /usr/bin/bash -e {0} env: DEFAULT_BRANCH: master GITHUB_TOKEN: *** Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 201, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 162, in main commits = repo.compare(start_commit, end_commit).commits File "/usr/lib/python3/dist-packages/github/Repository.py", line 888, in compare headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits", "status": "404"} ``` Fixes scylladb/scylladb#21808 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21809 (cherry picked from commit `e04aca7efe`) Closes scylladb/scylladb#21820	2024-12-06 16:34:59 +02:00
Avi Kivity	20b2d5b7c9	Merge 'compaction: update maintenance sstable set on scrub compaction completion' from Lakshmi Narayanan Sreethar Scrub compaction can pick up input sstables from maintenance sstable set but on compaction completion, it doesn't update the maintenance set leaving the original sstable in set after it has been scrubbed. To fix this, on compaction completion has to update the maintenance sstable if the input originated from there. This PR solves the issue by updating the correct sstable_sets on compaction completion. Fixes #20030 This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2. Closes scylladb/scylladb#21582 * github.com:scylladb/scylladb: compaction: remove unused `update_sstable_lists_on_off_strategy_completion` compaction_group: replace `update_sstable_lists_on_off_strategy_completion` compaction_group: rename `update_main_sstable_list_on_compaction_completion` compaction_group: update maintenance sstable set on scrub compaction completion compaction_group: store table::sstable_list_builder::result in replacement_desc table::sstable_list_builder: remove old sstables only from current list table::sstable_list_builder: return removed sstables from build_new_list (cherry picked from commit `58baeac0ad`) Closes scylladb/scylladb#21790	2024-12-06 10:36:46 +02:00
Michael Pedersen	f37deb7e98	docs: correct the storage size for n2-highmem-32 to 9000GB updated storage size for n2-highmem-32 to 9000GB as this is default in SC Fixes scylladb/scylladb#21785 Closes scylladb/scylladb#21537 (cherry picked from commit `309f1606ae`) Closes scylladb/scylladb#21595	2024-12-05 09:51:11 +02:00
Tomasz Grabiec	933ec7c6ab	utils: UUID: Make get_time_UUID() respect the clock offset schema_change_test currently fails due to failure to start a cql test env in unit tests after the point where this is called (in one of the test cases): forward_jump_clocks(std::chrono::seconds(606024*31)); The problem manifests with a failure to join the cluster due to missing_column exception ("missing_column: done") being thrown from system_keyspace::get_topology_request_state(). It's a symptom of join request being missing in system.topology_requests. It's missing because the row is expired. When request is created, we insert the mutations with intended TTL of 1 month. The actual TTL value is computed like this: ttl_opt topology_request_tracking_mutation_builder::ttl() const { return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1) - std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch()); } _ts comes from the request_id, which is supposed to be a timeuuid set from current time when request starts. It's set using utils::UUID_gen::get_time_UUID(). It reads the system clock without adding the clock offset, so after forward_jump_clocks(), _ts and gc_clock::now() may be far off. In some cases the accumulated offset is larger than 1month and the ttl becomes negative, causing the request row to expire immediately and failing the boot sequence. The fix is to use db_clock, which respects offsets and is consistent with gc_clock. The test doesn't fail in CI becuase there each test case runs in a separate process, so there is no bootstrap attempt (by new cql test env) after forward_jump_clocks(). Closes scylladb/scylladb#21558 (cherry picked from commit `1d0c6aa26f`) Closes scylladb/scylladb#21584 Fixes #21581	2024-12-04 14:18:16 +01:00
Kefu Chai	2b5cd10b66	docs: explain task status retention and one-time query behavior Task status information from nodetool commands is not retained permanently: - Status of completed tasks is only kept for `task_ttl_in_seconds` - Status is removed after being queried, making it a one-time operation This behavior is important for users to understand since subsequent queries for the same completed task will not return any information. Add documentation to make this clear to users. Fixes scylladb/scylladb#21757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21386 (cherry picked from commit `afeff0a792`) Closes scylladb/scylladb#21759	2024-12-04 13:49:24 +02:00
Kefu Chai	bf47de9f7f	test: topology_custom: ensure node visibility before keyspace creation Building upon commit `69b47694`, this change addresses a subtle synchronization weakness in node visibility checks during recovery mode testing. Previous Approach: - Waited only for the first node to see its peers - Insufficient to guarantee full cluster consistency Current Solution: 1. Implement comprehensive node visibility verification 2. Ensure all nodes mutually recognize each other 3. Prevent potential schema propagation race conditions Key Improvements: - Robust cluster state validation before keyspace creation - Eliminate partial visibility scenarios Fixes scylladb/scylladb#21724 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21726 (cherry picked from commit `65949ce607`) Closes scylladb/scylladb#21734	2024-12-04 13:46:26 +02:00
Nadav Har'El	dc71a6a75e	Merge 'test/boost/view_schema_test.cc: Wait for views to build in test_view_update_generating_writetime' from Dawid Mędrek Before these changes, we didn't wait for the materialized views to finish building before writing to the base table. That led to generating an additional view update, which, in turn, led to test failures. The scenario corresponding to the summary above looked like this: 1. The test creates an empty table and MVs on it. 2. The view builder starts, but it doesn't finish immediately. 3. The test performs mutations to the base table. Since the views already exist, view updates are generated. 4. Finally, the view builder finishes. It notices that the base table has a row, so it generates a view update for it because it doesn't notice that we already have data in the view. We solve it by explicitly waiting for both views to finish building and only then start writing to the base table. Additionally, we also fix a lifetime issue of the row the test revolves around, further stabilizing CI. Fixes https://github.com/scylladb/scylladb/issues/20889 Backport: These changes have no semantic effect on the codebase, but they stabilize CI, so we want to backport them to the maintained versions of Scylla. Closes scylladb/scylladb#21632 * github.com:scylladb/scylladb: test/boost/view_schema_test.cc: Increase TTL in test_view_update_generating_writetime test/boost/view_schema_test.cc: Wait for views to build in test_view_update_generating_writetime (cherry picked from commit `733a4f94c7`) Closes scylladb/scylladb#21640	2024-12-04 13:44:33 +02:00
Aleksandra Martyniuk	f13f821b31	repair: implement tablet_repair_task_impl::release_resources tablet_repair_task_impl keeps a vector of tablet_repair_task_meta, each of which keeps an effective_replication_map_ptr. So, after the task completes, the token metadata version will not change for task_ttl seconds. Implement tablet_repair_task_impl::release_resources method that clears tablet_repair_task_meta vector when the task finishes. Set task_ttl to 1h in test_tablet_repair to check whether the test won't time out. Fixes: #21503. Closes scylladb/scylladb#21504 (cherry picked from commit `572b005774`) Closes scylladb/scylladb#21622	2024-12-04 13:43:40 +02:00
André LFA	74ad6f2fa3	Update report-scylla-problem.rst removing references to old Health Check Report Closes scylladb/scylladb#21467 (cherry picked from commit `703e6f3b1f`) Closes scylladb/scylladb#21591	2024-12-04 13:41:00 +02:00
Abhinav	fc42571591	test: Parametrize 'replacement with inter-dc encryption' test to confirm behavior in zero token node cases. In the current scenario, 'test_replace_with_encryption' only confirms the replacement with inter-dc encryption for normal nodes. This commit increases the coverage of test by parametrizing the test to confirm behavior for zero token node replacement as well. This test also implicitly provides coverage for bootstrap with encryption of zero token nodes. This PR increases coverage for existing code. Hence we need to backport it. Since only 6.2 version has zero token node support, hence we only backport it to 6.2 Fixes: scylladb/scylladb#21096 Closes scylladb/scylladb#21609 (cherry picked from commit `acd643bd75`) Closes scylladb/scylladb#21764	2024-12-04 11:22:39 +01:00
Botond Dénes	c6ef055e9c	Merge 'repair: fix task_manager_module::abort_all_repairs' from Aleksandra Martyniuk Currently, task_manager_module::abort_all_repairs marks top-level repairs as aborted (but does not abort them) and aborts all existing shard tasks. A running repair checks whether its id isn't contained in _aborted_pending_repairs and then proceeds to create shard tasks. If abort_all_repairs is executed after _aborted_pending_repairs is checked but before shard tasks are created, then those new tasks won't be aborted. The issue is the most severe for tablet_repair_task_impl that checks the _aborted_pending_repairs content from different shards, that do not see the top-level task. Hence the repair isn't stopped but it creates shard repair tasks on all shards but the one that initialized repair. Abort top-level tasks in abort_all_repairs. Fix the shard on which the task abort is checked. Fixes: #21612. Needs backport to 6.1 and 6.2 as they contain the bug. Closes scylladb/scylladb#21616 * github.com:scylladb/scylladb: test: add test to check if repair is properly aborted repair: add shard param to task_manager_module::is_aborted repair: use task abort source to abort repair repair: drop _aborted_pending_repairs and utilize tasks abort mechanism repair: fix task_manager_module::abort_all_repairs (cherry picked from commit `5ccbd500e0`) Closes scylladb/scylladb#21642	2024-11-21 06:33:31 +02:00
Nadav Har'El	6ba0253dd3	alternator: fix "/localnodes" to not return down nodes Alternator's "/localnodes" HTTP requests is supposed to return the list of nodes in the local DC to which the user can send requests. Before commit `bac7c33313` we used the gossiper is_alive() method to determine if a node should be returned. That commit changed the check to is_normal() - because a node can be alive but in non-normal (e.g., joining) state and not ready for requests. However, it turns out that checking is_normal() is not enough, because if node is stopped abruptly, other nodes will still consider it "normal", but down (this is so-called "DN" state). So we need to check both is_alive() and is_normal(). This patch also adds a test reproducing this case, where a node is shut down abruptly. Before this patch, the test failed ("/localnodes" continued to return the dead node), and after it it passes. Fixes #21538 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21540 (cherry picked from commit `7607f5e33e`) Closes scylladb/scylladb#21634	2024-11-20 09:22:59 +02:00
Anna Stuchlik	d9eb502841	doc: add the 6.0-to-2024.2 upgrade guide-from-6 This commit adds an upgrade guide from ScyllDB 6.0 to ScyllaDB Enterprise 2024.2. Fixes https://github.com/scylladb/scylladb/issues/20063 Fixes https://github.com/scylladb/scylladb/issues/20062 Refs https://github.com/scylladb/scylla-enterprise/issues/4544 (cherry picked from commit `3d4b7e41ef`) Closes scylladb/scylladb#21620	2024-11-18 17:28:44 +02:00
Emil Maskovsky	0c7c6f85e0	test/topology_custom: fix the flaky test_raft_recovery_stuck The test is only sending a subset of the running servers for the rolling restart. The rolling restart is checking the visibility of the restarted node agains the other nodes, but if that set is incomplete some of the running servers might not have seen the restarted node yet. Improved the manager client rolling restart method to consider all the running nodes for checking the restarted node visibility. Fixes: scylladb/scylladb#19959 Closes scylladb/scylladb#21477 (cherry picked from commit `92db2eca0b`) Closes scylladb/scylladb#21556	2024-11-15 10:37:20 +02:00
Kefu Chai	2480decbc7	doc: import the new pub keys used to sign the package before this change, when user follows the instruction, they'd get ```console $ sudo apt-get update Hit:1 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble InRelease Hit:2 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble-updates InRelease Hit:3 http://us-east-1.ec2.archive.ubuntu.com/ubuntu noble-backports InRelease Hit:4 http://security.ubuntu.com/ubuntu noble-security InRelease Get:5 https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease [7550 B] Err:5 https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A43E06657BAC99E3 Reading package lists... Done W: GPG error: https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease: The following signatures couldn't be verified because the public key is not av ailable: NO_PUBKEY A43E06657BAC99E3 E: The repository 'https://downloads.scylladb.com/downloads/scylla/deb/debian-ubuntu/scylladb-6.2 stable InRelease' is not signed. N: Updating from such a repository can't be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details. ``` because the packages were signed with a different keyring. in this change, we import the new pubkey, so that the pacakge manager can verify the new packages (2024.2+ and 6.2+) signed with the new key. see also https://github.com/scylladb/scylla-ansible-roles/issues/399 and https://forum.scylladb.com/t/release-scylla-manager-3-3-1/2516 for the annonucement on using the new key. Fixes scylladb/scylladb#21557 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21524 (cherry picked from commit `1cedc45c35`) Closes scylladb/scylladb#21588	2024-11-15 10:36:44 +02:00
Botond Dénes	687a18db38	Merge 'scylla_raid_setup: fix failure on SELinux package installation' from Takuya ASADA After merged `5a470b2bfb`, we found that scylla_raid_setup fails on offline mode installation. This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect internet. Seems like it occur because of missing "policycoreutils-python-utils" package, which is the package for "semange" command. So we need to implement the relabeling patch without using the command. Fixes https://github.com/scylladb/scylladb/issues/21441 Also, since Amazon Linux 2 has different package name for semange, we need to adjust package name. Fixes https://github.com/scylladb/scylladb/issues/21351 Closes scylladb/scylladb#21474 * github.com:scylladb/scylladb: scylla_raid_setup: support installing semanage on Amazon Linux 2 scylla_raid_setup: fix failure on SELinux package installation (cherry picked from commit `1c212df62d`) Closes scylladb/scylladb#21547	2024-11-14 15:51:06 +02:00
Botond Dénes	548170fb68	Merge '[Backport 6.2] compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors' from ScyllaDB stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them by continue with shutdown. stop_ongoing_compactions, in particular, currently returns the status of stopped compaction tasks from `stop_tasks`, but still all tasks must be stopped after it, even if they failed, so assert that and ignore the errors. Fixes scylladb/scylladb#21159 * Needs backport to 6.2 and 6.1, as commit `8cc99973eb` causes handles storage that might cause compaction tasks to fail and eventually terminate on shudown when the exceptions are thrown in noexcept context in the deferred stop destructor body (cherry picked from commit `e942c074f2`) (cherry picked from commit `d8500472b3`) (cherry picked from commit `c08ba8af68`) (cherry picked from commit `a7a55298ea`) (cherry picked from commit `6cce67bec8`) Refs #21299 Closes scylladb/scylladb#21434 * github.com:scylladb/scylladb: compaction_manager: stop: await _stop_future if engaged compaction_manager: really_do_stop: assert that no tasks are left behind compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors compaction/compaction_manager: stop_tasks(): unlink stopped tasks compaction/compaction_manager: make _tasks an intrusive list	2024-11-14 06:59:52 +02:00
Jenkins Promoter	75b79a30da	Update ScyllaDB version to: 6.2.2	2024-11-13 23:22:52 +02:00
Benny Halevy	bdf31d7f54	compaction_manager: stop: await _stop_future if engaged The current condition that consults the compaction manager state for awaiting `_stop_future` works since _stop_future is assigned after the state is set to `stopped`, but it is incidental. What matters is that `_stop_future` is engaged. While at it, exchange _stop_future with a ready future so that stop() can be safely called multiple times. And dropped the superfluous co_return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `6cce67bec8`)	2024-11-13 10:00:47 +02:00
Benny Halevy	3d915cd091	compaction_manager: really_do_stop: assert that no tasks are left behind stop_ongoing_compactions now ignores any errors returned by tasks, and it should leave no task left behind. Assert that here, before the compaction_manager is destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `a7a55298ea`)	2024-11-13 09:59:57 +02:00
Benny Halevy	abb26ff913	compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them but continue with shutdown. Leaked errors on the stop path may cause termination on shutdown, when called in a deferred action destructor. Fixes scylladb/scylladb#21298 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `c08ba8af68`)	2024-11-13 09:56:42 +02:00
Botond Dénes	3f821b7f4f	compaction/compaction_manager: stop_tasks(): unlink stopped tasks Stopped tasks currently linger in _tasks until the fiber that created the task is scheduled again and unlinks the task. This window between stop and remove prevents reliable checks for empty _tasks list after all tasks are stopped. Unlink the task early so really_do_stop() can safely check for an empty _tasks list (next patch). (cherry picked from commit `d8500472b3`)	2024-11-13 09:56:21 +02:00
Botond Dénes	cab3b86240	compaction/compaction_manager: make _tasks an intrusive list _tasks is currently std::list<shared_ptr<compaction_task_executor>>, but it has no role in keeping the instances alive, this is done by the fibers which create the task (and pin a shared ptr instance). This lends itself to an intrusive list, avoiding that extra allocation upon push_back(). Using an intrusive list also makes it simpler and much cheaper (O(1) vs. O(N)) to remove tasks from the _tasks list. This will be made use of in the next patch. Code using _task has to be updated because the value_type changes from shared_ptr<compaction_task_executor> to compaction_task_executor&. (cherry picked from commit `e942c074f2`)	2024-11-13 09:48:00 +02:00
Piotr Dulikowski	2fa4f3a9fc	Merge 'main,cql_test_env: start group0_service before view_builder' from Michał Jadwiszczak In scylladb/scylladb#19745, view_builder was migrated to group0 and since then it is dependant on group0_service. Because of this, group0_service should be initialized/destroyed before/after view_builder. This patch also adds error injection to `raft_server_with_timeouts::read_barrier`, which does 1s sleep before doing the read barrier. There is a new test which reproduces the use after free bug using the error injection. Fixes scylladb/scylladb#20772 scylladb/scylladb#19745 is present in 6.2, so this fix should be backported to it. Closes scylladb/scylladb#21471 * github.com:scylladb/scylladb: test/boost/secondary_index_test: add test for use after free api/raft: use `get_server_with_timeouts().read_barrier()` in coroutines main,cql_test_env: start group0_service before view_builder (cherry picked from commit `7021efd6b0`) Closes scylladb/scylladb#21506	2024-11-12 14:36:06 +01:00
Yaron Kaikov	a3e69cc8fb	./github/workflows/add-label-when-promoted.yaml: Run auto-backport only on default branch In https://github.com/scylladb/scylladb/pull/21496#event-15221789614 ``` scylladbbot force-pushed the backport/21459/to-6.1 branch from 414691c to `59a4ccd` Compare 2 days ago ``` Backport automation triggered by `push` but also should either start from `master` branch (or `enterprise` branch from Enterprise), we need to verify it by checking also the default branch. Fixes: https://github.com/scylladb/scylladb/issues/21514 Closes scylladb/scylladb#21515 (cherry picked from commit `2596d1577b`) Closes scylladb/scylladb#21531	2024-11-11 17:43:54 +02:00
Michał Chojnowski	876017efee	mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity For performance reasons, mutation_partition_v2::maybe_drop(), and by extension also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&) can evict empty row entries, and hence change the continuity of the merged entry. For checking that apply_to_incomplete respects continuity, test_apply_to_incomplete_respects_continuity obtains the continuity of the partition entry before and after apply_to_incomplete by calling e.squashed().get_continuity(). But squashed() uses apply_monotonically(), so in some circumstances the result of squashed() can have smaller continuity than the argument of squashed(), which messes with the thing that the test is trying to check, and causes spurious failures. This patch changes the method of calculating the continuity set, so that it matches the entry exactly, fixing the test failures. Fixes scylladb/scylladb#13757 Closes scylladb/scylladb#21459 (cherry picked from commit `35921eb67e`) Closes scylladb/scylladb#21497	2024-11-08 15:32:24 +01:00
Yaron Kaikov	9eed1d1cbd	.github/scripts/auto-backport.py: update method to get closed prs `commit.get_pulls()` in PyGithub returns pull requests that are directly associated with the given commit Since in closed PR. the relevant commit is an event type, the backport automation didn't get the PR info for backporting Ref: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21468 (cherry picked from commit `ef104b7b96`) Closes scylladb/scylladb#21483	2024-11-08 10:26:10 +02:00
Yaron Kaikov	d33538bdd4	.github/script/auto-backport.py: push backport PR to `scylladbbot` fork Since Scylla is a public repo, when we create a fork, it doesn't fork the team and permissions (unlike private repos where it does). When we have a backport PR with conflicts, the developers need to be able to update the branch to fix the conflicts. To do so, we modified the logic of the backport automation as follows: - Every backport PR (with and without conflicts) will be open directly on the `scylladbbot` fork repo - When there are conflicts, an email will be sent to the original PR author with an invitation to become a contributor in the `scylladbbot` fork with `push` permissions. This will happen only once if Auther is not a contributor. - Together with sending the invite, all backport labels will be removed and a comment will be added to the original PR with instructions - The PR author must add the backport labels after the invitation is accepted Fixes: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21401 (cherry picked from commit `77604b4ac7`) Closes scylladb/scylladb#21466	2024-11-07 12:38:56 +02:00
Yaron Kaikov	073c9cbaa1	github: add script for backports automation instead of Mergify Adding an auto-backport.py script to handle backport automation instead of Mergify. The rules of backport are as follows: * Merged or Closed PRs with any backport/x.y label (one or more) and promoted-to-master label * Backport PR will be automatically assigned to the original PR author * In case of conflicts the backport PR will be open in the original autoor fork in draft mode. This will give the PR owner the option to resolve conflicts and push those changes to the PR branch (Today in Scylla when we have conflicts, the developers are forced to open another PR and manually close the backport PR opened by Mergify) * Fixing cherry-pick the wrong commit SHA. With the new script, we always take the SHA from the stable branch * Support backport for enterprise releases (from Enterprise branch) Fixes: https://github.com/scylladb/scylladb/issues/18973 (cherry picked from commit `f9e171c7af`) Closes scylladb/scylladb#21469	2024-11-07 06:57:05 +02:00
Tomasz Grabiec	a3a0ffbcd0	Merge 'tablet: Fix single-sstable split when attaching new unsplit sstables' from Raphael "Raph" Carvalho To fix a race between split and repair here `c1de4859d8`, a new sstable generated during streaming can be split before being attached to the sstable set. That's to prevent an unsplit sstable from reaching the set after the tablet map is resized. So we can think this split is an extension of the sstable writer. A failure during split means the new sstable won't be added. Also, the duration of split is also adding to the time erm is held. For example, repair writer will only release its erm once the split sstable is added into the set. This single-sstable split is going through run_custom_job(), which serializes with other maintenance tasks. That was a terrible decision, since the split may have to wait for ongoing maintenance task to finish, which means holding erm for longer. Additionally, if split monitor decides to run split on the entire compaction group, it can cause single-sstable split to be aborted since the former wants to select all sstables, propagating a failure to the streaming writer. That results in new sstable being leaked and may cause problems on restart, since the underlying tablet may have moved elsewhere or multiple splits may have happened. We have some fragility today in cleaning up leaked sstables on streaming failure, but this single-sstable split made it worse since the failure can happen during normal operation, when there's e.g. no I/O error. It makes sense to kill run_custom_job() usage, since the single-sstable split is offline and an extension of sstable writing, therefore it makes no sense to serialize with maintenance tasks. It must also inherit the sched group of the process writing the new sstable. The inheritance happens today, but is fragile. Fixes #20626. Closes scylladb/scylladb#20737 * github.com:scylladb/scylladb: tablet: Fix single-sstable split when attaching new unsplit sstables replica: Fix tablet split execute after restart (cherry picked from commit `bca8258150`) Ref scylladb/scylladb#21415	2024-11-06 15:01:35 +02:00
Botond Dénes	8bf76d6be7	Merge '[Backport 6.2] replica: Fix tombstone GC during tablet split preparation' from Raphael Raph Carvalho During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: sstable B is split first, and moved from main (unsplit) group to a split-ready group now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes https://github.com/scylladb/scylladb/issues/20044. Please replace this line with justification for the backport/* labels added to this PR Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed. (cherry picked from commit `bcd358595f`) (cherry picked from commit `93815e0649`) Refs https://github.com/scylladb/scylladb/pull/20939 Closes scylladb/scylladb#21206 * github.com:scylladb/scylladb: replica: Fix tombstone GC during tablet split preparation service: Improve error handling for split	2024-11-06 09:55:47 +02:00
Raphael S. Carvalho	1e51ed88c6	replica: Fix tombstone GC during tablet split preparation During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: 1) sstable B is split first, and moved from main (unsplit) group to a split-ready group 2) now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes #20044. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `93815e0649`)	2024-11-04 14:24:18 -03:00
Raphael S. Carvalho	ca5f938ed4	service: Improve error handling for split Retry wasn't really happening since the loop was broken and sleep part was skipped on error. Also, we were treating abort of split during shutdown as if it were an actual error and that confused longevity tests that parse for logs with error level. The fix is about demoting the level of logs when we know the exception comes from shutdown. Fixes #20890. (cherry picked from commit `bcd358595f`)	2024-11-04 14:22:08 -03:00
Botond Dénes	fb20ea7de1	Merge '[Backport 6.2] tasks: fix virtual tasks children' from ScyllaDB Fix how regular tasks that have a virtual parent are created in task_manager::module::make_task: set sequence number of a task and subscribe to module's abort source. Fixes: #21278. Needs backport to 6.2 (cherry picked from commit `1eb47b0bbf`) (cherry picked from commit `910a6fc032`) Refs #21280 Closes scylladb/scylladb#21332 * github.com:scylladb/scylladb: tasks: fix sequence number assignment tasks: fix abort source subscription of virtual task's child	2024-11-04 18:18:35 +02:00
Tzach Livyatan	d5eb12c25d	Update os-support-info.rst - add CentOS ScyllaDB support RHEL 9 and derivatives, including CentOS 9. Fix https://github.com/scylladb/scylladb/issues/21309 (cherry picked from commit `1878af9399`) Closes scylladb/scylladb#21331	2024-11-04 18:17:46 +02:00
Aleksandra Martyniuk	291f568585	test: repair: drop log checks from test_repair_succeeds_with_unitialized_bm Currently, test_repair_succeeds_with_unitialized_bm checks whether repair finishes successfully and the error is properly handled if batchlog_manager isn't initialized. Error handling depends on logs, making the test fragile to external conditions and flaky. Drop the error handling check, successful repair is a sufficient passing condition. Fixes: #21167. (cherry picked from commit `85d9565158`) Closes scylladb/scylladb#21330	2024-11-04 18:16:55 +02:00
Botond Dénes	d5475fbc07	Merge '[Backport 6.2] repair: Fix finished ranges metrics for removenode' from ScyllaDB The skipped ranges should be multiplied by the number of tables Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 (cherry picked from commit `cffe3dc49f`) (cherry picked from commit `1392a6068d`) (cherry picked from commit `9868ccbac0`) Refs #21252 Closes scylladb/scylladb#21313 * github.com:scylladb/scylladb: test: Add test_node_ops_metrics.py repair: Make the ranges more consistent in the log repair: Fix finished ranges metrics for removenode	2024-11-04 18:16:21 +02:00
Anna Stuchlik	6916dbe822	doc: remove the Cassandra references from notedool This PR removes the reference to Cassandra from the nodetool index, as the native nodetool is no longer a fork. In addition, it removes the Apache copyright. Fixes https://github.com/scylladb/scylladb/issues/21238 (cherry picked from commit `ef4bcf8b3f`) Closes scylladb/scylladb#21307	2024-11-04 18:15:36 +02:00
Michał Jadwiszczak	f51a8ed541	test/auth_cluster/test_raft_service_levels: match enterprise SL limit Despite OSS doesn't limit number of created service levels, match the enterprise limit to decrease divergence in the test between OSS and enterprise. Fixes scylladb/scylladb#21044 (cherry picked from commit `846d94134f`) Closes scylladb/scylladb#21282	2024-11-04 18:14:38 +02:00
Calle Wilund	127606f788	cql_test_env/gossip: Prevent double shutdown call crash Fixes #21159 When an exception is thrown in sstable write etc such that storage_manager::isolate is initiated, we start a shutdown chain for message service, gossip etc. These are synced (properly) in storage_manager::stop, but if we somehow call gossiper::shutdown outside the normal service::stop cycle, we can end up running the method simultaneously, intertwined (missing the guard because of the state change between check and set). We then end up co_awaiting an invalid future (_failure_detector_loop_done) - a second wait. Fixed by a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added in `20496ed`, ages ago. However, it should not be needed nowadays. b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure. (cherry picked from commit `c28a5173d9`) Closes scylladb/scylladb#21393	2024-11-04 16:52:42 +01:00
Benny Halevy	56a0fa922d	storage_service: on_change: update_peer_info only if peer info changed Return an optional peer_info from get_peer_info_for_update when the `app_state_map` arg does not change peer_info, so that we can skip calling update_peer_info, if it didn't change. Fixes scylladb/scylladb#20991 Refs scylladb/scylladb#16376 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21152 (cherry picked from commit `04d741bcbb`)	2024-11-04 11:20:32 +02:00
Benny Halevy	c841a4a851	compaction_manager: compaction_disabled: return true if not in compaction_state When a compaction_group is removed via `compaction_manager::remove`, it is erase from `_compaction_state`, and therefore compaction is definitely not enabled on it. This triggers an internal error if tablets are cleaned up during drop/truncate, which checks that compaction is disabled in all compaction groups. Note that the callers of `compaction_disabled` aren't really interested in compaction being actively disabled on the compaction_group, but rather if it's enabled or not. A follow-up patch can be consider to reverse the logic and expose `compaction_enabled` rather than `compaction_disabled`. Fixes scylladb/scylladb#20060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `1c55747637`) Closes scylladb/scylladb#21404	2024-11-03 16:05:05 +02:00
Gleb Natapov	1a9721e93e	topology coordinator: take a copy of a replication state in raft_topology_cmd_handler Current code takes a reference and holds it past preemption points. And while the state itself is not suppose to change the reference may become stale because the state is re-created on each raft topology command. Fix it by taking a copy instead. This is a slow path anyway. Fixes: scylladb/scylladb#21220 (cherry picked from commit `fb38bfa35d`) Closes scylladb/scylladb#21361	2024-10-30 14:11:17 +01:00
Kamil Braun	1dded7e52f	Merge '[Backport 6.2] fix nodetool status to show zero-token nodes' from ScyllaDB In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. A test is also added in nodetool/test_status.py to verify this logic. This test fails without this commit’s zero token node support logic, hence verifying the behavior. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions don't support zero token nodes. Fixes: scylladb/scylladb#19849 Fixes: scylladb/scylladb#17857 (cherry picked from commit `72f3c95a63`) (cherry picked from commit `39dfd2d7ac`) (cherry picked from commit `c00d40b239`) Refs scylladb/scylladb#20909 Closes scylladb/scylladb#21334 * github.com:scylladb/scylladb: fix nodetool status to show zero-token nodes test: move `wait_for_first_completed` to pylib/util.py token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes	2024-10-29 10:50:35 +01:00
Abhinav	9082d66d8a	fix nodetool status to show zero-token nodes In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. Robust topology tests are added, which spins up scylla nodes and confirm nodetool status output for various cases, providing good coverage. A test is also added in nodetool/test_status.py to verify this logic. These tests fail without this commit’s zero token node support logic, hence verifying the behavior. The test `test_status_keyspace_joining_node` has been removed. This test is based on case where host_id=None, which is impossible. Since we now use host_id_map for node discovery in nodetool, the nodes with "host_id=None" go undetected. Since this case is anyway impossible, we can get rid of this. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions dont support zero token nodes. Fixes: scylladb/scylladb#19849 (cherry picked from commit `c00d40b239`)	2024-10-28 21:33:55 +00:00
Abhinav	c7a0876a73	test: move `wait_for_first_completed` to pylib/util.py This function is needed in a new test added in the next commit and this refactoring avoids code duplication. (cherry picked from commit `39dfd2d7ac`)	2024-10-28 21:33:55 +00:00
Abhinav	917d40e600	token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes Rename host_id map getter, 'get_endpoint_to_host_id_map_for_reading' to 'get_endpoint_to_host_id_map_' Also modify the getter to return information regarding joining nodes as well. This getter will later be used for retrieving the nodes in nodetool status, hence it needs to show all nodes, including joining ones. The function name suffix `_for_reading` suggests that the function was used in some other places in the past, and indeed if we need endpoints "for reading" then we cannot show joining endpoints. But it was confirmed that this function is currently only used by "/storage_service/host_id" endpoint, hence it can be modified as required. Fixes: scylladb/scylladb#17857 (cherry picked from commit `72f3c95a63`)	2024-10-28 21:33:54 +00:00
Aleksandra Martyniuk	1fd60424d9	tasks: fix sequence number assignment Currently, children of virtual tasks do not have sequence number assigned. Fix it. (cherry picked from commit `910a6fc032`)	2024-10-28 21:32:49 +00:00
Aleksandra Martyniuk	af6ddebc7f	tasks: fix abort source subscription of virtual task's child Currently, if a regular task does not have a parent or its parent is a virtual tasks then it subscribes to module's abort source in task_manager::task::impl constructor. However, at this point the kind of the task's parent isn't set. Due to that, children of virtual tasks aren't aborted on shutdown. Subscribe to module's abort source in task::impl::set_virtual_parent. (cherry picked from commit `1eb47b0bbf`)	2024-10-28 21:32:49 +00:00
Tomasz Grabiec	fa71b82da4	node-exporter: Disable hwmon collector This collector reads nvme temperature sensor, which was observed to cause bad performance on Azure cloud following the reading of the sensor for ~6 seconds. During the event, we can see elevated system time (up to 30%) and softirq time. CPU utilization is high, with nvm_queue_rq taking several orders of magnitude more time than normally. There are signs of contention, we can see __pv_queued_spin_lock_slowpath in the perf profile, called. This manifests as latency spikes and potentially also throughput drop due to reduced CPU capacity. By default, the monitoring stack queries it once every 60s. (cherry picked from commit `93777fa907`) Closes scylladb/scylladb#21304	2024-10-28 15:05:06 +01:00
Asias He	1a5a6a0758	test: Add test_node_ops_metrics.py It tests the node_ops_metrics_done metric reaches 100% when a node ops is done. Refs: #21174 (cherry picked from commit `9868ccbac0`)	2024-10-28 09:54:30 +00:00
Asias He	6ae5481de4	repair: Make the ranges more consistent in the log Consider the number of tables for the number of ranges logging. Make it more consistent with the log when the ops starts. (cherry picked from commit `1392a6068d`)	2024-10-28 09:54:30 +00:00
Asias He	0bc22db3a9	repair: Fix finished ranges metrics for removenode The skipped ranges should be multiplied by the number of tables. Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 (cherry picked from commit `cffe3dc49f`)	2024-10-28 09:54:30 +00:00
Botond Dénes	b78675270e	streaming: stream-session: switch to tracking permit The stream-session is the receiving end of streaming, it reads the mutation fragment stream from an RPC stream and writes it onto the disk. As such, this part does no disk IO and therefore, using a permit with count resources is superfluous. Furthermore, after `d98708013c`, the count resources on this permit can cause a deadlock on the receiver end, via the `db::view::check_view_update_path()`, which wants to read the content of a system table and therefore has to obtain a permit of its own. Switch to a tracking-only permit, primarily to resolve the deadlock, but also because admission is not necessary for a read which does no IO. Refs: scylladb/scylladb#20885 (partial fix, solves only one of the deadlocks) Fixes: scylladb/scylladb#21264 (cherry picked from commit `dbb26da2aa`) Closes scylladb/scylladb#21303	2024-10-28 08:07:05 +02:00
Jenkins Promoter	ea6fe4bfa1	Update ScyllaDB version to: 6.2.1	2024-10-27 12:06:35 +02:00
Botond Dénes	30a2ed7488	Merge '[Backport 6.2] cql/tablets: fix retrying ALTER tablets KEYSPACE' from Marcin Maliszkiewicz ALTER tablets-enabled KEYSPACES (KS) may fail due to group0_concurrent_modification, in which case it's repeated by a for loop surrounding the code. But because raft's add_entry consumes the raft's guard (by std::move'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned for loop altogether and rethrow the exception, as the rf_change event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. Note: refactor is implemented in the follow-up commit. Fixes: https://github.com/scylladb/scylladb/issues/21102 Should be backported to every 6.x branch, as it may lead to a crash. (cherry picked from commit `de511f56ac`) (cherry picked from commit `3f4c8a30e3`) (cherry picked from commit `522bede8ec`) Refs https://github.com/scylladb/scylladb/pull/21121 Closes scylladb/scylladb#21256 * github.com:scylladb/scylladb: test: topology: add disable_schema_agreement_wait utility function test: add UT to test retrying ALTER tablets KEYSPACE cql/tablets: fix indentation in `rf_change` event handler cql/tablets: fix retrying ALTER tablets KEYSPACE	2024-10-25 10:57:36 +03:00
Botond Dénes	dcddb1ff4a	Merge '[Backport 6.2] multishard reader: make it safe to create with admitted permits' from ScyllaDB Passing an admitted permit -- i.e. one with count resources on it -- to the multishard reader, will possibly result in a deadlock, because the permit of the multishard reader is destroyed after the permits of its child readers. Therefore its semaphore resources won't be automatically released until children acquire their own resources. This creates a dependency (an edge in the "resource allocation graph"), where the semaphore used by the multishard reader depends on the semaphores used by children. When such dependencies create a cycle, and permits are acquired by different reads in just the right order, a deadlock will happen. Users of the multishard reader have to be aware of this gotcha -- and of course they aren't. This is small wonder, considering that not even the documentation on the multishard reader mentions this problem. To work around this, the user has to call `reader_permit::release_base_resources()` on the permit, before passing it to the multishard reader. On multiple occasions, developers (including the very author of the multishard reader), forgot or didn't know about this and this resulted in deadlocks down the line. This is a design-flaw of the multishard reader, which is addressed in this PR, after which, it is safe to pass admitted or not admitted permits to the multishard reader, it will handle the call to `release_base_resources()` if needed. After fixing the problem in the multishard reader, the existing calls to `release_base_resources()` on permits passed to multishard readers are removed. A test is added which reproduces the problem and ensures we don't regress. Refs: https://github.com/scylladb/scylladb/issues/20885 (partial fix, there is another deadlock in that issue, which this PR doesn't fix) Fixes: https://github.com/scylladb/scylladb/issues/21263 This fixes (indirectly) a regression introduced by `d98708013c` so it has to be backported to 6.2 (cherry picked from commit `e1d8cddd09`) Refs scylladb/scylladb#21058 Closes scylladb/scylladb#21178 * github.com:scylladb/scylladb: test/boost/mutation_test: add test for multishard permit safety test/lib/reader_lifecycle_policy: add semaphore factory to constructor test/lib/reader_lifecycle_policy: rename factory_function repair/row_level: drop now unneeded release_base_resource() calls readers/multishard: make multishard reader safe to create with admitted permits	2024-10-25 09:32:03 +03:00
Piotr Dulikowski	4ca0e31415	test/test_view_build_status: properly wait for v2 in migration test The test_view_build_status_migration_to_v2 test case creates a new view (vt2) after peforming the view_build_status -> view_build_status_v2 migration and waits until it is built by `wait_for_view_v2` function. It works by waiting until a SELECT from view_build_status_v2 will return the expected number of rows for a given view. However, if the host parameter is unspecified, it will query only one node on each attempt. Because `view_build_status_v2` is managed via raft, queries always return data from the queried node only. It might happen that `wait_for_view_v2` fetches expected results from one node while a different node might be lagging behind the group0 coordinator and might not have all data yet. In case of test_view_build_status_migration_to_v2 this is a problem - it first uses `wait_for_view_v2` to wait for view, later it queries `view_build_status_v2` on a random node and asserts its state - and might fail because that node didn't have the newest state yet. Fix the issue by issuing `wait_for_view_v2` in parallel for all nodes in the cluster and waiting until all nodes have the most recent state. Fixes: scylladb/scylladb#21060 (cherry picked from commit `a380a2efd9`) Closes scylladb/scylladb#21129	2024-10-24 16:42:53 +03:00
Raphael S. Carvalho	363bc7424e	locator: Always preserve balancing_enabled in tablet_metadata::copy() When there are zero tablets, tablet_metadata::_balancing_enabled is ignored in the copy. The property not being preserved can result in balancer not respecting user's wish to disable balancing when a replica is created later on. Fixes #21175. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `dfc217f99a`) Closes scylladb/scylladb#21190	2024-10-24 16:37:41 +03:00
Botond Dénes	a5b11a3189	test/boost/mutation_test: add test for multishard permit safety Add a test checking that the multishard reader will not deadlock, when created with an admitted permit, on a semaphore with a single count resource. (cherry picked from commit `e1d8cddd09`)	2024-10-24 09:18:11 -04:00
Botond Dénes	c0eba659f6	test/lib/reader_lifecycle_policy: add semaphore factory to constructor Allowing callers to specify how the semaphore is created and stopped, instead of doing so via boolean flags like it is done currently. This method doesn't scale, so use a factory instead. (cherry picked from commit `5a3fd69374`)	2024-10-24 09:18:11 -04:00
Botond Dénes	dbb1dc872d	test/lib/reader_lifecycle_policy: rename factory_function To reader_factor_function. We are about to add a new factory function parameters, so the current factory_function has to be renamed to something more specific. (cherry picked from commit `c8598e21e8`)	2024-10-24 09:18:11 -04:00
Botond Dénes	07b288b7d7	repair/row_level: drop now unneeded release_base_resource() calls The multishard reader now does this itself, no need to do it here. (cherry picked from commit `76a5ba2342`)	2024-10-24 09:18:11 -04:00
Botond Dénes	41a44ddc12	readers/multishard: make multishard reader safe to create with admitted permits Passing an admitted permit -- i.e. one with count resources on it -- to the multishard reader, will possibly result in a deadlock, because the permit of the multishard reader is destroyed after the permits of its child readers. Therefore its semaphore resources won't be automatically released until children acquire their own resources. This creates a dependency (an edge in the "resource allocation graph"), where the semaphore used by the multishard reader depends on the semaphores used by children. When such dependencies create a cycle, and permits are acquired by different reads in just the right order, a deadlock will happen. Users of the multishard reader have to be aware of this gotcha -- and of course they aren't. This is small wonder, considering that not even the documentation on the multishard reader mentions this problem. To work around this, the user has to call `reader_permit::release_base_resources()` on the permit, before passing it to the multishard reader. On multiple occasions, developers (including the very author of the multishard reader), forgot or didn't know about this and this resulted in deadlocks down the line. This is a design-flaw of the multishard reader, which is addressed in this patch, after which, it is safe to pass admitted or not admitted permits to the multishard reader, it will handle the call to `release_base_resources()` if needed. (cherry picked from commit `218ea449a5`)	2024-10-24 09:18:11 -04:00
Lakshmi Narayanan Sreethar	3f04df55eb	[Backport 6.2] replica/table: check memtable before discarding tombstone during read On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 `perf-simple-query` stats before and after this fix : `build/Dev/scylla perf-simple-query --smp=1 --flush` : ``` // Before this Fix // --------------- 94941.79 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59393 insns/op, 24029 cycles/op, 0 errors) 97551.14 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59376 insns/op, 23966 cycles/op, 0 errors) 96599.92 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59367 insns/op, 23998 cycles/op, 0 errors) 97774.91 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59370 insns/op, 23968 cycles/op, 0 errors) 97796.13 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59368 insns/op, 23947 cycles/op, 0 errors) throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79 instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02 cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19 // After this Fix // -------------- 95313.53 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59392 insns/op, 24058 cycles/op, 0 errors) 97311.48 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59375 insns/op, 24005 cycles/op, 0 errors) 98043.10 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59381 insns/op, 23941 cycles/op, 0 errors) 96750.31 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59396 insns/op, 24025 cycles/op, 0 errors) 93381.21 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59390 insns/op, 24097 cycles/op, 0 errors) throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21 instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73 cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22 ``` This PR fixes a regression introduced in `ce96b472d3` and should be backported to older versions. Closes scylladb/scylladb#20985 * github.com:scylladb/scylladb: topology-custom: add test to verify tombstone gc in read path replica/table: check memtable before discarding tombstone during read compaction_group: track maximum timestamp across all sstables (cherry picked from commit `519e167611`) Backported from #20985 to 6.2. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#21251	2024-10-24 15:33:39 +03:00
Marcin Maliszkiewicz	8c8f97c280	test: topology: add disable_schema_agreement_wait utility function Code extracted from `fa45fdf5f7` as it's being used by test_alter_tablets_keyspace_concurrent_modification and we're backporting it.	2024-10-24 11:24:56 +02:00
Piotr Smaron	a61ab7d02e	test: add UT to test retrying ALTER tablets KEYSPACE The newly added testcase is based on the already existing `test_alter_dropped_tablets_keyspace`. A new error injection is created, which stops the ALTER execution just before the changes are submitted to RAFT. In the meantime, a new schema change is performed using the 2nd node in the cluster, thus causing the 1st node to retry the ALTER statement. (cherry picked from commit `522bede8ec`)	2024-10-23 13:35:26 +00:00
Piotr Smaron	775578af59	cql/tablets: fix indentation in `rf_change` event handler Just moved the code that previously was under a `for` loop by 1 tab, i.e. 4 spaces, to the left. (cherry picked from commit `3f4c8a30e3`)	2024-10-23 13:35:26 +00:00
Piotr Smaron	97f22f426f	cql/tablets: fix retrying ALTER tablets KEYSPACE ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. `topology_coordinator::handle_topology_coordinator_error` handling the case of `group0_concurrent_modification` has been extended with logging in order not to write catch-log-throw boilerplate. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102 (cherry picked from commit `de511f56ac`)	2024-10-23 13:35:26 +00:00
Botond Dénes	55a9605687	Merge '[Backport 6.2] Check system.tablets update before putting it into the table' from ScyllaDB Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load. Fixes #20043 (cherry picked from commit `f09fe4f351`) (cherry picked from commit `e5bf376cbc`) (cherry picked from commit `1863ccd900`) Refs #21020 Closes scylladb/scylladb#21111 * github.com:scylladb/scylladb: tablets: Validate system.tablets update group0_client: Introduce change validation group0_client: Add shared_token_metadata dependency	2024-10-23 10:00:39 +03:00
Pavel Emelyanov	83cc3e4791	tablets: Validate system.tablets update Implement change validation for raft topology_change command. For now the only check is that the "pending replicas" contains at most one entry. The check mirrors similar one in `process_one_row` function. If not passed, this prevents system.tablets from being updated with the mutation(s) that will not be loaded later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 14:45:51 +03:00
Pavel Emelyanov	aef7e7db0b	group0_client: Introduce change validation Add validate_change() methods (well, a template and an overload) that are called by prepare_command() and are supposed to validate the proposed change before it hits persistent storage Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 14:45:22 +03:00
Pavel Emelyanov	282cdfcfcc	group0_client: Add shared_token_metadata dependency It will be needed later to get tablet_metadata from. The dependency is "OK", shared_token_metadata is low-level sharded service. Client already references db::system_keyspace, which in turn references replica::database which, finally, references token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 14:45:12 +03:00
Daniel Reis	b661bc39df	docs: fix redirect from cert-based auth to security/enable-auth page (cherry picked from commit `28a265ccd8`) Closes scylladb/scylladb#21124	2024-10-22 09:10:04 +03:00
Botond Dénes	0805780064	Merge '[Backport 6.2] scylla_raid_setup: configure SELinux file context' from ScyllaDB On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #19325 (cherry picked from commit `56c971373c`) (cherry picked from commit `0ac450de05`) Refs #20528 Closes scylladb/scylladb#21211 * github.com:scylladb/scylladb: scylla_raid_setup: configure SELinux file context scylla_coredump_setup: fix SELinux configuration for RHEL9	2024-10-21 16:03:08 +03:00
Takuya ASADA	3de8885161	scylla_raid_setup: configure SELinux file context On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #20573 (cherry picked from commit `0ac450de05`)	2024-10-21 11:15:06 +00:00
Takuya ASADA	29a0ce3b0a	scylla_coredump_setup: fix SELinux configuration for RHEL9 Seems like specific version of systemd pacakge on RHEL9 has a bug on SELinux configuration, it introduced "systemd-container-coredump" module to provide rule for systemd-coredump, but not enabled by default. We have to manually load it, otherwise it causes permission error. Fixes #19325 (cherry picked from commit `56c971373c`)	2024-10-21 11:15:06 +00:00
Benny Halevy	eebf97c545	view: check_needs_view_update_path: get token_metadata_ptr check_needs_view_update_path is async and might yield so the token_metadata reference passed to it must be kept alive throughout the call. Fixes scylladb/scylladb#20979 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `eaa3b774a6`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21038	2024-10-21 10:27:52 +02:00
Artsiom Mishuta	a728695d10	test.py: deselect remove_data_dir_of_dead_node event deselect remove_data_dir_of_dead_node event from test_random_failures due to ussue #20751 (cherry picked from commit `9b0e15678e`) Closes scylladb/scylladb#21138	2024-10-17 11:38:35 +02:00
Piotr Smaron	82a34aa837	test: fix flaky `test_multidc_alter_tablets_rf` The testcase is flaky due to a known python driver issue: https://github.com/scylladb/python-driver/issues/317. This issue causes the `CREATE KEYSPACE` statement to be sometimes executed twice in a row, and the 2nd CREATE statement causes the test to fail. In order to work around it, it's enough to add `if not exists` when creating a ks. Fixes: #21034 Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch. (cherry picked from commit `f8475915fb`) Closes scylladb/scylladb#21107	2024-10-15 09:26:28 +03:00
Piotr Dulikowski	d10c6a86cc	SCYLLA-VERSION-GEN: correct the logic for skipping SCYLLA--FILE The SCYLLA-VERSION-GEN file skips updating the SCYLLA--FILE files if the commit hash from SCYLLA-RELEASE-FILE is the same. The original reason for this was to prevent the date in the version string from changing if multiple modes are built across midnight (scylladb/scylla-pkg#826). However - intentionally or not - it serves another purpose: it prevents an infinite loop in the build process. If the build.ninja file needs to be rebuilt, the configure.py script unconditionally calls ./SCYLLA-VERSION-GEN. On the other hand, if one of the SCYLLA-*-FILE files is updated then this triggers rebuild of build.ninja. Apparently, this is sufficient for ninja to enter an infinite loop. However, the check assumes that the RELEASE is in the format <build identifier>.<date>.<commit hash> and assumes that none of the components have a dot inside - otherwise it breaks and just works incorrectly. Specifically, when building a private version, it is recommended to set the build identifier to `count.yourname`. Previously, before `85219e9`, this problem wasn't noticed most likely because reconfigure process was broken and stopped overwriting the build.ninja file after the first iteration. Fix the problem by fixing the logic that extracts the commit hash - instead of looking at the third dot-separated field counting from the left side, look at the last field. Fixes: scylladb/scylladb#21027 (cherry picked from commit `64ca58125e`) Closes scylladb/scylladb#21103	2024-10-15 09:26:00 +03:00
Botond Dénes	554838691b	Merge '[Backport 6.2] compaction: fix potential data resurrection with file-based migration' from Ferenc Szili This is a manual backport of #20788 When tablets are migrated with file-based streaming, we can have a situation where a tombstone is garbage collected before the data it shadows lands. For instance, if we have a tablet replica with 3 sstables: 1. sstable containing an expired tombstone 2. sstable with additional data 3. sstable containing data which is shadowed by the expired tombstone in sstable 1 If this tablet is migrated, and the sstables are streamed in the order listed above, the first two sstables can be compacted before the third sstable arrives. In that case, the expired tombstone will be garbage collected, and data in the third sstable will be resurrected after it arrives to the pending replica. This change fixes this problem by disabling tombstone garbage collection for pending replicas. This fixes a problem in Enterprise, but the change is in OSS in order to have as few differences between OSS and Enterprise and to have a common infrastructure for disabling tombstone GC on pending replicas. Fixes #21090 Closes scylladb/scylladb#21061 * github.com:scylladb/scylladb: test: test tombstone GC disabled on pending replica tablet_storage_group_manager: update tombstone_gc_enabled in compaction group database::table: add tombstone_gc_enabled(locator::tablet_id)	2024-10-15 09:25:22 +03:00
Kefu Chai	b691dddf6b	install.sh: install seastar/scripts/addr2line.py as well seastar extracted `addr2line` python module out back in e078d7877273e4a6698071dc10902945f175e8bc. but `install.sh` was not updated accordingly. it still installs `seastar-addr2line` without installing its new dependency. this leaves us with a broken `seastar-addr2line` in the relocatable tarball. ```console $ /opt/scylladb/scripts/seastar-addr2line Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/seastar-addr2line", line 26, in <module> from addr2line import BacktraceResolver ModuleNotFoundError: No module named 'addr2line' ``` in this change, we redistribute `addr2line.py` as well. this should address the issue above. Fixes scylladb/scylladb#21077 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `da433aad9d`) Closes scylladb/scylladb#21085	2024-10-14 09:52:21 +03:00
Botond Dénes	85b1c64a33	Merge '[Backport 6.2] storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from ScyllaDB During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in filter_for_query(): the map is considered incorrect if the list of replicas contains a node from a data center whose replication factor is 0. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2. (cherry picked from commit `132358dc92`) (cherry picked from commit `ae23d42889`) (cherry picked from commit `ad93cf5753`) (cherry picked from commit `8db6d6bd57`) (cherry picked from commit `c373edab2d`) Refs #20851 Closes scylladb/scylladb#21067 * github.com:scylladb/scylladb: Add conditions checking for get_read_executor Avoid an extra call to block_for in db::filter_for_query. Improve code readability in consistency_level.cc and storage_proxy.cc tools: Add build_info header with functions providing build type information tests: Add tests for alter table with RF=1 to RF=0	2024-10-14 09:51:50 +03:00
Benny Halevy	6e67a993ba	storage_service: rebuild: warn about tablets-enabled keyspaces Until we automatically support rebuild for tablets-enabled keyspaces, warn the user about them. The reason this is not an error, is that after increasing RF in a new datacenter, the current procedure is to run `nodetool rebuild` on all nodes in that dc to rebuild the new vnode replicas. This is not required for tablets, since the additional replicas are rebuilt automatically as part of ALTER KS. However, `nodetool rebuild` is also run after local data loss (e.g. due to corruption and removal of sstables). In this case, rebuild is not supported for tablets-enabled keyspaces, as tablet replicas that had lost data may have already been migrated to other nodes, and rebuilding the requested node will not know about it. It is advised to repair all nodes in the datacenter instead. Refs scylladb/scylladb#17575 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ed1e9a1543`) Closes scylladb/scylladb#20722	2024-10-14 09:47:35 +03:00
Michał Chojnowski	b8a9fd4e49	reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources can_admit_read() returns reason::memory_resources when the permit is queued due to lack of count resources, and it returns reason::count_resources when the permit is queued due to lack of memory resources. It's supposed to be the other way around. This bug is causing the two counts to be swapped in the stat dumps printed to the logs when semaphores time out. (cherry picked from commit `6cf3747c5f`) Closes scylladb/scylladb#21030	2024-10-13 18:34:18 +03:00
Jenkins Promoter	363cf881d4	Update ScyllaDB version to: 6.2.0	2024-10-13 14:15:40 +03:00
Sergey Zolotukhin	68a55facdf	Add conditions checking for get_read_executor During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in get_endpoints_for_reading(): the map is considered incorrect the number of read replica nodes is higher than replication factor. The check is applied only when built in non release mode. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 (cherry picked from commit `c373edab2d`)	2024-10-11 18:20:43 +00:00
Sergey Zolotukhin	9010d0a22f	Avoid an extra call to block_for in db::filter_for_query. (cherry picked from commit `8db6d6bd57`)	2024-10-11 18:20:43 +00:00
Sergey Zolotukhin	3c0f43b6eb	Improve code readability in consistency_level.cc and storage_proxy.cc Add const correctness and rename some variables to improve code readability. (cherry picked from commit `ad93cf5753`)	2024-10-11 18:20:43 +00:00
Sergey Zolotukhin	a22e4476ac	tools: Add build_info header with functions providing build type information A new header provides `constexpr` functions to retrieve build type information: `get_build_type()`, `is_release_build()`, and `is_debug_build()`. These functions are useful when adding changes that should be enabled at compile time only for specific build types. (cherry picked from commit `ae23d42889`)	2024-10-11 18:20:42 +00:00
Sergey Zolotukhin	14650257c0	tests: Add tests for alter table with RF=1 to RF=0 Adding Vnodes and Tablets tests for alter keyspace operation that decreases replication factor from 1 to 0 for one of two data centers. Tablet version fails due to issue described in scylladb/scylladb#20625. Test for scylladb/scylladb#20625 (cherry picked from commit `132358dc92`)	2024-10-11 18:20:42 +00:00
Ferenc Szili	2a318817ba	test: test tombstone GC disabled on pending replica This tests if tombstone GC is disabled on pending replicas	2024-10-11 14:10:30 +02:00
Ferenc Szili	5f052a2b52	tablet_storage_group_manager: update tombstone_gc_enabled in compaction group In order to avoid cases during tablet migrations where we garbage collect tombstones before the data it shadows arrives, we will disable tombstone GC on pending replicas. To achieve this we added a tombston_gc_enabled flag to compaction_group. This flag is updated from updte_effective_repliction_map method of the tablet_storage_group_manager class.	2024-10-11 14:09:30 +02:00
David Garcia	e018b38a54	docs: Fix confgroup links It was not possible to link to configuration parameters groups in docs/reference/configuration-parameters.rst if they contained a space. (cherry picked from commit `2247bdbc8c`) Closes scylladb/scylladb#21037	2024-10-11 14:31:28 +03:00
Ferenc Szili	14ce5e14d0	database::table: add tombstone_gc_enabled(locator::tablet_id) This change adds the flag tombstone_gc_enabled to compaction_group. The value of this flag will be set in tablet_storage_group_manager::update_effective_replication_map().	2024-10-11 13:29:30 +02:00
Piotr Smaron	d1a31460a0	cql/tablets: handle MVs in ALTER tablets KEYSPACE ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized views (MV), and only produced tablets mutations changing tables. With this patch we're producing tablets mutations for both tables and MVs, hence when e.g. we change the replication factor (RF) of a KS, both the tables' RFs and MVs' RFs are updated along with tablets replicas. The `test_tablet_rf_change` testcase has been extended to also verify that MVs' tablets replicas are updated when RF changes. Fixes: #20240 (cherry picked from commit `e0c1a51642`) Closes scylladb/scylladb#21022	2024-10-11 14:14:09 +03:00
Botond Dénes	9175cc528b	Merge '[Backport 6.2] cql: improve validating RF's change in ALTER tablets KS' from ScyllaDB This patch series fixes a couple of bugs around validating if RF is not changed by too much when performing ALTER tablets KS. RF cannot change by more than 1 in total, because tablets load balancer cannot handle more work at once. Fixes: #20039 Should be backported to 6.0 & 6.1 (wherever tablets feature is present), as this bug may break the cluster. (cherry picked from commit `042825247f`) (cherry picked from commit `adf453af3f`) (cherry picked from commit `9c5950533f`) (cherry picked from commit `47acdc1f98`) (cherry picked from commit `93d61d7031`) (cherry picked from commit `6676e47371`) (cherry picked from commit `2aabe7f09c`) (cherry picked from commit `ee56bbfe61`) Refs #20208 Closes scylladb/scylladb#21009 * github.com:scylladb/scylladb: cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS cql: join new and old KS options in ALTER tablets KS cql: fix validation of ALTERing RFs in tablets KS cql: harden `alter_keyspace_statement.cc::validate_rf_difference` cql: validate RF change for new DCs in ALTER tablets KS cql: extend test_alter_tablet_keyspace_rf cql: refactor test_tablets::test_alter_tablet_keyspace cql: remove unused helper function from test_tablets	2024-10-11 14:13:43 +03:00
Botond Dénes	18be4f454e	Merge '[Backport 6.2] Node replace and remove operations: Add deprecate IP addresses usage warning.' from ScyllaDB - As part of deprecation of IP address usage, warning messages were added when IP addresses specified in the `ignore-dead-nodes` and `--ignore-dead-nodes-for-replace` options for scylla and nodetool. - Slight optimizations for `utils::split_comma_separated_list`, ` host_id_or_endpoint lists` and `storage_service` remove node operations, replacing `std::list` usage with `std::vector`. Fixes scylladb/scylladb#19218 Backport: 6.2 as it's not yet released. (cherry picked from commit `3b9033423d`) (cherry picked from commit `a871321ecf`) (cherry picked from commit `9c692438e9`) (cherry picked from commit `6398b7548c`) Refs #20756 Closes scylladb/scylladb#20958 * github.com:scylladb/scylladb: config: Add a warning about use of IP address for join topology and replace operations. nodetool: Add IP address usage warning for 'ignore-dead-nodes'. tests: Fix incorrect UUIDs in test_nodeops utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists	2024-10-11 14:12:51 +03:00
Botond Dénes	f35a083abe	repair/row_level: remove reader timeout This timeout was added to catch reader related deadlocks. We have not seen such deadlocks for a long time, but we did see false-timeouts caused by this, see explanation below. Since the cost now outweight the benefit, remove the timeout altogether. The false timeout happens during mixed-shard repair. The `reader_permit::set_timeout()` call is called on the top-level permit which repair has a handle on. In the case of the mixed-shard repair, this belongs to the multishard reader. Calling set_timeout() on the multishard reader has no effect on the actual shard readers, except in one case: when the shard reader is created, it inherits the multishard reader's current timeout. As the shard reader can be alive for a long time, this timeout is not refreshed and ultimately causes a timeout and fails the repair. Refs: #18269 (cherry picked from commit `3ebb124eb2`) Closes scylladb/scylladb#20955	2024-10-11 14:11:03 +03:00
Anna Stuchlik	57affc7fad	doc: document the option to run ScyllaDB in Docker on macOS This commit adds a description of a workaround to create a multi-node ScyllaDB cluster with Docker on macOS. Refs https://github.com/scylladb/scylladb/issues/16806 See https://forum.scylladb.com/t/running-3-node-scylladb-in-docker/1057/4 (cherry picked from commit `7eb1dc2ae5`) Closes scylladb/scylladb#20931	2024-10-11 14:10:06 +03:00
Raphael S. Carvalho	927e526e2d	replica: Fix schema change during migration cleanup During migration cleanup, there's a small window in which the storage group was stopped but not yet removed from the list. So concurrent operations traversing the list could work with stopped groups. During a test which emitted schema changes during migrations, a failure happened when updating the compaction strategy of a table, but since the group was stopped, the compaction manager was unable to find the state for that group. In order to fix it, we'll skip stopped groups when traversing the list since they're unused at this stage of migration and going away soon. Fixes #20699. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `cf58674029`) Closes scylladb/scylladb#20899	2024-10-11 14:07:42 +03:00
Calle Wilund	b224665575	database: Also forced new schema commitlog segment on user initiated memtable flush Refs #20686 Refs #15607 In #15060 we added forced new commitlog segment on user initated flush, mainly so that tests can verify tombstone gc and other compaction related things, without having to wait for "organic" segment deletion. Schema commitlog was not included, mainly because we did not have tests featuring compaction checks of schema related tables, but also because it was assumed to be lower general througput. There is however no real reason to not include it, and it will make some testing much quicker and more predictable. (cherry picked from commit `60f8a9f39d`) Closes scylladb/scylladb#20705	2024-10-11 14:03:17 +03:00
Gleb Natapov	9afb1afefa	storage_proxy: make sure there is no end iterator in _live_iterators array storage_proxy::cancellable_write_handlers_list::update_live_iterators assumes that iterators in _live_iterators can be dereferenced, but the code does not make any attempt to make sure this is the case. The iterator can be the end iterator which cannot be dereferenced. The patch makes sure that there is no end iterator in _live_iterators. Fixes scylladb/scylladb#20874 (cherry picked from commit `da084d6441`) Closes scylladb/scylladb#21003	2024-10-10 17:09:27 +03:00
Kefu Chai	72153cac96	auth: capture boost::regex_error not std::regex_error in `a3db5401`, we introduced the TLS certi authenticator, which is configured using `auth_certificate_role_queries` option . the value of this option contains a regular expression. so there are chances the regular expression is malformatted. in that case, when converting its value presenting the regular expression to an instance of `boost::regex`, Boost.Regex throws a `boost::regex_error` exception, not `std::regex_error`. since we decided to use Boost.Regex, let's catch `boost::regex_error`. Refs `a3db5401` Fixes scylladb/scylladb#20941 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `439c52c7c5`) Closes scylladb/scylladb#20952	2024-10-09 21:58:40 +03:00
Michał Chojnowski	f988980260	utils/rjson.cc: correct a comment about assert() Commit `aa1270a00c` changed most uses of `assert` in the codebase to `SCYLLA_ASSERT`. But the comment fixed in this patch is talking specifically about `assert`, and shouldn't have been changed. It doesn't make sense after the change. (cherry picked from commit `da7edc3a08`) Closes scylladb/scylladb#20976	2024-10-09 21:50:26 +03:00
Anna Stuchlik	1d11adf766	doc: remove outdated JMX references This commit removes references to JMX from the docs. Context: The JMX server has been dropped and removed from installation. The user can install it manually if needed, as documented with https://github.com/scylladb/scylladb/issues/18687. This commit removes the outdated information about JMX from other pages in the documentation, including the docs for nodetool, the list of ports, and the admin section. Also, the no longer relevant JMX information is removed from the Docker Hub docs. Fixes https://github.com/scylladb/scylladb/issues/18687 Fixes https://github.com/scylladb/scylladb/issues/19575 (cherry picked from commit `4e43d542cd`) Closes scylladb/scylladb#20988	2024-10-09 20:57:49 +03:00
Jenkins Promoter	dae1d18145	Update ScyllaDB version to: 6.2.0-rc3	2024-10-09 15:10:48 +03:00
Kamil Braun	e9588a8a53	Merge '[Backport 6.2] Wait for all users of group0 server to complete before destroying it' from ScyllaDB Group0 server is often used in asynchronous context, but we do not wait for them to complete before destroying the server. We already have shutdown gate for it, so lets use it in those asynch functions. Also make sure to signal group0 abort source if initialization fails. Fixes scylladb/scylladb#20701 Backport to 6.2 since it contains `af83c5e53e` and it made the race easier to hit, so tests became flaky. (cherry picked from commit `ba22493a69`) (cherry picked from commit `e642f0a86d`) Refs #20891 Closes scylladb/scylladb#21008 * github.com:scylladb/scylladb: group: hold group0 shutdown gate during async operations group0: Stop group0 if node initialization fails	2024-10-09 12:19:16 +02:00
Piotr Smaron	c73d0ffbaa	cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS Tablets load balancer is unable to process more than a single pending replica, thus ALTER tablets KS cannot accept an ALTER statement which would result in creating 2+ pending replicas, hence it has to validate if the sum of absoulte differences of RFs specified in the statement is not greter than 1. (cherry picked from commit `ee56bbfe61`)	2024-10-08 18:06:52 +00:00
Piotr Smaron	c7b5571766	cql: join new and old KS options in ALTER tablets KS A bug has been discovered while trying to ALTER tablets KS and specifying only 1 out of 2 DCs - the not specified DC's RF has been zeroed. This is because ALTER tablets KS updated the KS only with the RF-per-DC mapping specified in the ALTER tablets KS statement, so if a DC was ommitted, it was assigned a value of RF=0. This commit fixes that plus additionally passes all the KS options, not only the replication options, to the topology coordinator, where the KS update is performed. `initial_tablets` is a special case, which requires a special handling in the source code, as we cannot simply update old initial_tablet's settings with the new ones, because if only ` and TABLETS = {'enabled': true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but rather keep the old value - this is tested by the `test_alter_preserves_tablets_if_initial_tablets_skipped` testcase. Other than that, the above mentioned testcase started to fail with these changes, and it appeared to be an issue with the test not waiting until ALTER is completed, and thus reading the old value, hence the test's body has been modified to wait for ALTER to complete before performing validation. (cherry picked from commit `2aabe7f09c`)	2024-10-08 18:06:48 +00:00
Piotr Smaron	92325073a9	cql: fix validation of ALTERing RFs in tablets KS The validation has been corrected with: 1. Checking if a DC specified in ALTER exists. 2. Removing `REPLICATION_STRATEGY_CLASS_KEY` key from a map of RFs that needs their RFs to be validated. (cherry picked from commit `6676e47371`)	2024-10-08 18:06:47 +00:00
Piotr Smaron	f5c0969c06	cql: harden `alter_keyspace_statement.cc::validate_rf_difference` This function assumed that strings passed as arguments will be of integer types, but that wasn't the case, and we missed that because this function didn't have any validation, so this change adds proper validation and error logging. Arguments passed to this function were forwarded from a call to `ks_prop_defs::get_replication_options`, which, among rf-per-dc mapping, returns also `class:replication_strategy` pair. Second pair's member has been casted into an `int` type and somehow the code was still running fine, but only extra testing added later discovered a bug in here. (cherry picked from commit `93d61d7031`)	2024-10-08 18:06:46 +00:00
Gleb Natapov	90ced080a8	group: hold group0 shutdown gate during async operations Wait for all outstanding async work that uses group0 to complete before destroying group0 server. Fixes scylladb/scylladb#20701 (cherry picked from commit `e642f0a86d`)	2024-10-08 18:06:45 +00:00
Piotr Smaron	7674d80c31	cql: validate RF change for new DCs in ALTER tablets KS ALTER tablets KS validated if RF is not changed by more than 1 for DCs that already had replicas, but not for DCs that didn't have them yet, so specifying an RF jump from 0 to 2 was possible when listing a new DC in ALTER tablets KS statement, which violated internal invariants of tablets load balancer. This PR fixes that bug and adds a multi-dc testcases to check if adding replicas to a new DC and removing replicas from a DC is honoring the RF change constraints. Refs: #20039 (cherry picked from commit `47acdc1f98`)	2024-10-08 18:06:45 +00:00
Gleb Natapov	06ceef34a7	group0: Stop group0 if node initialization fails Commit `af83c5e53e` moved aborting of group0 into the storage service drain function. But it is not called if node fails during initialization (if it failed to join cluster for instance). So lets abort on both paths (but only once). (cherry picked from commit `ba22493a69`)	2024-10-08 18:06:44 +00:00
Piotr Smaron	ec83367b45	cql: extend test_alter_tablet_keyspace_rf Added cases to also test decreasing RF and setting the same RF. Also added extra explanatory comments. (cherry picked from commit `9c5950533f`)	2024-10-08 18:06:44 +00:00
Piotr Smaron	dfe2e20442	cql: refactor test_tablets::test_alter_tablet_keyspace 1. Renamed the testcase to emphasize that it only focuses on testing changing RF - there are other tests that test ALTER tablets KS in general. 2. Fixed whitespaces according to PEP8 (cherry picked from commit `adf453af3f`)	2024-10-08 18:06:42 +00:00
Piotr Smaron	ad2191e84f	cql: remove unused helper function from test_tablets `change_default_rf` is not used anywhere, moreover it uses `replication_factor` tag, which is forbidden in ALTER tablets KS statement. (cherry picked from commit `042825247f`)	2024-10-08 18:06:41 +00:00
Sergey Zolotukhin	855abd7368	config: Add a warning about use of IP address for join topology and replace operations. When the '--ignore-dead-nodes-for-replace' config option contains IP addresses, a warning will be logged, notifying the user that using IP addresses with this option is deprecated and will no longer be supported in the next release. Fixes scylladb/scylladb#19218 (cherry picked from commit `6398b7548c`)	2024-10-03 14:10:30 +00:00
Sergey Zolotukhin	086dc6d53c	nodetool: Add IP address usage warning for 'ignore-dead-nodes'. Since we are deprecating the use of IP addresses, a warning message will be printed if 'nodetool removenode --ignore-dead-nodes' is used with IP addresses. (cherry picked from commit `9c692438e9`)	2024-10-03 14:10:29 +00:00
Sergey Zolotukhin	09b0b3f7d6	tests: Fix incorrect UUIDs in test_nodeops It was found that the UUIDs used in test_nodeops were invalid. This update replaces those UUIDs with newly generated random UUIDs. (cherry picked from commit `a871321ecf`)	2024-10-03 14:10:28 +00:00
Sergey Zolotukhin	3bbb7a24b1	utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists - utils::split_comma_separated_list now accepts a reference to sstring instead of a copy to avoid extra memory allocations. Additionally, the results of trimming are moved to the resulting vector instead of being copied. - service/storage_service removenode, raft_removenode, find_raft_nodes_from_hoeps, parse_node_list and api/storage_service::set_storage_service were changed to use std::vector<host_id_or_endpoint> instead of std::list<host_id_or_endpoint> as std::vector is a more cache-friendly structure, resulting in better performance. (cherry picked from commit `3b9033423d`)	2024-10-03 14:10:27 +00:00
Pavel Emelyanov	b43454c658	cql: Check that CREATEing tablets/vnodes is consistent with the CLI There are two bits that control whenter replication strategy for a keyspace will use tablets or not -- the configuration option and CQL parameter. This patch tunes its parsing to implement the logic shown below: if (strategy.supports_tablets) { if (cql.with_tablets) { if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { throw "tablets are not enabled"; } } else if (cql.with_tablets = off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { return create_keyspace_without_tablets(); } } } else { // strategy doesn't support tablets if (cql.with_tablets == on) { throw "invalid cql parameter"; } else if (cql.with_tablets == off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified return create_keyspace_without_tablets(); } } closes: #20088 In order to enable tablets "by default" for NetworkTopologyStrategy there's explicit check near ks_prop_defs::get_initial_tablets(), that's not very nice. It needs more care to fix it, e.g. provide feature service reference to abstract_replication_strategy constructor. But since ks_prop_defs code already highjacks options specifically for that strategy type (see prepare_options() helper), it's OK for now. There's also #20768 misbehavior that's preserved in this patch, but should be fixed eventually as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `ebedc57300`) Closes scylladb/scylladb#20927	2024-10-03 17:09:49 +03:00
Jenkins Promoter	93700ff5d1	Update ScyllaDB version to: 6.2.0-rc2	2024-10-02 14:58:37 +03:00
Anna Stuchlik	5e2b4a0e80	doc: add metric updates from 6.1 to 6.2 This commit specifies metrics that are new in version 6.2 compared to 6.1, as specified in https://github.com/scylladb/scylladb/issues/20176. Fixes https://github.com/scylladb/scylladb/issues/20176 (cherry picked from commit `a97db03448`) Closes scylladb/scylladb#20930	2024-10-02 12:07:06 +03:00
Calle Wilund	bb5dc0771c	commitlog: Fix buffer_list_bytes not updated correctly Fixes #20862 With the change in `60af2f3cb2` the bookkeep for buffer memory was changed subtly, the problem here that we would shrink buffer size before we after flush use said buffer's size to decrement the buffer_list_bytes value, previously inc:ed by the full, allocated size. I.e. we would slowly grow this value instead of adjusting properly to actual used bytes. Test included. (cherry picked from commit `ee5e71172f`) Closes scylladb/scylladb#20902	2024-10-01 17:41:02 +03:00
Aleksandra Martyniuk	9ed8519362	node_ops: fix task_manager_module::get_nodes() Currently, node ops virtual task gathers its children from all nodes contained in a sum of service::topology::normal_nodes and service::topology::transition_nodes. The maps may contain nodes that are down but weren't removed yet. So, if a user requests the status of a node ops virtual task, the task's attempt to retrieve its children list may fail with seastar::rpc::closed_error. Filter out the tasks that are down in node_ops::task_manager_module::get_nodes. Fixes: #20843. (cherry picked from commit `a558abeba3`) Closes scylladb/scylladb#20898	2024-10-01 14:52:11 +03:00
Avi Kivity	077d7c06a0	Merge '[Backport 6.2] sstables: Fix use-after-free on page cache buffer when parsing promoted index entries across pages' from ScyllaDB This fixes a use-after-free bug when parsing clustering key across pages. Also includes a fix for allocating section retry, which is potentially not safe (not in practice yet). Details of the first problem: Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In `93482439`, the promoted index cursor was optimized to avoid fully page copy when parsing index blocks. Instead, parser is given a temporary_buffer which is a view on the page. A bit earlier, in `b1b5bda`, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. Details of the solution: We adapt page_view to a temporary_buffer-like API. For this, a new concept is introduced called ContiguousSharedBuffer. We also change parsers so that they can be templated on the type of the buffer they work with (page_view vs temporary_buffer). This way we don't introduce indirection to existing algorithms. We use page_view instead of temporary_buffer in the promoted index parser which works with page cache buffers. page_view can be safely shared via share() and stored across allocating sections. It keeps hold to the LSA buffer even across allocating sections by the means of cached_file::page_ptr. Fixes #20766 (cherry picked from commit `8aca93b3ec`) (cherry picked from commit `ac823b1050`) (cherry picked from commit `93bfaf4282`) (cherry picked from commit `c0fa49bab5`) (cherry picked from commit `29498a97ae`) (cherry picked from commit `c15145b71d`) (cherry picked from commit `7670ee701a`) (cherry picked from commit `c09fa0cb98`) (cherry picked from commit `0279ac5faa`) (cherry picked from commit `8e54ecd38e`) (cherry picked from commit `b5ae7da9d2`) Refs #20837 Closes scylladb/scylladb#20905 * github.com:scylladb/scylladb: sstables: bsearch_clustered_cursor: Add trace-level logging sstables: bsearch_clustered_cursor: Move definitions out of line test, sstables: Verify parsing stability when allocating section is retried test, sstables: Verify parsing stability when buffers cross page boundary sstables: bsearch_clustered_cursor: Switch parsers to work with page_view cached_file: Adapt page_view to ContiguousSharedBuffer cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start sstables, utils: Allow parsers to work with different buffer types sstables: promoted_index_block_parser: Make reset() always bring parser to initial state sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried	2024-10-01 14:51:29 +03:00
Tomasz Grabiec	5a1575678b	sstables: bsearch_clustered_cursor: Add trace-level logging (cherry picked from commit `b5ae7da9d2`)	2024-10-01 01:38:48 +00:00
Tomasz Grabiec	2401f7f9ca	sstables: bsearch_clustered_cursor: Move definitions out of line In order to later use the formatter for the inner class promoted_index_block, which is defined out of line after cached_promoted_index class definition. (cherry picked from commit `8e54ecd38e`)	2024-10-01 01:38:47 +00:00
Tomasz Grabiec	906d085289	test, sstables: Verify parsing stability when allocating section is retried (cherry picked from commit `0279ac5faa`)	2024-10-01 01:38:47 +00:00
Tomasz Grabiec	34dd3a6daa	test, sstables: Verify parsing stability when buffers cross page boundary (cherry picked from commit `c09fa0cb98`)	2024-10-01 01:38:47 +00:00
Tomasz Grabiec	3afa8ee2ca	sstables: bsearch_clustered_cursor: Switch parsers to work with page_view This fixes a use-after-free bug when parsing clustering key across pages. Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In `b1b5bda`, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. The solution is to use page_view instead of temporary_buffer, which can be safely shared via share() and stored across allocating section. The page_view maintains its hold to the LSA buffer even across allocating sections. Fixes #20766 (cherry picked from commit `7670ee701a`)	2024-10-01 01:38:47 +00:00
Tomasz Grabiec	3347152ff9	cached_file: Adapt page_view to ContiguousSharedBuffer (cherry picked from commit `c15145b71d`)	2024-10-01 01:38:47 +00:00
Tomasz Grabiec	ff7bd937e2	cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start Will be easier to implement ContiguousSharedBuffer API as the buffer size will be equal to _size. (cherry picked from commit `29498a97ae`)	2024-10-01 01:38:47 +00:00
Tomasz Grabiec	50ea1dbe32	sstables, utils: Allow parsers to work with different buffer types Currently, parsers work with temporary_buffer<char>. This is unsafe when invoked by bsearch_clustered_cursor, which reuses some of the parsers, and passes temporary_buffer<char> which is a view onto LSA buffer which comes from the index file page cache. This view is stable only around consume(). If parsing requires more than one page, it will continue with a different input buffer. The old buffer will be invalid, and it's unsafe for the parser to store and access it. Unfortunetly, the temporary_buffer API allows sharing the buffer via the share() method, which shares the underlying memory area. This is not correct when the underlying is managed by LSA, because storage may move. Parser uses this sharing when parsing blobs, e.g. clustering key components. When parsing resumes in the next page, parser will try to access the stored shared buffers pointing to the previous page, which may result in use-after-free on the memory area. In prearation for fixing the problem, parametrize parsers to work with different kinds of buffers. This will allow us to instantiate them with a buffer kind which supports sharing of LSA buffers properly in a safe way. It's not purely mechanical work. Some parts of the parsing state machine still works with temporary_buffer<char>, and allocate buffers internally, when reading into linearized destination buffer. They used to store this destination in _read_bytes vector, same field which is used to store the shared buffers. Now it's not possible, since shared buffer type may be different than temporary_buffer<char>. So those paths were changed to use a new field: _read_bytes_buf. (cherry picked from commit `c0fa49bab5`)	2024-10-01 01:38:47 +00:00
Tomasz Grabiec	45125c4d7d	sstables: promoted_index_block_parser: Make reset() always bring parser to initial state When reset() is done due to allocating section retry, it can be theoretically in an arbitrary point. So we should not assume that it finished parsing and state was reset by previous parsing. We should reset all the fields. (cherry picked from commit `93bfaf4282`)	2024-10-01 01:38:46 +00:00
Tomasz Grabiec	9207f7823d	sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method To unify logic which handles allocating section retry, and thus improve safety. (cherry picked from commit `ac823b1050`)	2024-10-01 01:38:46 +00:00
Tomasz Grabiec	711864687f	sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried Parser's state was not reset when allocating section was retried. This doesn't cause problems in practice, because reserves are enough to cover allocation demands of parsing clustering keys, which are at most 64K in size. But it's still potentially unsafe and needs fixing. (cherry picked from commit `8aca93b3ec`)	2024-10-01 01:38:45 +00:00
Kamil Braun	faf11e5bc3	Merge '[Backport 6.2] Populate raft address map from gossiper on raft configuration change' from ScyllaDB For each new node added to the raft config populate it's ID to IP mapping in raft address map from the gossiper. The mapping may have expired if a node is added to the raft configuration long after it first appears in the gossiper. Fixes scylladb/scylladb#20600 Backport to all supported versions since the bug may cause bootstrapping failure. (cherry picked from commit `bddaf498df`) (cherry picked from commit `9e4cd32096`) Refs #20601 Closes scylladb/scylladb#20847 * github.com:scylladb/scylladb: test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join group0: make sure that address map has an entry for each new node in the raft configuration	2024-09-30 17:01:52 +02:00
Gleb Natapov	f9215b4d7e	test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join (cherry picked from commit `9e4cd32096`)	2024-09-26 21:13:34 +00:00
Gleb Natapov	469ac9976a	group0: make sure that address map has an entry for each new node in the raft configuration ID->IP mapping is added to the raft address map when the mapping first appears in the gossiper, but it is added as expiring entry. It becomes non expiring when a node is added to raft configuration. But when a node joins those two events may be distant in time (since the node's request may sit in the topology coordinator queue for a while) and mappings may expire already from the map. This patch makes sure to transfer the mapping from the gossiper for a node that is added to the raft configuration instead of assuming that the mapping is already there. (cherry picked from commit `bddaf498df`)	2024-09-26 21:13:33 +00:00
Botond Dénes	d341f1ef1e	Merge '[Backport 6.2] mark node as being replaced earlier' from ScyllaDB Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended. Fixes: https://github.com/scylladb/scylladb/issues/20629 Need to be backported since this is a regression (cherry picked from commit `644e7a2012`) (cherry picked from commit `c0939d86f9`) (cherry picked from commit `1b4c255ffd`) Refs #20743 Closes scylladb/scylladb#20829 * github.com:scylladb/scylladb: test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts topology coordinator:: mark node as being replaced earlier topology coordinator: do metadata barrier before calling finish_accepting_node() during replace	2024-09-26 10:37:25 +03:00
Kamil Braun	07dfcd1f64	service: raft: fix rpc error message What it called "leader" is actually the destination of the RPC. Trivial fix, should be backported to all affected versions. (cherry picked from commit `09c68c0731`) Closes scylladb/scylladb#20826	2024-09-26 10:33:50 +03:00
Anna Stuchlik	f8d63b5572	doc: add OS support for version 6.2 This commit adds the OS support for version 6.2. In addition, it removes support for 6.0, as the policy is only to include information for the supported versions, i.e., the two latest versions. Fixes https://github.com/scylladb/scylladb/issues/20804 (cherry picked from commit `8145109120`) Closes scylladb/scylladb#20825	2024-09-26 10:29:08 +03:00
Anna Stuchlik	ca83da91d1	doc: add an intro to the Features page This commit modifies the Features page in the following way: - It adds a short introduction and descriptions to each listed feature. - It hides the ToC (required to control and modify the information on the page, e.g., to add descriptions, have full control over what is displayed, etc.) - Removes the info about Enterprise features (following the request not to include Enterprise info in the OSS docs) Fixes https://github.com/scylladb/scylladb/issues/20617 Blocks https://github.com/scylladb/scylla-enterprise/pull/4711 (cherry picked from commit `da8047a834`) Closes scylladb/scylladb#20811	2024-09-26 10:22:36 +03:00
Botond Dénes	f55081fb1a	Merge '[Backport 6.2] Rename Alternator batch item count metrics' from ScyllaDB This PR addresses multiple issues with alternator batch metrics: 1. Rename the metrics to scylla_alternator_batch_item_count with op=BatchGetItem/BatchWriteItem 2. The batch size calculation was wrong and didn't count all items in the batch. 3. Add a test to validate that the metrics values increase by the correct value (not just increase). This also requires an addition to the testing to validate ops of different metrics and an exact value change. Needs backporting to allow the monitoring to use the correct metrics names. Fixes #20571 (cherry picked from commit `515857a4a9`) (cherry picked from commit `905408f764`) (cherry picked from commit `4d57a43815`) (cherry picked from commit `8dec292698`) Refs #20646 Closes scylladb/scylladb#20758 * github.com:scylladb/scylladb: alternator:test_metrics test metrics for batch item count alternator:test_metrics Add validating the increased value alternator: Fix item counting in batch operations Alterntor rename batch item count metrics	2024-09-26 10:22:00 +03:00
Anna Stuchlik	aa8cdec5bd	doc: fix a broken link This commit fixes a link to the Manager by adding a missing underscore to the external link. (cherry picked from commit `aa0c95c95c`) Closes scylladb/scylladb#20710	2024-09-26 10:18:59 +03:00
Anna Stuchlik	75a2484dba	doc: update the unified installer instructions This commit updates the unified installer instructions to avoid specifying a given version. At the moment, we're technically unable to use variables in URLs, so we need to update the page each release. Fixes https://github.com/scylladb/scylladb/issues/20677 (cherry picked from commit `400a14eefa`) Closes scylladb/scylladb#20708	2024-09-26 10:04:35 +03:00
Gleb Natapov	37387135b4	test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts (cherry picked from commit `1b4c255ffd`)	2024-09-26 03:45:50 +00:00
Gleb Natapov	ac24ab5141	topology coordinator:: mark node as being replaced earlier Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended. (cherry picked from commit `c0939d86f9`)	2024-09-26 03:45:50 +00:00
Gleb Natapov	729dc03e0c	topology coordinator: do metadata barrier before calling finish_accepting_node() during replace During replace with the same IP a node may get queries that were intended for the node it was replacing since the new node declares itself UP before it advertises that it is a replacement. But after the node starts replacing procedure the old node is marked as "being replaced" and queries no longer sent there. It is important to do so before the new node start to get raft snapshot since the snapshot application is not atomic and queries that run parallel with it may see partial state and fail in weird ways. Queries that are sent before that will fail because schema is empty, so they will not find any tables in the first place. The is pre-existing and not addressed by this patch. (cherry picked from commit `644e7a2012`)	2024-09-26 03:45:50 +00:00
Kamil Braun	9d64ced982	test: fix `topology_custom/test_raft_recovery_stuck` flakiness The test performs consecutive schema changes in RECOVERY mode. The second change relies on the first. However the driver might route the changes to different servers and we don't have group 0 to guarantee linearizability. We must rely on the first change coordinator to push the schema mutations to other servers before returning, but that only happens when it sees other servers as alive when doing the schema change. It wasn't guaranteed in the test. Fix this. Fixes scylladb/scylladb#20791 Should be backported to all branches containing this test to reduce flakiness. (cherry picked from commit `f390d4020a`) Closes scylladb/scylladb#20807	2024-09-25 15:11:10 +02:00
Abhinav	ea6349a6f5	raft topology: add error for removal of non-normal nodes In the current scenario, We check if a node being removed is normal on the node initiating the removenode request. However, we don't have a similar check on the topology coordinator. The node being removed could be normal when we initiate the request, but it doesn't have to be normal when the topology coordinator starts handling the request. For example, the topology coordinator could have removed this node while handling another removenode request that was added to the request queue earlier. This commit intends to fix this issue by adding more checks in the enqueuing phase and return errors for duplicate requests for node removal. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#20271 (cherry picked from commit `b25b8dccbd`) Closes scylladb/scylladb#20799	2024-09-25 11:34:20 +02:00
Benny Halevy	ed9122a84e	time_window_compaction_strategy: get_reshaping_job: restrict sort of multi_window vector to its size Currently the function calls boost::partial_sort with a middle iterator that might be out of bound and cause undefined behavior. Check the vector size, and do a partial sort only if its longer than `max_sstables`, otherwise sort the whole vector. Fixes scylladb/scylladb#20608 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20609 (cherry picked from commit `39ce358d82`) Refs: scylladb/scylladb#20609	2024-09-23 16:02:40 +03:00
Amnon Heiman	c7d6b4a194	alternator:test_metrics test metrics for batch item count This patch adds tests for the batch operations item count. The tests validate that the metrics tracking the number of items processed in a batch increase by the correct amount. Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `8dec292698`)	2024-09-23 11:02:55 +00:00
Amnon Heiman	a35e138b22	alternator:test_metrics Add validating the increased value The `check_increases_operation` now allows override the checked metric. Additionally, a custom validation value can now be passed, which make it possible to validate the amount by which a value has changed, rather than just validating that the value increased. The default behavior of validating that values have increased remains unchanged, ensuring backward compatibility. Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `4d57a43815`)	2024-09-23 11:02:55 +00:00
Amnon Heiman	3db67faa8a	alternator: Fix item counting in batch operations This patch fixes the logic for counting items in batch operations. Previously, the item count in requests was inaccurate, it count the number of tabels in get_item and the request_items in write_items. The new logic correctly counts each individual item in `BatchGetItem` and `BatchWriteItem` requests. Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `905408f764`)	2024-09-23 11:02:55 +00:00
Amnon Heiman	6a12174e2d	Alterntor rename batch item count metrics This patch renames metrics tracking the total number of items in a batch to `scylla_alternator_batch_item_count`. It uses the existing `op` label to differentiate between `BatchGetItem` and `BatchWriteItem` operations. Ensures better clarity and distinction for batch operations in monitoring. This an example of how it looks like: # HELP scylla_alternator_batch_item_count The total number of items processed across all batches # TYPE scylla_alternator_batch_item_count counter scylla_alternator_batch_item_count{op="BatchGetItem",shard="0"} 4 scylla_alternator_batch_item_count{op="BatchWriteItem",shard="0"} 4 (cherry picked from commit `515857a4a9`)	2024-09-23 11:02:55 +00:00
Piotr Dulikowski	ca0096ccb8	Merge '[Backport 6.2] message/messaging_service: guard adding maintenance tenant under cluster feature' from Michał Jadwiszczak In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant $maintenance, but the change wasn't protected by any cluster feature. This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels. This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios: version without $maintenance tenant -> version with $maintenance tenant guarded by a feature version with $maintenance tenant but not guarded by a feature -> version with $maintenance tenant guarded by a feature The PR adds enabled flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection. The $maintenance tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled. Fixes https://github.com/scylladb/scylladb/issues/20070 Refs https://github.com/scylladb/scylla-enterprise/issues/4403 (cherry picked from commit `d44844241d`) (cherry picked from commit `71a03ef6b0`) (cherry picked from commit `b4b91ca364`) Refs https://github.com/scylladb/scylladb/pull/19802 Closes scylladb/scylladb#20690 * github.com:scylladb/scylladb: message/messaging_service: guard adding maintenance tenant under cluster feature message/messaging_service: add feature_service dependency message/messaging_service: add `enabled` flag to statement tenants	2024-09-23 09:48:12 +02:00
Jenkins Promoter	a71d4bc49c	Update ScyllaDB version to: 6.2.0-rc1	2024-09-19 10:21:33 +03:00
Michał Jadwiszczak	749399e4b8	message/messaging_service: guard adding maintenance tenant under cluster feature Set `enabled` flag for `$maintenance` tenant to false and enable it when `MAINTENANCE_TENANT` feature is enabled. (cherry-picked from `b4b91ca364`)	2024-09-18 19:10:24 +02:00
Michał Jadwiszczak	bdd97b2950	message/messaging_service: add feature_service dependency (cherry-picked from `71a03ef6b0`)	2024-09-18 19:09:46 +02:00
Michał Jadwiszczak	1a056f0cab	message/messaging_service: add `enabled` flag to statement tenants Adding a new tenant needs to be done under cluster feature protection. However it wasn't the case for adding `$maintenance` statement tenant and to fix it we need to support an upgrade from node which doesn't know about maintenance tenant at all and from one which uses it without any cluster feature protection. This commit adds `enabled` flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection. (cherry-picked from `d44844241d`)	2024-09-18 19:09:06 +02:00
Tzach Livyatan	cf78a2caca	Update client-node-encryption: OpsnSSL is FIPS enabled Closes scylladb/scylladb#19705 (cherry picked from commit `cb864b11d8`)	2024-09-18 11:58:46 +03:00
Anna Mikhlin	cbc53f0e81	Update ScyllaDB version to: 6.2.0-rc0	2024-09-17 13:40:50 +03:00
Botond Dénes	a4a8cad97f	Merge 'atomic_delete: allow deletion of sstables from several prefixes' from Benny Halevy Allow create_pending_deletion_log to delete a bunch of sstables potentially resides in different prefixes (e.g. in the base directory and under staging/). The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups. Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories. Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit. Fixes scylladb/scylladb#18862 Needs backport to 6.0 since tablets require this capability Closes scylladb/scylladb#19555 * github.com:scylladb/scylladb: sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory sstables: storage: keep base directory in base class sstables: storage: define opened_directory in header file sstable_directory: use only dirlog	2024-09-17 08:30:40 +03:00
Lakshmi Narayanan Sreethar	626f55a2ea	compaction: run cleanup under maintenance scheduling group The cleanup compaction task is a maintenance operation that runs after topology changes. So, run it under the maintenance scheduling group to avoid interference with regular compaction tasks. Also remove the share allocations done by the cleanup task, as they are unnecessary when running under the maintenance group. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20582	2024-09-16 16:58:43 +03:00
Avi Kivity	870d1c16f7	scripts: fix bin/cqlsh shortcut Since `3c7af28725`, the cqlsh submodule no longer contains a bin/cqlsh shell script. This broke the supermodule's bin/cqlsh shortcut. Fix it by invoking cqlsh.py directly. Closes scylladb/scylladb#20591	2024-09-16 09:52:29 +03:00
Botond Dénes	ea29fe579b	Merge 'replica: ignore cleanup of deallocated storage group' from Aleksandra Martyniuk Cleanup of a deallocated tablet throws an exception. Since failed cleanup is retried, we end up in an infinite loop. Ignore cleanup of deallocated storage groups. Fixes: #19752. Needs to be backported to all branches with tablets (6.0 and later) Closes scylladb/scylladb#20584 * github.com:scylladb/scylladb: test: check if cleanup of deallocated sg is ignored replica: ignore cleanup of deallocated storage group	2024-09-16 09:22:56 +03:00
Gleb Natapov	695f112795	paxos_state: release semaphore units before checking if a semaphore can be dropped To drop a semaphore it should not be held by anyone, so we need to release out units before checking if a semaphore can be dropped. Fixes: scylladb/scylladb#20602 Closes scylladb/scylladb#20607	2024-09-15 21:21:03 +03:00
Kefu Chai	028410ba58	mutation_writer: use bucket parameter instead of using it->first as `_bucket` is an `unordered_map<bucket_id, timestamp_bucket_writer>`, when writing to a given bucket, we try to create a writer with the specified bucket id, so the returned iterator should point to a node whose `first` element is always the bucket id. so, there is no need to reference `it` for the bucket id, let's just reference the parameter. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20598	2024-09-15 20:05:12 +03:00
Kefu Chai	49f232f405	compaction: fix a typo in comment s/expection/exception/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20594	2024-09-15 16:09:01 +03:00
Avi Kivity	b9bc783418	cql3: selection: don't ignore regular column restriction if a regular row is not present If a regular row isn't present, no regular column restriction (say, r=3) can pass since all regular columns are presented as NULL, and we don't have an IS NULL predicate. Yet we just ignore it. Handle the restriction on a missing column by return false, signifying the row was filtered out. We have to move the check after the conditional checking whether there's any restriction at all, otherwise we exit early with a false failure. Unit test marked xfail on this issue are now unmarked. A subtest of test_tombstone_limit is adjusted since it depended on this bug. It tested a regular column which wasn't there, and this bug caused the filter to be ignored. Change to test a static column that is there. A test for a bug found while developing the patch is also added. It is also tested by test_tombstone_limit, but better to have a dedicated test. Fixes #10357 Closes scylladb/scylladb#20486	2024-09-15 13:44:16 +03:00
Botond Dénes	6d8e9645ce	test/*/run: restore --vnodes into working order This option was silently broken when --enable-tablet's default changed from false to true. The reason is that when --vnodes is passed, run only removes --enable-tablets=true from scylla's command line. With the new default this is not enough, we need to explicitely disable tablets to override the default. Closes scylladb/scylladb#20462	2024-09-13 17:10:09 +03:00
Nadav Har'El	f255391d52	cql-pytest: translate Cassandra's tests for arithmetic operators This is a translation of Cassandra's CQL unit test source file OperationFctsTest.java into our cql-pytest framework. This is a massive test suite (over 800 lines of code) for Cassandra's "arithmetic operators" CQL feature (CASSANDRA-11935), which was added to Cassandra almost 8 years ago (and reached Cassandra 4.0), but we never implemented it in Scylla. All of the tests in suite fail in ScyllaDB due to our lack of this feature: Refs #2693: Support arithmetic operators One test also discovered a new issue: Refs #20501: timestamp column doesn't allow "UTC" in string format All the tests pass on Cassandra. Some of the tests insist on specific error message strings and specific precision for decimal arithmetic operations - where we may not necessarily want to be 100% compatible with Cassandra in our eventual implementation. But at least the test will allow us to make deliberate - and not accidental - deviations from compatibility with Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20502	2024-09-13 14:52:59 +03:00
Botond Dénes	d3a9654fcc	Merge 'Make use of async() context in sstable_mutation_test' from Pavel Emelyanov This test runs all its cases in seastar thread, but still uses .then() continuations in some of them. This PR converts all continuations into plain .get()-s. Closes scylladb/scylladb#20457 * github.com:scylladb/scylladb: test: Restore indentation after previous changes test: Threadify tombstone_in_tombstone2() test: Threadify range_tombstone_reading() test: Threadify tombstone_in_tombstone() test: Threadify broken_ranges_collection() test: Threadify compact_storage_dense_read() test: Threadify compact_storage_simple_dense_read() test: Threadify compact_storage_sparse_read() test: Simplify test_range_reads() counting test: Simplify test_range_reads() inner loop test: Threadify test_range_reads() itself test: Threadify test_range_reads() callers test: Threadify generate_clustered() itself test: Threadify generate_clustered() callers test: Threadify test_no_clustered test test: Threadify nonexistent_key test	2024-09-13 14:09:53 +03:00
Aleksandra Martyniuk	2c4b1d6b45	test: check if cleanup of deallocated sg is ignored	2024-09-13 13:00:58 +02:00
Aleksandra Martyniuk	20d6cf55f2	replica: ignore cleanup of deallocated storage group Currently, attempt to cleanup deallocated storage group throws an exception. Failed tablet cleanup is retried, stucking in an endless loop. Ignore cleanup of deallocated storage group.	2024-09-13 13:00:53 +02:00
Andrei Chekun	bad7407718	test.py: Add support for BOOST_DATA_TEST_CASE Currently, test.py will throw an error if the test will use BOOST_DATA_TEST_CASE. test.py as a first step getting all test functions in the file, but when BOOST_DATA_TEST_CASE will be used the output will have additional lines indicating parametrized test that test.py can not handle. This commit adds handling this case, as a caveat all tests should start from 'test' or they will be ignored. Closes: #20530 Closes scylladb/scylladb#20556	2024-09-13 13:44:26 +03:00
Botond Dénes	7cb8cab2ae	Merge 'Remove make_shared_schema() helper' from Pavel Emelyanov This function was obsoleted by schema_builder some time ago. Not to patch all its callers, that helper became wrapper around it. Remained users are all in tests, and patching the to use builder directory makes the code shorter in many cases. Closes scylladb/scylladb#20466 * github.com:scylladb/scylladb: schema: Ditch make_shared_schema() helper test: Tune up indentation in uncompressed_schema() test: Make tests use schema_builder instead of make_shared_schema	2024-09-13 12:25:10 +03:00
Pavel Emelyanov	730731da4a	test: Remove unused table config from max_ongoing_compaction_test The local config is unused since #15909, when the table creation was changed to use env's facilities. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20511	2024-09-13 12:21:56 +03:00
Pavel Emelyanov	4c77f474ed	test: Remove unused upload_path local variable Since #14152 creation of an sstable takes table dir and its state. The test in question wants to create and sstable in upload/ subdir and for that it used to maintain full "cf.dir/upload" path, which is not required any more. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20514	2024-09-13 12:21:00 +03:00
Pavel Emelyanov	e9a1c0716f	test: Use sstables::test_env to make sstables for directory test This is continuation of #20431 in another test. After #20395 it's also possible to remove unused local dir variables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20541	2024-09-13 12:19:59 +03:00
Botond Dénes	4fb194117e	Merge 'Generalize multipart upload implementations in S3 client' from Pavel Emelyanov There are two currently -- upload_sink_base and do_upload_file. This PR merges as much code as possible (spoiler: it's already mostly copy-n-pase-d, so squashing is pretty straightforward) Closes scylladb/scylladb#20568 * github.com:scylladb/scylladb: s3/client: Reuse class multipart_upload in do_upload_file s3/client: Split upload_sink_base class into two	2024-09-13 10:35:10 +03:00
Kefu Chai	cf1f90fe0c	auth: remove unused #include the `seastar/core/print.hh` header is no longer required by `auth/resource.hh`. this was identified by clang-include-cleaner. As the code is audited, wecan safely remove the #include directive. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20575	2024-09-13 09:49:05 +03:00
Botond Dénes	c7c5817808	Merge 'Improve timestamp heuristics for tombstone garbage collection' from Benny Halevy When purging regular tombstone consult the min_live_timestamp, if available. This is safe since we don't need to protect dead data from resurrection, as it is already dead. For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp, if available, otherwise fallback to the min_live_timestamp. If we see in a view table a shadowable tombstone with time T, then in any row where the row marker's timestamp is higher than T the shadowable tombstone is completely ignored and it doesn't hide any data in any column, so the shadowable tombstone can be safely purged without any effect or risk resurrecting any deleted data. In other words, rows which might cause problems for purging a shadowable tombstone with time T are rows with row markers older or equal T. So to know if a whole sstable can cause problems for shadowable tombstone of time T, we need to check if the sstable's oldest row marker (and not oldest column) is older or equal T. And the same check applies similarly to the memtable. If both extended timestamp statistics are missing, fallback to the legacy (and inaccurate) min_timestamp. Fixes scylladb/scylladb#20423 Fixes scylladb/scylladb#20424 > [!NOTE] > no backport needed at this time > We may consider backport later on after given some soak time in master/enterprise > since we do see tombstone accumulation in the field under some materialized views workloads Closes scylladb/scylladb#20446 * github.com:scylladb/scylladb: cql-pytest: add test_compaction_tombstone_gc sstable_compaction_test: add mv_tombstone_purge_test sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection sstable_compaction_test: tombstone_purge_test: add testlog debugging sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp sstable, compaction: add debug logging for extended min timestamp stats compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats compaction: define max_purgeable_fn tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh sstables: scylla_metadata: add ext_timestamp_stats compaction_group, storage_group, table_state: add extended timestamp stats getters sstables, memtable: track live timestamps memtable_encoding_stats_collector: update row_marker: do nothing if missing	2024-09-13 08:56:51 +03:00
Takuya ASADA	3cd2a61736	dist: drop scylla-jmx Since JMX server is deprecated, drop them from submodule, build system and package definition. Related scylladb/scylla-tools-java#370 Related #14856 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes scylladb/scylladb#17969	2024-09-13 07:59:45 +03:00
Botond Dénes	fc9804ec31	Update tools/java submodule * tools/java 0b4accdd...e505a6d3 (1): > [C-S] Make it use DCAwareRoundRobinPolicy unless rack is provided Closes scylladb/scylladb#20562	2024-09-13 06:30:04 +03:00
Pavel Emelyanov	17e7d3145c	s3/client: Reuse class multipart_upload in do_upload_file Uploading a file is implemented by the do_upload_file class. This class re-implements a big portion of what's currently in multipart_upload one. This patch makes the former class inherit from the latter and removes all the duplication from it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-12 18:38:16 +03:00
Pavel Emelyanov	14b741afc9	s3/client: Split upload_sink_base class into two This class implements two facilities -- multipart upload protocol itself plus some common parts of upload_sink_impl (in fact -- only close() and plugs put(packet)). This patch aplits those two facilities into two classes. One of them will be re-used later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-12 18:00:19 +03:00
Sergey Zolotukhin	612a141660	raft: Fix race condition on override_snapshot_thresholds. When the server_impl::applier_fiber is paused by a co_await at line raft/server.cc:1375: ``` co_await override_snapshot_thresholds(); ``` a new snapshot may be applied, which updates the actual values of the log's last applied and snapshot indexes. As a result, the new snapshot index could become higher than the old value stored in _applied_idx at line raft/server.cc:1365, leading to an assertion failure in log::last_conf_for(). Since error injection is disabled in release builds, this issue does not affect production releases. This issue was introduced in the following commit `9dfa041fe1`, when error injection was added to override the log snapshot configuration parameters. How to reproduce: 1. Build debug version of randomized_nemesis_test ``` ninja-build build/debug/test/raft/randomized_nemesis_test ``` 2. Run ``` parallel --halt now,fail=1 -j20 'build/debug/test/raft/randomized_nemesis_test \ --run_test=test_frequent_snapshotting -- -c2 -m2G --overprovisioned --unsafe-bypass-fsync 1 \ --kernel-page-cache 1 --blocked-reactor-notify-ms 2000000 --default-log-level \ trace > tmp/logs/eraseme_{}.log 2>&1 && rm tmp/logs/eraseme_{}.log' ::: {1..1000} ``` Fixes scylladb/scylladb#20363 Closes scylladb/scylladb#20555	2024-09-12 16:19:27 +02:00
Aleksandra Martyniuk	59fba9016f	docs: operating-scylla: add task manager docs Admin-facing documentation of task manager. Closes scylladb/scylladb#20209	2024-09-12 16:42:28 +03:00
Nadav Har'El	d49dbb944c	Merge 'doc: move Alternator in the page tree and remove it's redundant ToC' from Anna Stuchlik This PR hides the ToC on the Alternator page, as we don't need it, especially at the end of the page. The ToC must be hidden rather than removed because removing it would, in turn, remove the "Getting Started With ScyllaDB Alternator" and "ScyllaDB Alternator for DynamoDB users" from the page tree and make them inaccessible. In addition, this PR moves Alternator higher in the page tree. Fixes https://github.com/scylladb/scylladb/issues/19823 Closes scylladb/scylladb#20565 * github.com:scylladb/scylladb: doc: move Alternator higher in the page tree doc: hide the redundant ToC on the Alternator page	2024-09-12 15:58:34 +03:00
Nadav Har'El	930accad12	alternator: return error on unused AttributeDefinitions A CreateTable request defines the KeySchema of the base table and each of its GSIs and LSIs. It also needs to give an AttributeDefinition for each attribute used in a KeySchema - which among other things specifies this attribute's type (e.g., S, N, etc.). Other, non-key, attributes do not have a specified type, and accordingly must not be mentioned in AttributeDefinitions. Before this patch, Alternator just ignored unused AttributeDefinitions entries, whereas DynamoDB throws an error in this case. This patch fixes Alternator's behavior to match DynamoDB's - and adds a test to verify this. Besides being more error-path-compatible with DynamoDB, this extra check can also help users: We already had one user complaining that an AttributeDefinitions setting he was using was ignored, not realizing that it wasn't used by any KeySchema. A clear error message would have saved this user hours of investigation. Fixes #19784. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20378	2024-09-12 15:37:18 +03:00
Pavel Emelyanov	632a65bffa	Merge 'repair: row_level: coroutinize more functions' from Avi Kivity Coroutinize more functions in row-level repair to improve maintainability. The functions all deal with repair buffers, so coroutinization does not affect performance. Cleanup, no reason to backport Closes scylladb/scylladb#20464 * github.com:scylladb/scylladb: repair: row_level: restore indentation repair: row_level: coroutinize repair_service::insert_repair_meta() repair: row_level: coroutinize repair_meta::get_full_row_hashes() repair: row_level: coroutinize repair_meta::apply_rows_on_follower() repair: row_level: coroutinize repair_meta::clear_working_row_buf() repair: row_level: coroutinize get_common_diff_detect_algorithm() repair: row_level: coroutinize repair_service::remove_repair_meta() (non-selective overload) repair: row_level: coroutinize repair_service::remove_repair_meta() (by-address overload) repair: row_level: coroutinize repair_service::remove_repair_meta() (by-id overload) repair: row_level: row_level_repair::run() repair: row_level: row_level_repair::send_missing_rows_to_follower_nodes() repair: row_level: row_level_repair::get_missing_rows_from_follower_nodes() repair: row_level: row_level_repair::negotiate_sync_boundary() repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_process_op() repair: row_level: coroutinize repair_meta::get_sync_boundary_handler() repair: row_level: coroutinize repair_meta::get_sync_boundary() repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions_handler() repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions() repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions_handler() repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions() repair: row_level: coroutinize repair_meta::repair_row_level_stop_handler() repair: row_level: coroutinize repair_meta::repair_row_level_stop() repair: row_level: coroutinize repair_meta::repair_row_level_start_handler() repair: row_level: coroutinize repair_meta::repair_row_level_start() repair: row_level: coroutinize repair_meta::get_combined_row_hash_handler() repair: row_level: coroutinize repair_meta::get_combined_row_hash() repair: row_level: coroutinize repair_meta::get_full_row_hashes_handler() repair: row_level: coroutinize repair_meta::get_full_row_hashes_with_rpc_stream() repair: row_level: coroutinize repair_meta::request_row_hashes()	2024-09-12 15:35:57 +03:00
Kefu Chai	197451f8c9	utils/rjson.cc: include the function name in exception message recently, we are observing errors like: ``` stderr: error running operation: rjson::error (JSON SCYLLA_ASSERT failed on condition 'false', at: 0x60d6c8e 0x4d853fd 0x50d3ac8 0x518f5cd 0x51c4a4b 0x5fad446) ``` we only passed `false` to the `RAPIDJSON_ASSERT()` macro, so what we have is but the type of the error (rjson::error) and a backtrace. would be better if we can have more information without recompiling or fetching the debug symbols for decipher the backtrace. Refs scylladb/scylladb#20533 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20539	2024-09-12 15:22:49 +03:00
Anna Stuchlik	851e903f46	doc: move Alternator higher in the page tree	2024-09-12 14:08:26 +02:00
Anna Stuchlik	a32ff55c66	doc: hide the redundant ToC on the Alternator page This commit hides the ToC, as we don't need it, especially at the end of the page. The ToC must be hidden rather than removed because removing it would, in turn, remove the "Getting Started With ScyllaDB Alternator" and "ScyllaDB Alternator for DynamoDB users" from the page tree and make them inaccessible.	2024-09-12 14:01:15 +02:00
Alexey Novikov	8b6e987a99	test: add test_pinned_cl_segment_doesnt_resurrect_data add test for issue when writes in commitlog segments pinned to another table can be resurrected. This test based on dtest code published in #14870 and adapted for community version. It's a regression test for #15060 fix and should fail before this patch and succeed afterwards. Refs #14870, #15060 Closes scylladb/scylladb#20331	2024-09-12 10:58:22 +03:00
Takuya ASADA	90ab2a24df	toolchain: restore multiarch build When we introduced optimized clang at `6e487a4`, we dropped multiarch build on frozen toolchain, because building clang on QEMU emulation is too heavy. Actually, even after the patch merged, there are two mode which does not build clang, --clang-build-mode INSTALL_FROM and --clang-build-mode SKIP. So we should restore multiarch build only these mode, and keep skipping on INSTALL mode since it builds clang. Since we apply multiarch on INSTALL_FROM mode, --clang-archive replaced to --clang-archive-x86_64 and --clang-archive-aarch64. Note that this breaks compatibility of existing clang archive, since it changes clang root directory name from llvm-project to llvm-project-$ARCH. Closes #20442 Closes scylladb/scylladb#20444	2024-09-12 10:44:45 +03:00
Kefu Chai	3e84d43f93	treewide: use seastar::format() or fmt::format() explicitly before this change, we rely on `using namespace seastar` to use `seastar::format()` without qualifying the `format()` with its namespace. this works fine until we changed the parameter type of format string `seastar::format()` from `const char*` to `fmt::format_string<...>`. this change practically invited `seastar::format()` to the club of `std::format()` and `fmt::format()`, where all members accept a templated parameter as its `fmt` parameter. and `seastar::format()` is not the best candidate anymore. despite that argument-dependent lookup (ADT for short) favors the function which is in the same namespace as its parameter, but `using namespace` makes `seastar::format()` more competitive, so both `std::format()` and `seastar::format()` are considered as the condidates. that is what is happening scylladb in quite a few caller sites of `format()`, hence ADT is not able to tell which function the winner in the name lookup: ``` /__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous 265 \| return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id()); \| ^~~~~~ /usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 4290 \| format(format_string<_Args...> __fmt, _Args&&... __args) \| ^ /__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 143 \| format(fmt::format_string<A...> fmt, A&&... a) { \| ^ ``` in this change, we change all `format()` to either `fmt::format()` or `seastar::format()` with following rules: - if the caller expects an `sstring` or `std::string_view`, change to `seastar::format()` - if the caller expects an `std::string`, change to `fmt::format()`. because, `sstring::operator std::basic_string` would incur a deep copy. we will need another change to enable scylladb to compile with the latest seastar. namely, to pass the format string as a templated parameter down to helper functions which format their parameters. to miminize the scope of this change, let's include that change when bumping up the seastar submodule. as that change will depend on the seastar change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-11 23:21:40 +03:00
Pavel Emelyanov	f227f4332c	test: Remove unused path local variable Left after #20499 :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20540	2024-09-11 23:10:25 +03:00
Avi Kivity	ed7d352e7d	Merge 'Validate checksums for uncompressed SSTables' from Nikos Dragazis This PR introduces a new file data source implementation for uncompressed SSTables that will be validating the checksum of each chunk that is being read. Unlike for compressed SSTables, checksum validation for uncompressed SSTables will be active for scrub/validate reads but not for normal user reads to ensure we will not have any performance regression. It consists of: * A new file data source for uncompressed SSTables. * Integration of checksums into SSTable's shareable components. The validation code loads the component on demand and manages its lifecycle with shared pointers. * A new `integrity_check` flag to enable the new file data source for uncompressed SSTables. The flag is currently enabled only through the validation path, i.e., it does not affect normal user reads. * New scrub tests for both compressed and uncompressed SSTables, as well as improvements in the existing ones. * A change in JSON response of `scylla validate-checksums` to report if an uncompressed SSTable cannot be validated due to lack of checksums (no `CRC.db` in `TOC.txt`). Refs #19058. New feature, no backport is needed. Closes scylladb/scylladb#20207 * github.com:scylladb/scylladb: test: Add test to validate SSTables with no checksums tools: Fix typo in help message of scylla validate-checksums sstables: Allow validate_checksums() to report missing checksums test: Add test for concurrent scrub/validate operations test: Add scrub/validate tests for uncompressed SSTables test/lib: Add option to create uncompressed random schemas test: Add test for scrub/validate with file-level corruption test: Check validation errors in scrub tests sstables: Enable checksum validation for uncompressed SSTables sstables: Expose integrity option via crawling mutation readers sstables: Expose integrity option via data_consume_rows() sstables: Add option for integrity check in data streams sstables: Remove unused variable sstables: Add checksum in the SSTable components sstables: Introduce checksummed file data source implementation sstables: Replace assert with on_internal_error	2024-09-11 23:09:45 +03:00
Calle Wilund	b7839ec5d0	cql_test_env: Use temp socket + retry to ensure usable port for message_service if listen is enabled Fixes #20543 In cql_test_env, if cfg_in.ms_listen is set, we try to get a free port for the current test on which message service rpc can bind. This to allow multiple tests in parallel. However, we just do this by using random and getting a number, not actually verifying it against host ports in use. This is complicated further by the fact that port reuse is effectively disabled in seastar (see reactor::posix_reuseport_detect()). Due to this, the solution applied here is a combo of * Create temp socket with port = 0 to get a previously free port * Close socket right before listen (to handle reuse not working) * Retry on EADDRINUSE Closes scylladb/scylladb#20547	2024-09-11 23:02:41 +03:00
Aleksandra Martyniuk	31ea74b96e	db: system_keyspace: change version of topology_requests schema In `880058073b` a new column (request_type) was added to topology_requests table, but the table's schema version wasn't changed. Due to that during cluster upgrade, the old and the new versions occur but they are not distinguishable. Add offset to schema version of topology_requests table if it contains request_type column. Fixes: #20299. Closes scylladb/scylladb#20402	2024-09-11 16:36:35 +03:00
Piotr Dulikowski	d98708013c	Merge 'view: move view_build_status to group0' from Michael Litvak Migrate the `system_distributed.view_build_status` table to `system.view_build_status_v2`. The writes to the v2 table are done via raft group0 operations. The new parameter `view_builder_version` stored in `scylla_local` indicates whether nodes should use the old or the new table. New clusters use v2. Otherwise, the migration to v2 is initiated by the topology coordinator when the feature is enabled. It reads all the rows from the old table and writes them to the new table, and sets `view_builder_version` to v2. When the change is applied, all view_builder services are updated to write and read from the v2 table. The old table `system_distributed.view_build_status` is set to read virtually from the new table in order to maintain compatibility. When removing a node from the cluster, we remove its rows from the table atomically (fixes https://github.com/scylladb/scylladb/issues/11836). Also, during the migration, we remove all invalid rows. Fixes scylladb/scylladb#15329 dtest https://github.com/scylladb/scylla-dtest/pull/4827 Closes scylladb/scylladb#19745 * github.com:scylladb/scylladb: view: test view_build_status table with node replace test/pylib: use view_build_status_v2 table in wait_for_view view_builder: common write view_build_status function view_builder: improve migration to v2 with intermediate phase view: delete node rows from view_build_status on node removal view: sanitize view_build_status during migration view: make old view_build_status table a virtual table replica: move streaming_reader_lifecycle_policy to header file view_builder: test view_build_status_v2 storage_service: add view_build_status to raft snapshot view_builder: migration to v2 db:system_keyspace: add view_builder_version to scylla_local view_builder: read view status from v2 table view_builder: introduce writing status mutations via raft view_builder: pass group0_client and qp to view_builder view_builder: extract sys_dist status operations to functions db:system_keyspace: add view_build_status_v2 table	2024-09-11 13:02:58 +02:00
Nikos Dragazis	d1152a200f	test: Add test to validate SSTables with no checksums In a previous patch we extended the return status of `sstables::validate_checksums()` to report if an SSTable cannot be validated due to a missing CRC component (i.e., CRC.db does not appear in TOC.txt). Add a test case for this. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:40 +03:00
Nikos Dragazis	1f275c71b1	tools: Fix typo in help message of scylla validate-checksums Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	5c0a7f706b	sstables: Allow validate_checksums() to report missing checksums Change the return type of `sstable::validate_checksums()` from binary (valid/invalid) to a ternary (valid/invalid/no_checksums). The third status represents uncompressed SSTables without a CRC component (no entry for CRC.db in the TOC). Also, change the JSON response of `sstable validate-checksums` to expose the new status. Replace the boolean value for valid/invalid checksums with an object that contains two boolean keys: one that indicates if the SSTable has checksums, and one that indicates if the checksums are valid or not. The second key is optional and appears only if the SSTable has checksums. Finally, update the documentation to reflect the changes in the API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	5a284f4a9d	test: Add test for concurrent scrub/validate operations Theoretically it is possible to launch more than one scrub instances simultaneously. Since the checksum component is a shared resource, accesses have to be synchronized. Add a test that launches two scrub operations in validate mode and ensures that the checksum component is loaded once, referenced by all scrub instances via shared pointers, and deleted once the scrub operations finish. Introduce an injection point to achieve concurrent execution of scrubs. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	e2353f3b3e	test: Add scrub/validate tests for uncompressed SSTables Currently the unit tests check scrub in validate mode against compressed SSTables only. Mirror the tests for uncompressed SSTables as well. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	2991b09c8e	test/lib: Add option to create uncompressed random schemas Extend the `random_schema_specification` to support creating both compressed and uncompressed schemas. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:32 +03:00
Nikos Dragazis	4f56c587f6	test: Add test for scrub/validate with file-level corruption Currently, we test scrub/validate only against a corrupted SSTable with content-level corruption (out-of-order partition key). Add a test for file-level corruption as well. This should trigger the checksum check in the underlying compressed file data source implementation. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	cc10a5f287	test: Check validation errors in scrub tests Scrub was extended in PR #11074 to report validation errors but the unit tests were not updated. Update the tests to check the validation errors reported by scrub. Validation errors must be zero for valid SSTables and non-zero for invalid SSTables. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	719757fba9	sstables: Enable checksum validation for uncompressed SSTables Extend the `sstable::validate()` to validate the checksums of uncompressed SSTables. Given that this is already supported for compressed SSTables, this allows us to provide consistent behavior across any type of SSTable, be it either compressed or uncompressed. The most prominent use case for this is scrub/validate, which is now able to detect file-level corruption in uncompressed SSTables as well. Note that this change will not affect normal user reads which skip checksum validation altogether. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	716fc487fd	sstables: Expose integrity option via crawling mutation readers Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	1d2dc9f2e1	sstables: Expose integrity option via data_consume_rows() Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	2feced32f7	sstables: Add option for integrity check in data streams Add a new boolean parameter in `sstable::data_stream()` to enable/disable integrity mechanisms in the underlying data streams. Currently, this only affects uncompressed SSTables and it allows to enable/disable checksum validation on each chunk. The validation happens transparently via the checksummed data source implementation. The reason we need this option is to allow differentiating the behavior between normal user reads and scrub/validate reads. We would like to enable scrub to verify checksums for uncompressed SSTables, while leaving normal user reads unchanged for performance reasons (read amplification due to round up of reads to chunk size and loading of the CRC component). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:27:54 +03:00
Nikos Dragazis	d5bd40ad2c	sstables: Remove unused variable Remove unused stream variable from `sstable::data_stream()`. This was introduced in commit `47e07b787e` but never used. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:27:54 +03:00
Nikos Dragazis	2575d20f41	sstables: Add checksum in the SSTable components Uncompressed SSTables store their checksums in a separate CRC.db file. Add this in the list of SSTable components. Since this component is used only for validation, load the component on-demand for validation tasks and delete it when all validation tasks finish. In more detail: - Make the checksum component shareable and weakly referencable. Also, add a constructor since it is no longer an aggregate. - Use a weak pointer to store a non-owning reference in the components and a shared pointer to keep the object alive while validation runs. Once validation finishes, the component should be cleaned up automatically. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:27:38 +03:00
Nikos Dragazis	b7dfba4c18	sstables: Introduce checksummed file data source implementation Introduce a new data source implementation for uncompressed SSTables. This is just a thin wrapper for a raw data source that also performs checksum validation for each chunk. This way we can have consistent behavior for compressed and uncompressed SSTables. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:26:18 +03:00
Botond Dénes	0e5b444777	Merge 'database::get_all_tables_flushed_at: fix return value' from Lakshmi Narayanan Sreethar The `database::get_all_tables_flushed_at` method returns a variable without setting the computed all_tables_flushed_at value. This causes its caller, `maybe_flush_all_tables` to flush all the tables everytime regardless of when they were last flushed. Fix this by returning the computed value from `database::get_all_tables_flushed_at`. Fixes #20301 Requires a backport to 6.0 and 6.1 as they have the same issue. Closes scylladb/scylladb#20471 * github.com:scylladb/scylladb: cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config database::get_all_tables_flushed_at: fix return value	2024-09-11 11:43:45 +03:00
Benny Halevy	4e8f3f4cdd	cql-pytest: add test_compaction_tombstone_gc Test tombstone garbage collection with: 1. conflicting live data in memtable (verifying there is no regression in this area) 2. deletion in memtable (reproducing scylladb/scylladb#20423) 3. materialized view update in memtable (reproducing scylladb/scylladb#20424) in materialized_views Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:06:23 +03:00
Benny Halevy	9270348c38	sstable_compaction_test: add mv_tombstone_purge_test Simulate view updates pattern and verify that they don't inhibit tombstone garbage collection. Verify fix for scylladb/scylladb#20424 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:06:23 +03:00
Benny Halevy	0407e50aa4	sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection Tests fix for scylladb/scylladb#20423 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:06:06 +03:00
Benny Halevy	a7caa79df7	sstable_compaction_test: tombstone_purge_test: add testlog debugging Add some testlog debug printouts for the make_* helpers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:58 +03:00
Benny Halevy	470d301fe3	sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp Rather than forging a timestamp from the gc_clock just use `next_timestamp` do it can be considered for tomebstone purging purposes. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:58 +03:00
Benny Halevy	5849ba83e0	sstable, compaction: add debug logging for extended min timestamp stats Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	7d893a5ed9	compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats When purging regular tombstone consult the min_live_timestamp, if available. For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp, if available, otherwise fallback to the min_live_timestamp. If both are missing, fallback to the legacy (and inaccurate) min_timestamp. Fixes scylladb/scylladb#20423 Fixes scylladb/scylladb#20424 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	57e9e9c369	compaction: define max_purgeable_fn Before we add a new, is_shadowable, parameter to it. And define global `can_always_purge` and `can_never_purge` functions, a-la `always_gc` and `never_gc`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	b6fabd98c6	tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh And define `never_gc` globally, same as `always_gc` Before adding a new, is_shadowable parameter to it. Since it is used in the context of compaction it better fits compaction_garbage_collector header rather than tombstone.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	4de4af954f	sstables: scylla_metadata: add ext_timestamp_stats Store and retrieve the optional extended timestamp statistics (min_live_timestamp and min_live_row_marker_timestamp) in the scylla_metadata component. Note that there is no need for a cluster feature to store those attributes since the scylla_metadata on-disk format is extensible so that old sstables can be read by new versions, seeing the extra stats is missing, and new sstables can be read by old versions that ignore unknown scylla metadata section types. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	6f202cf48b	compaction_group, storage_group, table_state: add extended timestamp stats getters To return the minimum live timestamp and live row-marker timestamp across a compaction_group, storage_group, or table_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	14d86a3a12	sstables, memtable: track live timestamps When garbage collecting tombstones, we care only about shadowing of live data. However, currently we track min/max timestamp of both live and dead data, but there is no problem with purging tombstones that shadow dead data (expired or shdowed by other tombstones in the sstable/memtable). Also, for shadowable tombstones, we track live row marker timestamps separately since, if the live row marker timestamp is greater than a shadowable tombstone timestamp, then the row marker would shadow the shadowable tombstone thus exposing the cells in that row, even if their timestasmp may be smaller than the shadow tombstone's. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:49 +03:00
Abhi	9b09439065	raft: Add descriptions for requested abort errors Fixes: scylladb/scylladb#18902 Closes scylladb/scylladb#20291	2024-09-10 17:56:29 +02:00
Botond Dénes	de81388edb	Merge 'commitlog: Handle oversized entries' from Calle Wilund Refs #18161 Yet another approach to dealing with large commitlog submissions. We handle oversize single mutation by adding yet another entry typo: fragmented. In this case we only add a fragment (aha) of the data that needs storing into each entry, along with metadata to correlate and reconstruct the full entry on replay. Because these fragmented entries are spread over N segments, we also need to add references from the first segment in a chain to the subsequent ones. These are released once we clear the relevant cf_id count in the base. * This approach has the downside that due to how serialization etc works w.r.t. mutations, we need to create an intermediate buffer to hold the full serialized target entry. This is then incrementally written into entries of < max_mutation_size, successively requesting more segments. On replay, when encountering a fragment chain, the fragment is added to a "state", i.e. a mapping of currently processing frag chains. Once we've found all fragments and concatenated the buffers into a single fragmented one, we can issue a replay callback as usual. Note that a replay caller will need to create and provide such a state object. Old signature replay function remains for tests and such. This approach bumps the file format (docs to come). To ensure "atomicity" we both force synchronization, and should the whole op fail, we restore segment state (rewinding), thus discarding data all we wrote. Closes scylladb/scylladb#19472 * github.com:scylladb/scylladb: commitlog/database: Make some commitlog options updatable + add feature listener features/config: Add feature for fragmented commitlog entries docs: Add entry on commitlog file format v4 commitlog_test: Add more oversized cases commitlog_replayer: Replay segments in order created commitlog_replayer: Use replay state to support fragmented entries commitlog_replayer: coroutinize partly commitlog: Handle oversized entries	2024-09-10 17:15:46 +03:00
Benny Halevy	8d67357c42	memtable_encoding_stats_collector: update row_marker: do nothing if missing If the row_marker is missing then its timestamp is missing as well, so there's no point calling update_timestamp for it. Better return early. This should cause no functional change. The following patch will add more logic for tracking extended timestamp stats. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 16:46:34 +03:00
Pavel Emelyanov	b6f662417c	table: Remove unused database& argument from take_snapshot() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20496	2024-09-10 14:53:06 +03:00
Gleb Natapov	af83c5e53e	group0: stop group0 before draining storage service during shutdown Currently storage service is drained while group0 is still active. The draining stops commitlogs, so after this point no more writes are possible, but if group0 is still active it may try to apply commands which will try to do writes and they will fail causing group0 state machine errors. This is benign since we are shutting down anyway, but better to fix shutdown order to keep logs clean. Fixes scylladb/scylladb#19665	2024-09-10 13:15:56 +02:00
Lakshmi Narayanan Sreethar	a0f4fe3fc4	cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-10 16:39:05 +05:30
Lakshmi Narayanan Sreethar	4ca720f0bd	database::get_all_tables_flushed_at: fix return value The `database::get_all_tables_flushed_at` method returns a variable without setting the computed all_tables_flushed_at value. This causes its caller, `maybe_flush_all_tables` to flush all the tables everytime regardless of when they were last flushed. Fix this by returning the computed value from `database::get_all_tables_flushed_at`. Fixes #20301 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-10 16:35:47 +05:30
Yaniv Michael Kaul	a4ff0aae47	HACKIGN.md: clarify the use of dbuild when running test.py If you are using dbuild, that's where test.py needs to run. Also, replace 'Docker image' with the more generic 'container' term. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#20336	2024-09-10 13:40:45 +03:00
Botond Dénes	08f109724b	docs/cql/ddl.rst: fix description of sstable_compression ScyllaDB doesn't support custom compressors. The available compressors are the only available ones, not the default ones. Adjust the text to reflect this. Closes scylladb/scylladb#20225	2024-09-10 13:39:24 +03:00
Pavel Emelyanov	cfa59ab73d	test: Use single temp dir for sharded<sstables::test_env> The test-env in question is mostly started in one-shard mode. Also there are several boost tests that start sharded<> environment. In that case instances on different shards live in different temp dirs. That's not critical yet, but better to have single directory for the whole test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20412	2024-09-10 11:25:04 +03:00
Artsiom Mishuta	f95c257a1e	[test.py]: Fail test teardown in case of task leakage In test.py every asyncio task spawned during the test must be finished before the next test, otherwise, tests might affect each other results. The developers are responsible for writing asyncio code in a way that doesn’t leave task objects unfinished. Test.py has a mechanism that helps test writers avoid such tasks. At the end of each test case, it verifies that the test did not produce/leave any tasks and sets an event object that fails the next test at the start if this is the case(issue https://github.com/scylladb/scylladb/issues/16472) The problem with this was that breaking the next test was counterintuitive, and the logging for this situation was insufficient and unobvious. notes: Task.cancel() is not an option to avoid task leakage 1) Calling cancel() Does Not Cancel The Task : the cancel() method just request that the target task cancel. 2) Calling cancel() Does Not Block Until The Task is Cancelled: If the caller needs to know the task is cancelled and done, it could await for the target 3) In particular PR, task.cancel() cancell task on client(ManagerClient) but not on http server(ScyllaManager). so "await" is needed. Closes scylladb/scylladb#20012	2024-09-10 10:51:45 +03:00
Pavel Emelyanov	ac2127a640	test: Call table::make_sstable() directly in compaction test The test in question generates a bunch of table_for_tests objects and creates sstables for each. For that it calls test_env::make_sstable(), but it can be made shorter, by calling table method directly. The hidden goal of this change is to remove the explicit caller of table::dir() method. The latter is going away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20451	2024-09-10 10:19:20 +03:00
Botond Dénes	76bb22664a	Merge 'Sanitize open_sstables() helper in compaction test' from Pavel Emelyanov This includes - coroutinization - elimination of unused overload Closes scylladb/scylladb#20456 * github.com:scylladb/scylladb: test: Squash two open_sstables() helper together test: Coroutinize open_sstables() helper	2024-09-10 10:18:33 +03:00
Botond Dénes	a4a4797e27	Merge 'Alternator: tests and other preparation towards allowing adding a GSI to an existing table' from Nadav Har'El This series prepares us for working on #11567 - allow adding a GSI to a pre-existing table. This will require changing the implementation of GSIs in Alternator to not use real columns in the schema for the materialized view, and instead of a computed column - a function which extracts the desired member from the `:attrs` map and de-serializes it. This series does not contain the GSI re-implementation itself. Rather it contains a few small cleanups and mostly - new regression tests that cover this area, of adding and removing a GSI, and using a GSI, in more details than the tests we already had. I developed most of these tests while working on buggy fixes for #11567; The bugs in those implementations were exposed by the tests added here - they exposed bugs both in the new feature of adding or removing a GSI, and also regressions to the ordinary operation of GSI. So these tests should be helpful for whoever ends up fixing #11567, be it me based on my buggy implementation (which is _not_ included in this patch series), or someone else. No backports needed - this is part of a new feature, which we don't usually backport. Closes scylladb/scylladb#20383 * github.com:scylladb/scylladb: test/alternator: more extensive tests for GSI with two new key attributes test/alternator: test invalid key types for GSI test/alternator: test combination of LSI and GSI test/alternator: expand another test to use different write operations test/alternator: test GSIs with different key types alternator: better error message in some cases of key type mismatch test/alternator: test for more elaborate GSI updates test/alternator: strengthen tests for empty attribute values test/alternator: fix typo in test_batch.py test/alternator: more checks for GSI-key attribute validation Alternator: drop unneeded "IS NOT NULL" clauses in MV of GSI/LSI test/alternator: add more checks for adding/deleting a GSI test/alternator: ensure table deletions in test_gsi.py	2024-09-10 10:13:52 +03:00
Pavel Emelyanov	42f8d06a17	test: Use correct schema in directory tests with created table There are some test cases in sstable_directory_test test actually create a table with CQL and then try to manipulate its sstables with the help of sstable_directory. Those tests use existing local helper that starts sharded<sstable_directory> and this helper passes test-local static schema to sstable_directory constructor. As a result -- the schema of a table that test case created and the schema that sstable_directory works with are different. They match in the columns layout, which helps the test cases pass, but otherwise are two different schema objects with different IDs. It's more correct to use table schema for those runs. The fix introduces another helper to start sharded<sstable_directory>, and the older wrapper around cql_test_env becomes unused. Drop it too not to encourage future tests use it and re-introduce schema mismatch again. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20499	2024-09-10 09:56:26 +03:00
Benny Halevy	f47b5e60bc	sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory To be able to atomically delete sstables both in base table directory and in its sub-directories, like `staging/`, use a shared pending_delete_dir under under the base directory. Note that this requires loading and processing the base directory first. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:13 +03:00
Benny Halevy	44bd183187	sstables: storage: keep base directory in base class so we can use the base (table) directory for e.g. pending_delete logs, in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:13 +03:00
Benny Halevy	027e64876a	sstables: storage: define opened_directory in header file So it can be used outside the storage module in the following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:13 +03:00
Benny Halevy	a7b92d7b6f	sstable_directory: use only dirlog Currently, there are leftover log messages using sstlog rather than dirlog, that was introduced in `aebd965f0e`, and that makes debugging harder. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:11 +03:00
Botond Dénes	fc690a60d8	Update tools/cqlsh submodule * tools/cqlsh 86a280a1...b09bc793 (6): > build(deps): bump actions/download-artifact in /.github/workflows > cqlshlib/test: Add test_formatting.py > cqlshlib/test: Use assertEqual instead of assertEquals > cqlsh.py: Send DESCRIBE statement to server before parsing > cqlsh.py: Fix indentation > cqlsh.py: change shebang to /usr/bin/env python3	2024-09-10 08:11:40 +03:00
Lakshmi Narayanan Sreethar	2148e33d37	compaction: remove unnecessary share bump for split, scrub, and upgrade When split, scrub, and upgrade compactions ran under the compaction group, they had to bump up their shares to a minimum of 200 to prevent slow progress as they neared completion, especially in workloads with inconsistent ingestion rates. Since commit `e86965c2` moved these compactions to the maintenance group, this share bump is no longer necessary. This patch removes the unnecessary share allocation. Fixes #20224 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20495	2024-09-09 22:03:38 +03:00
Avi Kivity	9448260b30	Merge 'major compaction: check only sstables being compacted for tombstone garbage collection' from Lakshmi Narayanan Sreethar Any expired tombstone can be garbage collected if it doesn't shadow data in the commit log, memtable, or uncompacting SSTables. This PR introduces a new mode to major compaction, enabled by the `consider_only_existing_data` flag that bypasses these checks. When enabled, memtables and old commitlog segments are cleared with a system-wide flush and all the sstables (after flush) are included in the compaction, so that it works with all data generated up to a given time point. This new mode works with the assumption that newly written data will not be shadowed by expired tombstones. So it ignores new sstables (and new data written to memtable) created after compaction started. Since there was a system wide flush, commitlog checks can also be skipped when garbage collecting tombstones. Introducing data shadowed by a tombstone during compaction can lead to undefined behavior, even without this PR, as the tombstone may or may not have already been garbage collected. Fixes #19728 Closes scylladb/scylladb#20031 * github.com:scylladb/scylladb: cql-pytest: add test to verify consider_only_existing_data compaction option tools/scylla-nodetool: add consider-only-existing-data option to compact command api: compaction: add `consider_only_existing_data` option compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled tombstone_gc_state: introduce with_commitlog_check_disabled() compaction: introduce new option to check only compacting sstables for gc compaction: rename maybe_flush_all_tables to maybe_flush_commitlog compaction: maybe_flush_all_tables: add new force_flush param	2024-09-09 20:45:41 +03:00
Avi Kivity	894b85ce95	Merge 'hints: send hints with CL=ALL if target is leaving' from Piotr Dulikowski Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835 Should be backported to 6.0 and 6.1; the dtest started failing due to topology on raft, which sped up execution of the test and exposed the preexisting problem. Closes scylladb/scylladb#20488 * github.com:scylladb/scylladb: test: topology_custom/test_hints: consistency test for decommission test: topology_custom/test_hints: move sync point helpers to top level test: topology/util: extract find_server_by_host_id hints: send hints with CL=ALL if target is leaving hints: inline do_send_one_mutation	2024-09-09 18:23:13 +03:00
Avi Kivity	c3e19425bd	Merge 'docs/dev/docker-hub.md: refresh aio-max-nr calculation' from Laszlo Ersek ~~~ What we have today in "docs/dev/docker-hub.md" on "aio-max-nr" dates back to scylla commit `f4412029f4` ("docs/docker-hub.md: add quickstart section with --smp 1", 2020-09-22). Problems with the current language: - The "65K" claim as default value on non-production systems is wrong; "fs/aio.c" in Linux initializes "aio_max_nr" to 0x10000, which is 64K. - The section in question uses equal signs (=) incorrectly. The intent was probably to say "which means the same as", but that's not what equality means. - In the same section, the relational operator "<" is bogus. The available AIO count must be at least as high (>=) as the requested AIO count. - Clearer names should be used; adjust_max_networking_aio_io_control_blocks() in "src/core/reactor.cc" sets a great example: - "reactor::max_aio" should be called "storage_iocbs", - "detect_aio_poll" should be called "preempt_iocbs", - "reactor_backend_aio::max_polls" should be called "network_iocbs". - The specific value 10000 for the last one ("network_iocbs") is not correct in scylla's context. It is correct as the Seastar default, but scylla has used 50000 since commit `2cfc517874` ("main, test: adjust number of networking iocbs", 2021-07-18). Rewrite the section to address these problems. See also: - https://github.com/scylladb/scylladb/issues/5981 - https://github.com/scylladb/seastar/pull/2396 - https://github.com/scylladb/scylladb/pull/19921 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> ~~~ No need for backporting; the documentation being refreshed targets developers as audience, not end-users. Closes scylladb/scylladb#20398 * github.com:scylladb/scylladb: docs/dev/docker-hub.md: refresh aio-max-nr calculation docs/dev/docker-hub.md: strip trailing whitespace	2024-09-09 15:04:38 +03:00
Botond Dénes	3e0bff161c	Merge 'Use yielding directory lister in sstable_directory' from Pavel Emelyanov The yielding lister is considered to be better replacement that scan_dir(lambda) one. Also, the sstable directory will be patched to scan the contents of S3 bucket and yielding lister fits better for generalization. Closes scylladb/scylladb#20114 * github.com:scylladb/scylladb: sstable_directory: Fix indentation after previous patches sstable_directory: Use yielding lister in .handle_sstables_pending_delete() sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs() sstable_directory: Use yielding lister in .prepare() sstable_directory: Shorten lister loop sstable_directory: Use with_closeable() in .process() directory_lister: Add noexcept default move-constructor	2024-09-09 14:35:51 +03:00
Pavel Emelyanov	0f48847d02	test: Use shorter with_sstable_directory overload() In sstable directory test there are two of those -- one that works on path, state, env and callback, and the other one that just needs env and callback, getting path from env and assuming state is normal. Two test cases in this test can enjoy the shorter one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20395	2024-09-09 14:25:24 +03:00
Pavel Emelyanov	2bfbbaffac	test: Use sstables::test_env to make sstables for schema loader test This test calls manager directly, but it's shorter to ask test_env for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20431	2024-09-09 14:22:58 +03:00
Takuya ASADA	e36c939505	dist: tune LimitNOFILES for large nodes On very large node, LimitNOFILES=80000 may not enough size, it can cause "Too many files" error. To avoid that, let's increase LimitNOFILES on scylla_setup stage, generate optimal value calurated from memory size and number of cpus. Closes scylladb/scylla-enterprise#4304 Closes scylladb/scylladb#20443	2024-09-09 14:13:49 +03:00
Piotr Smaron	60af48f5fd	cql: fix exception when validating KS in CREATE TABLE `c70f321c6f` added an extra check if KS exists. This check can throw `data_dictionary::no_such_keyspace` exception, which is supposed to be caught and a more user-friendly exception should be thrown instead. This commit fixes the above problem and adds a testcase to validate it doesn't appear ever again. Also, I moved the check for the keyspace outside of the `for` loop, as it doesn't need to be checked repeatedly. Fixes: scylladb/scylladb#20097 Closes scylladb/scylladb#20404	2024-09-09 13:30:57 +03:00
Nadav Har'El	ee7d4d8825	test/alternator: more extensive tests for GSI with two new key attributes The case of a GSI with two key attributes (hash and range) which were both not keys in the base table is a special case, not supported by CQL but allowed in Alternator. We have several tests for this case, but they don't cover all the strange possibilities that a GSI row disappears / reappears when one or two of the attributes is updated / inserted / deleted. So this patch includes a more extensive test for this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	ad53d6a230	test/alternator: test invalid key types for GSI This patch adds a test that types which are not allowed for GSI keys - basically any type except S(tring), B(ytes) or N(number), are rejected as expected - an error path that we didn't cover in existing tests. The new test passes - Alternator doesn't have a bug in this area, and as usual, also passes on DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	c4021d0819	test/alternator: test combination of LSI and GSI To allow adding a GSI to an existing table (refs #11567), we plan to re-implement GSIs to stop forcing their key attribute to become a real column in the schema - and let it remains a member of the map ":attrs" like all non-key attributes. But since LSIs can only be defined on table creation time, we don't have to change the LSI implementation, and these can still force their key to become a real column. What the test in this patch does is to verify that using the same attribute as a key of both GSI and LSI on the same table works. There's a high risk that it won't work: After all, the LSI should force the attribute to become a real column (to which base reads and writes go), but the GSI will use a computed column which reads from ":attrs", no? Well, it turns out that view.cc's value_getter::operator() always had a surprising exception which "rescues" this test and makes it pass: Before using a computed column, this code checks if a base-table column with the same name exists, and if it does, it is used instead of the computed column! It's not clear why this logic was chosen, but it turns out to be really useful for making the test in this test pass. And it's important that if we ever change that unintuitive behavior, we will have this test as a regression test. The new test unsurprisingly passes on current Scylla because its implementation of GSI and LSI is still the same. But it's an important regression test for when we change the GSI implementation. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	7563d0a8a1	test/alternator: expand another test to use different write operations Expand another Alternator test (test_gsi.py::test_gsi_missing_attribute) to write items not just using PutItem, but also using UpdateItem and BatchWriteItem. There is a risk that these different operations use slightly different code paths - so better check all of them and not just PutItem. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	4d02beec53	test/alternator: test GSIs with different key types All of the tests in test/alternator/test_gsi.py use strings as the GSI's keys. This tests a lot of GSI functionality, but we implicitly assumed that our implementation used an already-correct and already-tested implementation of key columns and MV, which if it works for one type, works for other types as well. This assumption will no longer hold if we reimplement GSI on a "computed column" implementation, which might run different code for different types of GSI key attributes (the supported types are "S"tring, "B"ytes, and "N"umber). So in this patch we add tests for writing and reading different types of GSI key attributes. These tests showed their importance as regression tests when the first draft of the GSI reimplementation series failed them. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	80a0798e77	alternator: better error message in some cases of key type mismatch Alternator uses a common function get_typed_value() to read the values of key attribute and confirm they have the expected type (key attributes have a fixed type in the schema). If the type is wrong, we want to print a "Type mismatch" error message. But the current implementation did the checks in the wrong order, and as a result could print a "Malformed value object" message instead of a "Type mismatch". That could happen if the wrong type is a boolean, map, list, or basically any type whose JSON representation is not a string. The allowed key types - bytes), string and number - all have string representations in JSON, but still we should first report the mismatched type and only report the "Malformed object" if the type matches but the JSON is faulty. In addition to fixing the error message, we fix an existing test which complained in a comment (but ignored) that the error message in some case (when trying to use a map where a key is expected) the strange "Malformed value object" instead of the expected "Type mismatch". The next patch will add an additional reproducer for this problem and its fix. That test will do: ``` with pytest.raises(ClientError, match='ValidationException.*mismatch'): test_table_gsi_6.put_item(Item={'p': p, 's': True}) ``` I.e., it tries to set a boolean value for a string key column, and expect to get the "Type mismatch" error and not the ugly "Malformed value object". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	624ed32278	test/alternator: test for more elaborate GSI updates Most tests in test_gsi.py involve simple updates to a GSI, just creating a GSI row. Although a couple of tests did involve more complex operations (such as an update requiring deleting an old row from the GSI and inserting a new one,), we did not have a single organized test designed to check all these cases, so we add one in this patch. This test (test_update_gsi_pk) will be important for verifying the low-level implementation of the new GSI implementation that we plan to based on computed columns. Early versions of that code passed many of the simpler tests, but not this one. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	65d4ddf093	test/alternator: strengthen tests for empty attribute values We soon plan to refactor Alternator's GSI and change the validation of values set in attributes which are GSI keys. It's important to test that when updating attributes that are not GSI keys - and are either base- table keys or normal non-key attributes - the validation didn't change. For example, empty strings are still not allowed in base-table key attributes, but are allowed (since May 2020 in DynamoDB) in non-key attributes. We did have tests in this area, but this patch strengthens them - adding a test for non-key attribute, and expanding the key-attribute test to cover the UpdateItem and BatchWriteItem operations, not just PutItem. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:41 +03:00
Avi Kivity	9a5061209f	Merge '[test.py] Enable allure for python test' from Andrei Chekun To enhance the test reports UX: 1. switching off/on passed/failed/skipped test for better visibility 2. better searching in test results 3. understanding the trends of execution for each test 4. better configurability of the final report Enable allure adapter for all python tests. Add tags and parameters to the test to be able to distinguish them across modes and runs. Related: https://github.com/scylladb/qa-tasks/issues/1665 Related: https://github.com/scylladb/scylladb/pull/19335 Related: https://github.com/scylladb/scylladb/pull/18169 Closes scylladb/scylladb#19942 * github.com:scylladb/scylladb: [test.py] Clean duplicated arg for test suite [test.py] Enable allure for python test	2024-09-09 12:53:00 +03:00
Nadav Har'El	5859daed68	test/alternator: fix typo in test_batch.py Two tests had a typo 'item' instead of 'Item'. If Scylla had a bug, this could have caused these tests to miss the bug. Scylla passes also the fixed test, because Scylla's behavior is correct. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	1f8e39f680	test/alternator: more checks for GSI-key attribute validation When an attribute is a GSI key, DynamoDB imposes certain rules when writing values for it - it must be of the declared type for that key, and can't be an empty string. We had tests for this, but all of them did the write using the PutItem operation. In this patch we also test the same things using the UpdateItem and BatchWriteItem operations. Because Scylla has different code paths for these three operations, and each code path needs to remember to call the validation function, all three should all be checked and not just PutItem. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	cf5d7ce212	Alternator: drop unneeded "IS NOT NULL" clauses in MV of GSI/LSI Scylla's materialized views naturally skip any base rows where the view's key isn't set (is NULL), because we can't create a view row with a null key. To make the user aware that this is happening, the user is required to add "WHERE ... IS NOT NULL" for the view's key columns when defining the view. However, the only place that these extra IS NOT NULL clauses are checked are in the CQL "CREATE MATERIALIZED VIEWS" statement - they are completely ignored in all other places in the code. In particular, when we create a materialized view in Alternator (GSI or LSI), we don't have to add these "IS NOT NULL" clauses, as they are outright ignored. We didn't know they were ignored, and made an effort to add them - but no matter how incorrectly we did it, it didn't matter :-) In commit `2bf2ffd3ed` it turned out we had a typo that caused the wrong column name to be printed. Also, even today we are still missing base key columns that aren't listed as a view key in Alternator but still added as view clustering keys in Scylla - and again the fact these were missing also didn't matter. So I think it's time to stop pretending, and stop calculating these "IS NOT NULL" strings, so this patch outright removes them from the Alternator view-creation code. Beyond being a nice cleanup of unnecessary and inaccurate code, it will also be necessary when we allow in later patches to index for an Alternator attribute "x" not a real column x in the base table but rather an element in the ":attrs" map - so adding a "x IS NOT NULL" isn't only unnecessary, it is outright illegal: The expression evaluation code, even though it doesn't do anything with the "IS NOT NULL" expression, still verifies that "x" is a valid column, which it isn't. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	8beaa9d10e	test/alternator: add more checks for adding/deleting a GSI We already have tests for the feature of adding or removing a GSI from an existing table, which Alternator doesn't yet support (issue #11567). In this patch we add another check, how after a GSI is added, you can no longer add items with the wrong type for the indexed type, and after removing a GSI, you can. The expanded tests pass on DynamoDB, and obviously still xfail on Alternator because the feature is not yet implemented. Refs #11567. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	ce19311ab3	test/alternator: ensure table deletions in test_gsi.py Most of the Alternator tests are careful to unconditionally remove the test tables, even if the test fails. This is important when testing on a shared database (e.g., DynamoDB) but also useful to make clean shutdown faster as there should be no user table to flush. We missed a few such cases in test_gsi.py, and fixed some of them in commit `59c1498338` but still missed a few, and this patch fixes some more instances of this problem. We do this by using the context manager new_test_table() - which automatically deletes the table when done - instead of the function create_test_table() which needs an explicit delete at the end. There are no functional changes in this patch - most of the lines changed are just reindents. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Kefu Chai	ccbd3eb9f7	main: do not register redis and alternator services if not enabled in main.cc, we start redis with `ss.local().register_protocol_server()` only if it is enabled. but `storage_service` always calls `stop_server()` with _all_ registered server, no matter if they have started or not. in general, it does not hurt. for instance, `redis::controller::stop_server()` is a noop, if the controller is not started. but `storage_service` still print the logging message like: ``` INFO 2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server INFO 2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server was successful ``` this could be confusing or at least distracting when a field engineer looks at the log. also, please note, `redis_port` and `redis_ssl_port` cannot be changed dynamically once scylla server is up, so we do not need to worry about "what if the redis server is started at runtime, how can is be stopped?". the same applies to alternator service. in this change, to avoid surprises, we conditionally register the protocol servers with the storage service based on their enabled statuses. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20472	2024-09-09 08:44:50 +03:00
Avi Kivity	58713f3080	types: remove some unused free functions These functions are unused, so safe to remove, and reduce the work to convert to managed_bytes{,_view}. Closes scylladb/scylladb#20482	2024-09-09 08:36:33 +03:00
Kefu Chai	720997d1de	cql3/statements: mark format string as `constexpr const` after switching over to the new `seastar::format()` which enables the compile-time format check, the fmt string should be a constexpr, otherwise `fmt::format()` is not able to perform the check at compile time. to prepare for bumping up the seastar module to a version which contains the change of `seastar::format()`, let's mark the format string with `constexpr const`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20484	2024-09-09 08:35:45 +03:00
Piotr Dulikowski	6f3d0af994	test: topology_custom/test_hints: consistency test for decommission Adds the test_hints_consistency_during_decommission test which reproduces the failure observed in scylladb/scylla-dtest#4582. It uses error injections, including the newly added topology_coordinator_pause_after_streaming injection, to reliably orchestrate the scenario observed there. In a nutshell, the test makes sure to replay hints after streaming during decommission has finished, but before the cluster switches to reading from new replicas. Without the fix, hints would be replayed to the decommissioned node and then would be lost forever after the cluster start reading from new replicas.	2024-09-08 10:51:38 +02:00
Piotr Dulikowski	30d53167c9	test: topology_custom/test_hints: move sync point helpers to top level Move create_sync_point and await_sync_point from the scope of the test_sync_point test to the file scope. They will be used in a test that will be introduced in the commit that follows.	2024-09-08 10:51:38 +02:00
Piotr Dulikowski	a75d0c0bfa	test: topology/util: extract find_server_by_host_id Move it out from test_mv_tablets_replace.py. It will be used by a test introduced in a later commit.	2024-09-08 10:51:38 +02:00
Piotr Dulikowski	61ac0a336d	hints: send hints with CL=ALL if target is leaving Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destiantion of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835	2024-09-08 10:50:59 +02:00
Piotr Dulikowski	8abb06ab82	hints: inline do_send_one_mutation It's a small method and it is only used once in send_one_mutation. Inlining it lets us get rid of its declaration in the header - now, if one needs to change the variables passed from one function to another, it is no longer necessary to change the header.	2024-09-08 07:19:35 +02:00
Avi Kivity	ab32ce6b45	Merge 'Coroutinize sstable::read_summary() method' from Pavel Emelyanov Shorter and simpler this way. Hopefully it doesn't sit on critical paths Closes scylladb/scylladb#20460 * github.com:scylladb/scylladb: sstables: Fix indentation after previous patch sstables: Coroutinize sstable::read_summary()	2024-09-06 18:45:54 +03:00
Kefu Chai	aeaeaf345d	compaction: use structured binding when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20473	2024-09-06 18:17:48 +03:00
Kamil Braun	427ad2040f	Merge 'test: randomized failure injection for Raft-based topology' from Evgeniy Naydanov The idea of the test is to have a cluster where one node is stressed with injections and failures and the rest of the cluster is used to make progress of the raft state machine. To achieve this following two lists introduced in the PR: - ERROR_INJECTIONS in error_injections.py - CLUSTER_EVENTS in cluster_events.py Each cluster event is an async generator which has 2 yields and should be used in the following way: 0. Start the generator: ```python >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection) ``` 1. Run the prepare part (before the first yield) ```python >>> await anext(cluster_event_steps) ``` 2. Run the cluster event itself (between the yields) ```python >>> await anext(cluster_event_steps) ``` 3. Run the check part (after the second yield) ```python >>> await anext(cluster_event, None) ``` Closes scylladb/scylladb#16223 * github.com:scylladb/scylladb: test: randomized failure injection for Raft-based topology test: error injections for Raft-based topology [test.py] topology.util: add get_non_coordinator_host() function [test.py] random_tables: add UDT methods [test.py] random_tables: add CDC methods [test.py] api: get scylla process status [test.py] api: add expected_server_up_state argument to server_add()	2024-09-06 14:00:41 +02:00
Pavel Emelyanov	226fd03bae	Merge 'service/qos: remove unused marked_for_deletion field from service_level struct' from Piotr Dulikowski The `service_level::marked_for_deletion` field is always set to `false`. It might have served some purpose in the past, but now it can be just removed, simplifying the code and eliminating confusion about the field. This is just code cleanup, no backport is needed. Closes scylladb/scylladb#20452 * github.com:scylladb/scylladb: service/qos: remove the marked_for_deletion parameter service/qos: add constructors to service_level	2024-09-06 11:44:25 +03:00
Kamil Braun	52fdf5b4c9	test: test_raft_no_quorum: increase raft timeout in debug mode The test cases in this file use an error injection to reduce raft group 0 timeouts (from the default 1 minute), in order to speed up the tests; the scenarios expect these timeouts to happen, so we want them to happen as quick as possible, but we don't want to reduce timeouts so much that it will make other operations fail when we don't expect them to (e.g. when the test wants to add a node to the cluster). Unfortunately the selected 5 seconds in debug mode was not enough and made the tests flaky: scylladb/scylladb#20111. Increase it to 10 seconds. This unfortunately will slow down these tests as they have to sometimes wait for 10 seconds for the timeout to happen. But better to have this than a flaky test. Fixes: scylladb/scylladb#20111 Closes scylladb/scylladb#20320	2024-09-06 11:40:09 +03:00
Avi Kivity	384a09585b	repair: row_level: repair_get_row_diff_with_rpc_stream_process_op: simplify return value During review of `0857b63259` it was noticed that the function repair_get_row_diff_with_rpc_stream_process_op() and its _slow_path callee only ever return stop_iteration::no (or throw an exception). As such, its return value is useless, and in fact the only caller ignores it. Simplify by returning a plain future<>. Closes scylladb/scylladb#20441	2024-09-06 11:39:21 +03:00
Kefu Chai	034c1df29b	auth/authentication_options: move fmt::formatter up so that it is accessible from its caller. if we enforce the compile-time format string check, the formatter would need the access to the specialization of `fmt::formatter` of the arguments being foramtted. to be prepared for this change, let's move the `fmt::formatter` specialization up, otherwise we'd have following error after switching to the compile-time format string check introduced by a recent seastar change: ``` In file included from ./auth/authenticator.hh:22: ./auth/authentication_options.hh:50:49: error: call to consteval function 'fmt::basic_format_string<char, auth::authentication_option &>::basic_format_string< char[32], 0>' is not a constant expression 50 \| : std::invalid_argument(fmt::format("The {} option is not supported.", k)) { \| ^ ./auth/authentication_options.hh:57:13: error: explicit specialization of 'fmt::formatter<auth::authentication_option>' after instantiation 57 \| struct fmt::formatter<auth::authentication_option> : fmt::formatter<string_view> { \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/base.h:1228:17: note: implicit instantiation first required here 1228 \| -> decltype(typename Context::template formatter_type<T>().format( \| ^ In file included from replica/distributed_loader.cc:30: ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20447	2024-09-06 09:12:38 +03:00
Pavel Emelyanov	527fc9594a	sstables: Fix indentation after previous patch And move the comment inside if while at it, it looks better in there (and makes less churn in the patch itself) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-06 08:43:08 +03:00
Pavel Emelyanov	f7325586f3	sstables: Coroutinize sstable::read_summary() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-06 08:43:07 +03:00
Evgeniy Naydanov	dd99cf197d	test: randomized failure injection for Raft-based topology The idea of the test is to have a small cluster, where one node is stressed with injections and failures and the rest of the cluster is used to make progress of the Raft state machine. To achieve this following two lists introduced in the commit: - ERROR_INJECTIONS in error_injections.py - CLUSTER_EVENTS in cluster_events.py Each cluster event is an async generator which has 2 yields and should be used in the following way: 0. Start the generator: >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection) 1. Run the prepare part (before the first yield) >>> await anext(cluster_event_steps) 2. Run the cluster event itself (between the yields) >>> await anext(cluster_event_steps) 3. Run the check part (after the second yield) >>> await anext(cluster_event, None)	2024-09-05 22:11:32 +00:00
Evgeniy Naydanov	769424723b	test: error injections for Raft-based topology Add following error injections: - stop_after_init_of_system_ks - stop_after_init_of_schema_commitlog - stop_after_starting_gossiper - stop_after_starting_raft_address_map - stop_after_starting_migration_manager - stop_after_starting_commitlog - stop_after_starting_repair - stop_after_starting_cdc_generation_service - stop_after_starting_group0_service - stop_after_starting_auth_service - stop_during_gossip_shadow_round - stop_after_saving_tokens - stop_after_starting_gossiping - stop_after_sending_join_node_request - stop_after_setting_mode_to_normal_raft_topology - stop_before_becoming_raft_voter - topology_coordinator_pause_after_updating_cdc_generation - stop_before_streaming - stop_after_streaming - stop_after_bootstrapping_initial_raft_configuration	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	ac4ffbad5c	[test.py] topology.util: add get_non_coordinator_host() function Add get_non_coordinator_host() function which returns ServerInfo for the first host which is not a coordinator or None if there is no such host. Also rework get_coordinator_host() to not fail if some of the hosts don't have a host id.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	d95d698601	[test.py] random_tables: add UDT methods Add .add_udt() / .drop_udt() methods.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	8cb442ca50	[test.py] random_tables: add CDC methods Add .enabled_cdc() / .disable_cdc() methods.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	a7119cf420	[test.py] api: get scylla process status Add `server_get_process_status(server_id)` API call and wait_for_scylla_process_status() helper function.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	241bbb4172	[test.py] api: add expected_server_up_state argument to server_add() Allow to return from server_add() when a server reaches specified state. One of: - PROCESS_STARTED - HOST_ID_QUERIED (previously called NOT_CONNECTED) - CQL_CONNECTED (renamed from CONNECTED) - CQL_QUERIED (was just QUERIED) Also, rename CqlUpState to ServerUpState and move to internal_types.	2024-09-05 22:11:31 +00:00
Pavel Emelyanov	f02a686115	schema: Ditch make_shared_schema() helper Now it's unused Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 19:34:00 +03:00
Pavel Emelyanov	d045aa6df7	test: Tune up indentation in uncompressed_schema() After it was switched to use schema builder, the indenation of untouched lines deserves one extra space. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 19:33:29 +03:00
Pavel Emelyanov	a1deba0779	test: Make tests use schema_builder instead of make_shared_schema Everything, but perf test is straightforward switch. The perf-test generated regular columns dynamically via vector, with builder the vector goes away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 19:31:30 +03:00
Avi Kivity	c57b8dd0bf	repair: row_level: restore indentation	2024-09-05 18:38:43 +03:00
Avi Kivity	710977ef88	repair: row_level: coroutinize repair_service::insert_repair_meta() Some of the indentation was broken, and is partially repaired by this change.	2024-09-05 17:59:42 +03:00
Avi Kivity	f23a32ed84	repair: row_level: coroutinize repair_meta::get_full_row_hashes()	2024-09-05 17:56:27 +03:00
Avi Kivity	607747beb1	repair: row_level: coroutinize repair_meta::apply_rows_on_follower()	2024-09-05 17:55:07 +03:00
Avi Kivity	89d4394d12	repair: row_level: coroutinize repair_meta::clear_working_row_buf()	2024-09-05 17:52:32 +03:00
Pavel Emelyanov	69a5ec69c4	test: Use table storage options in sstable_directory_test When creating sstables this test allocates temporary local options. That works, because this test doesn't run on object storage, but it's more correct to pick storage options from the table at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20440	2024-09-05 17:48:25 +03:00
Avi Kivity	4cfc25f8d7	repair: row_level: coroutinize get_common_diff_detect_algorithm() The function is threaded, but the inner lambda can be coroutinized.	2024-09-05 17:47:27 +03:00
Michael Litvak	9545e0a114	view: test view_build_status table with node replace Add a test replacing a node and verifying the contents of the view_build_status table are updated as expected, having rows for the new node and no rows for the old node.	2024-09-05 15:42:35 +03:00
Michael Litvak	3ca5dd537f	test/pylib: use view_build_status_v2 table in wait_for_view Change the util function wait_for_view to read the view build status from the system.view_build_status_v2 table which replaces system_distributed.view_build_status. The old table can still be used but it is less efficient because it's implemented as a virtual table which reads from the v2 table, so it's better to read directly from the v2 table. This can cause slowness in tests. The additional util function wait_for_view_v1 reads from the old table. This may be needed in upgrade tests if the v2 table is not available yet.	2024-09-05 15:42:35 +03:00
Michael Litvak	5c95aaae0d	view_builder: common write view_build_status function When writing to the view_build_status we have common logic related to upgrade and deciding whether to write to sys_dist ks or group0. Move this common logic to a generic function used by all functions writing to the table.	2024-09-05 15:42:35 +03:00
Michael Litvak	c1f3517a75	view_builder: improve migration to v2 with intermediate phase Add an intermediate phase to the view builder migration to v2 where we write to both the old and new table in order to not lose writes during the migration. We add an additional view builder version v1_5 between v1 and v2 where we write to both tables. We perform a barrier before moving to v2 to ensure all the operations to the old table are completed.	2024-09-05 15:42:35 +03:00
Michael Litvak	446ad3c184	view: delete node rows from view_build_status on node removal When a node is removed we want to clean its rows from the view_build_status table. Now when removing a node and generating the topology state update, we generate also the mutations to delete all the possible rows belonging to the node from the table.	2024-09-05 15:42:35 +03:00
Michael Litvak	08462aaff7	view: sanitize view_build_status during migration When migrating the view_build_status to v2, skip adding any leftover rows that don't correspond to an existing node or an existing view. Previously such rows could have been created and not cleaned, for example when a node is removed.	2024-09-05 15:42:35 +03:00
Michael Litvak	78d6ff6598	view: make old view_build_status table a virtual table After migrating the view build status from system_distributed.view_build_status to system.view_build_status_v2, we set system_distributed.view_build_status to be a virtual table, such that reading from it is actually reading from the underlying new table. The reason for this is that we want to keep compatibility with the old table, since it exists also in Cassandra and it is used by various external tools to check the view build status. Making the table virtual makes the transition transparent for external users. The two tables are in different keyspaces and have different shard mapping. The v1 table is a distributed table with a normal shard mapping, and the v2 table is a local table using the null sharder. The virtual reader works by constructing a multishard reader which reads the rows from shard zero, and then filtering it to get only the rows owned by the current shard.	2024-09-05 15:42:35 +03:00
Michael Litvak	09eadcff08	replica: move streaming_reader_lifecycle_policy to header file move the class streaming_reader_lifecycle_policy to a header file in order to make it reusable in other places.	2024-09-05 15:42:35 +03:00
Michael Litvak	22f4f1fa49	view_builder: test view_build_status_v2 Add tests to verify the new view_build_status_v2 is used by the view_builder and can be read from all nodes with the expected values. Also test a migration from the v1 layout to v2.	2024-09-05 15:42:35 +03:00
Michael Litvak	fcf66ad541	storage_service: add view_build_status to raft snapshot Include the table system.view_build_status_v2 in the raft snapshot, and also the view_builder version parameter.	2024-09-05 15:42:30 +03:00
Michael Litvak	8d25a4d678	view_builder: migration to v2 Migrate view_builder to v2, to store the view build status of all nodes in the group0 based table view_build_status_v2. Introduce a feature view_build_status_on_group0 so we know when all nodes are ready to migrate and use the new table. A new cluster is initialized to use v2. Otherwise, The topology coordinator initiates the migration when the feature is enabled, if it was not done already. The migration reads all the rows in the v1 table and writes it via group0 to the v2 table, together with a mutation that updates the view_builder parameter in scylla_local to v2. When this mutation is applied, it updates the view_builder service to start using the v2 table.	2024-09-05 15:41:04 +03:00
Michael Litvak	f3887cd80b	db:system_keyspace: add view_builder_version to scylla_local Add a new scylla_local parameter view_builder_version, and functions to read and mutate the value. The version value defaults to v1 if it doesn't exist in the table.	2024-09-05 15:41:04 +03:00
Michael Litvak	d58a8930c4	view_builder: read view status from v2 table Update the view_status function to read from the new view_build_status_v2 table when enabled. The code to read and extract the values is identical to v1 and v2 except it accesses different keyspace and table, so the common code is extracted to the view_status_common function and used by both v1 and v2 flows with appropriate parameters.	2024-09-05 15:41:04 +03:00
Michael Litvak	05d18b818f	view_builder: introduce writing status mutations via raft Introduce the announce_with_raft function as alternative to writing view build status mutations to the table in system_distributed. Instead, we can apply the mutations via group0 operation to the view_build_status_v2 table. All the view_builder functions that write to the view_build_status table can be configured by a flag to either write the legacy way or via raft.	2024-09-05 15:41:04 +03:00
Michael Litvak	b8c7a10ae6	view_builder: pass group0_client and qp to view_builder Store references of group0_client and query_processor in the view_builder service. They are required for generating mutations and writing them via group0.	2024-09-05 15:41:04 +03:00
Michael Litvak	b2332c5a72	view_builder: extract sys_dist status operations to functions Extract all the update and read operations of a view build status in the table system_distributed.view_build_status to separate functions.	2024-09-05 15:41:04 +03:00
Michael Litvak	bf4a58bf91	db:system_keyspace: add view_build_status_v2 table add the table system.view_build_status_v2 with the same schema as system_distributed.view_build_status.	2024-09-05 15:41:04 +03:00
Gleb Natapov	807e37502a	db/consistency_level: do not use result from heat weighted load balancer if it contains duplicates Because of https://github.com/scylladb/scylladb/issues/9285 heat weighted load balancer may sometimes return same node twice. It may cause wrong data to be read or unexpected errors to be returned to a client. Since the original bug is not easy to fix and it is rare lets introduce a workaround. We will check for duplicates and will use non HWLB one if one is found. Fixes scylladb/scylladb#20430 Closes scylladb/scylladb#20414	2024-09-05 15:21:35 +03:00
Wojciech Mitros	c1b0434c16	test: finish mv view update explicitly instead of relying on delay duration When testing mv admission control, we perform a large view update and check if the following view update can be admitted due to the high view backlog usage. We rely on a delay which keeps the backlog high for longer to make sure the backlog is still increased during the second write. However, in some test runs the delay is not long enough, causing the second write to miss the large backlog and not hit admission control. In this patch we keep the increased backlog high using another injection instead of relying on a delay to make absolute sure that the backlog is still high during the second write. Fixes scylladb/scylladb#20382 Closes scylladb/scylladb#20445	2024-09-05 15:08:04 +03:00
Lakshmi Narayanan Sreethar	7c5efab7d5	cql-pytest: add test to verify consider_only_existing_data compaction option Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:34:13 +05:30
Lakshmi Narayanan Sreethar	68a902f74a	tools/scylla-nodetool: add consider-only-existing-data option to compact command Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:34:06 +05:30
Lakshmi Narayanan Sreethar	84d06a13c7	api: compaction: add `consider_only_existing_data` option Added a new parameter `consider_only_existing_data` to major compaction API endpoints. When enabled, major compaction will: - Force-flush all tables. - Force a new active segment in the commit log. - Compact all existing SSTables and garbage-collect tombstones by only checking the SSTables being compacted. Memtables, commit logs, and other SSTables not part of the compaction will not be checked, as they will only contain newer data that arrived after the compaction started. The `consider_only_existing_data` is passed down to the compaction descriptor's `gc_check_only_compacting_sstables` option to ensure that only the existing data is considered for garbage collection. The option is also passed to the `maybe_flush_commitlog` method to make sure all the tables are flushed and a new active segment is created in the commit log. Fixes #19728 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	98bc44f900	compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp When gc_check_only_compacting_sstables is enabled, get_max_purgeable_timestamp should not check memtables and other sstables that are not part of the compaction to deduce the max purgeable timestamp. Refs #19728 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	7b9ce8e040	compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled When the compaction_descriptor's gc_check_only_compacting_sstables flag is enabled, create and pass a copy of the get_tombstone_gc_state that will skip checking the commitlog. Refs #19728 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	12fa40154b	tombstone_gc_state: introduce with_commitlog_check_disabled() Added a new method, `with_commitlog_check_disabled`, that returns a new copy of the tombstone_gc_state but with commitlog check disabled. This will be used by a following patch to disable commitlog checks during compaction. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	5b8c6a8a5e	compaction: introduce new option to check only compacting sstables for gc Added new option, `gc_check_only_compacting_sstables`, to compaction_descriptor to control the garbage collection behavior. The subsequent patches will use this flag to decide if the garbage collection has to check only the SSTables being compacted to collect tombstones. This option is disabled for now and will be enabled based on a new compaction parameter that will be added later in this patch series. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	5e6bffc146	compaction: rename maybe_flush_all_tables to maybe_flush_commitlog Major compaction flushes all tables as a part of flushing the commitlog. After forcing new active segments in the commitlog, all the tables are flushed to enable reclaim of older commitlog segments. The main goal is to flush the commitlog and flushing all the table is just a dependency. Rename maybe_flush_all_tables to maybe_flush_commitlog so that it reflects the actual intent of the major compaction code. Added a new wrapper method to database::flush_all_tables(), database::flush_commitlog(), that is now called from maybe_flush_commitlog. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	fa2488cc83	compaction: maybe_flush_all_tables: add new force_flush param Add a new parameter, `force_flush` to the maybe_flush_all_tables() method. Setting `force_flush` to true will flush all the tables regardless of when they were flushed last. This will be used by the new compaction option in a following patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Laszlo Ersek	53524974db	docs/dev/maintainer.md: clarify "Updating submodule references" Before the introduction of "scripts/refresh-submodules.sh", there was indeed some manual work for the maintainer to do, hence "publish your work" must have sounded correct. Today, the phrase "publish your work" sounds confusing. Commit `71da4e6e79` ("docs: Document sync-submodules.sh script in maintainer.md", 2020-06-18) should have arguably reworded the last step of the submodule refresh procedure; let's do it now. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20333	2024-09-05 13:57:32 +03:00
Pavel Emelyanov	1f0db29ef6	test: Remove unused directory semaphore The with_sstable_dir() helper no longer needs one, it used to pass it as argument to sstable_directory constructor, but now the directory doesn't need it (takes semaphore via table object). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20396	2024-09-05 13:11:35 +03:00
Kefu Chai	b4fc24cc1f	github: use needs.read-toolchain.outputs.image for build-scylla so we don't need to hardwire the image on which we build scylla. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20370	2024-09-05 12:58:36 +03:00
Pavel Emelyanov	955391d209	sstable_directory: Fix indentation after previous patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	2febde24f3	sstable_directory: Use yielding lister in .handle_sstables_pending_delete() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	02aac3e407	sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs() Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	ff77a677a6	sstable_directory: Use yielding lister in .prepare() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	7b5fe6bee6	sstable_directory: Shorten lister loop Squash call to lister.get() and check for the returned value into while()'s condition. This saves few more lines of code as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	5dc266cefa	sstable_directory: Use with_closeable() in .process() The method already uses yielding lister, but handles the exceptions explicitly. Use with_closeable() helper, it makes the code shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	7742b90cb1	directory_lister: Add noexcept default move-constructor It's required to make it possible to push lister into with_closeable(). Its requiremenent of nothrow-move-constructible doesn't accept default-generated one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:10:21 +03:00
Nikos Dragazis	2450afb934	sstables: Replace assert with on_internal_error The `skip()` method of the compressed data source implementation uses an assert statement to check if the given offset is valid. Replace this with `on_internal_error()` to fail gracefully. An invalid offset shouldn't bring the whole server down. Also, enhance the error message for unsynced compressed readers. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-05 11:03:54 +03:00
Pavel Emelyanov	da598a6210	test: Restore indentation after previous changes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:38:01 +03:00
Pavel Emelyanov	e16c07c896	test: Threadify tombstone_in_tombstone2() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	28d016f312	test: Threadify range_tombstone_reading() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	7d567d07ad	test: Threadify tombstone_in_tombstone() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	a34e38f070	test: Threadify broken_ranges_collection() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	eac4ec47f8	test: Threadify compact_storage_dense_read() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	322c1ee9c5	test: Threadify compact_storage_simple_dense_read() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	df71b3e446	test: Threadify compact_storage_sparse_read() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	142ccc64fb	test: Simplify test_range_reads() counting It used to keep counter with the help of a smart pointer, now it can just use on-stack variable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	a78ab2e998	test: Simplify test_range_reads() inner loop It used to rely on bool (wrapped with pointer) and future<>-based loop helper, now it can just break from the while loop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	c84ae64562	test: Threadify test_range_reads() itself And update its callers again. Preserve no longer relevant local smart pointers until next patch. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	253d53b6a1	test: Threadify test_range_reads() callers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:00 +03:00
Pavel Emelyanov	fd8bb0c46c	test: Threadify generate_clustered() itself And update its callers again. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:35:59 +03:00
Pavel Emelyanov	f500ee690b	test: Threadify generate_clustered() callers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:34:54 +03:00
Pavel Emelyanov	08186c048d	test: Threadify test_no_clustered test And update its callers. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:26:25 +03:00
Pavel Emelyanov	5f0a40f959	test: Threadify nonexistent_key test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:26:13 +03:00
Pavel Emelyanov	a150a63259	test: Squash two open_sstables() helper together One accepts integer generations, another one accepts "generic" ones. The latter is only called by the former, so no sense in keeping it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 09:08:40 +03:00
Pavel Emelyanov	4184c688ea	test: Coroutinize open_sstables() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 09:08:12 +03:00
Piotr Dulikowski	ecd53db3b0	service/qos: remove the marked_for_deletion parameter It is always set to false and it doesn't seem to serve any function now.	2024-09-04 21:52:34 +02:00
Piotr Dulikowski	bae6076541	service/qos: add constructors to service_level Add a default constructor and a constructor which explicitly initializes all fields of the service_level structure. This is done in order to make sure that removal of the marked_for_deletion field can be done safely - otherwise, for example, service_level could be aggregate-initialized with an incomplete list of values for the fields, and removing marked_for_deletion which is in the middle of the struct would cause the is_static field to be initialized with the value that was designated for marked_for_deletion. As a bonus, make sure that marked_for_deletion and is_static bool fields are initialized in the default constructor to false in order to avoid potential undefined behavior.	2024-09-04 21:52:13 +02:00
Avi Kivity	ec8590ae6c	Merge 'Always pass `abort_source&` to `raft_group0_client::hold_read_apply_mutex`' from Kamil Braun There are two versions of `raft_group0_client::hold_read_apply_mutex`, one takes `abort_source&`, the other doesn't. Modify all call sites that used the non-abort-source version to pass an `abort_source&`, allowing us to remove the other overload. If there is no explicit reason not to pass an `abort_source&`, then one should be passed by default -- it often prevents hangs during shutdown. --- No backport needed -- no known issues affected by this change. Closes scylladb/scylladb#19996 * github.com:scylladb/scylladb: raft_group0_client: remove `hold_read_apply_mutex` overload without `abort_source&` storage_service: pass `_abort_source` to `hold_read_apply_mutex` group0_state_machine: pass `_abort_source` to `hold_read_apply_mutex` api: move `reload_raft_topology_state` implementation inside `storage_service`	2024-09-04 21:35:27 +03:00
Kefu Chai	fe0e961856	docs: do not install scylla/ppa repo when perform upgrade for following reasons: 1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing. 2. the ppa in question does not provide the packages used in production. it does provides the package for building scylla 3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages. so, in this change, we remove all references to enabling the Scylla/PPA repository. Fixes scylladb/scylladb#20449 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20450	2024-09-04 20:30:40 +03:00
Avi Kivity	20b79816f1	repair: row_level: coroutinize repair_service::remove_repair_meta() (non-selective overload)	2024-09-04 18:43:19 +03:00
Avi Kivity	3b9ac51b6b	repair: row_level: coroutinize repair_service::remove_repair_meta() (by-address overload)	2024-09-04 18:39:21 +03:00
Avi Kivity	704e3f5432	repair: row_level: coroutinize repair_service::remove_repair_meta() (by-id overload)	2024-09-04 18:37:48 +03:00
Avi Kivity	9612c4d790	repair: row_level: row_level_repair::run() The function itself is threaded, but the inner lambdas are coroutinized (except one which is expected to run in a thread, and so is threaded).	2024-09-04 18:34:45 +03:00
Avi Kivity	2b94ee981b	repair: row_level: row_level_repair::send_missing_rows_to_follower_nodes() The function itself is threaded, but the inner lambda is coroutinized.	2024-09-04 18:28:27 +03:00
Avi Kivity	c768448339	repair: row_level: row_level_repair::get_missing_rows_from_follower_nodes() The function itself is threaded, but the inner lambda is coroutinized.	2024-09-04 18:28:12 +03:00
Avi Kivity	d2f1b44487	repair: row_level: row_level_repair::negotiate_sync_boundary() The function itself is threaded, but the inner lambda is coroutinized.	2024-09-04 18:21:39 +03:00
Kefu Chai	0756520f82	sstable: coroutinize sstable::seal_sstable() for better readability. presumably, `sstable::seal_sstable()` is not on the critical path, and we don't need to worry about the overhead of using C++20 coroutine. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20410	2024-09-04 18:14:33 +03:00
Kefu Chai	88c5c3001a	compaction: refactor compaction_manager::can_proceed() instead of chaining the conditions with '&&', break them down. for two reasons: * for better readability: to group the conditions with the same purpose together * so we don't look up the table twice. it's an anti-pattern of using STL, and it could be confusing at first glance. this change is a cleanup, so it does not change the behavior. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20369	2024-09-04 18:12:29 +03:00
Avi Kivity	645e39e746	repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_process_op() Both the outer function and the inner lambda are coroutinized.	2024-09-04 18:10:43 +03:00
Avi Kivity	4c05d0b965	repair: row_level: coroutinize repair_meta::get_sync_boundary_handler()	2024-09-04 15:33:40 +03:00
Avi Kivity	eea011fad5	repair: row_level: coroutinize repair_meta::get_sync_boundary() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:31:57 +03:00
Avi Kivity	91b88df956	repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions_handler()	2024-09-04 15:20:53 +03:00
Avi Kivity	b73194c9bf	repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:18:33 +03:00
Avi Kivity	a69fb626bd	repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions_handler()	2024-09-04 15:17:42 +03:00
Avi Kivity	5cd8207ac7	repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:16:32 +03:00
Avi Kivity	e108f867a9	repair: row_level: coroutinize repair_meta::repair_row_level_stop_handler()	2024-09-04 15:15:42 +03:00
Avi Kivity	ffbb973063	repair: row_level: coroutinize repair_meta::repair_row_level_stop() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:14:08 +03:00
Avi Kivity	587b6fe400	repair: row_level: coroutinize repair_meta::repair_row_level_start_handler()	2024-09-04 15:12:49 +03:00
Avi Kivity	db7b1014ff	repair: row_level: coroutinize repair_meta::repair_row_level_start()	2024-09-04 15:10:45 +03:00
Avi Kivity	17b82265ae	repair: row_level: coroutinize repair_meta::get_combined_row_hash_handler()	2024-09-04 15:08:58 +03:00
Avi Kivity	bacbdde791	repair: row_level: coroutinize repair_meta::get_combined_row_hash()	2024-09-04 15:07:27 +03:00
Avi Kivity	8b8dc5092f	repair: row_level: coroutinize repair_meta::get_full_row_hashes_handler()	2024-09-04 15:05:28 +03:00
Avi Kivity	21e01990ff	repair: row_level: coroutinize repair_meta::get_full_row_hashes_with_rpc_stream() The when_all_succeed() call is changed to the safer coroutine::when_all(), which avoids the temporary futures.	2024-09-04 15:03:00 +03:00
Avi Kivity	572fbfde09	repair: row_level: coroutinize repair_meta::request_row_hashes()	2024-09-04 14:07:59 +03:00
Nadav Har'El	15f8046fcb	alternator ttl: fix use-after-free The Alternator TTL scanning code uses an object "scan_ranges_context" to hold the scanning context. One of the members of this object is a service::query_state, and that in turn holds a reference to a service::client_state. The existing constructor created a temporary client_state object and saved a reference to it - which can result in use after free as the temporary object is freed as soon as the constructor ends. The fix is to save a client_state in the scan_ranges_context object, instead of a temporary object. Fixes #19988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20418	2024-09-03 22:15:18 +03:00
Pavel Emelyanov	c03b1e2827	test: Remove unused database argument from make_sstable_for_all_shards() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20427	2024-09-03 21:36:28 +03:00
Calle Wilund	2695fefa81	commitlog/database: Make some commitlog options updatable + add feature listener Makes some commitlog options runtime updatable. Most important for this case, the usage of fragmented entries. Also adds a subscription in database on said feature, to possibly enable once cluster enables it.	2024-09-03 16:38:28 +00:00
Calle Wilund	238a0236e5	features/config: Add feature for fragmented commitlog entries Hides the functionality behind a cluster feature, i.e. postspones using it until an upgrade is complete etc. This to allow rolling back even with dirty nodes, at least until a cluster is commited. Feature can also be disabled by scylla option, just in case. This will lock it out of whole cluster, but this is probably good, because depending on off or on, certain schema/raft ops might fail or succeed (due to large mutations), and this should probably be equivalent across nodes.	2024-09-03 16:38:28 +00:00
Calle Wilund	9bf452c7a0	docs: Add entry on commitlog file format v4	2024-09-03 16:38:28 +00:00
Calle Wilund	ad595e4d6a	commitlog_test: Add more oversized cases Also adds some randomization to the tests.	2024-09-03 16:38:28 +00:00
Calle Wilund	1d5e509136	commitlog_replayer: Replay segments in order created Minimizes potential buffer usage for fragmented entries.	2024-09-03 16:38:28 +00:00
Calle Wilund	61ff9486fb	commitlog_replayer: Use replay state to support fragmented entries	2024-09-03 16:38:27 +00:00
Calle Wilund	7c16683184	commitlog_replayer: coroutinize partly	2024-09-03 16:38:27 +00:00
Calle Wilund	05bf2ae5d7	commitlog: Handle oversized entries Refs #18161 Yet another approach to dealing with large commitlog submissions. We handle oversize single mutation by adding yet another entry type: fragmented. In this case we only add a fragment (aha) of the data that needs storing into each entry, along with metadata to correlate and reconstruct the full entry on replay. Because these fragmented entries are spread over N segments, we also need to add references from the first segment in a chain to the subsequent ones. These are released once we clear the relevant cf_id count in the base. * This approach has the downside that due to how serialization etc works w.r.t. mutations, we need to create an intermediate buffer to hold the full serialized target entry. This is then incrementally written into entries of < max_mutation_size, successively requesting more segments. On replay, when encountering a fragment chain, the fragment is added to a "state", i.e. a mapping of currently processing frag chains. Once we've found all fragments and concatenated the buffers into a single fragmented one, we can issue a replay callback as usual. Note that a replay caller will need to create and provide such a state object. Old signature replay function remains for tests and such. This approach bumps the file format (docs to come). To ensure "atomicity" we both force syncronization, and should the whole op fail, we restore segment state (rewinding), thus discarding data all we wrote. v2: * Improve some bookeep, ensure we keep track of segments and flush properly, to get counter correct	2024-09-03 16:38:27 +00:00
Anna Stuchlik	35796306a7	doc: comment out redirections for pages under Features This commit temporarily disables redirections for all pages under Features that were moved with this PR: https://github.com/scylladb/scylladb/pull/20401 Redirections work for all versions. This means that pages in 6.1 are redirected to URLs that are not available yet (because 6.2 has not been released yet). The redirections are correct and should be enabled when 6.2 is released: I've created an issue to do it: https://github.com/scylladb/scylladb/issues/20428 Closes scylladb/scylladb#20429	2024-09-03 17:16:51 +02:00
Avi Kivity	6ddcf80d89	Merge 'Reuse sstable::test_env::reusable_sst() helper for pre-exsting sstables' from Pavel Emelyanov Tests that try to access sstables from test/resource/ typically sstable::load() it after object creation. There's reusable_sst() helper for that. This PR fixes one more caller that still goes longer route by doing sstable and loading it on its own. Closes scylladb/scylladb#20420 * github.com:scylladb/scylladb: test: Call reusable sst from ka_sst() helper test: Move sstable_open_config to reusable_sst()'s argument	2024-09-03 17:40:34 +03:00
Kamil Braun	504bf68ebb	raft_group0_client: remove `hold_read_apply_mutex` overload without `abort_source&` Ensure that every caller passes `abort_source&`.	2024-09-03 15:52:05 +02:00
Kamil Braun	79983723c8	storage_service: pass `_abort_source` to `hold_read_apply_mutex` There's no point waiting for this lock if `storage_service` is being aborted. In theory the lock, if held, should be eventually released by whatever is holding it during shutdown -- but if there is some cyclic reference between the services, and e.g. whatever holds the lock is stuck because of ongoing shutdown and would only be unstuck by `storage_service` getting stopped (which it can't because it's waiting on the lock), that would cause a shutdown deadlock. Better to be safe than sorry.	2024-09-03 15:52:05 +02:00
Kamil Braun	a7097fb985	group0_state_machine: pass `_abort_source` to `hold_read_apply_mutex` `transfer_snapshot` was already passing `_abort_source` when trying to take the lock but other member functions didn't.	2024-09-03 15:52:05 +02:00
Kamil Braun	a4d1065628	api: move `reload_raft_topology_state` implementation inside `storage_service` In later commit we'll want to access more `storage_service` internals in the API's implementation (namely, `_abort_source`) Also moving the implementation there allows making `service::topology_transition()` private again (it was made public in `992f1327d3` only for this API implementation)	2024-09-03 15:52:03 +02:00
Andrei Chekun	27e5fa149a	[test.py] Clean duplicated arg for test suite Arguments mode and run_id already set in the _prepare_pytest_params, so there is no need to set them one more time.	2024-09-03 14:41:57 +02:00
Andrei Chekun	8a9146ebda	[test.py] Enable allure for python test Enable allure adapter for all python tests. Add tag and parameters to the test to be able to distinguish them across modes and runs. Related: https://github.com/scylladb/qa-tasks/issues/1665	2024-09-03 14:41:57 +02:00
Łukasz Paszkowski	20a6296309	test: Add reversed query tests on simulated upgrade process Run the reversed queries on a 2-node cluster with CL=ALL with and without NATIVE_REVERSE_QUERIES feature flag. When the flag is enabled, the native reversed format is used, otherwise the legacy format. The NATIVE_REVERSE_QUERIES feature flag is suppressed with an error injection that simulates cluster upgrade process. Backport is not required. The patch adds additional upgrade tests for https://github.com/scylladb/scylladb/pull/18864 Closes scylladb/scylladb#20179	2024-09-03 14:45:08 +03:00
Pavel Emelyanov	0857b63259	Merge 'repair: row_level: coroutinize some slow-path functions' from Avi Kivity This series coroutinizes up some functions in repair/row_level.cc. This enhances readability and reduces bloat: ``` size build/release/repair/row_level.o.{before,after} text data bss dec hex filename 1650619 48 524 1651191 1931f7 build/release/repair/row_level.o.before 1604610 48 524 1605182 187e3e build/release/repair/row_level.o.after ``` 46kB of text were saved. Functions that only touch a single mutation fragment were not coroutinized to avoid adding a allocation in a fast path. In one case a function was split into a fast path and a slow path. Clean-up series, backport not needed. Closes scylladb/scylladb#20283 * github.com:scylladb/scylladb: repair: row_level: restore indentation repair: row_level: coroutinize repair_meta::get_full_row_hashes_sink_op() repair: row_level: coroutinize repair_meta::get_full_row_hashes_source_op() repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_handler() repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_handler() repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_handler() repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_process() repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_process_op_slow_path() repair: row_level: split repair_get_row_diff_with_rpc_stream_process_op() into fast and slow paths repair: row_level: coroutinize repair_meta::put_row_diff_handler() repair: row_level: coroutinize repair_meta::put_row_diff_sink_op() repair: row_level: coroutinize repair_meta::put_row_diff_source_op() repair: row_level: coroutinize repair_meta::put_row_diff() repair: row_level: coroutinize repair_meta::get_row_diff_handler() repair: row_level: coroutinize repair_meta::get_row_diff_sink_op() repair: row_level: coroutinize repair_meta::to_repair_rows_on_wire() repair: row_level: coroutinize repair_meta::do_apply_rows() repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf_within_set_diff() repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf() repair: row_level: coroutinize repair_meta::row_buf_csum() repair: row_level: coroutinize repair_meta::get_repairs_row_size() repair: row_level: coroutinize repair_meta::set_estimated_partitions() repair: row_level: coroutinize repair_meta::get_estimated_partitions() repair: row_level: coroutinize repair_meta::do_estimate_partitions_on_local_shard() repair: row_level: coroutinize repair_reader::close() repair: row_level: coroutinize repair_reader::end_of_stream() repair: row_level: coroutinize sink_source_for_repair::close() repair: row_level: coroutinize sink_source_for_repair::get_sink_source()	2024-09-03 14:41:22 +03:00
Nadav Har'El	dd030f8112	alternator: improve RBAC access denied error messages This patch address two requests made by reviewers of the original "Add CQL-based RBAC support to Alternator" series. Both requests were about the error messages produced when access is denied: 1. The error message is improved to use more proper English, and also to include the name of the role which was denied access. 2. The permission-check and error-message-formatting code is de-duplicated, using a common function verify_permission(). This de-duplication required moving the access-denied error path to throwing an exception instead of the previous exception-free implementation. However, it can be argued that this change is actually a good thing, because it makes the successful case, when access is allowed, faster. The de-duplicated code is shorter and simpler, and allowed changing the text of the error message in just one place. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20326	2024-09-03 14:39:30 +03:00
Kefu Chai	d26bb9ae30	sstables: correct the debugging message printed when removing temp dir in `372a4d1b79`, we introduced a change which was for debugging the logging message. but the logging message intended for printing the temp_dir not prints an `optional<int>`. this is both confusing, and more importantly, it hurts the debuggability. in this change, the related change is reverted. Fixes scylladb/scylladb#20408 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20409	2024-09-03 14:36:08 +03:00
Pavel Emelyanov	e4bc5470cf	test: Call reusable sst from ka_sst() helper The sstable_mutation_test wants to load pre-existing sstables from resouce/ subdir. For that there's reusable_sst() helper on env. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-03 14:01:28 +03:00
Pavel Emelyanov	e9980bd6dd	test: Move sstable_open_config to reusable_sst()'s argument So that callers are able to provide custom config in the future Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-03 14:00:59 +03:00
Laszlo Ersek	cd0819e3ed	docs/dev/docker-hub.md: refresh aio-max-nr calculation What we have today in "docs/dev/docker-hub.md" on "aio-max-nr" dates back to scylla commit `f4412029f4` ("docs/docker-hub.md: add quickstart section with --smp 1", 2020-09-22). Problems with the current language: - The "65K" claim as default value on non-production systems is wrong; "fs/aio.c" in Linux initializes "aio_max_nr" to 0x10000, which is 64K. - The section in question uses equal signs (=) incorrectly. The intent was probably to say "which means the same as", but that's not what equality means. - In the same section, the relational operator "<" is bogus. The available AIO count must be at least as high (>=) as the requested AIO count. - Clearer names should be used; adjust_max_networking_aio_io_control_blocks() in "src/core/reactor.cc" sets a great example: - "reactor::max_aio" should be called "storage_iocbs", - "detect_aio_poll" should be called "preempt_iocbs", - "reactor_backend_aio::max_polls" should be called "network_iocbs". - The specific value 10000 for the last one ("network_iocbs") is not correct in scylla's context. It is correct as the Seastar default, but scylla has used 50000 since commit `2cfc517874` ("main, test: adjust number of networking iocbs", 2021-07-18). Rewrite the section to address these problems. See also: - https://github.com/scylladb/scylladb/issues/5981 - https://github.com/scylladb/seastar/pull/2396 - https://github.com/scylladb/scylladb/pull/19921 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-09-03 12:10:59 +02:00
Laszlo Ersek	15738d14ce	docs/dev/docker-hub.md: strip trailing whitespace Strip trailing whitespace. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-09-03 12:00:28 +02:00
Botond Dénes	2556e902b1	Update tools/jmx submodule * tools/jmx 89308b77...793452a9 (1): > dist: support building packages in Github Actions	2024-09-03 11:58:37 +03:00
Anna Stuchlik	5193d2d171	doc: remove the seeds-related questions from the FAQ This commit one of the series to remove the FAQ page by removing irrelevant/outdated entries or moving them to the forum. The question about seeds is irrelevant, not frequently asked, and covered in other sections of the docs. Also, it mentions versions that are no longer supported. Closes scylladb/scylladb#20403	2024-09-03 11:01:49 +03:00
Takuya ASADA	9d7fed40b5	install.sh: fix more incorrect permission on strict umask Even after `13caac7`, we still have more files incorrect permission, since we use "cp -r" and creating new file with redirect. To fix this, we need to replace "cp -r" with "cp -pr", and "chmod <perm>" on newly created files. Fixes #14383 Related #19775 Closes scylladb/scylladb#19786	2024-09-03 10:37:53 +03:00
Anna Stuchlik	360f7b3d33	doc: move Features to the top-level page This commit moves the Features page from the section for developers to the top level in the page tree. This involves: - Moving the source files to the features folder from the using-scylla folder. - Moving images into features/images folder. - Updating references to the moved resources. - Adding redirections to the moved pages. Closes scylladb/scylladb#20401	2024-09-03 07:24:33 +03:00
Kefu Chai	fb2ed20b42	.github: post a comment if "Fixes" policy is violated it's more visible than an "Error" in the action's detail message. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19271	2024-09-03 07:23:48 +03:00
Botond Dénes	8f31d3f1fc	Merge 'tools/nodetool: improve backup and restore commands' from Kefu Chai this change contains two improvements to "backup" and "restore" commands: - let them print task id - let them return 1 as the exist status code upon operation failure ---- these changes are improvements to the newly introduced commands, which are not in any LTS branches yet, so no need to backport. Closes scylladb/scylladb#20371 * github.com:scylladb/scylladb: tools/scylla-nodetool: return failure with exit code in backup/restore tools/scylla-nodetool: let backup/restore print task id	2024-09-02 16:40:55 +03:00
Takuya ASADA	59aedb38d0	locator: retry HTTP request to GCE/Azure metadata service Like we already do on EC2, implement retrying request to the metadata service on GCE and Azure. Closes #19817 Closes scylladb/scylladb#20189	2024-09-02 13:04:05 +03:00
Kefu Chai	e66e885e5b	tools/scylla-nodetool: return failure with exit code in backup/restore before this change, "backup" and "restore" commands always return 0 as their exist code no matter if the performed operation fails or not. inspired by the "task" commands of nodetool, let's return 1 with exit code if the operation fails. the tests are updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-02 15:12:26 +08:00
Kefu Chai	470c3e8535	tools/scylla-nodetool: let backup/restore print task id in `20fffcdc`, we added the "task wait" subcommand, so user is allowed to interact with a task with its task id. and in existing implementation of "backup" and "restore" command, if user does not pass `--nowait`, the command just exits without any output upon sending the request to scylladb. in this change, we print out the task_id if user does not pass `--nowait` command line option to "backup" or "restore" command. this allows user to follow up on the operation if necessary. the tests are updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-02 15:12:26 +08:00
Nadav Har'El	0b3890df46	test/cql-pytest: test RBAC auto-grant (and reproduce CDC bug) This patch adds functional testing for the role-based access control (RBAC) "auto-grant" feature, where if a user that is allowed to create a table, it also recieves full permissions over the table it just created. We also test permissions over new materialized views created by a user, and over CDC logs. The test for CDC logs reproduces an already suspected bug, #19798: A user may be allowed to create a table with CDC enabled, but then is not allowed to read the CDC log just created. The tests show that the other cases (base tables and views) do not have this bug, and the creating user does get appropriate permissions over the new table and views. In addition to testing auto-grant, the patch also includes tests for the opposite feature, "auto-revoke" - that permissions are removed when the table/view/cdc is deleted. If we forget to do that while implementing auto-grant, we risk that users may be able to use tables created by other users just because they used the same table _name_ earlier. It's important to have these auto-revoke tests together with the auto-grant tests that reproduce #19798 - so we don't forget this part when finally fixing #19798. Refs #19798. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19845	2024-09-02 09:03:40 +03:00
Botond Dénes	52bed81a1e	Merge 'cql3: add option to not unify bind variables with the same name' from Avi Kivity Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes #15559 This may be useful to users transitioning from Cassandra, so merits a backport. Closes scylladb/scylladb#19493 * github.com:scylladb/scylladb: cql3: add option to not unify bind variables with the same name cql3: introduce dialect infrastructure cql3: prepared_statement_cache: drop cache key default constructor	2024-09-02 08:34:24 +03:00
Kefu Chai	28b5471c01	docs/dev/maintainer.md: fix formatting * in the "Backporting Seastar commits" section, there's a single quote instead of a backtick in this line, so fix it. * add backticks around `refresh-submodules.sh`, which is a filename. * correct the command line setting a git config option, because `git-config` does not support this command line syntax, ```console $ git config --global diff.conflictstyle = diff3 $ git config --global get diff.conflictstyle = $ git config --global diff.conflictstyle diff3 $ git config --global get diff.conflictstyle diff3 ``` quote from git-config(1) > ``` > git config set [<file-option>] [--type=<type>] [--all] [--value=<value>] [--fixed-value] <name> <value> > ``` * stop using the deprecated mode of the `git-config` command, and use subcommand instead. as git-config(1) puts: > git config <name> <value> [<value-pattern>] > Replaced by git config set [--value=<pattern>] <name> <value>. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20328	2024-09-01 22:24:01 +03:00
Yaniv Michael Kaul	2ebba9cd11	tools/toolchain/dbuild: prefer podman over docker Check if podman is available before docker. If it is, use it. Otherwise, check for docker. 1. Podman is better. It runs with fewer resources, and I've had display issues with Docker (output was not shown consistently) 2. 'which docker' works even when the docker service and socket are turned off. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#20342	2024-09-01 22:17:01 +03:00
David Garcia	c4da75e392	docs: run docs test on changing config params Triggers the "Build Docs" PR workflow whenever the `db/config.cc` or `db/config.h` files are edited. These files are used to produce documentation, and this change will help prevent the introduction of breaking changes to the documentation build when they are modified. Closes scylladb/scylladb#20347	2024-09-01 22:15:48 +03:00
Avi Kivity	0f4b05824e	Merge 'perf/perf_sstable: add {crawling,partitioned}_streaming modes' from Kefu Chai for testing the load performance of load_and_stream operation. Refs #19989 --- no need to backport. it adds two new tests to the existing `perf_sstable` tool for evaluating the load performance when performing the "load_and_streaming" operation. hence has no impact on the production. Closes scylladb/scylladb#20186 * github.com:scylladb/scylladb: perf/perf_sstable: add {crawling,partitioned}_streaming modes test/perf/perf_sstable: use switch-case when appropriate	2024-09-01 22:04:22 +03:00
Avi Kivity	7197d280b0	Merge 'scylla-gdb.py: lazy-evaluate the constants ' from Kefu Chai instead of evaluating the constants in-class, accessing them via a cached class property. it would be handy if we could source `scylla-gdb.py` in `.gdbinit`, but this script accesses some symbols which are not available without a file being debugged. what's why gdb fails to load the init script: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module> class intrusive_slist: File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist size_t = gdb.lookup_type('size_t') ^^^^^^^^^^^^^^^^^^^^^^^^^ gdb.error: No type named size_t. ``` so we have to `file path/to/scylla` and then `source scylla-gdb.py` every time when we debug scylla or a seastar application, instead of loading `scylla-gdb.py` in `.gdbinit`. the reason is that the script accesses the debug symbols like `gdb.lookup_type('size_t')` in-class. so when the python interpreter reads the script, it evaluates this statement, but at that moment, the debug symbols are not loaded, so `source scylla-gdb.py` fails in `.gdbinit`. in this change, we transform all these class variables to cached properties, so that they * are evaluated on-demand * are evaluated only once at most this addresses the pain at the expense of verbosity. --- this change intends to improve the developer's user experience, and has no impacts on product, so no need to backport. Closes scylladb/scylladb#20334 * github.com:scylladb/scylladb: test/scylla_gdb: test the .gdb init use case scylla-gdb.py: lazy-evaluate the constants	2024-09-01 20:00:53 +03:00
Pavel Emelyanov	7df43312ac	test: Remove sstable making helpers from table_for_tests All users of it have sstable_test_env at hand (in fact -- they call env method to get table_for_test). And since sstable_test_env already has a bunch of methods to create sstable, the table_for_test wrapper doesn't need to duplicate this code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20360	2024-09-01 19:58:15 +03:00
Kefu Chai	bc2b7b47c8	build: cmake: add and use Scylla_CLANG_INLINE_THRESHOLD cmake parameter so that we can set this the parameter passed to `-inline-threshold` with `configure.py` when building with CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20364	2024-09-01 19:56:02 +03:00
Kefu Chai	6970c502c9	dist: drop %pretrans section before this change, if user does not have `/bin/sh` around, when installing scylla packages, the script in `%pretrans" is executed, and fails due to missing `/bin/sh`. per https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans > Note that the %pretrans scriptlet will, in the particular case of > system installation, run before anything at all has been installed. > This implies that it cannot have any dependencies at all. For this > reason, %pretrans is best avoided, but if used it MUST (by necessity) > be written in Lua. See > https://rpm-software-management.github.io/rpm/manual/lua.html for more > information. but we were trying to warn users upgrading from scylla < 1.7.3, which was released 7 years ago at the time of writing. in this change, we drop the `%pretrans` section. hopefuly they will find their way out if they still exist. Fixes scylladb/scylladb#20321 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20365	2024-09-01 19:46:19 +03:00
Kefu Chai	a06e1c6545	scylla-housekeeping: use raw string to avoid using escape sequence before this change, when running `scylla-housekeeping`: ``` /opt/scylladb/scripts/libexec/scylla-housekeeping:122: SyntaxWarning: invalid escape sequence '\s' match = re.search(".http.?://repositories./scylladb/([^/\s]+)/./([^/\s]+)/scylladb-.", line) ``` we could have the warning above. because `\s` is not a valid escape sequence, but the Python interpreter accepts it as two separated characters of `\s` after complaining. but it's still annoying. so, let's use a raw string here. Refs scylladb/scylladb#20317 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20359	2024-09-01 18:59:23 +03:00
Kefu Chai	e431b90145	test/boost/view_build_test: include used header before this change, when building the test of `view_build_test` with clang-20, we can have following build failure: ``` FAILED: test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -MF test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o.d -o test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -c /home/kefu/dev/scylladb/test/boost/view_build_test.cc /home/kefu/dev/scylladb/test/boost/view_build_test.cc:998:5: error: unknown type name 'simple_schema' 998 \| simple_schema ss; \| ^ ``` apparently, `simple_schema`'s declaration is not available in this translation unit. in this change * we include the header where `simple_schema` is defined, so that the build passes with clang-20. * also take this opportunity to reorder the header a little bit, so the testing headers are grouped together. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20367	2024-09-01 18:58:23 +03:00
Kefu Chai	753188c33d	test: include seastar/testing/random.hh when appropriate in a recent seastar change (644bb662), we do not include `seastar/testing/random.hh` in `seastar/testing/test_runner.hh` anymore, as the latter is not a facade of the former, and neither does it use the former. as a sequence, some tests which take the advantage of the included `seastar/testing/random.hh` do not build with the latest seastar: ``` FAILED: test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o /usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DBOOST_UNIT_TEST_FRAMEWORK_DYN_LINK -DBOOST_UNIT_TEST_FRAMEWORK_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -I/__w/scylladb/scylladb/build -isystem /__w/scylladb/scylladb/abseil -isystem /__w/scylladb/scylladb/build/rust -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -MD -MT test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -MF test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o.d -o test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -c /__w/scylladb/scylladb/test/lib/key_utils.cc In file included from /__w/scylladb/scylladb/test/lib/key_utils.cc:11: /__w/scylladb/scylladb/test/lib/random_utils.hh:25:30: error: no member named 'local_random_engine' in namespace 'seastar::testing' 25 \| return seastar::testing::local_random_engine; \| ~~~~~~~~~~~~~~~~~~^ 1 error generated. ``` in this change, we include `seastar/testing/random.hh` when the random facility is used, so that they can be compiled with the latest seastar library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20368	2024-09-01 18:57:07 +03:00
Kefu Chai	0104c7d371	tools/scylla-nodetool: s/vm.count()/vm.contains()/ under the hood, std::map::count() and std::map::contains() are nearly identical. both operations search for the given key witin the map. however, the former finds a equal range with the given key, and gets the distance between the disntance between the begin and the end of the range; while the later just searches with the given key. since scylla-nodetool is not a performance-critical application, the minor difference in efficiency between these two operations is unlikely to have a significant impact on its overall performance. while std::map::count() is generally suitable for our need, it might be beneficial to use a more appropriate API. in this change, we use std::map::contains() in the place of std::map::count() when checking for the existence of a paramter with given name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20350	2024-09-01 18:39:00 +03:00
Avi Kivity	ddf344e4f1	Merge 'compaction: use structured binding and ranges library when appropriate' from Kefu Chai for better readability --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#20366 * github.com:scylladb/scylladb: compaction: use std::views::reverse when appropriate compaction: use structured binding when appropriate	2024-09-01 18:35:15 +03:00
Avi Kivity	ea8441dfa3	cql3: add option to not unify bind variables with the same name Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes #15559	2024-09-01 17:27:48 +03:00
Avi Kivity	60acfd8c08	docs: cql: document ZstdCompressor for CREATE TABLE Adjust the wording slightly to be less awkward. Closes scylladb/scylladb#20377	2024-09-01 14:28:09 +03:00
Kefu Chai	e53a9a99cd	compaction: use std::views::reverse when appropriate let's use the standard library when appropriate. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-01 08:44:01 +08:00
Kefu Chai	3801c079e2	compaction: use structured binding when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-01 08:34:10 +08:00
Avi Kivity	61e6a77a99	repair: row_level: restore indentation	2024-08-30 23:00:59 +03:00
Avi Kivity	a35942e09a	repair: row_level: coroutinize repair_meta::get_full_row_hashes_sink_op() Extra care is needed for exception handling.	2024-08-30 22:55:16 +03:00
Avi Kivity	8e9ebd82fc	repair: row_level: coroutinize repair_meta::get_full_row_hashes_source_op()	2024-08-30 22:55:16 +03:00
Avi Kivity	f7d19e237d	repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_handler() Both the handle_exception() and finally() blocks need some extra care.	2024-08-30 22:55:16 +03:00
Avi Kivity	bb8751f4b5	repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_handler() Both the handle_exception() and finally() blocks need some extra care.	2024-08-30 22:55:16 +03:00
Avi Kivity	7ba0642da2	repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_handler() Both the handle_exception() and finally() blocks need some extra care.	2024-08-30 22:55:16 +03:00
Avi Kivity	61bbf452c6	repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_process()	2024-08-30 22:55:16 +03:00
Avi Kivity	01a578f608	repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_process_op_slow_path()	2024-08-30 22:55:16 +03:00
Avi Kivity	3733105f78	repair: row_level: split repair_get_row_diff_with_rpc_stream_process_op() into fast and slow paths This allows coroutinization of the slow path without affecting the fast path.	2024-08-30 22:55:16 +03:00
Avi Kivity	e17c3b71a8	repair: row_level: coroutinize repair_meta::put_row_diff_handler()	2024-08-30 22:55:16 +03:00
Avi Kivity	74ea2b9663	repair: row_level: coroutinize repair_meta::put_row_diff_sink_op() Exception handling is a bit awkward since can't co_await in a catch block.	2024-08-30 22:55:16 +03:00
Avi Kivity	e4362a5b7b	repair: row_level: coroutinize repair_meta::put_row_diff_source_op()	2024-08-30 22:55:16 +03:00
Avi Kivity	b998d69f09	repair: row_level: coroutinize repair_meta::put_row_diff()	2024-08-30 22:55:16 +03:00
Avi Kivity	3f2b5fe5dc	repair: row_level: coroutinize repair_meta::get_row_diff_handler()	2024-08-30 22:55:16 +03:00
Avi Kivity	cd63971501	repair: row_level: coroutinize repair_meta::get_row_diff_sink_op() Since sink.close() is called from an exception handler, some code movement is needed so it isn't co_awaited from a catch block.	2024-08-30 22:55:16 +03:00
Avi Kivity	3f28dec88c	repair: row_level: coroutinize repair_meta::to_repair_rows_on_wire() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:16 +03:00
Avi Kivity	1a84f1a73d	repair: row_level: coroutinize repair_meta::do_apply_rows() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:16 +03:00
Avi Kivity	7f15cc446f	repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf_within_set_diff() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:16 +03:00
Avi Kivity	93ca202bd3	repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:15 +03:00
Avi Kivity	5f8895d908	repair: row_level: coroutinize repair_meta::row_buf_csum() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:15 +03:00
Avi Kivity	d1e45f2982	repair: row_level: coroutinize repair_meta::get_repairs_row_size() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:15 +03:00
Avi Kivity	0b1bf57d19	repair: row_level: coroutinize repair_meta::set_estimated_partitions()	2024-08-30 22:55:15 +03:00
Avi Kivity	aee078d8e5	repair: row_level: coroutinize repair_meta::get_estimated_partitions()	2024-08-30 22:55:15 +03:00
Avi Kivity	51534f60eb	repair: row_level: coroutinize repair_meta::do_estimate_partitions_on_local_shard()	2024-08-30 22:55:12 +03:00
Kamil Braun	e01cef01a6	Merge 'Ignore seed name resolution errors during the restart of a cluster member node.' from Sergey Zolotukhin All seeds hostname resolution errors will be ignored during a node restart in case the node had already joined a cluster. This will prevent restart errors if some seed names are not resolvable. Fixes scylladb/scylladb#14945 Closes scylladb/scylladb#20292 * github.com:scylladb/scylladb: Ignore seed name resolution errors on restart. Add a test for starting with a wrong seed.	2024-08-30 11:33:44 +02:00
Kamil Braun	292ef0d1f9	Merge 'Fix node replace with inter-dc encryption enabled.' from Gleb Natapov Currently if a coordinator and a node being replaced are in the same DC while inter-dc encryption is enabled (connections between nodes in the same DC should not be encrypted) the replace operation will fail. It fails because a coordinator uses non encrypted connection to push raft data to the new node, but the new node will not accept such connection until it knows which DC the coordinator belongs to and for that the raft data needs to be transferred. The series adds the test for this scenario and the fix for the chicken&egg problem above. The series (or at least the fix itself) needs to be backported because this is a serious regression. Fixes: scylladb/scylladb#19025 Closes scylladb/scylladb#20290 * github.com:scylladb/scylladb: topology coordinator: fix indentation after the last patch topology coordinator: do not add replacing node without a ring to topology test: add test for replace in clusters with encryption enabled test.py: add server encryption support to cluster manager .gitignore: fix pattern for resources to match only one specific directory	2024-08-30 11:29:05 +02:00
Kefu Chai	82fbe317ec	test/scylla_gdb: test the .gdb init use case before this change, we run all the tests in a single pytest session, with scylladb debug symbols loaded. but we want to test another use case, where the scylladb debug symbols are missing. in this change, * we do not check for the existence of debug symbols until necessary * add a mark named "without_scylla" * run the tests in two pytest sessions - one with "without_scylla" mark - one with "not without_scylla" mark * add a test which is marked with the "without_scylla" mark. the test verify that the scylla-gdb.py script can be loaded even without scylladb debug symbols. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-30 17:05:29 +08:00
Kefu Chai	7dd63c891f	scylla-gdb.py: lazy-evaluate the constants instead of evaluating the constants in-class, accessing them via a cached class property. it would be handy if we could source `scylla-gdb.py` in `.gdbinit`, but this script accesses some symbols which are not available with a file being debugged. so when gdb fails to load init script: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module> class intrusive_slist: File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist size_t = gdb.lookup_type('size_t') ^^^^^^^^^^^^^^^^^^^^^^^^^ gdb.error: No type named size_t. ``` so we have to `file path/to/scylla` and then `source scylla-gdb.py` every time when we debug scylla or a seastar application, instead of loading `scylla-gdb.py` in `.gdbinit`. the reason is that the script access the debug symbols like `gdb.lookup_type('size_t')` in-class. so when the python interpreter reads the script, it evaluates this statement, but at that moment, the debug symbols are not loaded, so `source scylla-gdb.py` fails in `.gdbinit`. in this change, we transform all these class variables to cached property, so that they * are evaluated on-demand * are evaluated only once at most this addresses the pain at the expense of verbosity. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-30 17:05:29 +08:00
Pavel Emelyanov	cec4d207f6	Merge 'repair: throw if batchlog manager isn't initialized' from Aleksandra Martyniuk repair_service::repair_flush_hints_batchlog_handler may access batchlog manager while it is uninitialized. Throw if batchlog manager isn't initialized. Fixes: #20236. Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access. Closes scylladb/scylladb#20251 * github.com:scylladb/scylladb: test: add test to ensure repair won't fail with uninitialized bm repair: throw if batchlog manager isn't initialized	2024-08-30 11:37:24 +03:00
Anna Stuchlik	4471c80bdc	doc: add the 6.1-to-6.2 upgrade guide This commit replaces the 6.0-to-6.1 upgrade guide with the 6.1-to-6.2 upgrade guide. The new guide is a template that covers the basic procedure. If any 6.2-specific updates are required, they will have to be added along with development. Closes scylladb/scylladb#20178	2024-08-30 10:10:45 +03:00
Piotr Dulikowski	c05be27e4a	Merge 'db/hints: Move the code for writing hints to a separate function' from Dawid Mędrek In scylladb/scylladb@7301a96, in the function `hint_endpoint_manager::store_hint()`, we transformed the lambda passed to `seastar::with_gate()` to a coroutine lambda to improve the readability. However, there was a subtle problem related to lifetimes of the captures that needed to be addressed: * Since we started `co_await`ing in the lambda, the captures were at risk of being destructed too soon. The usual solution is to wrap a coroutine lambda within a `seastar::coroutine::lambda` object and rely on the extended lifetime enforced by the semantics of the language. See `docs/dev/lambda-coroutine-fiasco.md` for more context. * However, since we don't immediately `co_await` the future returned by `with_gate()`, we cannot rely on the extended lifetime provided by the wrapper. The document linked in the previous bullet point suggests keeping the passed coroutine lambda as a variable and pass it as a reference to `with_gate()`. However, that's not feasible either because we discard the returned future and the function returns almost instantly -- destructing every local object, which would encompass the lambda too. The solution used in the commit was to move captures of the lambda into the lambda's body. That helped because Seastar's backend is responsible for keeping all of the local variables alive until the lambda finishes its execution. However, we didn't move all of the captures into the lambda -- the missing one was the `this` pointer that was implicitly used in the lambda. Address sanitiser hasn't reported any bugs related to the pointer yet, but the bug is most likely there. In this commit, we transform the lambda's body into a new member function and only call it from the lambda. This way, we don't need to care about the lifetimes of the captures because Seastar ensures that the function's arguments stay alive until the coroutine finishes. Choosing this solution instead of assigning `this` to a pointer variable inside the lambda's body and using it to refer to the object's members has actual benefit: it's not possible to accidentally forget to refer to a member of the object via the pointer; it also makes the code less awkward. Fixes scylladb/scylladb#20306 Closes scylladb/scylladb#20258 * github.com:scylladb/scylladb: db/hints: Fix indentation in `do_store_hint()` db/hints: Move code for writing hints to separate function	2024-08-30 09:09:02 +02:00
Avi Kivity	bbcfd47bf5	doc: nodetool: toppartitions: document --samplers and --capacity In particular --capacity is critical for obtaining accurate measurements. Closes scylladb/scylladb#20192	2024-08-30 10:07:54 +03:00
Botond Dénes	9f9346fc59	Merge 'nodetool: tasks: add nodetool commands to track task manager tasks' from Aleksandra Martyniuk Add nodetool commands to manage task manager tasks: - tasks abort - aborts the task - tasks list - lists all tasks in the module - tasks modules - lists all modules - tasks set-ttl - sets task ttl - tasks status - gets status of the task - tasks tree - gets statuses of the task and all its desendent's - tasks ttl - gets task ttl - tasks wait - waits for the task and gets its status Fixes: https://github.com/scylladb/scylladb/issues/19201. Closes scylladb/scylladb#19614 * github.com:scylladb/scylladb: test: nodetool: add tests for tasks commands nodetool: tasks: add nodetool commands to track task manager tasks api: task_manager: return status 403 if a task is not abortable api: task_manager: return none instead of empty task id api: task_manager: add timeout to wait_task api: task_manager: add operation to get ttl nodetool: add suboperations support nodetool: change operations_with_func type nodetool: prepare operation related classes for suboperations	2024-08-30 07:37:37 +03:00
Avi Kivity	d69bf4f010	cql3: introduce dialect infrastructure A dialect is a different way to interpret the same CQL statement. Examples: - how duplicate bind variable names are handled (later in this series) - whether `column = NULL` in LWT can return true (as is now) or whether it always returns NULL (as in SQL) Currently, dialect is an empty structure and will be filled in later. It is passed to query_processor methods that also accept a CQL string, and from there to the parser. It is part of the prepared statement cache key, so that if the dialect is changed online, previous parses of the statement are ignored and the statement is prepared again. The patch is careful to pick up the dialect at the entry point (e.g. CQL protocol server) so that the dialect doesn't change while a statement is parsed, prepared, and cached.	2024-08-29 21:19:23 +03:00
Avi Kivity	f9322799af	cql3: prepared_statement_cache: drop cache key default constructor It's unnecessary, and interferes with the following patch where we change the cache key type.	2024-08-29 21:07:00 +03:00
Avi Kivity	67b24859bc	Merge 'generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek ~~~ generic_server: convert connection tracking to seastar::gate If we call server::stop() right after "server" construction, it hangs: With the server never listening (never accepting connections and never serving connections), nothing ever calls server::maybe_stop(). Consequently, co_await _all_connections_stopped.get_future(); at the end of server::stop() deadlocks. Such a server::stop() call does occur in controller::do_start_server() [transport/controller.cc], when - cserver->start() (sharded<cql_server>::start()) constructs a "server"-derived object, - start_listening_on_tcp_sockets() throws an exception before reaching listen_on_all_shards() (for example because it fails to set up client encryption -- certificate file is inaccessible etc.), - the "deferred_action" cserver->stop().get(); is invoked during cleanup. (The cserver->stop() call exposing the connection tracking problem dates back to commit `ae4d5a60ca` ("transport::controller: Shut down distributed object on startup exception", 2020-11-25), and it's been triggerable through the above code path since commit `6b178f9a4a` ("transport/controller: split configuring sockets into separate functions", 2024-02-05).) Tracking live connections and connection acceptances seems like a good fit for "seastar::gate", so rewrite the tracking with that. "seastar::gate" can be closed (and the returned future can be waited for) without anyone ever having entered the gate. NOTE: this change makes it quite clear that neither server::stop() nor server::shutdown() must be called multiple times. The permitted sequences are: - server::shutdown() + server::stop() - or just server::stop(). Fixes #10305 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> ~~~ Fixes #10305. I think we might want to backport this -- it fixes a hang-on-misconfiguration which affects `scylla-6.1.0-0.20240804.abbf0b24a60c.x86_64` minimally. Basically every release that contains commit `ae4d5a60ca` has a theoretical chance for the hang, and every release that contains commit `6b178f9a4a` has a practical chance for the hang. Focusing on the more practical symptom (i.e., releases containing commit `6b178f9a4a`), `git tag --contains 6b178f9a4a90` gives us (ignoring candidates and release candidates): - scylla-6.0.0 - scylla-6.0.1 - scylla-6.0.2 - scylla-6.1.0 Closes scylladb/scylladb#20212 * github.com:scylladb/scylladb: generic_server: make server::stop() idempotent generic_server: coroutinize server::shutdown() generic_server: make server::shutdown() idempotent test/generic_server: add test case configure, cmake: sort the lists of boost unit tests generic_server: convert connection tracking to seastar::gate	2024-08-29 19:45:48 +03:00
Laszlo Ersek	db44000f8d	Update seastar submodule * seastar 83e6cdfd...ec5da7a6 (1): > reactor, linux-aio: advise users in more detail on setting aio-max-nr Fixes #5981 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20307	2024-08-29 19:42:02 +03:00
Raphael S. Carvalho	26facd807e	storage_service: avoid processing same table unnecessarily in split monitor If there's a token metadata for a given table, and it is in split mode, it will be registered such that split monitor can look at it, for example, to start split work, or do nothing if table completed it. during topology change, e.g. drain, split is stalled since it cannot take over the state machine. It was noticed that the log is being spammed with a message saying the table completed split work, since every tablet metadata update, means waking up the monitor on behalf of a table. So it makes sense to demote the logging level to debug. That persists until drain completes and split can finally complete. Another thing that was noticed is that during drain, a table can be submitted for processing faster than the monitor can handle, so the candidate queue may end up with multiple duplicated entries for same table, which means unnecessary work. That is fixed by using a sequenced set, which keeps the current FIFO behavior. Fixes #20339. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#20029	2024-08-29 19:38:43 +03:00
Aleksandra Martyniuk	1f46cad5de	test: nodetool: add tests for tasks commands	2024-08-29 17:37:13 +02:00
Aleksandra Martyniuk	20fffcdcf5	nodetool: tasks: add nodetool commands to track task manager tasks	2024-08-29 17:37:12 +02:00
Avi Kivity	7da3314deb	Merge 'Integrated restore' from Ernest Zaslavsky Handed over from https://github.com/scylladb/scylladb/pull/20149 This adds minimal implementation of the start-restore API call. The method starts a task that runs load-and-stream functionality against sstables from S3 bucket. Arguments are: ``` endpoint -- the ID in object_store.yaml config file bucket -- the target bucket to get objects from keyspace -- the keyspace to work on table -- the table to work on snapshot -- the name of the snapshot from which the backup was taken ``` The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion. Remote sstables components are scanned as if they were placed in local upload/ directory. Then colelcted sstables are fed into load-and-stream. This branch has https://github.com/scylladb/scylladb/pull/19890 (Integrated backup), https://github.com/scylladb/scylladb/pull/20120 (S3 lister) and few more minor PRs merged in. The restore branch itself starts with [utils: Introduce abstract (directory) lister](`29c867b54d`) commit. refs: https://github.com/scylladb/scylladb/issues/18392 Closes scylladb/scylladb#20305 * github.com:scylladb/scylladb: tools/scylla-nodetool: add restore integration test/object_store: Add simple restore test test/object_store: Generalize prepare_snapshot_for_backup() code: Introduce restore API method sstable_loader: Add sstables::storage_manager dependency sstable_loader: Maintain task manager module sstable_loader: Out-line constructor distributed_loader: Split get_sstables_from_upload_dir() sstables/storage: Compose uploaded sstable path simpler sstable_directory: Prepare FS lister to scan files on S3 sstable_directory: Parse sstable component without full path s3-client: Add support for lister::filter utils: Introduce abstract (directory) lister	2024-08-29 18:25:30 +03:00
Kamil Braun	9574c399ce	Merge 'add support for zero-token nodes' from Patryk Jędrzejczak We revive the `join_ring` option. We support it only in the Raft-based topology, as we plan to remove the gossip-based topology when we fix the last blocker - the implementation of the manual recovery tool. In the Raft-based topology, a node can be assigned tokens only once when it joins the cluster. Hence, we disallow joining the ring later, which is possible in Cassandra. The main idea behind the solution is simple. We make the unsupported special case of zero tokens a supported normal case. Nodes with zero tokens assigned are called "zero-token nodes" from now on. From the topology point of view, zero-token nodes are the same as token-owning nodes. They can be in the same states, etc. From the data point of view, they are different. They are not members of the token ring, so they are not present in `token_metadata::_normal_token_owners`. Hence, they are ignored in all non-local replication strategies. The tablet load balancer also ignores them. Zero-token nodes can be used as coordinator-only nodes, just like in Cassandra. They can handle requests just like token-owning nodes. The main motivation behind zero-token nodes is that they can prevent the Raft majority loss efficiently. Zero-token nodes are group 0 voters, but they can run on much weaker and cheaper machines because they do not replicate data and handle client requests by default (drivers ignore them). For example, if there are two DCs, one with 4 nodes and one with 5 nodes, if we add a DC with 2 zero-token nodes, every DC will contain less than half of the nodes, so we won't lose the majority when any DC dies. Another way of preventing the Raft majority loss is changing the voter set, which is tracked by scylladb/scylladb#18793. That approach can be used together with zero-token nodes. In the example above, if we choose equal numbers of voters in both DCs, then a DC with one zero-token node will be sufficient. However, in the typical setup of 2 DCs with the same number of nodes it is enough to add a DC with only one zero-token node without changing the voter set. Zero-token nodes could also be used as load balancers in the Alternator. Additionally, this PR fixes scylladb/scylladb#11087, which turned out to be a blocker. This PR introduced a new feature. There is no need to backport it. Fixes scylladb/scylladb#6527 Fixes scylladb/scylladb#11087 Fixes scylladb/scylladb#15360 Closes scylladb/scylladb#19684 * github.com:scylladb/scylladb: docs: raft: document using zero-token nodes to prevent majority loss test: test recovery mode in the presence of zero-token nodes test: topology: util.py: add cqls parameter to check_system_topology_and_cdc_generations_v3_consistency test: topology: util.py: accept zero tokens in check_system_topology_and_cdc_generations_v3_consistency treewide: support zero-token nodes in the recovery mode storage_proxy: make TRUNCATE work locally for local tables test: topology: util.py: document that check_token_ring_and_group0_consistency fails with zero-token nodes test: test zero-token nodes test: test_topology_ops: move helpers to topology/util.py feature_service: introduce the ZERO_TOKEN_NODES feature storage_service: rename join_token_ring to join_topology storage_service: raft_topology_cmd_handler: improve warnings topology_coordinator: fix indentation after the previous patch treewide: introduce support for zero-token nodes in Raft topology system_keyspace: load_topology_state: remove assertion impossible to hit treewide: distinguish all nodes from all token owners gossip topology: make a replacing node remove the replaced node from topology locator: topology: add_or_update_endpoint: use none as the default node state test: boost: tablets tests: ensure all nodes are normal token owners token_metadata: rename get_all_endpoints and get_all_ips network_topology_strategy: reallocate_tablets: remove unused dc_rack_nodes virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership	2024-08-29 16:26:21 +02:00
Gleb Natapov	32a59ba98f	topology coordinator: fix indentation after the last patch	2024-08-29 17:14:09 +03:00
Gleb Natapov	17f4a151ce	topology coordinator: do not add replacing node without a ring to topology When only inter dc encryption is enabled a non encrypted connection between two nodes is allowed only if both nodes are in the same dc. If a nodes that initiates the connection knows that dst is in the same dc and hence use non encrypted connection, but the dst not yet knows the topology of the src such connection will not be allowed since dst cannot guaranty that dst is in the same dc. Currently, when topology coordinator is used, a replacing node will appear in the coordinator's topology immediately after it is added to the group0. The coordinator will try to send raft message to the new node and (assuming only inter dc encryption is enabled and replacing node and the coordinator are in the same dc) it will try to open regular, non encrypted, connection to it. But the replacing node will not have the coordinator in it's topology yet (it needs to sync the raft state for that). so it will reject such connection. To solve the problem the patch does not add a replacing node that was just added to group0 to the topology. It will be added later, when tokens will be assigned to it. At this point a replacing node will already make sure that its topology state is up-to-date (since it will execute a raft barrier in join_node_response_params handler) and it knows coordinator's topology. This aligns replace behaviour with bootstrap since bootstrap also does not add a node without a ring to the topology. The patch effectively reverts `b8ee8911ca` Fixes: scylladb/scylladb#19025	2024-08-29 17:14:09 +03:00
Gleb Natapov	2f1b1fd45e	test: add test for replace in clusters with encryption enabled	2024-08-29 17:14:09 +03:00
Gleb Natapov	b98282a976	test.py: add server encryption support to cluster manager	2024-08-29 17:14:09 +03:00
Gleb Natapov	84757a4ed3	.gitignore: fix pattern for resources to match only one specific directory	2024-08-29 17:13:58 +03:00
Dawid Medrek	d459cf91eb	db/hints: Fix indentation in `do_store_hint()`	2024-08-29 14:47:08 +02:00
Dawid Medrek	75ce6943d0	db/hints: Move code for writing hints to separate function In scylladb/scylladb@7301a96, in the function `hint_endpoint_manager::store_hint()`, we transformed the lambda passed to `seastar::with_gate()` to a coroutine lambda to improve the readability. However, there was a subtle problem related to lifetimes of the captures that needed to be addressed: * Since we started `co_await`ing in the lambda, the captures were at risk of being destructed too soon. The usual solution is to wrap a coroutine lambda within a `seastar::coroutine::lambda` object and rely on the extended lifetime enforced by the semantics of the language. See `docs/dev/lambda-coroutine-fiasco.md` for more context. * However, since we don't immediately `co_await` the future returned by `with_gate()`, we cannot rely on the extended lifetime provided by the wrapper. The document linked in the previous bullet point suggests keeping the passed coroutine lambda as a variable and pass it as a reference to `with_gate()`. However, that's not feasible either because we discard the returned future and the function returns almost instantly -- destructing every local object, which would encompass the lambda too. The solution used in the commit was to move captures of the lambda into the lambda's body. That helped because Seastar's backend is responsible for keeping all of the local variables alive until the lambda finishes its execution. However, we didn't move all of the captures into the lambda -- the missing one was the `this` pointer that was implicitly used in the lambda. Address sanitiser hasn't reported any bugs related to the pointer yet, but the bug is most likely there. In this commit, we transform the lambda's body into a new member function and only call it from the lambda. This way, we don't need to care about the lifetimes of the captures because Seastar ensures that the function's arguments stay alive until the coroutine finishes. Choosing this solution instead of assigning `this` to a pointer variable inside the lambda's body and using it to refer to the object's members has actual benefit: it's not possible to accidentally forget to refer to a member of the object via the pointer; it also makes the code less awkward.	2024-08-29 14:47:02 +02:00
Aleksandra Martyniuk	627fc46ca7	api: task_manager: return status 403 if a task is not abortable	2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk	10ab60f32b	api: task_manager: return none instead of empty task id If a user requests a status of a task that does not have a parent, show "none" instead of an empty parent_id.	2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk	5bcff4d544	api: task_manager: add timeout to wait_task	2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk	3d78172328	api: task_manager: add operation to get ttl	2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk	fb160afaf6	nodetool: add suboperations support Modify nodetool methods so that it support suboperations.	2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk	4b96f9abb9	nodetool: change operations_with_func type Change the type of operations_with_func so that they can contain suboperations.	2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk	c6f8a0116a	nodetool: prepare operation related classes for suboperations Modify operation and add operation_action class so that information about suboperations is stored. It's a preparation for adding suboperations support to nodetool.	2024-08-29 13:53:39 +02:00
Kefu Chai	dbb056f4f7	build: cmake: point -ffile-prefix-map to build directory before this change, we included `-ffile-prefix-map=${CMAKE_SOURCE_DIR}=.` in cflags when building the tree with CMake, but this was wrong. as the "." directory is the build directory used by CMake. and this directory is specified by the `-B` option when generating the building system. if `configure.py --use-cmake` is used to build the tree, the build directory would be "build". so this option instructs the compiler to replace the directory of source file in the debug symbols and in `__FILE__` at compile time. but, in a typical workspace, for instance, `build/main.cc` does not exist. the reason why this does not apply to CMake but applies to the rules generated by `configure.py` is that, `configure.py` puts the generated `build.ninja` right under the top source directory, so `.` is correct and it helps to create reproducible builds. because this practically erases the path prefixes in the build output. while CMake puts it under the specified build directory, replacing the source directory with the build directory with the file prefix map is just wrong. there are two options to address this problem: * stop passing this option. but this would lead to non-reproducible builds. as we would encode the build directory in the "scylla" executable. if a developer needs to rebuild an executable for debugging a coredump generated in production, he/she would have to either build the tree in the same directory as our CI does. or, he/she has to pass `-ffile-prefix-map=...` to map the local build directory to the one used by CI. this is not convenient. * instead of using `${CMAKE_SOURCE_DIR}=.`, add `${CMAKE_BINARY_DIR}=.`. this erases the build directory in the outputs, but preserves the debuggability. so we pick the second solution. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20329	2024-08-29 12:28:11 +03:00
Patryk Jędrzejczak	c192a9ee3b	docs: raft: document using zero-token nodes to prevent majority loss	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	e027ffdffc	test: test recovery mode in the presence of zero-token nodes We modify existing tests to verify that the recovery mode works correctly in the presence of zero-token nodes. In `test_topology_recovery_basic`, we test the case when a zero-token node is live. In particular, we test that the gossip-based restart of such a node works. In `test_topology_recovery_after_majority_loss`, we test the case when zero-token nodes are unrecoverable. In particular, we test that the gossip-based removenode of such nodes works. Since zero-token nodes are ignored by the Python driver if it also connects to other nodes, we use different CQL sessions for a zero-token node in `test_topology_recovery_basic`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	fb1e060c4c	test: topology: util.py: add cqls parameter to check_system_topology_and_cdc_generations_v3_consistency In the following commit, we modify `test_topology_recovery_basic` to test the recovery mode in the presence of live zero-token nodes. Unfortunately, it requires a bit ugly workaround. Zero-token nodes are ignored by the Python driver if it also connects to other nodes because of empty tokens in the `system.peers` table. In that test, we must connect to a zero-token node to enter the recovery mode and purge the Raft data. Hence, we use different CQL sessions for different nodes. In the future, we may change the Python driver behavior and revert this workaround. Moreover, the recovery tests will be removed or significantly changed when we implement the manual recovery tool. Therefore, we shouldn't worry about this workaround too much.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	54905fc179	test: topology: util.py: accept zero tokens in check_system_topology_and_cdc_generations_v3_consistency Before we use `check_system_topology_and_cdc_generations_v3_consistency` in a test with a zero-token node, we must ensure it doesn't fail because of zero tokens in a row of the `system.topology` table.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	02bb70da19	treewide: support zero-token nodes in the recovery mode Before we implement the manual recovery tool, we must support zero-token nodes in the recovery mode. This means that two topology operations involving zero-token nodes must work in the gossip-based topology: - removing a dead zero-token node, - restarting a live zero-token node. We make changes necessary to make them work in this patch.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	87b415efdc	storage_proxy: make TRUNCATE work locally for local tables In on of the following patches, we implement support for zero-token nodes in the recovery mode. To achieve this, we need to be able to purge all Raft data on live zero-token nodes by using TRUNCATE. Currently, TRUNCATE works the same for all replication strategies - it is performed on all token owners. However, zero-token nodes are not token owners, so TRUNCATE would ignore them. Since zero-token nodes store only local tables, fixing scylladb/scylladb#11087 is the perfect solution for the issue with zero-token nodes. We do it in this patch. Fixes scylladb/scylladb#11087	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	21c8409fa4	test: topology: util.py: document that check_token_ring_and_group0_consistency fails with zero-token nodes	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	95e14ae44b	test: test zero-token nodes We add tests to verify the basic properties of zero-token nodes. `test_zero_token_nodes_no_replication` and `test_not_enough_token_owners` are more or less deterministic tests. Running them only in the dev mode is sufficient. `test_zero_token_nodes_topology_ops` is quite slow, as expected, considering parameterization and the number of topology operations. In the future we can think of making it faster or skipping in the debug mode. For now, our priority is to test zero-token nodes thoroughly.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	d43d67c525	test: test_topology_ops: move helpers to topology/util.py In one of the following patches, we reuse the helper functions from `test_topology_ops` in a new test, so we move them to `util.py`. Also, we add the `cl` parameter to `start_writes`, as the new test will use `cl=2`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	574c252391	feature_service: introduce the ZERO_TOKEN_NODES feature Zero-token nodes must be supported by all nodes in the cluster. Otherwise, the non-supporting nodes would crash on some assertion that assumes only token-owing normal nodes make sense. Hence, we introduce the ZERO_TOKEN_NODES cluster feature. Zero-token nodes refuse to boot if it is not supported. I tested this patch manually. First, I booted a node built in the previous patch. Then, I tried to add a zero-token node built in this patch. It refused to boot as expected.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	c25eefe217	storage_service: rename join_token_ring to join_topology After introducing zero-token nodes that call join_token_ring but do not join the ring, the join_token_ring name does not make much sense.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	9937cf3a24	storage_service: raft_topology_cmd_handler: improve warnings	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	3ce936da7b	topology_coordinator: fix indentation after the previous patch	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	22d907e721	treewide: introduce support for zero-token nodes in Raft topology We revive the `join_ring` option. We support it only in the Raft-based topology, as we plan to remove the gossip-based topology when we fix the last blocker - the implementation of the manual recovery tool. In the Raft-based topology, a node can be assigned tokens only once when it joins the cluster. Hence, we disallow joining the ring later, which is possible in Cassandra. The main idea behind the solution is simple. We make the unsupported special case of zero tokens a supported normal case. Nodes with zero tokens assigned are called "zero-token nodes" from now on. From the topology point of view, zero-token nodes are the same as token-owning nodes. They can be in the same states, etc. From the data point of view, they are different. They are not members of the token ring, so they are not present in `token_metadata::_normal_token_owners`. Hence, they are ignored in all non-local replication strategies. The tablet load balancer also ignores them. Topology operations involving zero-token nodes are simplified: - `add` and `replace` finish in the `join_group0` state, so creating a new CDC generation and streaming are skipped, - `removenode` and `decommission` skip streaming, - `rebuild` does not even contact the topology coordinator as there is nothing to rebuild, Also, if the topology operation involves a token-owning node, zero-token nodes are ignored in streaming. Zero-token nodes can be used as coordinator-only nodes, just like in Cassandra. They can handle requests just like token-owning nodes. The main motivation behind zero-token nodes is that they can prevent the Raft majority loss efficiently. Zero-token nodes are group 0 voters, but they can run on much weaker and cheaper machines because they do not replicate data and handle client requests by default (drivers ignore them). For example, if there are two DCs, one with 4 nodes and one with 5 nodes, if we add a DC with 2 zero-token nodes, every DC will contain less than half of the nodes, so we won't lose the majority when any DC dies. Another way of preventing the Raft majority loss is changing the voter set, which is tracked by scylladb/scylladb#18793. That approach can be used together with zero-token nodes. In the example above, if we choose equal numbers of voters in both DCs, then a DC with one zero-token node will be sufficient. However, in the typical setup of 2 DCs with the same number of nodes it is enough to add a DC with only one zero-token node without changing the voter set. Zero-token nodes could also be used as load balancers in the Alternator.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	ba016c9af7	system_keyspace: load_topology_state: remove assertion impossible to hit We store tokens in a non-frozen set, which doesn't distinguish an empty set from no value. Hence, hitting this assertion is impossible.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	ed55261650	treewide: distinguish all nodes from all token owners In one of the following patches, we introduce support for zero-token nodes. From that point, getting all nodes and getting all token owners isn't equivalent. In this patch, we ensure that we consider only token owners when we want to consider only token owners (for example, in the replication logic), and we consider all nodes when we want to consider all nodes (for example, in the topology logic). The main purpose of this patch is to make the PR introducing zero-token nodes easier to review. The patch that introduces zero-token nodes is already complicated. We don't want trivial changes from this patch to make noise there. This patch introduces changes needed for zero-token nodes only in the Raft-based topology and in the recovery mode. Zero-token nodes are unsupported in the gossip-based topology outside recovery. Some functions added to `token_metadata` and `topology` are inefficient because they compute a new data structure in every call. They are never called in the hot path, so it's not a serious problem. Nevertheless, we should improve it somehow. Note that it's not obvious how to do it because we don't want to make `token_metadata` store topology-related data. Similarly, we don't want to make `topology` store token-related data. We can think of an improvement in a follow-up. We don't remove unused `topology::get_datacenter_rack_nodes` and `topology::get_datacenter_nodes`. These function can be useful in the future. Also, `topology::_dc_nodes` is used internally in `topology`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	2d9575d6a9	gossip topology: make a replacing node remove the replaced node from topology In the following patch, we change the gossiper to work the same for zero-token nodes and token-owning nodes. We replace occurrences of `is_normal_token_owner` with topology-based conditions. We want to rely on the invariant that token-owning nodes own tokens if and only if they are in the normal or leaving state. However, this invariant is broken by a replacing node because it does not remove the replaced node from topology. Hence, after joining, the replacing node has topology with a node that is not a token owner anymore but is in a leaving state (`being_replaced`). We fix it to prevent the following patch from introducing a regression.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	c7016dedb3	locator: topology: add_or_update_endpoint: use none as the default node state In one of the following patches, we change the gossiper to work the same for zero-token nodes and token-owning nodes. We replace occurrences of `is_normal_token_owner` with topology-based conditions. We want to rely on the invariant that token-owning nodes own tokens if and only if they are in the normal or leaving state. However, this invariant can be broken in the gossip-based topology when a new node joins the cluster. When a boostrapping node starts gossiping, other nodes add it to their topology in `storage_service::on_alive`. Surprisingly, the state of the new node is set to `normal`, as it's the default value used by `add_or_update_endpoint`. Later, the state will be set to `bootstrapping` or `replacing`, and finally it will be set again to `normal` when the join operation finishes. We fix this strange behavior by setting the node state to `none` in `storage_service::on_alive` for nodes not present in the topology. Note that we must add such nodes to the topology. Other code needs their Host ID, IP, and location. We change the default node state from `normal` to `none` in `add_or_update_endpoint` to prevent bugs like the one in `storage_service::on_alive`. Also, we ensure that nodes in the `none` state are ignored in the getters of `locator::topology`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	6adaf85634	test: boost: tablets tests: ensure all nodes are normal token owners In one of the following patches, we make NetworkTopologyStrategy and the tablet load balancer consider only normal token owners to ensure they ignore zero-token nodes. Some unit tests would start failing after this change because they do not ensure that all nodes are normal token owners. This patch prevents it. Judging by the logic in the test cases in `network_topology_strategy_test`, `point++` was probably intended anyway.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	366605224c	token_metadata: rename get_all_endpoints and get_all_ips In one of the following patches, we introduce support for zero-token nodes. A zero-token node that has successfully joined the cluster is in the normal state but is not a normal token owner. Hence, the names of `get_all_endpoints` and `get_all_ips` become misleading. They should specify that the functions return only IDs/IPs of token owners.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	293a66fe41	network_topology_strategy: reallocate_tablets: remove unused dc_rack_nodes	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	4ff08decb8	virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership If a node is in `locator::topology`, then it has a location. We remove the token ownership condition to make the table more descriptive.	2024-08-29 10:37:06 +02:00
Kefu Chai	ecfe0aace6	perf: perf_mutation_readers: break memtable class down before this change, memtable serves as the fixture for 6 test cases, actually these 6 test cases can be categorized into a matrix of 3 x 2: { single_row, multi_row, large_partition } x { single_partition, multi_paritition }. in this change, we break memtable into 3 different fixtures, to reflect this fact. more readable this way. and a benefit is that each test does not have to pay for the overhead of setup it does not use at all. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20177	2024-08-29 08:54:17 +03:00
Botond Dénes	e538e3593c	Merge 'build: add --no-use-cmake option to configure.py' from Kefu Chai as part of the efforts to address scylladb/scylladb#2717, we are switching over to the CMake-based building system, and fade out the mechinary to create the rules manually in `configure.py`. in this change, we add `--no-use-cmake` to `configure.py`, it serves two purposes: * prepare for the change which enables cmake by default, by then, we would set the default value of `use_cmake` to True, and allow user to keep using the existing mechinary in the transition period using `--no-use-cmake`. * allows the CI to tell if a tree is able to build with CMake. the command line option of `--use-cmake` is also used by the CI workflows, and is passed to `configure.py` if `BUILD_WITH_CMAKE` jenkins pipeline parameter is set. but not all branches with `--use-cmake` are ready to build with CMake -- only the latest master HEAD is ready. so the CI needs to check the capability of building with CMake by looking at the output of `configure.py --help`, to see if it includes `--no-use-cmake`. after this change lands. we will remove the `BUILD_WITH_CMAKE` parameter, and use cmake as long as `configure.py` supports `--no-use-cmake` option. the existing mechinary will stay with us for a short transition period so that developers can take time to get used to the usage of the naming of targets and the new directory arrangement. as a side effect, #20079 will be fixed after switching to CMake. --- this is a cmake-related change, hence no need to backport. Closes scylladb/scylladb#20261 * github.com:scylladb/scylladb: build: add --no-use-cmake option to configure.py build: let configure.py fail if unknown option is passed to it	2024-08-29 08:51:41 +03:00
Kefu Chai	a182bfd96a	tools/read_mutation: reuse parse_table_directory_name() less repeatings this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20315	2024-08-29 08:49:20 +03:00
Nadav Har'El	6391550bbc	test/alternator: add another check to test_stream_list_tables The test test_streams.py::test_stream_list_tables reproduces a bug where enabling streams added a spurious result to ListTables. A reviewer of that patch asked to also add a check that name of the table itself doesn't disappear from ListTables when a stream is enabled, so this is what this patch adds. This theoretical scenario (a table's name disappearing from ListTables) never happened, so the new check doesn't reproduce any known bug, but I guess it never hurts to make the test stronger for regression testing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19934	2024-08-29 08:45:22 +03:00
Nadav Har'El	61e5927e8e	repair: fix build on older compilers The code tries to build as "neighbors" an unordered_map from an iterator of std::tuple, instead of the correct std::pair. Apparently, the tuples are transparently converted to pairs on the newest compilers and the whole works, but on slightly older compilers (like the one on Fedora 39) Scylla no longer compiles - the compiler complains it can't convert a tuple to a pair in this context. So fix the code to use pairs, not tuples, and it fixes the build on Fedora 39. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20319	2024-08-28 19:56:03 +03:00
Laszlo Ersek	49bff3b1ab	generic_server: make server::stop() idempotent After server::shutdown(), make server::stop() more robust too, by allowing callers (internal or external) to call it several times (not concurrently though, just yet; see <https://github.com/scylladb/scylladb/issues/20309>). Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 15:54:31 +02:00
Kefu Chai	03ab80501f	tools/scylla-nodetool: add restore integration as we have an API for restore a keyspace / table, let's expose this feature with nodetool. so we can exercise it without the help of scylla-manager or 3rd-party tools with a user-friendly interface. in this change: * add a new subcommand named "restore" to nodetool * add test to verify its interaction with the API server * update the document accordingly. * the bash completion script is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	41b9eda398	test/object_store: Add simple restore test The test shows how to restore previously backed up table: - backup - truncate to get rid of existing sstables - start restore with the new API method - wait for the task to finish Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	f5a22a94c6	test/object_store: Generalize prepare_snapshot_for_backup() Give it snapshot-name argument. Next test will want custom snapshot name. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	11a04bfb66	code: Introduce restore API method The method starts a task that uses sstables_loader load-and-stream functionality to bring new sstables into the cluster. The existing load-and-stream picks up sstables from upload/ directory, the newly introduced task collects them from S3 bucket and given prefix (that correspond to the path where backup API method put them). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Sergey Zolotukhin	65f37f3ba6	Ignore seed name resolution errors on restart. Gossiper seeds host name resolution failures are ignored during restart if a node is already boostrapped (i.e. it has successfully joined the cluster). Fixes scylladb/scylladb#14945	2024-08-28 14:01:04 +02:00
Patryk Jędrzejczak	08cb3a5e2c	test: test_raft_recovery_basic: add raft=trace logs It could help when we hit scylladb/scylladb#17918 again. This PR only changes log levels in a test, no need to backport it. Refs scylladb/scylladb#17918 Closes scylladb/scylladb#20318	2024-08-28 13:50:09 +02:00
Sergey Zolotukhin	fc5e683d02	Add a test for starting with a wrong seed. The test checks a bootstrapped node start with a wrong host name in the seeds config. Test for scylladb/scylladb#14945	2024-08-28 11:34:37 +02:00
Laszlo Ersek	1138347e7e	generic_server: coroutinize server::shutdown() By turning server::shutdown() into a coroutine, we need not dynamically allocate "nr_conn". Verified as follows: (1) In terminal #1: build/Dev/scylla --overprovisioned --developer-mode=yes \ --memory=2G --smp=1 --default-log-level error \ --logger-log-level cql_server=debug:cql_server_controller=debug > INFO [...] cql_server_controller - Starting listening for CQL clients > on 127.0.0.1:9042 (unencrypted, > non-shard-aware) > INFO [...] cql_server_controller - Starting listening for CQL clients > on 127.0.0.1:19042 (unencrypted, > shard-aware) (2) In terminals #2 and #3: tools/cqlsh/bin/cqlsh.py (3) Press ^C in terminal #1: > DEBUG [...] cql_server - abort accept nr_total=2 > DEBUG [...] cql_server - abort accept 1 out of 2 done > DEBUG [...] cql_server - abort accept 2 out of 2 done > DEBUG [...] cql_server - shutdown connection nr_total=4 > DEBUG [...] cql_server - shutdown connection 1 out of 4 done > DEBUG [...] cql_server - shutdown connection 2 out of 4 done > DEBUG [...] cql_server - shutdown connection 3 out of 4 done > DEBUG [...] cql_server - shutdown connection 4 out of 4 done > INFO [...] cql_server_controller - CQL server stopped This patch is best viewed with "git show --word-diff=color". Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	2216275ebd	generic_server: make server::shutdown() idempotent Make server::shutdown() more robust by allowing callers (internal or external) to call it several times (not concurrently though, just yet; see <https://github.com/scylladb/scylladb/issues/20309>). Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	dbc0ca6354	test/generic_server: add test case Check whether we can stop a generic server without first asking it to listen. The test fails currently; the failure mode is a hang, which triggers the 5 minute timeout set in the test: > unknown location(0): fatal error: in "stop_without_listening": > seastar::timed_out_error: timedout > seastar/src/testing/seastar_test.cc(43): last checkpoint > test/boost/generic_server_test.cc(34): Leaving test case > "stop_without_listening"; testing time: 300097447us Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	931f2f8d73	configure, cmake: sort the lists of boost unit tests Both lists were obviously meant to be sorted originally, but by today we've introduced many instances of disorder -- thus, inserting a new test in the proper place leaves the developer scratching their head. Sort both lists. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	5a04743663	generic_server: convert connection tracking to seastar::gate If we call server::stop() right after "server" construction, it hangs: With the server never listening (never accepting connections and never serving connections), nothing ever calls server::maybe_stop(). Consequently, co_await _all_connections_stopped.get_future(); at the end of server::stop() deadlocks. Such a server::stop() call does occur in controller::do_start_server() [transport/controller.cc], when - cserver->start() (sharded<cql_server>::start()) constructs a "server"-derived object, - start_listening_on_tcp_sockets() throws an exception before reaching listen_on_all_shards() (for example because it fails to set up client encryption -- certificate file is inaccessible etc.), - the "deferred_action" cserver->stop().get(); is invoked during cleanup. (The cserver->stop() call exposing the connection tracking problem dates back to commit `ae4d5a60ca` ("transport::controller: Shut down distributed object on startup exception", 2020-11-25), and it's been triggerable through the above code path since commit `6b178f9a4a` ("transport/controller: split configuring sockets into separate functions", 2024-02-05).) Tracking live connections and connection acceptances seems like a good fit for "seastar::gate", so rewrite the tracking with that. "seastar::gate" can be closed (and the returned future can be waited for) without anyone ever having entered the gate. NOTE: this change makes it quite clear that neither server::stop() nor server::shutdown() must be called multiple times. The permitted sequences are: - server::shutdown() + server::stop() - or just server::stop(). Fixes #10305 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Kefu Chai	6d8dca1e20	build: add --no-use-cmake option to configure.py as part of the efforts to address scylladb/scylladb#2717, we are switching over to the CMake-based building system, and fade out the mechinary to create the rules manually in `configure.py`. in this change, we add `--no-use-cmake` to `configure.py`, it serves two purposes: * prepare for the change which enables cmake by default, by then, we would set the default value of `use_cmake` to True, and allow user to keep using the existing mechinary in the transition period using `--no-use-cmake`. * allows the CI to tell if a tree is able to build with CMake. the command line option of `--use-cmake` is also used by the CI workflows, and is passed to `configure.py` if `BUILD_WITH_CMAKE` jenkins pipeline parameter is set. but not all branches with `--use-cmake` are ready to build with CMake -- only the latest master HEAD is ready. so the CI needs to check the capability of building with CMake by looking at the output of `configure.py --help`, to see if it includes --no-use-cmake`. after this change lands. we will remove the `BUILD_WITH_CMAKE` parameter, and use cmake as long as `configure.py` supports `--no-use-cmake` option. the existing mechinary will stay with us for a short transition period so that developers can take time to get used to the usage of the naming of targets and the new directory arrangement. as a side effect, #20079 will be fixed after switching to CMake. Refs scylladb/scylladb#2717 Refs scylladb/scylladb#20079 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-28 11:37:56 +08:00
Kefu Chai	a2de14be7f	build: let configure.py fail if unknown option is passed to it this allows us to use `configure.py` to tell if a certain argument is supported without parsing its output. in the next commit, we will add `--no-use-cmake` option, which will be used to tell if the tree is ready for using CMake for its building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-28 11:37:55 +08:00
Kefu Chai	e4b213f041	build: cmake: use the same options to configure seastar in `configure.py`, a set of options are specified when configuring seastar, but not all of them were ported to scylla's CMake building system. for instance, `configure.py` explicitly disables io_uring reactor backend at build time, but the CMake-based system does not. so, in this change, in order to preserve the existing behavior, let's port the two previously missing option to CMake-based building system as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20288	2024-08-28 06:15:59 +03:00
Avi Kivity	94d5507237	Merge 'select from mutation_fragments() + tablets: handle reads for non-owned partitions' from Botond Dénes Attempting to read a partition via `SELECT * FROM MUTATION_FRAGMENTS()`, which the node doesn't own, from a table using tablets causes a crash. This is because when using tablets, the replica side simply doesn't handle requests for un-owned tokens and this triggers a crash. We should probably improve how this is handled (an exception is better than a crash), but this is outside the scope of this PR. This PR fixes this and also adds a reproducer test. Fixes: https://github.com/scylladb/scylladb/issues/18786 Fixes a regression introduced in 6.0, so needs backport to 6.0 and 6.1 Closes scylladb/scylladb#20109 * github.com:scylladb/scylladb: test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works replica/mutation_dump: enfore pinning of effective replication map replica/mutation_dump: handle un-owned tokens (with tablets)	2024-08-27 20:46:10 +03:00
Avi Kivity	b13ab90448	Merge 'alternator/executor: Use native reversed format' from Łukasz Paszkowski When executing reversed queries, a native revered format shall be used. Therefore, the table schema and the clustering key bounds are reversed before a partition slice and a read command are constructed. It is, however, possible to run a reversed query passing a table schema but only when there are no restrictions on the clustering keys. In this particular situation, the query returns correct results. Since the current alternator tests in test.py do not imply any restrictions, this situation was not caught during development of https://github.com/scylladb/scylladb/pull/18864. Hence, additional tests are provided that add clustering keys restrictions when executing reversed queries to capture such errors earlier than in dtests. Additional manual tests were performed to test a mixed-node cluster (with alternator API enabled in Scylla on each node): 1. 2-node cluster with one node upgraded: reverse read queries performed on an old node 2. 2-node cluster with one node upgraded: reverse read queries performed on a new node 3. 2-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node 4. 2-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node All reverse read queries above consists of: - single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on The exact same tests were also performed on a fully upgraded cluster. Fixes https://github.com/scylladb/scylladb/issues/20191 No backport is required as this is a complementary patch for the series https://github.com/scylladb/scylladb/pull/18864 that did not require backporting. Closes scylladb/scylladb#20205 * github.com:scylladb/scylladb: test_query.py: Test reverse queries with clustering key bounds alternator::do_query Add additional trace log alternator::do_query: Use native reversed format alternator::do_query Rename schema with table_schema	2024-08-27 20:40:49 +03:00
Benny Halevy	18c45f7502	raft_rebuild: propagate source_dc force option to rebuild_option Currently, the `force` property of the `source_dc` rebuild option is lost and `raft_topology_cmd_handler` has no way to know if it was given or not. This in turn can cause rebuild to fail, even when `--force` is set by the user, where it would succeed with gossip topology changes, based on the source_dc --force semantics. Fixes scylladb/scylladb#20242 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20249	2024-08-27 17:05:48 +02:00
Kefu Chai	d27fdf9f57	Update seastar submodule * seastar a7d81328...83e6cdfd (29): > fair_queue: Export the number of times class was activated > tests/unit: drop support of C++17 > remove vestigial OSv support > cmake: undefine _FORTIFY_SOURCE on thread.cc > container_perf: a benchmark for container perf > io_sink: use chunked_fifo as _pending_io container > chunked_fifo: implement clear in terms of pop_n > chunked_fifo: pop_front_n > io_sink: use iteration instead of indexing > json2code_test: choose less popular port number > ioinfo: add '--max-reqsize' parameter > treewide: drop the support of fmtlib < 8.0.0 > build: bump up the required fmtlib version to 8.1.1 > conditional-variable: align when() and wait() behaviour in case of a predicate throwing an exception > stall-analyser: add output support for flamegraph > reactor: Add --io-completion-notify-ms option > io_queue: Stall detector > io_queue: Keep local variable with request execution delay > io_queue: Rename flow ratio timer to be more generic > reactor: Export _polls counter (internally) > dns: de-inline dns_resolver::impl methods > dns: enter seastar::net namespace > dnf: drop compatibility for c-ares <= 1.16 > reactor: add missing includes of noncopyable_function.hh > reactor: Reset one-shot signal to DFL before handling > future: correctly document nested exception type emitted by finally() > modules: fix FATAL_ERROR on compiler check > seastar.cc: include fmt/ranges.h > pack io_request Closes scylladb/scylladb#20300	2024-08-27 17:51:21 +03:00
Avi Kivity	2f4ef31254	Merge 'tools/testing: update dist-check to use rockylinux and adapt to cmake' from Kefu Chai `dist-check` tests the generated rpm packages by installing them in a centos 7 container. but this script is terribly outdated - centos 7 is deprecated. we should use a new distro's latest stable release. - cqlsh was added to the family of rpms a while ago. we should test it as well. - the directory hierarchy has been changed. we should read the artifacts from the new directories. - cmake uses a different directory hierarchy. we should check the directory used by cmake as well. to address these breaking changes, the scripts are updated accordingly. --- this change gives an overhaul to a test, which is not used in production. so no need to backport. Closes scylladb/scylladb#20267 * github.com:scylladb/scylladb: tools/testing: add cqlsh rpm tools/testing: adapt to cmake build directory tools/testing: test with rockylinux:9 not centos:7 tools/testing: correct the paths to rpm packages and SCYLLA-*-FILE dist-check: add :z option when mapping volume	2024-08-27 16:16:34 +03:00
Pavel Emelyanov	1f3f0b1926	sstable_loader: Add sstables::storage_manager dependency The storage_manager maintains set of clients to configured object storage(s). The sstables loader is going to spawn tasks that will talk to to those storages, thus it needs the storage manager to get the clients clients from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	06c3c53deb	sstable_loader: Maintain task manager module This service is going to start tasks managed by task manager. For that, it should have its module set up and registered. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	9cf95e8a07	sstable_loader: Out-line constructor It will grow and become more complicated. Better to have it outside the header. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	6a006d2255	distributed_loader: Split get_sstables_from_upload_dir() Next patches will need this method to initialize sstable_directory differently and then do its regular processing. For that, split the method into two, next patch will re-use the common part it needs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	630ab1dbea	sstables/storage: Compose uploaded sstable path simpler Current S3 storage driver keeps sstables in bucket in a form of /bucket/generation/component-name To get sstables that are backed up on S3 this format doesn't apply, because components are uploaded with their names unmodified. This patch makes S3 storage driver account for that and not re-format component paths for upload sstable state. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	2eda917375	sstable_directory: Prepare FS lister to scan files on S3 When component lister is created it checks the target storage options for what kind of lister to create. For local options it creates FS lister that collects sstables from their component files. For S3 options, it relies on sstables registry. When collecting sstables from backup, it's not possible to use registry, because those entries are not there. Instead, lister should pick up individual components as it they were on local FS. This patch prepares the lister for that -- in case S3 options are provided and the sstables' state is "upload", don't try to read those from registry, but instantiate the FS lister that will later use s3::bucket_lister. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	60d43911a9	sstable_directory: Parse sstable component without full path When sstable directory collects a entry from storage, it tries to parse its full path with the help of sstables::parse_path(). There are two overloads of that function -- one with ks:cf arguments and one without. The latter tries to "guess" keyspace and table names from the directory name. However, ks and table names are already known by the directory, it doesn't even use the returned ks and cf values, so this parsing is excessive. Also, future patches will put here backup paths, that might not match the ks_name/table_name-table_uuid/ pattern that the parser expects. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	86bc5b11fe	s3-client: Add support for lister::filter Directory lister comes with a filter function that tells lister which entries to skip by its .get() method. For uniformity, add the same to S3 bucket_lister. After this change the lister reports shorter name in the returned directory entry (with the prefix cut), so also need to tune up the unit test respectively. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:40 +03:00
Pavel Emelyanov	113d2449f8	utils: Introduce abstract (directory) lister This patch hides directory_lister and bucket_lister behind a common facade. The intention is to provide a uniform API for sstable_directory that it could use to list sstables' components wherever they are. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:40 +03:00
Piotr Dulikowski	da5f4faac1	Merge 'mv: reject user requests by coordinator when a replica is overloaded by MVs' from Wojciech Mitros Currently, when a view update backlog of one replica is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded. This patch does not remove the replica write failures, because we still may reach a full backlog when more view updates are generated after the coordinator check is performed and before the write reaches the replica. Fixes scylladb/scylladb#17426 Closes scylladb/scylladb#18334 * github.com:scylladb/scylladb: mv: test the view update behavior mv: add test for admission control storage_proxy: return overloaded_exception instead of throwing mv: reject user requests by coordinator when a replica is overloaded by MVs	2024-08-27 12:50:34 +02:00
Aleksandra Martyniuk	f38bb6483a	test: add test to ensure repair won't fail with uninitialized bm	2024-08-27 11:37:50 +02:00
Aleksandra Martyniuk	d8e4393418	repair: throw if batchlog manager isn't initialized repair_service::repair_flush_hints_batchlog_handler may access batchlog manager while it is uninitialized. Batchlog manager cannot be initialized before repair as we have the dependencies chain: repair_service -> storage_service::join_cluster -> batchlog_manager. Throw if batchlog manager isn't initialized. That won't cause repair to fail.	2024-08-27 11:22:28 +02:00
Botond Dénes	5c0f6d4613	Merge 'Make Summary support histogram with infinite bucket vlaues' from Amnon Heiman This series fixes an issue where histogram Summaries return an infinite value. It updated the quantile calculation logic to address cases where values fall into the infinite bucket of a histogram. Now, instead of returning infinite (max int), the calculation will return the last bucket limit, ensuring finite outputs in all cases. The series adds a test for summaries with a specific test case for this scenario. Fixes #20255 Need backport to 6.0, 6.1 and 2023.1 and above Closes scylladb/scylladb#20257 * github.com:scylladb/scylladb: test/estimated_histogram_test Add summary tests utils/histogram.hh: Make summary support inifinite bucket.	2024-08-27 10:33:54 +03:00
Kefu Chai	ae7ce38721	build: print out the default value of options instead of using the default `argparse.HelpFormatter`, let's use `ArgumentDefaultsHelpFormatter`, so that the default values of options are displayed in the help messages. this should help developer understand the behavior of the script better. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20262	2024-08-27 10:04:31 +03:00
Kefu Chai	e2747e4bb5	build: cmake: add dist-check target to achieve feature parity with our existing building system, we need to implement a new build target "dist-check" in the CMake-based building system. in this change, "dist-check" is added to CMake-based building system. unlike the rules generated by `configure.py`, the `dist-check` target in CMake depends on the dist-*-rpm targets. the goal is to enable user to test `dist-check` without explicitly building the artifacts being tested. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20266	2024-08-27 10:03:41 +03:00
Kefu Chai	ea612e7065	docs: install poetry>=1.8.0 in `57def6f1`, we specified "package-mode" for poetry, but this option was introduced in poetry 1.8.0, as the "non-package" mode support. see https://github.com/python-poetry/poetry/releases/tag/1.8.0 this change practically bumps up the minimum required poetry version to 1.8.0, we did update `pyproject.tombl` to reflect this change. but wefailed to update the `Makefile`. in this change, we update `Makefile` to ensure that user which happens have an older version of poetry can install the version which supports this version when running `make setupenv`. Refs scylladb/scylladb#20284 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20286	2024-08-27 09:20:09 +03:00
Yaniv Michael Kaul	022eb25d98	tools/toolchain/README.md: fix wording Forgot to add that 'reg' tool is also needed. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#20287	2024-08-27 09:18:23 +03:00
Kefu Chai	5cffb23aa3	scylla-gdb.py: use chunked_fifo to represent _sink._pending_io we switched from `circular_buffer` to `chunked_fifo` to present `io_sink::_pending_io` in the latest seastar now. to be prepared for this change, let's * add `chunked_fifo` class in `scylla-gdb.py`. * use `circular_buffer` as a fallback of `chunked_fifo`. instead of doing this the other way around, we try to send the message that the latest seastar uses `chunked_fifo`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20280	2024-08-27 08:44:56 +03:00
Andrei Chekun	fd51332978	test.py: Add parameter to control the pool size from the command line Add parameter --cluster-pool-size that can control pool size for all PythonTestSuite tests. By default, the pool size set to 10 for most of the suites, but this is too much for laptops. So this parameter can be used to lower the pool size and not to freeze the system. Additionally, the environment variable CLUSTER_POOL_SIZE was added for a convenient way to limit pool size in the system without the need to provide each time an additional parameter. Related: https://github.com/scylladb/scylladb/pull/20276 Closes scylladb/scylladb#20289	2024-08-26 19:55:41 +03:00
Avi Kivity	0acfa4a00d	Merge 'abstract_replication_strategy: make get_ranges async' from Benny Halevy To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 Closes scylladb/scylladb#19758 * github.com:scylladb/scylladb: abstract_replication_strategy: make get_ranges async database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param compaction: task_manager_module: open code maybe_get_keyspace_local_ranges alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder alternator: ttl: can pass const gms::gossiper& to ranges_holder alternator: ttl: ranges_holder_primary: unconstify _token_ranges member alternator: ttl: refactor token_ranges_owned_by_this_shard	2024-08-26 16:56:18 +03:00
Botond Dénes	6d633e89ef	Merge 'update CODEOWNERS' from Piotr Smaron Removed people that no longer contribute to the scylladb.git and added/substituted reviewers responsible for maintaining the frontend components. No need to backport, this is just an information for the github tool. Closes scylladb/scylladb#20136 * github.com:scylladb/scylladb: codeowners: add appropriate reviewers to the cluster components codeowners: add appropriate reviewers to the frontend components codeowners: fix codeowner names codeowners: remove non contributors	2024-08-26 16:44:39 +03:00
Botond Dénes	4505b14fd6	Merge 'table_helper: complete coroutinization' from Avi Kivity table_helper has some quite awkward code, improve it a little. Code cleanup, so no reason to backport. Closes scylladb/scylladb#20194 * github.com:scylladb/scylladb: table_helper: insert(): improve indentation table_helper: coroutinize insert() table_helper: coroutinize cache_table_info() table_helper: extract try_prepare()	2024-08-26 13:43:17 +03:00
Botond Dénes	b2c07c9b6f	Merge 'compaction: change compaction stop reason ' from Aleksandra Martyniuk Currently "table removal" is logged as a reason of compaction stop for table drop, tablet cleanup and tablet split. Modify log to reflect the reason. Closes scylladb/scylladb#20042 * github.com:scylladb/scylladb: test: add test to check compaction stop log compaction: fix compaction group stop reason	2024-08-26 13:40:07 +03:00
Kefu Chai	4d516a8363	tools/testing: add cqlsh rpm we need to test the installation of cqlsh rpm. also, we should use the correct paths of the generated rpm packages. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:33:57 +08:00
Kefu Chai	baee15390e	tools/testing: adapt to cmake build directory cmake uses a different arrangement, so let's check for the existence of the build directory and fallback to cmake's build directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:33:57 +08:00
Kefu Chai	b802c000e1	tools/testing: test with rockylinux:9 not centos:7 the centos image repos on docker has been deprecated, and the repo for centos7 has been removed from the main CentOS servers. so we are either not able to install packages from its default repo, without using the vault mirror, or no longer to pull its image from dockerhub. so, in this change * we switch over to rockylinux:9, which is the latest stable release of rockylinux, and rockylinux is a popular clone of RHEL, so it matches our expectation of a typical use case of scylla. * use dnf to manage the packages. as dnf is the standard way to manage rpm packages in modern RPM-based distributions. * do not install deltarpm. delta rpms are was not supported since RHEL8, and the `deltarpm` package is not longer available ever since. see https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html-single/considerations_in_adopting_rhel_8/index#ref_the-deltarpm-functionality-is-no-longer-supported_notable-changes-to-the-yum-stack as a sequence, this package does not exist in Rockylinux-9. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:33:53 +08:00
Kefu Chai	00dad27f67	tools/testing: correct the paths to rpm packages and SCYLLA-*-FILE when building with the rules generated from `configure.py`, these files are located under tools' own build directory. so correct them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:19:24 +08:00
Kefu Chai	86ef63df92	dist-check: add :z option when mapping volume if SELinux is enabled on the host, we'd have following failure when running `dist-check.sh`: ``` + podman run -i --rm -v /home/kefu/dev/scylladb:/home/kefu/dev/scylladb docker.io/centos:7 /bin/bash -c 'cd /home/kefu/dev/scylladb && /home/kefu/dev/scylladb/tools/testing/dist-check/docker.io/centos-7.sh --mode debug' /bin/bash: line 0: cd: /home/kefu/dev/scylladb: Permission denied ``` to address the permission issue, we need to instruct podman to relabel the shared volume, so that the container can access the shared volume. see also https://docs.podman.io/en/stable/markdown/podman-pod-create.1.html#volume-v-source-volume-host-dir-container-dir-options Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:15:40 +08:00
Kefu Chai	8ef26a9c8c	build: cmake: add "test" target before this change, none of the target generated by CMake-based building system runs `test.py`. but `build.ninja` generated directly by `configure.py` provides a target named `test`, which runs the `test.py` with the options passed to `configure.py`. to be more compatible with the rules generated by `configure.py`, in this change * do not include "CTest" module, as we are not using CTest for driving tests. we use the homebrew `test.py` for this purpose. more importantly, the target named "test" is provided by "CTest". so in order to add our own "test" target, we cannot use "CTest" module. * add a target named "test" to run "test.py". * add two CMake options so we can customize the behavior of "test.py", this is to be compatible with the existing behavior of `configure.py`. Refs scylladb/scylladb#2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20263	2024-08-25 21:45:13 +03:00
Avi Kivity	72a85e3812	Merge 'Integrated backup' from Pavel Emelyanov This adds minimal implementation of the start-backup API call. The method starts a task that uploads all files from the given keyspace's snapshot to the requested endpoint/bucket. Arguments are: - endpoint -- the ID in object_store.yaml config file - bucket -- the target bucket to put objects into - keyspace -- the keyspace to work on - snapshot -- the method assumes that the snapshot had been already taken and only copies sstables from it The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion (hint: it's good to have non-zero TTL value to make sure fast backups don't finish before the caller manages to call wait_task API). Sstables components are scanned for all tables in the keyspace and are uploaded into the /bucket/${cf_name}/${snapshot_name}/ path. refs: #18391 Closes scylladb/scylladb#19890 * github.com:scylladb/scylladb: tools/scylla-nodetool: add backup integration docs: Document the new backup method test/object_store: Test that backup task is abortable test/object_store: Add simple backup test test/object_store: Move format_tuples() test/pylib: Add more methods to rest client backup-task: Make it abortable (almost) code: Introduce backup API method database: Export parse_table_directory_name() helper database: Introduce format_table_directory_name() helper snapshot-ctl: Add config to snapshot_ctl snapshot-ctl: Add sstables::storage_manager dependency snapshot-ctl: Maintain task manager module snapshot-ctl: Add "snapshots" logger snapshot-ctl: Outline stop() method and constructor snapshot-ctl: Inline run_snapshot_list<> test/cql_test_env: Export task manager from cql test env task_manager: Print task ttl on start (for debugging) docs: Update object_storage.md with AWS_ environment docs: Restructure object_storage.md	2024-08-25 20:19:10 +03:00
Kefu Chai	f8931a4578	build: cmake: add "dist" target since the rules generated by `configure.py` has this target, we need to have an equivalent target as well in CMake-based buidling system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20265	2024-08-25 20:18:12 +03:00
Andrei Chekun	f54b7f5427	test.py: Increase pool size Increase pool size changes were recently reverted because of the flakiness for the test_gossip_boot test. Test started to fail on adding the node to the cluster without any issues in the Scylla log file. In test logs it looked like the installation process for the new node just hanged. After investigating the problem, I've found out that the issue is that test.py was draining the io_executor pool for cleaning the directory during install that was set to eight workers. So to fix the issue, io_executor pool should be increased to more or less the same ratio as it was: doubled cluster pool size. Closes scylladb/scylladb#20276	2024-08-25 19:59:18 +03:00
Kefu Chai	a0688b29ea	replication_strategy: add fmt::formatter<replication_strategy_type> so that we can use {fmt} with it without the help of fmt::streamed. also since we have a proper formatter for replication_strategy_type, let's implement `formatter<vnode_effective_replication_map::factory_key>` as well. since there are no callers of these two operator<<, let's drop them in this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20248	2024-08-25 19:34:52 +03:00
Kefu Chai	c88b63ce13	github: use clang-20 in clang-nightly workflow since clang 19 has been branched. let's track the development brach, which is clang 20. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20279	2024-08-25 19:31:43 +03:00
Benny Halevy	686a8f2939	abstract_replication_strategy: make get_ranges async To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:57:34 +03:00
Benny Halevy	2bbbe2a8bc	database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param Prepare for making the function async. Then, it will need to hold on to the erm while getting the token_ranges asynchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:55:33 +03:00
Benny Halevy	ea5a0cca10	compaction: task_manager_module: open code maybe_get_keyspace_local_ranges It is used only here and can be simplified by checking if the keyspace replication strategy is per table by the caller. Prepare for making get_keyspace_local_ranges async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	824bdf99d2	alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder Add static `make` methods to ranges_holder_{primary,secondary} and use them to make the ranges objects and pass them to `token_ranges_owned_by_this_shard`, rather than letting token_ranges_owned_by_this_shard invoke the right constructor of the ranges_holder class. Prepare for making `make` async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	b2abbae24b	alternator: ttl: can pass const gms::gossiper& to ranges_holder There's no need to pass a mutable reference to the gossiper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	333c0d7c88	alternator: ttl: ranges_holder_primary: unconstify _token_ranges member To allow the class to be nothrow_move_constructable. Prepare for returning it as a future value. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	d385219a12	alternator: ttl: refactor token_ranges_owned_by_this_shard Rather than holding a variant member (and defining both ranges_holder_{primary,secondary} in both specilizations of the class, just make the internal ranges_holder class first-class citizens and parameterize the `token_ranges_owned_by_this_shard` template by this class type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Avi Kivity	c4dd21de38	repair: row_level: coroutinize repair_reader::close()	2024-08-24 00:36:48 +03:00
Avi Kivity	b1dd470533	repair: row_level: coroutinize repair_reader::end_of_stream()	2024-08-24 00:35:59 +03:00
Avi Kivity	7ce76fd0ea	repair: row_level: coroutinize sink_source_for_repair::close() The repeat() loop translates to almost nothing.	2024-08-24 00:30:02 +03:00
Avi Kivity	168a018e45	repair: row_level: coroutinize sink_source_for_repair::get_sink_source()	2024-08-24 00:19:12 +03:00
Avi Kivity	6b370d8154	table_helper: insert(): improve indentation Restore after coroutinization.	2024-08-24 00:08:05 +03:00
Avi Kivity	ecd7702007	table_helper: coroutinize insert() Improves readability. The do_with() ensures it's at least as performant (though it's not in any fast path).	2024-08-24 00:08:05 +03:00
Avi Kivity	980ec2f925	table_helper: coroutinize cache_table_info() After we extracted try_prepare(), this is fairly simple, and improves readability.	2024-08-24 00:08:05 +03:00
Avi Kivity	4e44a15d4d	table_helper: extract try_prepare() table_helper::cache_table_info() is fairly convoluted. It cannot be easily coroutinized since it invokes asynchronous functions in a catch block, which isn't supported in coroutines. To start to break it down, extract a block try_prepare() from code that is called twice. It's both a simplification and a first step towards coroutinization. The new try_prepare() can return three values: `true` if it succeeded, `false` if it failed and there's the possibility of attempting a fallback, and an exception on error.	2024-08-24 00:08:05 +03:00
Lakshmi Narayanan Sreethar	4823a1e203	test/pylib: fix keyspace_compaction method The `keyspace_compaction` method incorrectly appends the column family parameter to the URL using a regular string, `"?cf={table}"`, instead of an f-string, `f"?cf={table}"`. As a result, the column family name is sent as `{table}` to the server, causing the compaction request to fail. Fix this issue by passing the parameter to the POST request using a dictionary instead of appending it to the URL. Fixes #20264 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20243	2024-08-23 15:20:10 +03:00
Kefu Chai	4a405b0af9	perf/perf_sstable: enumerate sstables when loading them before this change, we use the default options when creating `test_env`, and the default options enable `use_uuid`. but the modes of `perf-sstables` involving reads assumes that the identifiers are deterministic. so that the previously written sstables using the "write" mode can be read with the modes like "index_read", which just uses `test_env::make_sstable()` in `load_sstables()`, and under the hood, `test_env::make_sstable()` uses `test_env::new_generation()` for retrieving the next identifier of sstable. when using integer-base identifier, this works. as the sstable identifiers are generated from a monotonically increasing integer sequence, where the identifiers are deterministic. but this does not apply anymore when the UUID-based identifiers are used, as the identifiers are generated with a pseudorandom generator of UUID v1. in this change, to avoid relying on the determinism of the integer-based sstable identifier generation, we enumerate sstables by listing the given directory, and parse the path for their identifier. after this change, we are able to support the UUID-based sstable identifier. another option is disable the UUID-based sstable identifier when loading sstables. the upside is that this approach is minimal and straightforward. but the downside is that it encodes the assumption in the algorithm implicitly, and could be confusing -- we create a new generation for loading an existing sstable with this generation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20183	2024-08-23 10:39:24 +03:00
Pavel Emelyanov	d1ac58f088	api: Get compaction througput via compaction manager Now the endpoint hanler gets the value from db::config which is not nice from several perspectives. First, it gets config (ab)using database. Second, it's compaction manager that "knows" its throughput, global config is the initial source of that information. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20173	2024-08-23 10:33:03 +03:00
Pavel Emelyanov	38edbebb10	compaction_manager: Keep flush-all-before-major option on own config Currently the major compaction task impl grabs this (non-updateable) value from db::config. That's not good, all services including compaction manager have their own configs from which they take options. Said that, this patch puts the said option onto compaction_manager::config, makes use of it and configures one from db::config on start (and tests). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20174	2024-08-23 10:31:55 +03:00
Botond Dénes	15fdc3f6cc	Merge 'Add ability to list S3 bucket contents' from Pavel Emelyanov This is prerequisite for "restore from object storage" feature. In order to collect the sstables in bucket one would need to list the bucket contents with the given prefix. The ListObjectsV2 provides a way for it and here's the respective s3::client extension. Closes scylladb/scylladb#20120 * github.com:scylladb/scylladb: test: Add test for s3::client::bucket_lister s3_client: Add bucket lister s3_client: Encode query parameter value for query-string	2024-08-23 10:16:07 +03:00
Kefu Chai	7f65ee3270	dbuild: pass --tty only if --interactive in `947e2814`, we pass `--tty` as long as we are using podman _or_ we are in interactive mode. but if we build the tree using podman using jenkins, we are seeing that ninja is displaying the output as if it's in an interactive mode. and the output includes ASCII escape codes. this is distracting. the reason is that we * are using podman, and * ninja tells if it should displaying with a "smart" terminal by checking istty() and the "TERM" environmental variable. so, in this change, we add --tty only if * we are in the interactive mode. * or stdin is associated with a terminal. this is the use case where user uses dbuild to interactively build scylla Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20196	2024-08-23 09:30:20 +03:00
Kefu Chai	ee19bbed05	test: do not define boost_test_print_type() for types with operator<< in `30e82a81`, we add a contraint to the template parameter of boost_test_print_type() to prevent it from being matched with types which can be formatted with operator<<. but it failed to work. we still have test failure reports like: ``` [Exception] - critical check ['s', 's', 't', '_', 'm', 'r', '.', 'i', 's', '_', 'e', 'n', 'd', '_', 'o', 'f', '_', 's', 't', 'r', 'e', 'a', 'm', '(', ')'] has failed ``` this is not what we expect. the reason is that we passed the template parameters to the `has_left_shift` trait in the wrong order, see https://live.boost.org/doc/libs/1_83_0/libs/type_traits/doc/html/boost_typetraits/reference/has_left_shift.html. we should have passed the lhs of operator<< expression as first parameter, and rhs the second. so, in this change, we correct the type constraint by passing the template parameter in the right order, now the error message looks better, like: ``` test/boost/mutation_query_test.cc(110): error: in "test_partition_query_is_full": check !partition_slice_builder(*s) .with_range({}) .build() .is_full() has failed ``` it turns out boost::transformed_range<> is formattable with operator<<, as it fulfills the constraints of `boost::has_left_shift<ostream, R>`, but when printing it, the compiler fails when it tries to insert the elements in the range to the output stream. so, in order to workaround this issue, we add a specialization for `boost::transformed_range<F, R`. also, to improve the readability, we reimplement the `has_left_shift<>` as a concept, so that it's obvious that we need to put both the output stream as the first parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20233	2024-08-23 09:26:22 +03:00
Amnon Heiman	644e6f0121	test/estimated_histogram_test Add summary tests This patch adds tests for summary calculation. It adds two tests, the first is a basic calculation for P50, P95, P99 by adding 100 elements into 20 buckets. The second test look that if elements are found in the infinite bucket, the result would be the lower limit (33s) and not infinite. Relates to #20255 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-22 23:34:24 +03:00
Amnon Heiman	011aa91a8c	utils/histogram.hh: Make summary support inifinite bucket. This patch handles an edge cases related to The infinite bucket limit. Summaries are the P50, P95, and P99 quantiles. The quantiles are calculated from a histogram; we find the bucket and return its upper limit. In classic histograms, there is a notion of the infinite bucket; anything that does not fall into the last bucket is considered to be infinite; with quantile, it does not make sense. So instead of reporting infinite we'll report the bucket lower limit. Fixes #20255 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-22 23:34:24 +03:00
Kefu Chai	39dd088374	test: include used headers before this change, clang 20 fails to build the tree, like: ``` /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o -MF test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o.d -o test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o -c /home/kefu/dev/scylladb/test/boost/database_test.cc /home/kefu/dev/scylladb/test/boost/database_test.cc:539:29: error: invalid use of incomplete type 'schema_builder' 539 \| return *schema_builder(ks_name, cf_name) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/schema/schema.hh:115:7: note: forward declaration of 'schema_builder' 115 \| class schema_builder; \| ^ ``` and ``` /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o -MF test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o.d -o test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o -c /home/kefu/dev/scylladb/test/boost/group0_cmd_merge_test.cc /home/kefu/dev/scylladb/test/boost/group0_cmd_merge_test.cc:78:18: error: member access into incomplete type 'db::config' 78 \| cfg.db_config->commitlog_segment_size_in_mb(1); \| ^ /home/kefu/dev/scylladb/data_dictionary/data_dictionary.hh:28:7: note: forward declaration of 'db::config' 28 \| class config; \| ^ 1 error generated. ``` and ``` `FAILED: test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o -MF test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o.d -o test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o -c /home/kefu/dev/scylladb/test/boost/repair_test.cc /home/kefu/dev/scylladb/test/boost/repair_test.cc:149:45: error: use of undeclared identifier 'global_schema_ptr' 149 \| co_await e.db().invoke_on_all([gs = global_schema_ptr(gen.schema())](replica::database& db) -> future<> { \| ^ /home/kefu/dev/scylladb/test/boost/repair_test.cc:150:62: error: use of undeclared identifier 'gs' 150 \| co_await db.add_column_family_and_make_directory(gs.get(), replica::database::is_new_cf::yes); \| ^ 2 errors generated. ``` because we are using incomplete types when their complete definitions are required. so, in this change, we include the headers for their complete definition. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20239	2024-08-22 20:51:38 +03:00
Kefu Chai	969cbb75ce	tools/scylla-nodetool: add backup integration as we have an API for backup a keyspace, let's expose this feature with nodetool. so we can exercise it without the help of scylla-manager or 3rd-party tools with a user-friendly interface. in this change: * add a new subcommand named "backup" to nodetool * add test to verify its interaction with the API server * add two more route to the REST API mock server, as the test is using /task_manager/wait_task/{task_id} API. for the sake of completeness, the route for /task_manager/{part1} is added as well. * update the document accordingly. * the bash completion script is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-22 19:48:06 +03:00
Pavel Emelyanov	245cc852dd	docs: Document the new backup method Add the new /storage_service/backup endpoint to object_storage.md as yet another way to use S3 from Scylla.	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	de87450453	test/object_store: Test that backup task is abortable It starts similarly to simpl backup test, but injects a pause into the task once a single file is scheduled for upload, then aborts the task, waits for it to fail, and check that _not_ all files are uploaded. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	f8d894bc23	test/object_store: Add simple backup test The test shows how to backup a keyspace: - flush - take snapshot - start backup with the new API method - wait for the task to finish Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	47e49e6dec	test/object_store: Move format_tuples() There will soon appear a new .py file in the suite that will want to use this helper too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	d83d585709	test/pylib: Add more methods to rest client Namely: - POST /storage_service/snapshots to take snapshot on a ks - GET /task_manager/get_task_status/{id} to get status of a running task - GET /task_manager/wait_task/{id} to wait for a task to finish - POST /task_manager/abort_task/{id} to abort a running task Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	ed6e6700ab	backup-task: Make it abortable (almost) Make the impl::is_abortable() return 'yes' and check the impl::_as in the files listing loop. It's not real abort, since files listing loop is expected to be fast and most of the time will be spent in s3::client code reading data from disk and sending them to S3, but client doesn't support aborting its requests. That's some work yet to be done. Also add injection for future testing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	a812f13ddd	code: Introduce backup API method The method starts a task that uploads all files from the given keyspace's snapshot to the requested endpoint/bucket. The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion (hint: it's good to have non-zero TTL value to make sure fast backups don't finish before the caller manages to call wait_task API). If snapshot doesn't exist, nothing happens (FIXME, need to return back an error in that case). If endpoint is not configured locally, the API call resolves with bad-request instantly. Sstables components are scanned for all tables in the keyspace and are uploaded into the /bucket/${cf_name}/${snapshot_name}/ path. Task is not abortable (FIXME -- to be added) and doesn't really report its progress other than running/done state (FIXME -- to be added too). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	f7b380d53b	database: Export parse_table_directory_name() helper There's parse_table_directory_name() static helper in database.cc code that is used by methods that parse table tree layout for snapshot. Export this helper for external usage and rename to fit the format_... one introduced by previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:48 +03:00
Pavel Emelyanov	33962946fc	database: Introduce format_table_directory_name() helper The one makes table directory (not full path) out of table name and uuid. This is to be symmetrical with yet another helper that converts dirctory name back to table name and uuid (next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:48 +03:00
Pavel Emelyanov	dff51fd58c	snapshot-ctl: Add config to snapshot_ctl Pretty much all services in Scylla have their own config. Add one to snapshot-ctl too, it will be populated later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:20 +03:00
Pavel Emelyanov	f37857e20a	snapshot-ctl: Add sstables::storage_manager dependency The storage_manager maintains set of clients to configured object storage(s). The snapshot ctl is going to spawn tasks that will talk to those storages, thus it needs the storage manager to get the clients from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	362331c89b	snapshot-ctl: Maintain task manager module This service is going to start tasks managed by task manager. For that, it should have its module set up and registered. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4ae89a9c81	snapshot-ctl: Add "snapshots" logger Will be used later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	90c794172b	snapshot-ctl: Outline stop() method and constructor These two are going to grow, keep them out not to pollute the header Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	96946a4b11	snapshot-ctl: Inline run_snapshot_list<> This helper will be used by a code from another .cc file, so the template needs to be in header for smooth instantiation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4e73b4d8ad	test/cql_test_env: Export task manager from cql test env To be used by one of the next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4b86eede1f	task_manager: Print task ttl on start (for debugging) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	8949d73cd9	docs: Update object_storage.md with AWS_ environment Commit `51c53d8db6` made it possible to configure object storage endpoint creds via environment. Mention this in the docs.	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	d3f9865d2f	docs: Restructure object_storage.md Currently the doc assumes that object storage can only be used to keep sstables on it. It's going to change, restructure the doc to allow for more usage scenarios.	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4e2d7aa2a2	test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works Currently it doesn't, one of the node crashes with std::out_of_range exception and meaningless calltrace [Botond]: this test checks the case of reading a partition via MUTATION_FRAGMENTS from a node which doesn't own said partition. refs: #18786 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 06:24:06 -04:00
Botond Dénes	46563d719f	replica/mutation_dump: enfore pinning of effective replication map By making it a required argument, making sure the topology version is pinned for the duration of the query. This is needed because mutation dump queries bypass the storage proxy, where this pinning usually takes place. So it has to be enforced here.	2024-08-22 06:24:06 -04:00
Botond Dénes	de5329157c	replica/mutation_dump: handle un-owned tokens (with tablets) When using tablets, the replica-side doesn't handle un-owned tokens. table::shard_for_reads() will just return 0 for un-owned tokens, and a later attempt at calling table::storage_group_for_token() with said un-owned token will cause a crash (std::terminate due to std::out_of_range thrown in noexcept context). The replicas rely on the coordinator to not send stray requests, but for select from mutation_fragments(table) queries, there is no coordinator side who could do the correct dispatching. So do this in mutation_dump(), just creating empty readers for un-owned tokens.	2024-08-22 03:06:55 -04:00
Łukasz Paszkowski	a11d19f321	test_query.py: Test reverse queries with clustering key bounds Since a native reversed format is used for reversed queries, additional tests with restrictions on clustering keys are required to capture possible errors like https://github.com/scylladb/scylladb/issues/20191 earlier than in dtests. Add parametrization to the following tests: + test_query_reverse + test_query_reverse_paging to accept a comparison operator used in selection criteria for a Query operation.	2024-08-21 14:21:34 +02:00
Aleksandra Martyniuk	9b7c837106	test: add test to check compaction stop log	2024-08-21 12:42:37 +02:00
Aleksandra Martyniuk	5005e19de7	compaction: fix compaction group stop reason compaction_manager::remove passes "table removal" as a reason of stopping ongoing compactions, but currently remove method is also called when a tablet is migrated or split. Pass the actual reason of compaction stop, so that logs aren't misleading.	2024-08-21 12:42:09 +02:00
Avi Kivity	2ef5b5e4fe	Revert "[test.py] Increase pool size for CI" This reverts commit `cc428e8a36`. It causes may spurious CI failures while nodes are being torn down. Revert it until the root cause is fixed, after which it can be reinstated. Fixes #20116.	2024-08-21 13:21:08 +03:00
Benny Halevy	f40d06b766	table: calculate_tablet_count: use sg_manager storage_groups size Now, when each shard storage_group_manager keeps only the storage_groups for the tablet replica it owns, we can simple return the storage_group map size instead of counting the number of tablet replicas mapped to this shard. Add a unit test that sums the tablet count on all shards and tests that the sum is equal to the configured default `initial_tablets. Fixes #18909 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20223	2024-08-21 11:01:58 +02:00
Tomasz Grabiec	a3a97e8aad	Merge 'schema_tables: calculate_schema_digest: prevent stalls due to large m…' from Benny Halevy …utations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 Closes scylladb/scylladb#18174 * github.com:scylladb/scylladb: schema_tables: calculate_schema_digest: filter the key earlier schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector	2024-08-20 21:24:38 +02:00
Łukasz Paszkowski	f29d7ffa81	alternator::do_query Add additional trace log Additional log prints information on the read query being executed. It lists information like whether the query is a reversed one or not, and table_schema and query_schema versions.	2024-08-20 20:56:15 +02:00
Łukasz Paszkowski	727cbd8151	alternator::do_query: Use native reversed format When executing reversed queries, a native revered format shall be used. Therefore the table schema and the clustering key bounds are reversed before a partition slice and a read command are constructed. Similarly as for cql3::statements::select_statement.	2024-08-20 20:56:15 +02:00
Łukasz Paszkowski	3720e8aabe	alternator::do_query Rename schema with table_schema In order to increase readability, a schema variable is renamed to a table_schema to emphesize a table schema is passed to the function and used across it. Allows us to introduce a query_schema variable in the next patch.	2024-08-20 20:56:06 +02:00
Aleksandra Martyniuk	9d9414a75d	replica: add/remove table atomically Currently, database::tables_metadata::add_table needs to hold a write lock before adding a table. So, if we update other classes keeping track of tables before calling add_table, and the method yields, table's metadata will be inconsistent. Set all table-related info in tables_metadata::add_table_helper (called by add_table) so that the operation is atomic. Analogically for remove_table. Fixes: #19833. Closes scylladb/scylladb#20064	2024-08-20 20:53:32 +03:00
Kamil Braun	5c9efdff50	Merge 'raft: store_snapshot_descriptor to use actually preserved items number when truncating the local log table' from Sergey Zolotukhin io_fiber/store_snapshot_descriptor now gets the actual number of items preserved when the log is truncated, fixing extra entries remained after log snapshot creation. Also removes incorrect check for the number of truncated items in the raft_sys_table_storage::store_snapshot_descriptor. Minor change: Added error_injection test API for changing snapshot thresholds settings. Fixes scylladb/scylladb#16817 Fixes scylladb/scylladb#20080 Closes scylladb/scylladb#20095 * github.com:scylladb/scylladb: raft: Ensure const correctness in applier_fiber. raft: Invoke store_snapshot_descriptor with actually preserved items. raft: Use raft_server_set_snapshot_thresholds in tests. raft: Fix indentation in server.cc raft: Add a test to check log size after truncation. raft: Add raft_server_set_snapshot_thresholds injection. utils: Ensure const correctness of injection_handler::get().	2024-08-20 18:15:30 +02:00
Tomasz Grabiec	ff52527c54	Merge 'repair: do_rebuild_replace_with_repair: use source_dc only when safe' from Benny Halevy It is unsafe to restrict the sync nodes for repair to the source data center if it has too low replication factor in network_topology_replication_strategy, or if other nodes in that DC are ignored. Also, this change restricts the usage of source_dc to `network_topology` and `everywhere_topology` strategies, as with simple replication strategy there is no guarantee that there would be any more replicas in that data center. Fixes #16826 Reproducer submitted as https://github.com/scylladb/scylla-dtest/pull/3865 It fails without this fix and passes with it. * Requires backport to live versions. Issue hit in the filed with 2022.2.14 Closes scylladb/scylladb#16827 * github.com:scylladb/scylladb: repair: do_rebuild_replace_with_repair: use source_dc only when safe repair: replace_with_repair: pass the replace_node downstream repair: replace_with_repair: pass ignore_nodes as a set of host_id:s repair: replace_rebuild_with_repair: pass ks_erms from caller nodetool: rebuild: add force option Add and use utils::optional_param to pass source_dc	2024-08-20 16:13:23 +02:00
Sergey Zolotukhin	13b3d3a795	raft: Ensure const correctness in applier_fiber. Add 'const' to non mutable varibales in server_impl::applier_fiber() function.	2024-08-20 15:24:00 +02:00
Sergey Zolotukhin	c3e52ab942	raft: Invoke store_snapshot_descriptor with actually preserved items. - raft_sys_table_storage::store_snapshot_descriptor now receives a number of preserved items in the log, rather than _config.snapshot_trailing value; - Incorrect check for truncated number of items in store_snapshot_descriptor was removed. Fixes scylladb/scylladb#16817 Fixes scylladb/scylladb#20080	2024-08-20 15:22:49 +02:00
Sergey Zolotukhin	922e035629	raft: Use raft_server_set_snapshot_thresholds in tests. Replace raft_server_snapshot_reduce_threshold with raft_server_set_snapshot_thresholds in tests as raft_server_set_snapshot_thresholds fully covers the functionality of raft_server_snapshot_reduce_threshold.	2024-08-20 15:08:49 +02:00
Sergey Zolotukhin	00a1d3e305	raft: Fix indentation in server.cc	2024-08-20 15:08:45 +02:00
Sergey Zolotukhin	b6de8230a9	raft: Add a test to check log size after truncation. The test checks that snapshot_trailing_size parameter is taken into consideration when the log system table is truncated. Test for scylladb#16817	2024-08-20 14:15:50 +02:00
Sergey Zolotukhin	9dfa041fe1	raft: Add raft_server_set_snapshot_thresholds injection. Use error injection to allow overriding following snapshot threshold settings: - snapshot_threshold - snapshot_threshold_log_size - snapshot_trailing - snapshot_trailing_size	2024-08-20 14:15:50 +02:00
Sergey Zolotukhin	c5da0775f2	utils: Ensure const correctness of injection_handler::get(). Make utils::error_injection::injection_handler::get() method 'const' as it does not mutate object's state.	2024-08-20 14:15:50 +02:00
Botond Dénes	3ee0d7f2d1	Merge 'tools: Enhance scylla sstable shard-of to support tablets' from Kefu Chai before this change, `scylla sstable shard-of` didn't support tablets, because: - with tablets enabled, data distribution uses the scheduler - this replaces the previous method of mapping based on vnodes and shard numbers - as a result, we can no longer deduce sstable mapping from token ranges in this change, we: - read `system.tablets` table to retrieve tablet information - print the tablet's replica set (list of <host, shard> pairs) - this helps users determine where a given sstable is hosted This approach provides the closest equivalent functionality of `shard-of` in the tablet era. Fixes scylladb/scylladb#16488 --- no need to backport, it's an improvement, not a critical fix. Closes scylladb/scylladb#20002 * github.com:scylladb/scylladb: tools: enhance `scylla sstable shard-of` to support tablets replica/tablets: extract tablet_replica_set_from_cell() tools: extract get_table_directory() out tools: extract read_mutation out build: split the list of source file across multiple line tools/scylla-sstable: print warning when running shard-of with tablets	2024-08-20 13:51:12 +03:00
Avi Kivity	e2b179a3d0	Merge 'Coroutinize sstable_directory registry garbage collecting method' from Pavel Emelyanov null Closes scylladb/scylladb#20172 * github.com:scylladb/scylladb: sstable_directory: Coroutinize inner lambdas sstable_directory: Fix indentation after previous patch sstable_directory: Coroutinize outer cotinuation chain	2024-08-20 12:50:09 +03:00
David Garcia	fea707033f	docs: improve include flag directive The include flag directive now treats missing content as info logs instead of warnings. This prevents build failures when the enterprise-specific content isn't yet available. If the enterprise content is undefined, the directive automatically loads the open-source content. This ensures the end user has access to some content. address comments Closes scylladb/scylladb#19804	2024-08-20 12:21:39 +03:00
Kefu Chai	9a10c33734	build: cmake: do not build storage_proxy.o by default in `5ce07e5d84`, the target named "storage_proxy.o" was added for training the build of clang. but the rule for building this target has two flaws: * it was added a dependency of the "all" target, but we don't need to build `storage_proxy.cc` twice when building the tree in the regular build job. we only need to build it when creating the profile for training the build of clang. * it misses the include directory of abseil library. that's why we have following build failure when building the default target: ``` [2024-08-18T14:58:37.494Z] /usr/local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/jenkins/workspace/scylla-master/scylla-ci/scylla -I/jenkins/workspace/scylla-master/scylla-ci/scylla/seastar/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/seastar/gen/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/seastar/gen/src -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/gen -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/jenkins/workspace/scylla-master/scylla-ci/scylla=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o -MF service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o.d -o service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o -c /jenkins/workspace/scylla-master/scylla-ci/scylla/service/storage_proxy.cc [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/service/storage_proxy.cc:17: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/db/commitlog/commitlog.hh:19: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/db/commitlog/commitlog_entry.hh:15: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/mutation/frozen_mutation.hh:15: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/mutation/mutation_partition_view.hh:16: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/build/gen/idl/mutation.dist.impl.hh:14: [2024-08-18T14:58:37.495Z] /jenkins/workspace/scylla-master/scylla-ci/scylla/serializer_impl.hh:20:10: fatal error: 'absl/container/btree_set.h' file not found [2024-08-18T14:58:37.495Z] 20 \| #include <absl/container/btree_set.h> [2024-08-18T14:58:37.495Z] \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ [2024-08-18T14:58:37.495Z] 1 error generated. ``` * if user only enables "dev" mode, we'd have: ``` CMake Error at service/CMakeLists.txt:54 (add_library): No SOURCES given to target: storage_proxy.o ``` so, in this change, we * exclude this target from "all" * link this target against abseil header library, so it has access to the abseil library. please note, we don't need to build an executable in this case, so the header would suffice. * add a proxy target to conditionally enable/disable this target. as CMake does not support generator expression in `add_dependencies()` yet at the time of writing. see https://gitlab.kitware.com/cmake/cmake/-/issues/19467 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20195	2024-08-19 21:30:34 +03:00
Avi Kivity	7eb3b15fff	Merge 'utils/tagged_integer: remove conversion to underlying integer' from Laszlo Ersek ~~~ utils/tagged_integer: remove conversion to underlying integer Silently converting a tagged (i.e., "dimension-ful") integer to a naked ("dimensionless") integer defeats the purpose of having tagged integers, and is a source of practical bugs, such as <https://github.com/scylladb/scylladb/issues/20080>. We could make the conversion operator explicit, for enforcing static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE) in every conversion location -- but that's a mouthful to write. Instead, remove the conversion operator, and let clients call the (identically behaving) value() member function. ~~~ No backport needed (refactoring). The series is supposed to solve #20081. Two patches in the series touch up code that is known to be (orthogonally) buggy; see - `service/raft_sys_table_storage: tweak dead code` (#20080) - `test/raft/replication: untag index_t in test_case::get_first_val()` (#20151) Fixes for those (independent) issues will have to be rebased on this series, or this series will have to be rebased on those (due to context conflicts). The series builds at every stage. The debug and release unit test suites pass at the end. Closes scylladb/scylladb#20159 * github.com:scylladb/scylladb: utils/tagged_integer: remove conversion to underlying integer test/raft/randomized_nemesis_test: clean up remaining index_t usage test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot() test/raft/replication: clean up remaining index_t usage test/raft/replication: take an "index_t start_idx" in create_log() test/raft/replication: untag index_t in test_case::get_first_val() test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions test/raft/helpers: tighten compare_log_entries() param types service/raft_sys_table_storage: tweak dead code service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries) service/raft_sys_table_storage: untag index_t and term_t for queries raft/server: clean up index_t usage raft/tracker: don't drop out of index_t space for subtraction raft/fsm: clean up index_t and term_t usage raft/log: clean up index_t usage db/system_keyspace: promise a tagged integer from increment_and_get_generation() gms/gossiper: return "strong_ordering" from compare_endpoint_startup() gms/gossiper: get "int32_t" value of "gms::version_type" explicitly	2024-08-19 19:52:54 +03:00
Benny Halevy	5f655e41e3	repair: do_rebuild_replace_with_repair: use source_dc only when safe It is unsafe to restrict the sync nodes for repair to the source data center if we cannot guarantee a quorum in the data center with network-topology replication strategy. This change restricts the usage of source_dc in the following cases: 1. For SimpleStrategy - source_dc is ignored since there is no guarantee that it contains remaining replicas for all tokens. 2. For EverywhereStrategy - use source_dc if there are remaining live nodes in the datacenter. 3. For NetworkTopologyStrategy: a. It is considered unsafe to use source_dc if number of nodes lost in that DC (replaced/rebuilt node + additional ignored nodes) is greater than 1, or it has 1 lost node and rf <= 1 in the DC. b. If the source_dc arg is forced, as with the new `nodetool rebuild --force <source_dc>` option, we use it anyway, even if it's considered to be unsafe. A warning is printed in this case. c. If the source_dc arg is user-provided, (using nodetool rebuild), an error exception is thrown, advising to use an alternative dc, if available, omit source_dc to sync with all nodes, or use the --force option to use the given source_dc anyhow. d. Otherwise, we look for an alternative source datacenter, that has not lost any node. If such datacenter is found we use it as source_dc for the keyspace, and log a warning. e. If no alternative dc is found (and source_dc is implicit), then: log a warning and fall back to using replicas from all nodes in the cluster. Fixes #16826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:23:51 +03:00
Benny Halevy	8665eef98c	repair: replace_with_repair: pass the replace_node downstream To be used by the next path to count how many nodes are lost in each datacenter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:23:33 +03:00
Benny Halevy	9729dd21c3	repair: replace_with_repair: pass ignore_nodes as a set of host_id:s The callers already pass ignore_nodes as host_id:s and we translate them into inet_address only for repair so delay the translation as much as posible, Refs scylladb/scylladb#6403 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:22:01 +03:00
Benny Halevy	b5d0ab092c	repair: replace_rebuild_with_repair: pass ks_erms from caller The keyspaces replication maps must be in sync with the token_metadata_ptr passed already to the functions, so instead of getting it in the callee, let the caller get the ks_erms along with retrieving the tmptr. Note that it's already done on the rebuild path for streaming based rebuild. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:20:27 +03:00
Benny Halevy	0419b1d522	nodetool: rebuild: add force option To be used to force usage of source_dc, even when it is unsafe for rebuild. Update docs and add test/nodetool/test_rebuild.py Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:20:12 +03:00
Benny Halevy	8b1877f3ca	Add and use utils::optional_param to pass source_dc Clearly indicate if a source_dc is provided, and if so, was it explicitly given by the user, or was implicitly selected by scylla. This will become useful in the next patches that will use that to either reject the operation if it's unsafe to use the source_dc and the dc was explicitly given by the user, or whether to fallback to using all nodes otherwise. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:13:54 +03:00
Anna Stuchlik	83d5cb04c2	doc: extract the info about tablets defaut to a separate file This commit extracts the information about the default for tables in keyspace creation to a separate file in the _common folder. The file is then included using the scylladb_include_flag directive. The purpose of this commit is to make it possible to include a different file in the scylla-enterprise repo - with a different default. Refs https://github.com/scylladb/scylla-enterprise/issues/4585 Closes scylladb/scylladb#20181	2024-08-19 16:16:18 +03:00
Kefu Chai	25b3c50f71	test/nodetool: print default value of options in help message would be more helpful, if the output of "--help" command line can include the default value of options. so, in this change, we include the default values in it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20170	2024-08-19 16:15:24 +03:00
Botond Dénes	40d2a6f0b2	Merge 'test.py: use XPath for iterating in "TestSuite/TestSuite"' from Kefu Chai before this change, we check for the existence of "TestSuite" node under the root of XML tree, and then enumerating all "TestSuite" nodes under this "TestSuite", this approach works. but it * introduces unnecessary indent * is not very readable in this change, we just use "./TestSuite/TestSuite" for enumerating all "TestSuite" nodes under "TestSuite". simpler this way. --- it's a cleanup in the test driver script, hence no need to backport. Closes scylladb/scylladb#20169 * github.com:scylladb/scylladb: test.py: fix the indent test.py: use XPath for iterating in "TestSuite/TestSuite"	2024-08-19 16:13:42 +03:00
Botond Dénes	6835f7e993	Merge 'Add CQL-based RBAC support to Alternator' from Piotr Smaron Alternator already supports authentication - the ability to to sign each request as a particular user. The users that can be used are the different "roles" that are created by CQL "CREATE ROLE" commands. This series adds support for authorization, i.e., the ability to determine that only some of these roles are allowed to read or write particular tables, to create new tables, and so on. The way we chose to do this in this series is to support CQL's existing role-based access control (RBAC) commands - GRANT and REVOKE - on Alternator tables. For example, an Alternator table "xyz" is visible to CQL as "alternator_xyz.xyz", so a `GRANT SELECT ON alternator_xyz.xyz TO myrole` will allow read commands (e.g., GetItem) on that table, and without this GRANT, a GetItem will fail with `AccessDeniedException`. This series adds the necessary checks to all relevant Alternator operations, and also adds extensive functional testing for this feature - i.e., that certain DynamoDB API operations are not allowed without the appropriate GRANTs. The following permissions are needed for the following Alternator API operations: * SELECT: `GetItem`, `Query`, `Scan`, `BatchGetItem`, `GetRecords` * MODIFY: `PutItem`, `DeleteItem`, `UpdateItem`, `BatchWriteItem` * CREATE: `CreateTable` * DROP: `DeleteTable` * ALTER: `UpdateTable`, `TagResource`, `UntagResource`, `UpdateTimeToLive` * _none needed_: `ListTables`, `DescribeTable`, `DescribeEndpoints`, `ListTagsOfResource`, `DescribeTimeToLive`, `DescribeContinuousBackups`, `ListStreams`, `DescribeStream`, `GetShardIterator` Currently, I decided that for consistency each operation requires one permission only. For example, PutItem only requires MODIFY permission. This is despite the fact that in some cases (namely, `ReturnValues=ALL_OLD`) it can also _read_ the item. We should perhaps discuss this decision - and compare how it was done in CQL - e.g., what happens in LWT writes that may return old values? Different permissions can be granted for a base table, each of its views, and the CDC table (Alternator streams). This adds power - e.g., we can allow a role to read only a view but not the base table, or read the table but not its history. GRANTing permissions on views or CDC logs require knowing their names, which are somewhat ugly (e.g., the name of GSI "abc" in table "xyz" is `alternator_xyz.xyz:abc`). But usefully, the error message when permissions are denied contains the full name of the table that was lacking permissions and which permissions were lacking, so users can easily add them. In addition to permissions checking, this series also correctly supports _auto-grant_ (except #19798): When a role has permissions to `CreateTable`, any table it creates will automatically be granted all permissions for this role, so this role will be able to use the new table and eventually delete it. `DeleteTable` does the opposite - it removes permissions from tables being deleted, so that if later a second user re-creates a table with the same name, the first user will not have permissions over the new table. The already-existing configuration parameter `alternator_enforce_authorization` (off by default), which previously only enabled authentication, now also enables authorization. Users that upgrade to the new version and already had `alternator_enforce_authorization=true` should verify that the users they use to authenticate either have the appropriate permissions or the "superuser" flag. Roles used to authenticate must also have the "login" flag. Please note that although the new RBAC support implements the access control feature we asked for in #5047, this implementation is _not compatible_ with DynamoDB. In DynamoDB, the access control is configured through IAM operations or through the new `PutResourcePolicy` - operation, not through CQL (obviously!). DynamoDB also offers finer access-control granularity than we support (Scylla's RBAC works on entire tables, DynamoDB allows setting permissions on key prefixes, on individual attributes, and more). Despite this non-compatibility, I believe this feature, as is, will already be useful to Alternator users. Fixes #5047 (after closing that issue, a new clean issue should be opened about the DynamoDB-compatible APIs that we didn't do - just so we remember this wasn't done yet). New feature, should not be backported. Closes scylladb/scylladb#20135 * github.com:scylladb/scylladb: tests: disable test_alternator_enforce_authorization_true test, alternator: test for alternator_enforce_authorization config test/pylib: allow setting driver_connect() options in servers_add() test: fix test_localnodes_joining_nodes alternator, RBAC: reproducer for missing CDC auto-grant alternator: document the new RBAC support alternator: add RBAC enforcement to GetRecords test/alternator: additional tests for RBAC test/alternator: reduce permissions-validity-in-ms test/alternator: add test for BatchGetItem from multiple tables alternator: test for operations that do not need any permissions alternator: add RBAC enforcement to UpdateTimeToLive alternator: add RBAC enforcement to TagResource and UntagResource alternator: add RBAC enforcement to BatchGetItem alternator: add RBAC enforcement to BatchWriteItem alternator: add RBAC enforcement to UpdateTable alternator: add RBAC enforcement to Query and Scan alternator: add RBAC enforcement to CreateTable alternator: add RBAC enforcement to DeleteTable alternator: add RBAC enforcement to UpdateItem alternator: add RBAC enforcement to DeleteItem alternator: add RBAC enforcement to PutItem alternator: add RBAC enforcement to GetItem alternator: stop using an "internal" client_state	2024-08-19 16:09:53 +03:00
Tomasz Grabiec	c1de4859d8	Merge 'tablets: Fix race between repair and split' from Raphael "Raph" Carvalho Consider the following: ``` T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes ``` If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes #19378. Fixes #19416. *Please replace this line with justification for the backport/\ labels added to this PR** Closes scylladb/scylladb#19427 * github.com:scylladb/scylladb: tablets: Fix race between repair and split compaction: Allow "offline" sstable to be split	2024-08-19 14:44:28 +02:00
Kefu Chai	151074240c	utils: cached_file: use structured binding when appropriate for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20184	2024-08-19 14:01:42 +03:00
Piotr Smaron	f773c76bfb	codeowners: add appropriate reviewers to the cluster components	2024-08-19 12:39:47 +02:00
Anna Stuchlik	8fb746a5d2	doc: fix a link on the RBAC page This commit fixes an external link on the Role Based Access Control page. Fixes https://github.com/scylladb/scylladb/issues/20166 Closes scylladb/scylladb#20171	2024-08-19 12:56:38 +03:00
Piotr Smaron	cdc88cd06c	tests: disable test_alternator_enforce_authorization_true The test is flaky and needs to be fixed in order to not randomly break our CI, OTOH can be commented out for the time being, so that we can marge the feature.	2024-08-19 09:57:53 +02:00
Nadav Har'El	989dbef315	test, alternator: test for alternator_enforce_authorization config This patch adds tests that demonstrates the current way that Alternator's authentication and authorization are both enabled or disabled by the option "alternator_enforce_authorization". If in the future we decide to change this option or eliminate it (e.g., remain just with the "authenticator" and "authorizer" options), we can easily update these tests to fit the new configuration parameters and check they work as expected. Because the new tests want to start Scylla instances with different configuration parameters, they are written in the the "topology" framework and not in the test/alternator framework. The test/alternator framework still contains (test/alternator/test_cql_rbac.py) the vast majority of the functional testing of the RBAC feature where all those tests just assume that RBAC is enabled and needs to be tested. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	41418603e1	test/pylib: allow setting driver_connect() options in servers_add() The manager.driver_connect() functions allows to pass parameters when creating the connection (e.g., a special auth_provider), but unfortunately right now the servers_add() function always calls driver_connect() without parameters. So in this patch we just add a new optional parameter to servers_add(), driver_connect_opts, that will be passed to driver_connect(). In theory instead of the new option to driver_connect() a caller can pass start=False to servers_add() and later call driver_connect() manually with the right arguments. The problem is that start=False avoids more than just calling driver_connect(), so it doesn't solve the problem. An example of using the new option is to run Scylla with authentication enabled, and then connect to it using the correct default account ("cassandra"/"cassandra"): config = { 'authenticator': 'PasswordAuthenticator', 'authorizer': 'CassandraAuthorizer' } servers = await manager.servers_add(1, config=config, driver_connect_opts={'auth_provider': PlainTextAuthProvider(username='cassandra', password='cassandra')}) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	de20ac1a6d	test: fix test_localnodes_joining_nodes The existing test topology_experimental_raft/test_alternator::test_localnodes_joining_nodes Tried to create a second server but not wait for it to complete, but the trick it used (cancelling the task) doesn't work since commit `2ee063c` makes a list of unwaited tasks and waits for them anyway. The test appears to work because it is the last test in the file, but if we ever add another test in the same file (like I plan to do in the next patch), that other test will find a "BROKEN" ScyllaClusterManager and report that it failed :-( Other tricks I tried to use (like killing the servers) also didn't work because of various limitations and complications of the test framework and all its layers. So not wanting to fight the fragile testing framework any more at this point, I just gave up and the test will wait for the second server to come up. This adds 120 seconds (!) to the test, but since this whole test file already takes more than 500 seconds to complete, let's bite this bullet. Maybe in the future when the test framework improves, we can avoid this 120 second wait. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	79f9b3007e	alternator, RBAC: reproducer for missing CDC auto-grant This patch adds a reproducing (xfailing) test for issue #19798, which shows that if a role is able to create an Alternator table, the role is able to read the new table (this is known as "auto-grant"), but is NOT able to read the CDC log (i.e., use Alternator Streams' "GetRecords"). Once we do fix this auto-grant bug, it's also important to also implement auto-revoke - the permissions on a deleted table must be deleted as well (otherwise the old owner of a deleted table will be able to read a new table with the same name). This patch also adds a test verifying that auto-revoke works. This test currently passes (because there is no auto- grant, so nothing needs to be revoked...) but if we'll implement auto-grant and forget auto-revoke, the second test will start to fail - so I added this test as a precaution against a bad fix. Refs #19798 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	7de6aedd47	alternator: document the new RBAC support In docs/alternator/compatibility.md we said that although Alternator supports authentication, it doesn't support authorization (access control). Now it does, so the relevant text needs to be corrected to fit what we have today. It's still in the compatibility.md document because it's not the same API as DynamoDB's, so users with existing applications may need to be aware of this difference. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	f9ff475dfb	alternator: add RBAC enforcement to GetRecords This patch adds a requirement for the "SELECT" permission on a table to run a GetRecords on it (the DynamoDB Streams API, i.e., CDC). The grant is checked on the CDC log table - not on the base table, which allows giving a role the ability to read the base but not is change stream, or vice versa. The operations ListStreams, DescribeStreams, GetShardIterators do not require any permissions to run - they do not read any data, and are (in my opinion) similar in spirit to DescribeTable, so I think it's fine not to require any permissions for them. A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	0789841cf8	test/alternator: additional tests for RBAC Additional tests for support for CQL Role-Based Access Control (RBAC) in Alternator: 1. Check that even in an Alternator table whose name isn't valid as CQL table names (e.g., uses the dot character) the GRANT/REVOKE commands work as expected. 2. Check that superuser roles have full permissions, as expected. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	409fea5541	test/alternator: reduce permissions-validity-in-ms We set in test/cql-pytest/run.py, affecting test/alternator/run, the configuration permissions_validity_in_ms by default to 100ms. This means that tests that need to check how GRANT or REVOKE work always need to sleep for more than 100ms, which can make a test with a lot of these operations very slow. So let's just set this configuration value to 5ms. I checked that it doesn't adversely affect the total running speed of test/alternator/run. This change only affects running tests through test/alternator/run, which is expected to be fast. I left the default for test.py as it was, 100ms, the latency of individual tests is less important there. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	1b20a11dec	test/alternator: add test for BatchGetItem from multiple tables While working on the RBAC on BatchGetItem, I noticed that although BatchGetItem may ask to read items from several tables, we don't have a test covering this case! This patch fixes that testing oversight. Note that for the write-side version of this operation, BatchWriteItem, we do have tests that write to several tables in the same batch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	f827bd51d2	alternator: test for operations that do not need any permissions Some operations, namely ListTables, DescribeTable, DescribeEndpoints, ListTagsOfResource, DescribeTimeToLive and DescribeContinuousBackups do not need any permissions to be GRANTed to a role. Our rationale for this decision is that in CQL, "describe table" and friends also do not require any permissions. This patch includes a test that verifies that they really don't need permissions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	9417cf8bcf	alternator: add RBAC enforcement to UpdateTimeToLive This patch adds a requirement for the "ALTER" permission on a table to run a UpdateTimeToLive on it. UpdateTimeToLive is similar in purpose to UpdateTable, so it makes sense to use the same permission "ALTER" as we do for UpdateTable. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	e76316495c	alternator: add RBAC enforcement to TagResource and UntagResource This patch adds a requirement for the "ALTER" permission on a table to run the TagResource or UntagResource operations on it. CQL does not have an exact parallel of DynamoDB's tagging feature, but our usual use of tags as an extension of UpdateTable to change non-standard options (e.g., write isolation policy or tablets setup), so it makes sense to require the same permissions we require for UpdateTable - namely "ALTER". A test for both operations is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	fda4a9fad8	alternator: add RBAC enforcement to BatchGetItem This patch adds a requirement for the "SELECT" permission on a table to run a BatchGetItem on it. A single batch may ask to write to several different tables, so we fail the entire batch with AccessDeniedException if any of the tables mentioned in the batch do not have SELECT permissions for this role. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:51 +02:00
Nadav Har'El	b02288785f	alternator: add RBAC enforcement to BatchWriteItem This patch adds a requirement for the "MODIFY" permission on a table to run a BatchWriteItem on it. A single batch may ask to write to several different tables, so we fail the entire batch with AccessDeniedException if any of the tables mentioned in the batch do not have MODIFY permissions for this role. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:56:28 +02:00
Nadav Har'El	445a5d57cd	alternator: add RBAC enforcement to UpdateTable This patch adds a requirement for the "ALTER" permission on a table to run a UpdateTable on it. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	b4484158e7	alternator: add RBAC enforcement to Query and Scan This patch adds a requirement for the "SELECT" permission on a table to run a Query or Scan on it. Both Query and Scan operations call the same do_query() function, so the permission checks are put there. Note that Query can read from either the base table or one of its views, and the permissions on the base and each of the views can be separate (so we can allow a role to only read one view, for example). Tests for all of the above are also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	82f7e55943	alternator: add RBAC enforcement to CreateTable This patch adds a requirement for the "CREATE" permission on ALL KEYSPACES to run a CreateTable operation. The CreateTable operation also performs so-called "auto-grant": When a role creates a table, it is automatically granted full permissions to read, write, change or delete that new table. A test for all these things is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	79dfb7b7d5	alternator: add RBAC enforcement to DeleteTable This patch adds a requirement for the "DROP" permission on a table to run a DeleteTable on it. Moreover, when a table and its views are deleted, any special permissions previously GRANTed on this table are removed. This is necessary because if a role creates a table it is automatically granted permissions on this table (this is known as "auto-grant" - see the CreateTable patch for details). If this role deletes this table and later a second role creates a table with the same name, we don't want the first role to have permissions on this new table. Tests for permission enforcements and revocation on delete are also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	2ebc0501b8	alternator: add RBAC enforcement to UpdateItem This patch adds a requirement for the "MODIFY" permission on a table to run a UpdateItem on it. Only the MODIFY permission is required, even if the operation may also read the old value of the item, such as a read-modify-write operation or even using ReturnValues='ALL_OLD'. A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	36d8aea654	alternator: add RBAC enforcement to DeleteItem This patch adds a requirement for the "MODIFY" permission on a table to run a DeleteItem on it. Only the MODIFY permission is required, even if the operation may also read the old value of the item (using ReturnValues='ALL_OLD'). A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	34c975854a	alternator: add RBAC enforcement to PutItem This patch adds a requirement for the "MODIFY" permission on a table to run a PutItem on it. Only the MODIFY permission is required, even if the operation may also read the old value of the item (using ReturnValues='ALL_OLD'). A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	3008b8416c	alternator: add RBAC enforcement to GetItem In this patch, we begin to add role-based access control (RBAC) enforement to Alternator - in this patch only to GetItem. After the preparation of client_state correctly in the previous patch, the permission check itself in the get_item() function is very simple. The bigger part of this patch is a full functional test in test/alternator/test_cql_rbac.py. The test is quite self-explanatory and heavily commented. Basically we check that a new role cannot read with GetItem a pre-existing table, and we can add that ability by GRANTing (in CQL) the new role the ability to SELECT the table, the keyspace, all keyspaces, or add that ability to some other role that this role inherits. In the following patches, we will add role-based access control to the Alternator operations, but the functional tests will be shorter - we don't need to check the role inheritence, "all keyspaces" feature, and so on, for every operation separately since they all use the same underlying checking functions which handles these role inheritence issues in exactly the same way. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	583f060bd8	alternator: stop using an "internal" client_state Scylla uses a "client_state" object to encapsulate the information of who the client is - its IP address, which user was authenticated, and so on. For an unknown reason, Alternator created for each request an "internal" client_state, meaning that supposedly the client for each request was some sort of internal process (e.g., repair) rather than a real client. This was wrong, and we even had a FIXME about not putting the client's IP address in client_state. So in this patch, we start using a normal "external" client_state instead of an "internal" one. The client_state constructors are very different in the two cases, so a few lines of code had to change. I hope that this change will cause no functional changes. For example, Alternator was already setting its own timeouts explicitly and not relying on the default ones for external clients. However, we need to fix this for the following patches which introduce permissions checks (Role-Based Access Control - RBAC) - the client_state methods for checking permissions become no-ops for internal clients (even if the client_state contains an authenticated users). We need these functions to do their job - so we need an external variant of client_state. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Tomasz Grabiec	ab7656a7be	Merge 'replica: fix copy constructor of tablet_sstable_set' from Lakshmi Narayanan Sreethar Commit `9f93dd9fa3` changed `tablet_sstable_set::_sstable_sets` to be a `absl::flat_hash_map` and in addition, `std::set<size_t> _sstable_set_ids` was added. `_sstable_set_ids` is set up in the `tablet_sstable_set(schema_ptr s, const storage_group_manager& sgm, const locator::tablet_map& tmap)` constructor, but it is not copied in `tablet_sstable_set(const tablet_sstable_set& o)`. This affects the `tablet_sstable_set::tablet_sstable_set` method as it depends on the copy constructor. Since sstable set can be cloned when a new sstable set is added, the issue will cause ids not being copied into the new sstable set. It's healed only after compaction, since the sstable set is rebuilt from scratch there. This PR fixes this issue by removing the existing copy constructor of `tablet_sstable_set` to enable the implicit default copy constructor. Fixes #19519 Closes scylladb/scylladb#20115 * github.com:scylladb/scylladb: boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor replica: fix copy constructor of tablet_sstable_set	2024-08-19 00:53:29 +02:00
Avi Kivity	390e01673b	Merge 'Adding batch latency and batch size metrics to Alternator' from Amnon Heiman This patch adds metrics for batch get_item and batch write_item. The new metrics record summary and histogram for latencies and batch size. Batch sizes are implemented as ever-growing counters. To get the average batch size divide the rate of the batch size counter by the rate of the number of batch counter: ```rate(batch_get_item_batch_size)/rate(batch_get_item)``` Relates to #17615 New code, No need to backport Closes scylladb/scylladb#20190 * github.com:scylladb/scylladb: Add tests for Alternator batch operation metrics alternator/executor: support batch latency and size metrics Add metrics for Alternator get and write batch operations	2024-08-18 21:22:39 +03:00
Amnon Heiman	63fdfb89cd	Add tests for Alternator batch operation metrics This patch adds unit tests to verify the correctness of the newly introduced histogram metrics for get and write batch operation latencies. The test uses the existing latency test with the added metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-18 12:19:43 +03:00
Amnon Heiman	d20a333f51	alternator/executor: support batch latency and size metrics This patch Updated the get and write batch operations in Alternator to record latency using the newly added histogram metrics. It adds logic to increment the counters with the number of items processed in each batch. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-18 12:14:23 +03:00
Amnon Heiman	8bad4b44f8	Add metrics for Alternator get and write batch operations Introduced histogram metrics to track latency for Alternator's get and write batch operations. Added counters to record the number of items processed in each batch operation. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-18 12:09:46 +03:00
Lakshmi Narayanan Sreethar	ec47b50859	boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-08-17 23:38:05 +05:30
Lakshmi Narayanan Sreethar	44583eed9e	replica: fix copy constructor of tablet_sstable_set Remove the existing copy constructor to enable the use of the implicit copy constructor. This fixes the issue of `_sstable_set_ids` not being copied in the current copy constructor. Fixes #19519 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-08-17 23:37:58 +05:30
Kefu Chai	3d593ceeb1	perf/perf_sstable: add {crawling,partitioned}_streaming modes for testing the load performance of load_and_stream operation. Refs scylladb/scylladb#19989 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-17 14:43:54 +08:00
Kefu Chai	7806c72e49	test/perf/perf_sstable: use switch-case when appropriate this change is a follow up of `06c60f6ab`, which updated the 2nd step of the test to use switch-case, but missed the 1st step. so this change updates the first step of the test as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-17 14:38:37 +08:00
Pavel Emelyanov	6a9b8ea135	sstable_directory: Coroutinize inner lambdas Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-16 10:45:27 +03:00
Pavel Emelyanov	7401c0ace2	sstable_directory: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-16 10:45:27 +03:00
Pavel Emelyanov	7422504d35	sstable_directory: Coroutinize outer cotinuation chain Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-16 10:45:27 +03:00
Kefu Chai	e8f9f71ef3	test.py: fix the indent and take this opportunity to fix a typo in comment. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-16 13:32:57 +08:00
Kefu Chai	e88166f7a4	test.py: use XPath for iterating in "TestSuite/TestSuite" before this change, we check for the existence of "TestSuite" node under the root of XML tree, and then enumerating all "TestSuite" nodes under this "TestSuite", this approach works. but it * introduces unnecessary indent * is not very readable in this change, we just use "./TestSuite/TestSuite" for enumerating all "TestSuite" nodes under "TestSuite". simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-16 13:32:33 +08:00
Kefu Chai	afee3924b3	s3/client: check for "Key" and "Value" tag in "Tag" XML tag despite that the API document at https://docs.aws.amazon.com/AmazonS3/latest/API/API_Tag.htm claims that both these tags are "Required" in the "Tag" object returned by S3 APIs, we still have to check them before dereferencing the pointer of the child node, as we should not trust the output of an external API. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20160	2024-08-15 20:16:35 +03:00
Andrei Chekun	f24f5b7db2	test.py: Fix boost XML conversion to allure when XML file is empty The method cannot find the TestSuite in the XML file and fails the whole job, however tests are passed. The issue was in incorrect understanding of boost summarization method. It creates one file for all modes, so there is no need to go through all modes to convert the XML file for allure. Closes: https://github.com/scylladb/scylladb/issues/20161 Closes scylladb/scylladb#20165	2024-08-15 20:15:31 +03:00
Benny Halevy	52234214e5	schema_tables: calculate_schema_digest: filter the key earlier Currently, each frozen mutation we get from system_keyspace::query_mutations is unfrozen in whole to a mutation and only then we check its key with the provided `accept_keyspace` function. This is wasteful, since they key can be processed directly form the frozen mutation, before taking the toll of unfreezing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-15 12:33:34 +03:00
Benny Halevy	95a5fba0ea	schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-15 12:33:34 +03:00
Kefu Chai	c628fa4e9e	tools: enhance `scylla sstable shard-of` to support tablets before this change, `scylla sstable shard-of` didn't support tablets, because: - with tablets enabled, data distribution uses the scheduler - this replaces the previous method of mapping based on vnodes and shard numbers - as a result, we can no longer deduce sstable mapping from token ranges in this change, we: - read `system.tablets` table to retrieve tablet information - print the tablet's replica set (list of <host, shard> pairs) - this helps users determine where a given sstable is hosted This approach provides the closest equivalent functionality of `shard-of` in the tablet era. Fixes scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	4291033b14	replica/tablets: extract tablet_replica_set_from_cell() so it can be reused to implement a low-level tool which reads tablets data from sstables Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	e1162e0dae	tools: extract get_table_directory() out the `get_table_directory()` function will have applications beyond its current use in `schema_loader.cc`. its ability to locate the directory storing the sstables of given table could be valuable in other subcommand(s) implementation. so, in this change we extract it out into a dedicated source file, so that it accept the primary_key and an optional clustering_key. Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	a04e0b6c7d	tools: extract read_mutation out the `read_mutation_from_table_offline()` function will have applications beyond its current use in `schema_loader.cc`. its ability to parser mutation data from sstables could be valuable in other subcommand(s) implementation. so, in this change we extract it out into a dedicated source file, so that it accept the primary_key and an optional clustering_key. Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	74a670dd19	build: split the list of source file across multiple line Split the extended list of source files across multiple lines. This improves readability and makes future additions easier to review in diffs. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	3f8f1d7274	tools/scylla-sstable: print warning when running shard-of with tablets the subcommand of "shard-of" does not support tablets yet. so let's print out an error message, instead of printing the mapping assuming that the sstables are distributed based on token only. this commit also adds two more command line options to this subcommand, so that user is required to specify either "--vnodes" or "--tablets" to instruct the tool how the cluster distributes the tokens across nodes and their shards. this helps to minimize the suprise of user. this change prepares for the succeeding changes to implement the tablets support. the corresponding test is updated accordingly so that it only exercises the "shard-of" subcommand without tablets. we will test it with tablets enabled in a succeeding change. Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Laszlo Ersek	baf6ec49ff	utils/tagged_integer: remove conversion to underlying integer Silently converting a tagged (i.e., "dimension-ful") integer to a naked ("dimensionless") integer defeats the purpose of having tagged integers, and is a source of practical bugs, such as <https://github.com/scylladb/scylladb/issues/20080>. We could make the conversion operator explicit, for enforcing static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE) in every conversion location -- but that's a mouthful to write. Instead, remove the conversion operator, and let clients call the (identically behaving) value() member function. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-15 02:12:58 +02:00
Laszlo Ersek	9aa7d232d6	test/raft/randomized_nemesis_test: clean up remaining index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in "test/raft/randomized_nemesis_test.cc": - addition of tagged and untagged (both should be tagged) - taking the minimum of an index difference and a container size (both should be untagged) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	1af3460a81	test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot() With implicit conversion of tagged integers to untagged ones going away, unpack and clean up the relatively complex first_to_remain = max(snap.idx + 1 - preserve_log_entries, 0) calculation in persistence::store_snapshot(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	4dc2faa49a	test/raft/replication: clean up remaining index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly untag the operands / arguments of the following operations, in "test/raft/replication.hh": - assignment to raft_cluster::_seen - call to hasher_int::hash_range() Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	3a32f3de81	test/raft/replication: take an "index_t start_idx" in create_log() raft_cluster::get_states() passes a "start_idx" to create_log(), and create_log() uses it as an "index_t" object. Match the type of "start_idx" to its name. This patch is best viewed with "git show -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	08e117aeb5	test/raft/replication: untag index_t in test_case::get_first_val() In test_case::get_first_val(), the asssignment first_val = initial_snapshots[initial_leader].snap.idx; both relies on implicit conversion of the tagged integer type "index_t" to the underlying "uint64_t", and is a logic bug, as reported at <https://github.com/scylladb/scylladb/issues/20151>. For now, wean the buggy asssignment off the disappearing tagged-to-untaggged conversion. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	6254fca7f5	test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions Properly annotate index_t and term_t constants for use in BOOST_CHECK_EQUAL() and BOOST_CHECK(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	bd4fc85bf0	test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions Properly annotate index_t and term_t constants for use in BOOST_CHECK_EQUAL(), BOOST_CHECK(). Clean up the first args of read_quorum() calls -- stay in term_t space. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	265655473e	test/raft/helpers: tighten compare_log_entries() param types The "from" and "to" parameters of compare_log_entries() are raft log indices; change them to raft::index_t, and update the callers. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Piotr Smaron	3e3858521d	codeowners: add appropriate reviewers to the frontend components	2024-08-14 22:26:35 +02:00
Piotr Smaron	1b2e88b96a	codeowners: fix codeowner names	2024-08-14 22:26:26 +02:00
Laszlo Ersek	5dcc627465	service/raft_sys_table_storage: tweak dead code In raft_sys_table_storage::store_snapshot_descriptor(), the condition preserve_log_entries > snap.idx both relies on implicit conversion of the tagged integer type "index_t" to the underlying "uint64_t", and is a logic bug, as reported at <https://github.com/scylladb/scylladb/issues/20080>. Ticket#20080 explains that this condition always evaluates to false in practice, and that the "else" branch handles all cases correctly anyway. For now, wean the buggy expression off the disappearing tagged-to-untaggged conversion. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 21:35:34 +02:00
Andrei Chekun	3407ae5d8f	[test.py] Add Junit logger for boost test Currently, boost tests aren't using Junit. Enable Junit report output and clean them from skipped test, since boost tests are executed by function name rather than filename. This allows including boost tests result to the Allure report. Related: https://github.com/scylladb/qa-tasks/issues/1665 Closes scylladb/scylladb#19925	2024-08-14 22:18:31 +03:00
Avi Kivity	6d6f93e4b5	Merge 'test/nodetool: enable running nodetool tests under test/nodetool' from Kefu Chai before this change, we assume user runs nodetool tests right under the root source directory. if user runs them under `test/nodetool`, the suppression rules are not applied. as the path is incorrect in that case. after this change, the supression rules' path is deduced from the top src directory. so we can now run the nodetool test under `test/nodetool` . --- no need to backport, this change improves developer's experience. Closes scylladb/scylladb#20119 * github.com:scylladb/scylladb: test/nodetool: deduce subpression path from top srcdir test/nodetool: deduce path from top srcdir	2024-08-14 22:10:38 +03:00
Michał Jadwiszczak	f7eb74e31f	cql3/statements/create_service_level: forbid creating SL starting with `$` Tenant names starting with `$` are reserved for internal ones. Forbid creating new service level which name starts with `$` and log a warning for existing service levels with `$` prefix. Closes scylladb/scylladb#20122	2024-08-14 21:25:31 +03:00
Kefu Chai	5ce07e5d84	build: cmake: add compiler-training target `tools/toolchain/optimized_clang.sh` builds this target for creating the profile in order to build clang optimized with this profile data. so let's be compatible with `configure.py`, and add this target to CMake building system as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20105	2024-08-14 21:21:33 +03:00
Ernest Zaslavsky	f5f65ead1e	Add `.clang-format`, also add CLion build folder to the `.gitignore` file Closes scylladb/scylladb#20123	2024-08-14 21:20:29 +03:00
Pavel Emelyanov	66d72e010c	distributed_loader: Lock table via global table ptr The lock_table() method needs database, ks and cf to find the table on all shards. The same can be achieved with the help of global_table_ptr thing that all the core callers already have at hand. There's a test that doesn't have global table, but it can get one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20139	2024-08-14 20:53:21 +03:00
Pavel Emelyanov	7e3e5cfcad	sstable_directory: Simplify special-purpose local-only constructor Typically the sstable_directory is constructed out of a table object. Some code, namely tests and schema-loader, don't have table at hand and construct directory out of schema, sharder, path-to-sstables, etc. This code doesn't work with any storage options other than local ones, so there's no need (yet) to carry this argument over. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20138	2024-08-14 20:22:50 +03:00
Avi Kivity	28d3b91cce	Merge 'test/perf/perf_sstables: use test_modes as the type of its option' from Kefu Chai before this change, we look up for the mode using the command line option as the key, but that's incorrect if the command line option does not match with any of the known names. in that case, `test_mode` just create another pair of <sstring, test_modes>, and return the second component of this pair. and the second component is not what we expect. we should have thrown an exception. in this change * the test_mode map is marked const. * the overloads for parsing / formatting the `test_modes` type are added, so that boost::program_options can parse and format it. after this change, we print more user friendly error, like ``` /scylla perf-sstable --mode index-foo error: the argument ('index-foo') for option '--mode' is invalid Try --help. ``` instead of a bunch of output which is printed as if we passes the correct option as the argument of the `--mode` option. --- it's an improvement of developer experience, hence no need to backport. Closes scylladb/scylladb#20140 * github.com:scylladb/scylladb: test/perf/perf_sstable: use switch-case when appropriate test/perf/perf_sstables: use test_modes as the type of its option	2024-08-14 20:18:22 +03:00
Piotr Smaron	31cb5b132b	codeowners: remove non contributors	2024-08-14 18:52:25 +02:00
Avi Kivity	3de4e8f91b	Merge 'cql: process LIMIT for GROUP BY select queries' from Paweł Zakrzewski This change fixes #17237, fixes #5361 and fixes #5362 by passing the limit value down the call chain in cql3. A test is also added. fixes #17237 fixes #5361 fixes #5362 The regression happened in 5.4 as we changed the way GROUP BY is processed in `432cb02` - to force aggregation when it is used. The LIMIT value was not passed to aggregations and thus we failed to adhere to it. W want to backport this fix to 5.4 and 6.0 to have continuous correct results for the test case from #17237 This patch consists of 4 commits: - fa4225ea0fac2057b7a9976f57dc06bcbd900cd4 - cql3: respect the user-defined page size in aggregate queries - a precondition for this patch to be implementable - 8fbe69e74dca16ed8832d9a90489ca47ba271d0b - cql3/select_statement: simplify the get_limit function - the `do_get_limit()` function did a lot of legwork that should not be associated with it. This change makes it trivial and makes its callers do additional checks (for unset guards, or for an aggregate query) - 162828194a2b88c22fbee335894ff045dcc943c9 - cql3: process LIMIT for GROUP BY queries - pass the limit value down the chain and make use of it. This is the actual fix to #17237 - b3dc6de6d6cda8f5c09b01463bb52f827a6a00b4 - test/cql-pytest: Add test for GROUP BY queries with LIMIT - tests Closes scylladb/scylladb#18842 * github.com:scylladb/scylladb: test/cql-pytest: Add test for GROUP BY queries with LIMIT cql3: process LIMIT for GROUP BY queries cql3/select_statement: simplify the get_limit function cql3: respect the user-defined page size in aggregate queries	2024-08-14 17:54:59 +03:00
Avi Kivity	8c257db283	Merge 'Native reverse pages over RPC' from Łukasz Paszkowski Drop half-reversed (legacy) format of query::partition_slice. The select query builds a fully reversed (native) slice for reversed queries and use it together with a reversed schema to construct query::read_command that is further propagated to the database. A cluster feature is added to support nodes that still operate on half-reversed slices. When the feature is turned off: - query::read_command is transformed (to have table schema and half-reversed slices) before sending to other nodes - query::read_command is transformed (to have query schema (reversed) and reversed slices) after receiving it from other nodes - Similarly, mutations are transformed. They are reversed before being sent to other nodes or after receiving them from other nodes. Additional manual tests were performed to test a mixed-node cluster: 1. 3-node cluster with one node upgraded: reverse read queries performed on an old node 2. 3-node cluster with one node upgraded: reverse read queries performed on a new node 3. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node 4. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node All reverse read queries above consists of: - single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on - multi-partition reverse reads with range restrictions with optional partition limit and partial ordering The exact same tests were also performed on a fully upgraded cluster. Fixes https://github.com/scylladb/scylladb/issues/12557 Closes scylladb/scylladb#18864 * github.com:scylladb/scylladb: mutation_partition: drop reverse parameter in compact_for_query clustering_key_filter: unify get_ranges and get_native_ranges streamed_mutation_freezer: drop the reverse parameter reverse-reads.md: Drop legacy reverse format information Fix comments refering to half-reversed (legacy) slices select_statement::do_execute: Add tracing informaction query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges query-request: Drop half_reverse_slice as it is no longer used anywhere readers: Use reversed schema and native reversed slices database: accept reversed schema for reversed queries storage_proxy: Support reverse queries in native format query_pagers: Replace _schema with _query_schema query_pagers: Support reverse queries in native format select_statement: Execute reversed query in native format storage_proxy::remote: Add support for mixed-node clusters mutation_query: Add reversed function to reverse reconcilable_result query-request: Add reversed function to reverse read_command features: add native_reverse_queries kl::reader::make_reader: Unify interface with mx::reader::make_reader config: drop reversed_reads_auto_bypass_cache config: drop enable_optimized_reversed_reads	2024-08-14 17:51:56 +03:00
Anna Stuchlik	99be8de71e	doc: set 6.1 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: - 6.1 is the latest version. - 6.1 is removed from the list of unstable versions. It must be merged when ScyllaDB 6.1 is released. No backport is required. Closes scylladb/scylladb#20041	2024-08-14 13:43:17 +02:00
Laszlo Ersek	d87d1ae29d	service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries) With conversion of tagged integers to untagged ones going away, replace static_cast<uint64_t>(snap.idx) with snap.idx.value() Furthermore, casting "preserve_log_entries" (of type "size_t") to "uint64_t" is redundant (both "snap.idx" and "preserve_log_entries" carry nonnegative values, and the mathematical difference is expected to be nonnegative); remove the cast. Finally, simplify the initialization syntax. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	e781046739	service/raft_sys_table_storage: untag index_t and term_t for queries With implicit conversion of tagged integers to untagged ones going away, explicitly untag index_t and term_t values in the following two contexts: - when they are passed to CQL queries as int64_t, - when they are default-constructed as fallbacks for int64_t fields missing from CQL result sets. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	4f1f207be1	raft/server: clean up index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in "raft/server.cc": - addition of tagged and untagged (both should be tagged) - subscripting an array by tagged (should be untagged) - comparing a size-like threshold against tagged (should be untagged) - exposing tagged via gauges (should be untagged) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	1b134d52ac	raft/tracker: don't drop out of index_t space for subtraction Tagged integers support subtraction; use it. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	b6233209d9	raft/fsm: clean up index_t and term_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in "raft/fsm.cc": - addition of tagged and untagged (both should be tagged) - comparison (relop) between tagged an untagged (both should be tagged) - subscripting or sizing an array by tagged (should be untagged) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	5b9a4428c6	raft/log: clean up index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in raft/log.{cc,h}: - addition of tagged and untagged (both should be tagged) - comparison (relop) between tagged an untagged (both should be tagged) - subscripting an array, or offsetting an iterator, by tagged (should be untagged) - comparing an array bound against tagged (should be untagged) - subtracting tagged from an array bound (should be untagged) Note: these files mix uniform initialization syntax (index_t{...}) with constructor call syntax (index_t()), with the former being more frequent. Stick with the former here too, for consistency. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	9e95f3a198	db/system_keyspace: promise a tagged integer from increment_and_get_generation() Internally, increment_and_get_generation() produces a "gms::generation_type" value. In turn, all callers of increment_and_get_generation() -- namely scylla_main() [main.cc] and single_node_cql_env::run_in_thread() [test/lib/cql_test_env.cc] -- pass the resolved value to storage_service::init_address_map() and storage_service::join_cluster(), both of which take a "gms::generation_type". Therefore it is pointless to "untag" the generation value temporarily between the producer and the consumers. Correct the return type of increment_and_get_generation(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	baccbc09c5	gms/gossiper: return "strong_ordering" from compare_endpoint_startup() The callers of gossiper::compare_endpoint_startup() need not (should not) learn of any particular (tagged or untagged) difference of generations; they only care about the ordering of generations. Change the return type of compare_endpoint_startup() to "std::strong_ordering", and delegate the comparison to tagged_tagged_integer::operator<=>. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	3bb608056c	gms/gossiper: get "int32_t" value of "gms::version_type" explicitly In do_sort(), we need to drop to "int32_t" temporarily, so that we can call ::abs() on the version difference. Do that explicitly. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Michał Chojnowski	4d77faa61e	cql_test_env: ensure shutdown() before stop() for system_keyspace If system_keyspace::stop() is called before system_keyspace::shutdown(), it will never finish, because the uncleared shared pointers will keep it alive indefinitely. Currently this can happen if an exception is thrown before the construction of the shutdown() defer. This patch moves the shutdown() call to immediately before stop(). I see no reason why it should be elsewhere. Fixes scylladb/scylla-enterprise#4380 Closes scylladb/scylladb#20089	2024-08-14 12:16:44 +03:00
Kefu Chai	06c60f6abe	test/perf/perf_sstable: use switch-case when appropriate instead of using a chain of `if-else`, use switch-case instead, it's visually easier to follow than `if`-`else` blocks. and since we never need to handle the `else` case, the `throw` statement is removed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-14 17:14:42 +08:00
Kefu Chai	5141c6efe0	test/perf/perf_sstables: use test_modes as the type of its option before this change, we look up for the mode using the command line option as the key, but that's incorrect if the command line option does not match with any of the known names. in that case, `test_mode` just create another pair of <sstring, test_modes>, and return the second component of this pair. and the second component is not what we expect. we should have thrown an exception. in this change * the test_mode map is marked const. * the overloads for parsing / formatting the `test_modes` type are added, so that boost::program_options can parse and format it. after this change, * we can print more user friendly error, like ``` /scylla perf-sstable --mode index-foo error: the argument ('index-foo') for option '--mode' is invalid Try --help. ``` instead of a bunch of output which is printed as if we passes the correct option as the argument of the `--mode` option. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-14 17:14:42 +08:00
Dawid Medrek	4ba9cb0036	README: Update the version of C++ to C++23 Scylla has started being built with C++23. We update the information in the relevant documents accordingly. Closes scylladb/scylladb#20134	2024-08-14 12:06:23 +03:00
Kamil Braun	a3d53bd224	Merge 'Prevent ALTERing non-existing KS with tablets' from Piotr Smaron ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 (I suggest to disable displaying whitespace differences when reviewing this PR). Fixes: scylladb/scylladb#19576 Closes scylladb/scylladb#19666 * github.com:scylladb/scylladb: tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist cql: refactor rf_change indentation Prevent ALTERing non-existing KS with tablets	2024-08-14 10:27:41 +02:00
Piotr Smaron	ddb5204929	tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist Using the error injection framework, we inject a sleep into the processing path of ALTER tablets KS, so that the topology coordinator of the leader node sleeps after the rf_change event has been scheduled, but before it is started to be executed. During that time the second node executes a DROP KS statement, which is propagated to the leader node. Once leader node wakes up and resumes processing of ALTER tablets KS, the KS won't exist and the node cannot crash, which was the case before.	2024-08-13 21:51:51 +02:00
Pavel Emelyanov	05adee4c82	test: Add test for s3::client::bucket_lister Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 21:15:43 +03:00
Pavel Emelyanov	a02e65c649	s3_client: Add bucket lister The lister resembles the directory_lister from util -- it returns entries upon its .get() invocation, and should be .close()d at the end. Internally the lister issues ListObjectsV2 request with provided prefix and limits the server with the amount of entries returned not to consume too much local memory (we don't have streaming XML parser for response). If the result is indeed truncated, the subsequent calls include the continuation token as per [1] [1] https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 21:15:43 +03:00
Avi Kivity	d82fd8b5f0	Merge 'Relax sstable_directory::process_descriptor() call graph' from Pavel Emelyanov The method logic is clean and simple -- load sstable from the descriptor and sort it into one of collections (local, shared, remote, unsorted). To achieve that there's a bunch of helper methods, but they duplicate functionality of each other. Squashing most of this code into process_descriptor() makes it easier to read and keeps sstable_directory private API much shorter. Closes scylladb/scylladb#20126 * github.com:scylladb/scylladb: sstable_directory: Open-code load_sstable() into process_descriptor() sstable_directory: Squash sort_sstable() with process_descriptor() sstable_directory: Remove unused sstable_filename(desc) helper sstable_directory: Log sst->get_filename(), not sstable_filename(desc) sstable_directory: Keep loaded sst in local var sstable_directory: Remove unused helpers sstable_directory: Load sstable once when sorting	2024-08-13 16:42:52 +03:00
Pavel Emelyanov	d3870304a9	sstable_directory: Open-code load_sstable() into process_descriptor() There are two load_sstable() overloads, and one of them is only used inside process_descriptor(). What this loading helper does is, in fact, processes given descriptor, so it's worth having it open-coded into its caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 13:27:00 +03:00
Pavel Emelyanov	da4a5df339	sstable_directory: Squash sort_sstable() with process_descriptor() The latter (caller) loads sstable, so does the former, so load it once and then put it in either list/set, depending on flags and shard info. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 13:26:10 +03:00
Pavel Emelyanov	d8cb175fb7	sstable_directory: Remove unused sstable_filename(desc) helper It's unused after previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	aa40aeb72f	sstable_directory: Log sst->get_filename(), not sstable_filename(desc) There are some places that log sstable Data file name via sstable descriptor. After previous patching all those loggers have sstable at hand and can use sstable::get_filename() instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	369f9111b8	sstable_directory: Keep loaded sst in local var This will make next patch shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	ad3725fbbd	sstable_directory: Remove unused helpers After previous patch some wrappers around load_sstable() became unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	63f1969e08	sstable_directory: Load sstable once when sorting In order to decide which list to put sstable into, the sort_sstable() first calls get_shards_for_this_sstable() which loads the sstable anyway. If loaded shards contain only the current one (which is the common case) sstable is loaded again. In fact, if the sstable happens to be remote it's loaded anyway to get its open info. Fix that by loading sstable, then getting shards directly from it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:16 +03:00
Łukasz Paszkowski	ba2f037af5	mutation_partition: drop reverse parameter in compact_for_query The reverse parameter is no longer used with native reverse reads. The row ranges are provided in native reverse order together with a reversed schema, thus the reverse parameter remain false all the time and can be droped.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	43221bbeed	clustering_key_filter: unify get_ranges and get_native_ranges When a reverse slice is provided, it is given in the native reverse format. Thus the ranges will be returned in the same order as stored in the slice. Therefore there is no need to distinguish between get_ranges and get_native_ranges. The latter one gets dropped and get_ranges returns ranges in the same order as stored in the slice.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	8b5ec0e963	streamed_mutation_freezer: drop the reverse parameter The reverse parameter is no longer used with native reverse reads. A reversed schema is provided and thus the reverse parameter shall remain false all the time.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	f4ca734ccb	reverse-reads.md: Drop legacy reverse format information	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	b3bf555036	Fix comments refering to half-reversed (legacy) slices	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	15a01c7111	select_statement::do_execute: Add tracing informaction Add information on table and query schema versions to tracing.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	158b994676	query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges Simplify implementation and for clustering key ranges in native reversed format, require a reversed table schema. Trimming native reversed clustering key ranges requires a reversed schema to be passed in. Thus, the reverse flag is no longer required as it would always be set to false.	2024-08-13 10:07:10 +02:00
Łukasz Paszkowski	8d95d44027	query-request: Drop half_reverse_slice as it is no longer used anywhere	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	da95f44adc	readers: Use reversed schema and native reversed slices The reconcilable_result is built as it would be constructed for forward read queries for tables with reversed order. Mutations constructed for reversed queries are consumed forward. Drop overloaded reversed functions that reverse read_command and reconcilable_result directly and keep only those requiring smart pointers. They are not used any more.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	faa62310d9	database: accept reversed schema for reversed queries Remove schema reversing in query() and query_mutations() methods. Instead, a reversed schema shall be passed for reversed queries. Rename a schema variable from s into query_schema for readability.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	df734e35a1	storage_proxy: Support reverse queries in native format For reversed queries, query_result() method accepts a reversed table schema and read_command with a query schema version and a slice in native reversed format. Support mixed-node clusters. In such a case, the feature flag native_reverse_queries is disabled and the read_command in sent to replicas in the old regacy format (stores table schema version and a slice in the legacy reverse format). After the reconciliation, for the read+repair case, un-reversed mutations are sent to replicas, i.e. forward ones.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	d9e76a5295	query_pagers: Replace _schema with _query_schema For readability purposes. As the constructor accepts a query schema, let the varaible holding a schema be called _query_schema.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	0b2e5ff28f	query_pagers: Support reverse queries in native format For reversed queries, accept a reversed table schema and read_command with a query schema version and a slice in native reversed format.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	309ba68692	select_statement: Execute reversed query in native format Use a reversed schema and a native reversed slice when constructing a read_command and executing a reversed select statement. Such a created read_command is passed further down to query_pagers::pager and storage::proxy::query_result that transform it to the format they accept/know, i.e. lagacy.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	8c391a8ebe	storage_proxy::remote: Add support for mixed-node clusters In handle_read, detect whether a coming read_command is in the legacy reversed format or native reversed format. The result will be used to transform the read_command between format as well as to transforms the results before they are send back to the coordinator.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	fbd324b5cd	mutation_query: Add reversed function to reverse reconcilable_result The reconcilable_result is reversed by reversing mutations for all paritions it holds. Reversing is asynchronous to avoid potential stall. Use for transitions between legacy and native formats and in order to support mixed-nodes clusters.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	b91edbacf1	query-request: Add reversed function to reverse read_command The read_command is reversed by reversing the schema version it holds and transforming a slice from the legacy reversed format to the native reversed format. Use for trasition between format and to support mixed-nodes clusters	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	9690785112	features: add native_reverse_queries Enabled when all replicas support the native_reversed command slice and return the result in reverse order in this case.	2024-08-13 10:03:42 +02:00
Łukasz Paszkowski	7b201e9165	kl::reader::make_reader: Unify interface with mx::reader::make_reader Ensure both readers have the same interfaces to avoid mistakes as both readers are used in sstable::make_reader. Less error prone.	2024-08-13 10:02:43 +02:00
Łukasz Paszkowski	b270097f1f	config: drop reversed_reads_auto_bypass_cache Reverse reads have already been with us for a while, thus this back door option to bypass in-memory data cache for reversed queries can be retired.	2024-08-13 10:02:42 +02:00
Łukasz Paszkowski	80df313f49	config: drop enable_optimized_reversed_reads Reverse reads have already been with us for a while, thus this back door option to read entire paritions forward and reversing them after can be retired.	2024-08-13 10:02:42 +02:00
Pavel Emelyanov	6675bd8a5c	s3_client: Encode query parameter value for query-string When signing AWS query one need to prepare "query string" which is a line looking like `encode(query_param)=encode(query_value)&...`. Encoded are only the query parameter names and values. It was missing in current code and so far worked because no encodable characters were used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 10:59:31 +03:00
Raphael S. Carvalho	74612ad358	tablets: Fix race between repair and split Consider the following: T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes #19378. Fixes #19416. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-08-12 17:28:51 -03:00
Raphael S. Carvalho	239344ab55	compaction: Allow "offline" sstable to be split In order to fix the race between split and repair, we must introduce the ability to split an "offline" sstable, one that wasn't added to any of the table's sstable set yet. It's not safe to split a sstable after adding it to the set, because a failure to split can result in unsplit data left in the set, causing split to fail down the road, since the coordinator thinks this replica has only split data in the set. Refs #19378. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-08-12 17:27:16 -03:00
Laszlo Ersek	607abe96e8	test/sstable: merge test_using_reusable_sst*() All lambdas passed to test_using_reusable_sst() conform to the prototype void (test_env&, sstable_ptr) All lambdas passed to test_using_reusable_sst_returning() conform to the prototype NON_VOID (test_env&, sstable_ptr) The common parameter list of both prototypes can be expressed with the concept std::invocable<test_env&, sstable_ptr> Once a "Func" template parameter (i.e., function type) satisfying this concept is taken, then "Func"'s void or non-void return type can be commonly expressed with std::invoke_result_t<Func, test_env&, sstable_ptr> In turn, test_env::do_with_async_returning<...> can be instantiated with this return type, even if it happens to be "void". ([stmt.return] specifies, "[a] return statement with an operand of type void shall be used only in a function that has a cv void return type", meaning that return func(env) will do the right thing in the body of test_env::do_with_async_returning<void>().) Merge test_using_reusable_sst() and test_using_reusable_sst_returning() into one. Preserve the function name from the former, and the test_env::do_with_async_returning<...>() call from the latter. Suggested-by: Avi Kivity <avi@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20090	2024-08-12 17:52:01 +03:00
Kefu Chai	db4654ca49	test/nodetool: deduce subpression path from top srcdir there are chances that developer launch `pytest` right under `test/nodetool`, in that case current working directory is not the root directory of the project, so the path to suppression rules does not point to a file. to cater the needs to run the test under `test/nodetool`, let's use the path deduced from the top_srcdir. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-12 22:50:18 +08:00
Kefu Chai	c817e13d63	test/nodetool: deduce path from top srcdir add a helper to get path from top src dir, more readable this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-12 22:50:18 +08:00
Nikos Dragazis	90363ce802	test: Test the SSTable validation API against malformed SSTables Unit testing for the SSTable validation API happens in `sstable_validate_test`. Currently, this test checks the API against some invalid SSTables with out-of-order clustering rows and out-of-order partitions. However, both are types of content-level corruption that do not trigger `malformed_sstable_exception` errors. Extend the test to cover cases of file-level corruption as well, i.e., cases that would raise a `malformed_sstable_exception`. Construct an SSTable with an invalid checksum to trigger this. This is part of the effort to improve scrub to handle all kinds of corruption. Fixes scylladb/scylladb#19057 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#20096	2024-08-12 15:09:58 +03:00
Botond Dénes	fec57c83e6	Merge 'cell_locker: maybe_rehash: ignore allocation failures' from Benny Halevy `maybe_rehash` is complimentary and is not strictly require to succeed. If it fails, it will retry on the next call, but there's no reason to throw an exception that will fail its caller, since `maybe_rehash` is called as the final step after the caller has already succeeded with its action. Minor enhancement for the error path, no backport required. Closes scylladb/scylladb#19910 * github.com:scylladb/scylladb: cell_locker: maybe_rehash: reindent cell_locker: maybe_rehash: ignore allocation failures	2024-08-12 10:54:56 +03:00
Kefu Chai	0ae04ee819	build: cmake: use $<CONFIG:cfgs> when appropriate per https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#genex:CONFIG, `cfgs` can be a comma-separated list. this is supported by CMake 3.19 and up, and our minimum required CMake version is 3.27. so let's switch over from the composition of `IN_LIST` and `CONFIG` generator expressions to a single one. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20110	2024-08-11 21:28:38 +03:00
Avi Kivity	318278ff92	Merge 'tablets: reload only changed metadata' from Botond Dénes Currently, each change to tablet metadata triggers a full metadata reload from disk. This is very wasteful, especially if the metadata change affects only a single row in the `system.tablets` table. This is the case when the tablet load balancer triggers a migration, this will affect a single row in the table, but today will trigger a full reload. We expect tablet count to potentially grow to thousands and beyond and the overhead of this full reload can become significant. This PR makes tablet metadata reload partial, instead of reloading all metadata on topology or schema changes, reload only the partitions that are affected by the change. Copy the rest from the in-memory state. This is done with two passes: first the change mutations are scanned and a hint is produced. This hint is then passed down to the reload code, which will use it to only reload parts (rows/partitions) of the metadata that has actually changed. The performance difference between full reload and partial reload is quite drastic: ``` INFO 2024-07-25 05:06:27,347 [shard 0:stat] testlog - Tablet metadata reload: full 616.39ms partial 0.18ms ``` This was measured with the modified (by this PR) `perf_tablets`, which creates 100 tables, each with 2K tablets. The test was modified to change a single tablet, then do a full and partial reload respectively, measuring the time it takes for reach. Fixes: #15294 New feature, no backport needed. Closes scylladb/scylladb#15541 * github.com:scylladb/scylladb: test/perf/perf_tablets: add tablet metadata reload perf measurement test/boost/tablets_test: add test for partial tablet metadata updates db/schema_tables: pass tablet hint to update_tablet_metadata() service/storage_service: load_tablet_metadata(): add hint parameter service/migration_listener: update_tablet_metadata(): add hint parameter service/raft/group0_state_machine: provide tablet change hint on topology change service/storage_service: topology_state_load(): allow providing change hint replica/tablets: add update_tablet_metadata() replica/tablets: fix indentation replica/tablets: extract tablet_metadata builder logic replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint() locator/tablets: add tablet_map::clear_tablet_transition_info() locator/tablets: make tablet_metadata cheap to copy mutation/canonical_mutation: add key()	2024-08-11 21:27:18 +03:00
Botond Dénes	2b2db510b7	test/perf/perf_tablets: add tablet metadata reload perf measurement Measure reload perf of full reload vs. partial reload, after changing a single tablet. While at it, modify the `--tablets-per-table` parameter, so that it has a default parameter which works OOTB. The previous default was both too large (causing oversized commitlog entry errors) and not a power of two.	2024-08-11 09:53:19 -04:00
Botond Dénes	65eee200b2	test/boost/tablets_test: add test for partial tablet metadata updates	2024-08-11 09:53:19 -04:00
Botond Dénes	b886ed44a7	db/schema_tables: pass tablet hint to update_tablet_metadata() Replace the has_tablet_mutations in `merge_tables_and_views()` with a hint parameter, which is calculated in the caller, from the original schema change mutations. This hint is then forwarded to the notifier's `update_tablet_metadata()` so that subscribers can refresh only the tablet partitions that changed.	2024-08-11 09:53:19 -04:00
Botond Dénes	5bff422b54	service/storage_service: load_tablet_metadata(): add hint parameter Allowing for reloading only those parts of the tablet metadata that were actually changed.	2024-08-11 09:53:19 -04:00
Botond Dénes	2cec0d8dd1	service/migration_listener: update_tablet_metadata(): add hint parameter The hint contains information related to what exactly changed, allowing listeners to do partial updates, instead of reloading all metadata on each notification.	2024-08-11 09:53:19 -04:00
Botond Dénes	ca302d9e28	service/raft/group0_state_machine: provide tablet change hint on topology change So that when reloading tablet state metadata from the disk, only the changed parts are reloaded.	2024-08-11 09:53:19 -04:00
Botond Dénes	806ec3244a	service/storage_service: topology_state_load(): allow providing change hint So that when reloading state from disk, only changed parts are reloaded instead of all. For now, only tablets have hints implemented.	2024-08-11 09:53:18 -04:00
Botond Dénes	bb1e733fe0	replica/tablets: add update_tablet_metadata() Allows updateng tablet metadata in-place, according to the provided hint, reading and updating only the parts that actually changed.	2024-08-11 09:52:37 -04:00
Botond Dénes	66292b4baa	replica/tablets: fix indentation Left broken from the previous patch.	2024-08-11 09:52:37 -04:00
Botond Dénes	aa378c458e	replica/tablets: extract tablet_metadata builder logic So it can be reused in a new method. Indentation is left broken deliberately, to make the patch easier to read.	2024-08-11 09:52:37 -04:00
Botond Dénes	f5976aa87b	replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint() Extract a hint of what a tablet mutation changed. The hint can be later used to selectively reload only the changed parts from disk. Two variants are added: * get_tablet_metadata_change_hint() - extracts a hint from a list of tablet mutations * update_tablet_metadata_change_hint() - updates an existing hint based on a single mutation, allowing for incremental hint extraction	2024-08-11 09:52:37 -04:00
Botond Dénes	54ea71f8a6	locator/tablets: add tablet_map::clear_tablet_transition_info()	2024-08-11 09:52:37 -04:00
Botond Dénes	0254cfc7d3	locator/tablets: make tablet_metadata cheap to copy Keep lw_shared_ptr<tablet_map> in the tablet map and use COW semantics. To prevent accidental changes to shared tablet_map instances, all modifications to a tablet_map have to go through a new `mutate_tablet_map()` method, which implements the copy-modify-swap idiom.	2024-08-11 09:52:37 -04:00
Botond Dénes	fb0ab3c1fb	mutation/canonical_mutation: add key() Extracts the partition key without deserializing the entire mutation.	2024-08-11 09:52:37 -04:00
Calle Wilund	e18a855abe	extensions: Add exception types for IO extensions and handle in memtable write path Fixes #19960 Write path for sstables/commitlog need to handle the fact that IO extensions can generate errors, some of which should be considered retry-able, and some that should, similar to system IO errors, cause the node to go into isolate mode. One option would of course be for extensions to simply generate std::system_errors, with system_category and appropriate codes. But this is probably a bad idea, since it makes it more muddy at which level an error happened, as well as limits the expressibility of the error. This adds three distinct types (sharing base) distinguishing permission, availabilty and configuration errors. These are treated akin to EACCESS, ENOENT and EINVAL in disk error handler and memtable write loop. Tests updated to use and verify behaviour. Closes scylladb/scylladb#19961	2024-08-11 13:52:35 +03:00
Raphael S. Carvalho	75829d75ec	replica: Fix race between split compaction and migration After removal of rwlock (`53a6ec05ed`), the race was introduced because the order that compaction groups of a tablet are closed, is no longer deterministic. Some background first: Split compaction runs in main (unsplit) group, and adds sstable to left and right groups on completion. The race works as follow: 1) split compaction starts on main group of tablet X 2) tablet X reaches cleanup stage, so its compaction groups are closed in parallel 3) left or right group are closed before main (more likely when only main has flush work to do) 4) split compaction completes, and adds sstable to left and right 5) if e.g left is closed, adjusting backlog tracker will trigger an exception, and since that happens in row cache update's execute(), node crashes. The problem manifested as follow: [shard 0: gms] raft_topology - Initiating tablet cleanup of 5739b9b0-49d4-11ef-828f-770894013415:15 on 102a904a-0b15-4661-ba3f-f9085a5ad03c:0 ... [shard 0:strm] compaction - [Split keyspace1.standard1 009e2f80-49e5-11ef-85e3-7161200fb137] Splitting [/var/lib/scylla/data/keyspace1/...] ... [shard 0:strm] cache - Fatal error during cache update: std::out_of_range (Compaction state for table [0x600007772740] not found), at: ... -------- seastar::continuation<seastar::internal::promise_base_with_type<void>, row_cache::do_update(... -------- seastar::internal::do_with_state<std::tuple<row_cache::external_updater, std::function<seastar::future<void> ()> >, seastar::future<void> > -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::(anonymous namespace)::thread_wake_task -------- seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::async<sstables::compaction::run(... seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::future<sstables::compaction_resu... From the log above, it can be seen cache update failure happens under streaming sched group and during compaction completion, which was good evidence to the cause. Problem was reproduced locally with the help of tablet shuffling. Fixes: #19873. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19987	2024-08-11 11:00:19 +03:00
Botond Dénes	1f4b9a5300	Merge 'compaction: drop compaction executors' possibility to bypass task manager' from Aleksandra Martyniuk If parent_info argument of compaction_manager::perform_compaction is std::nullopt, then created compaction executor isn't tracked by task manager. Currently, all compaction operations should by visible in task manager. Modify split methods to keep split executor in task manager. Get rid of the option to bypass task manager. Closes scylladb/scylladb#19995 * github.com:scylladb/scylladb: compaction: replace optional<task_info> with task_info param compaction: keep split executor in task manager	2024-08-11 10:26:43 +03:00
Botond Dénes	0bb1075a19	Merge 'tasks: fix task handler' from Aleksandra Martyniuk There are some bugs missed in task handler: - wait_for_task does not wait until virtual tasks are done, but returns the status immediately; - wait_for_task suffers from use after return; - get_status_recursively does not set the kind of task essentials. Fix the aforementioned. Closes scylladb/scylladb#19930 * github.com:scylladb/scylladb: test: add test to check that task handler is fixed tasks: fix task handler	2024-08-11 10:23:17 +03:00
Paweł Zakrzewski	9db272c949	test/cql-pytest: Add test for GROUP BY queries with LIMIT Remove xfail from all tests for #5361, as the issue is fixed. Remove xfail from test_group_by_clustering_prefix_with_limit It references #5362, but is fixed by #17237. Refs #17237	2024-08-11 09:08:44 +02:00
Paweł Zakrzewski	e7ae7f3662	cql3: process LIMIT for GROUP BY queries Currently LIMIT not passed to the query executor at all and it was just an accident that it worked for the case referenced in #17237. This change passes the limit value down the chain.	2024-08-11 09:08:43 +02:00
Paweł Zakrzewski	3838ad64b3	cql3/select_statement: simplify the get_limit function The get_limit() function performed tasks outside of its scope - for example checked if the statement was an aggregate. This change moves the onus of the check to the caller.	2024-08-11 09:08:43 +02:00
Paweł Zakrzewski	08f3219cb8	cql3: respect the user-defined page size in aggregate queries The comment in the code already states that we should use the user-defined page size if it's provided. To avoid OOM conditions we'll use the internally defined limit as the upper bound or if no page size is provided. This change lays ground work for fixing #5362 and is necessary to pass the test introduced in #19392 once it is implemented.	2024-08-11 09:08:43 +02:00
Michał Jadwiszczak	3745d0a534	gms/feature_service: allow to suppress features This patch adds `suppress_features` error injection. It allows to revoke support for some features and it can be used to simulate upgrade process in test.py. Features to suppress are passed as injection's value, separated by `;`. Example: `PARALLELIZED_AGGREGATION;UDA_NATIVE_PARALLELIZED_AGGREGATION` Fixes scylladb/scylladb#20034 Closes scylladb/scylladb#20055	2024-08-09 19:15:19 +02:00
Kefu Chai	a78f46aad7	s3/client: customize options for input_stream before this change, we use the default options for performing read on the input. and the default options is like ```c++ struct file_input_stream_options { size_t buffer_size = 8192; ///< I/O buffer size unsigned read_ahead = 0; ///< Maximum number of extra read-ahead operations }; ``` which is not able to offer good throughput when reading from disk, when we stream to S3. so, in this change, we use options which allows better throughput. Refs `061def001d` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20074	2024-08-09 11:52:30 +03:00
Dawid Medrek	e5d01d4000	db/hints: Make commitlog use commitlog IO scheduling group Before these changes, we didn't specify which I/O scheduling group commitlog instances in hinted handoff should use. In this commit, we set it explicitly to the commitlog scheduling group. The rationale for this choice is the fact we don't want to cause a bottleneck on the write path -- if hints are written too slowly, new incoming mutations (NOT hints) might be rejected due to a too high number of hints currently being written to disk; see `storage_proxy::create_write_response_handler_helper()` for more context. Fixes scylladb/scylladb#18654 Closes scylladb/scylladb#19170	2024-08-08 16:14:07 +02:00
Piotr Dulikowski	b72906518f	Merge 'service levels: update connections parameters automatically' from Michał Jadwiszczak This patch makes all cql connections update theirs service level parameters automatically when: - any service level is created or changed - one role is granted to another - any service level is attached to/detached from a role First of all, the patch defines what a service level and an effective service level are `938aa10509`. No new type of service levels are introduced, the commit only clarifies definitions and names what an effective service level is. (Effective service level is created by merging all service levels which are attached to all roles granted to the user. It represents exact values of connection's parameters.) Previously, to find an effective service level of a user, it required O(n) internal queries: O(n) queries to recursively find all granted roles (`standard_role_manager::query_granted()`) and a query for each role to get its service level (`standard_role_manager::get_attribute()`, which sums to O(n) queries). Because we want to reload SL parameters for all opened cql connections, we don't want to do O(n) queries for every connection, every time we create or change any service level/grant one role to another/attach or detach a service level to/from a role. To speed it up, the patch adds another layer of service level controller cache, which stored `role_name -> effective_service_level` mapping. This way finding a effective service level for a role is only a lookup to a map. Building the new cache requires only 2 queries: one to obtain all role hierarchy one to get all roles' service level. Fixes scylladb/scylladb#12923 Closes scylladb/scylladb#19085 * github.com:scylladb/scylladb: test/auth_cluster/test_raft_service_levels: add test for automatic connection update api/cql_server_test: add CQL server testing API transport/cql_server: subscribe to sl effective cache reloaded transport/controller: coroutinize `subscribe_server` and `unsubscribe_server` transport/cql_server: add method to update service level params on all connections generic_server: use async function in `for_each_gently()` service/qos/sl_controller: use effective service levels cache service/qos/service_level_controller: notify subscribers on effective cache reloaded service/raft/group0_state_machine: update effective service levels cache service/topology_coordinator: migrate service levels before auth service/qos/service_level_controller: effective service levels cache utils/sorting: allow to pass any container as verticies service/qos/service_level_controller: replace shard check to assert service/qos: define effective service level service/qos/qos_common: use const reference in `init_effective_names()` service/qos/service_level_controller: remove unused field auth: return map of directly granted roles test/auth/test_auth_v2_migration: create sl1 in the test	2024-08-08 15:31:04 +02:00
Anna Stuchlik	a1b4357765	doc: update Raft info in 6.1 This commit updates the Raft information regarding the Raft verification procedure. In 6.1, the procedure is no longer related to the upgrade. Fixes https://github.com/scylladb/scylladb/issues/19932 Closes scylladb/scylladb#20040	2024-08-08 11:25:50 +02:00
PeterFlockhart	0f9c6d24cf	Update SELECT grammar to define group_by_clause explicitly Closes scylladb/scylladb#20046	2024-08-08 12:23:20 +03:00
Avi Kivity	12c68bcf75	Merge 'querier: include cell stats in page stats' from Botond Dénes We have two mechanism to give visibility into reads having to process many tombstones: * a warning in the logs, triggered if a read processed more the `tombstone_warn_threshold` dead rows/tombstones * a trace message, which includes stats of the amount of rows in the page, including the amount of live and dead rows as well as tombstones This series extends this to also include information on cells, so we have visibility into the case where a read has to process an excessive amount of cell tombstones (mainly because of collections). A log line is now also logged if the amount of dead cells/tombstones in the page exceeds `tombstone_warn_threshold`. The trace message is also extended to contain cell stats. The `tombstone_warn_threshold` log lines now receive a 10s rate-limit to avoid excessive log spamming. The rate-limit is separate for the row and cell logs. Example of the new log line (`tombstone_warn_threshold=10` ): ``` WARN 2024-05-30 07:56:44,979 [shard 0:stmt] querier - Read 98 live cells and 126 dead cells/tombstones for system_schema.scylla_tables <partition-range-scan> (-inf, +inf) (see tombstone_warn_threshold) ``` Example of the new tracing message: ``` Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 13 cell(s) (1 live, 12 dead) [shard 0] \| 2024-05-30 08:13:19.690803 \| 127.0.0.1 \| 6114 \| 127.0.0.1 ``` Fixes: https://github.com/scylladb/scylladb/issues/18996 Improvement, not a backport candidate. Closes scylladb/scylladb#18997 * github.com:scylladb/scylladb: test/boost: mutation_test: add test for cell compaction stats mutation/compact_and_expire_result: drop operator bool() querier: consume_page(): add rate-limiting to tombstone warnings querier: consume_page(): add cell stats to page stats trace message querier: consume_page(): add tombstone warning for cell tombstones querier: consume_page(): extract code which logs tombstone warning mutation/mutation_compactor: collect and aggregate cell compaction stats mutation: row::compact_and_expire(): use compact_and_expire_result collection_mutation: compact_and_expire(): use compact_and_expire_result mutation: introduce compact_and_expire_result	2024-08-08 12:16:13 +03:00
Calle Wilund	d6742e9bce	distributed_loader: Remove load_prio_keyspaces Fixes #13334 All required code paths (see enterprise) now uses extensions::is_extension_internal_keyspace. The old mechanism can be removed. One less global var. Closes scylladb/scylladb#20047	2024-08-08 12:10:27 +03:00
Avi Kivity	db77b5bd03	Merge 'convert the rest of `test/boost/sstable_test.cc` to co-routines and seastar::thread' from Laszlo Ersek This is a followup to #19937, for #19803. See in particular [this comment](https://github.com/scylladb/scylladb/issues/19803#issuecomment-2258371923). The primary conversion target is coroutines. However, while coroutines are the most convenient style, they are only infrequently usable in this case, for the following reasons: - Wherever we have a `future::finally()` that calls a cleanup function that returns a future (which must be awaited), we cannot use `co_await`. We can only use `seastar::async()` with `deferred_close` or `defer()`. - The code passes lots of lambdas, and `co_await` cannot be used in lambdas. First, I tried, and the compiler rejects it; second, a capturing lambda that is a coroutine is a trap [[1]](https://devblogs.microsoft.com/oldnewthing/20211103-00/?p=105870) [[2]](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rcoro-capture). In most cases, I didn't have to use naked `seastar::async()`; there were specialized wrappers in place already. Thus, most of the changes target `seastar::thread` context under existent `seastar::async()` wrappers, and only a few functions end up as coroutines. The last patch in the series (`test/sstable: remove useless variable from promoted_index_read()`) is an independent micro-cleanup, the opportunity for which I thought to have noticed while reading the code. The tail of `test/boost/sstable_test.cc` (the stuff following `promoted_index_read()`) is already written as `seastar::thread`. That's already better (for readability) than future chaining; but could have I perhaps further converted those functions to coroutines? My answer was "no": - Some of the candidate functions relied on deferred cleanups that might need to yield (all three variants of `count_rows()`). - Some had been implemented by passing lambdas to wrappers of `seastar::async()` (`sub_partition_read()`, `sub_partitions_read()`). - The test case `test_skipping_in_compressed_stream()` initially looked promising for co-routinization (from its starting point `seastar::async()`), because it seemed to employ no deferred cleanup (that might need to yield). However, the function uses three lambdas that must be able to yield internally, and one of those (`make_is()`) is even capturing. - The rest (`test_empty_key_view_comparison()`, `test_parse_path_good()`, `test_parse_path_bad()`) was synchronous code to begin with. ``` test/boost/sstable_test.cc \| 188 +++++++++----------- 1 file changed, 83 insertions(+), 105 deletions(-) ``` Refactoring; no backport needed. Closes scylladb/scylladb#20011 * github.com:scylladb/scylladb: test/sstable: remove useless variable from promoted_index_read() test/sstable: rewrite promoted_index_read() with async() test/sstable: unfuturize lambda invocation in test_using_reusable_sst() test/sstable: rewrite wrong_range() with async() test/sstable: simplify not_find_key_composite_bucket0() under test_using_reusable_sst() test/sstable: rewrite full_index_search() with async() test/sstable: simplify find_key(), all_in_place() under test_using_reusable_sst() test/sstable: rewrite (un)compressed_random_access_read() with async() test/sstable: simplify write_and_validate_sst() test/sstable: simplify check_toc_func() under async() test/sstable: simplify check_statistics_func() under async() test/sstable: simplify check_summary_func() under async() test/sstable: coroutinize check_component_integrity() test/sstable: rewrite write_sst_info() with async() test/sstable: simplify missing_summary_first_last_sane() test/sstable: coroutinize summary_query_fail() test/sstable: rewrite summary_query() with async() test/sstable: coroutinize (simple/composite)_index_read() test/sstable: rewrite index_read() with async() test/sstable: rewrite test_using_reusable_sst() with async() test/sstable: rewrite test_using_working_sst() with async()	2024-08-08 11:55:37 +03:00
Michał Jadwiszczak	b62a8b747a	test/auth_cluster/test_raft_service_levels: add test for automatic connection update	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	870bdaa6b1	api/cql_server_test: add CQL server testing API Add a CQL server testing API with and endpoint to dump service level parameters of all CQL connections. This endpoint will be later used to test functionality of automated updating CQL connections parameters.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	c3e8778ad4	transport/cql_server: subscribe to sl effective cache reloaded Make cql server (but not maintenance server) is subscribed to qos configuration change. Trigger update of connections' service level params on effective cache reloaded event. It's not done on maintenance server because it doesn't support role hierarchy nor attaching service levels.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	b2f2288292	transport/controller: coroutinize `subscribe_server` and `unsubscribe_server`	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	4af90726b6	transport/cql_server: add method to update service level params on all connections Trigger update of service level param on every cql connection. In enterprise, the method needs also to update connections' scheduling group.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	324b3c43c0	generic_server: use async function in `for_each_gently()` In the following patch, we will add a method to update service levels parameters for each cql connections. To support this, this patch allows to pass async function as a parameter to `for_each_gently()` method.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	93e6de0d04	service/qos/sl_controller: use effective service levels cache Use cache to quickly access effective service level of a role.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	664a1913c6	service/qos/service_level_controller: notify subscribers on effective cache reloaded Add event representing reload of effective service level cache and notify subscribers when the cache is reloaded.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	5f8132c13c	service/raft/group0_state_machine: update effective service levels cache Updates to `system.role_members` and `system.role_attributes` affect effective service levels cache, so applying mutations to those tables should reload the effective SL cache.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	7b28df9b4d	service/topology_coordinator: migrate service levels before auth Effective service level cache will be updated when mutations are applied to some of the auth tables. But the effective cache depends on first-level service levels cache, so service levels data should be migrated before auth data.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	842573d0af	service/qos/service_level_controller: effective service levels cache Add a second layer of service_level_controller cache which contains role name -> effective service level mapping. To build the mapping, controller uses first cache layer (service level name -> service level) and 2 queries to auth tables (one to `roles` and one to `role_members`).	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	4922f87fed	utils/sorting: allow to pass any container as verticies The container containing all verticies doesn't have to be a vector. Allowing to pass any container that meet conditions, will make to function more flexible.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	619937c466	service/qos/service_level_controller: replace shard check to assert The cache is only updated on shard 0, so doing assert is a better sanity check.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	be4c83ad3c	service/qos: define effective service level Write down definitions of `service level` and `effective service level` in service/qos/service_level_controller.hh. Until now, effective service level was only used as result of `LIST EFFECTIVE SERVICE LEVEL OF <role>`. Now we want to have quick access to effective service level of each role and introduce cache of effective sl to do it. New definitions clarify things. The commit also renames: - `update_service_levels_from_distributed_data` -> `update_service_levels_cache` Later we will introduce effective_service_level_cache, so this change standarizes the names. - `find_service_level` -> `find_effective_service_level` The function actualy returns effective service level.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	0da979e013	service/qos/qos_common: use const reference in `init_effective_names()` `service_level_options::init_effective_names()` method's argument has no reason to be mutable reference. This commit converts it to const ref.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	37cd998993	service/qos/service_level_controller: remove unused field	2024-08-08 10:42:08 +02:00
Michał Jadwiszczak	f9048de0ce	auth: return map of directly granted roles Returns multimap of directly granted roles for each role. Uses only one query to create the map, instead of doing recursive queries for each individual role.	2024-08-08 10:42:08 +02:00
Michał Jadwiszczak	d643d5637c	test/auth/test_auth_v2_migration: create sl1 in the test Test `test_auth_v2_migration` creates auth data where role `users` has assigned service level `sl:fefe` but the service level isn't actually created. In following patches, we are going to introduce effective service levels cache which depends on auth and is refreshed when mutations are applied to v2 auth tables. Without this changes, this test will fail because the service level doesn't exist. Also the name `sl:fefe` is change to `sl1`.	2024-08-08 10:42:08 +02:00
Avi Kivity	3fe60560d2	Merge 'Coroutinize view_builder::start()' from Pavel Emelyanov It runs in the background and consists of two parts -- async() lambda and following .then()-s. This PR move the background running code into its own method and coroutinizes it in parts. With #19954 merged it finally looks really nice. Closes scylladb/scylladb#20058 * github.com:scylladb/scylladb: view_builder: Restore indentation after previous patches view_builder: Coroutinize inner start_in_background() calls view_builder: Coroutinize outer start_in_background() calls view_builder: Add helper method for background start	2024-08-07 19:47:32 +03:00
Kamil Braun	4181a1c53e	storage_service: raft topology: warn when `raft_topology_cmd_handler` fails due to abort Currently we print an ERROR on all exceptions in `raft_topology_cmd_handler`. This log level is too high, in some cases exceptions are expected -- like during shutdown. And it causes dtest failures. Turn exceptions from aborts into WARN level. Also improve logging by printing the command that failed. Fixes scylladb/scylladb#19754 Closes scylladb/scylladb#19935	2024-08-07 17:57:23 +02:00
Tomasz Grabiec	1a4baa5f9e	tablets: Do not allocate tablets on nodes being decommissioned If tablet-based table is created concurrently with node being decommissioned after tablets are already drained, the new table may be permanently left with replicas on the node which is no longer in the topology. That creates an immidiate availability risk because we are running with one replica down. This also violates invariants about replica placement and this state cannot be fixed by topology operations. One effect is that this will lead to load balancer failure which will inhibit progress of any topology operations: load_balancer - Replica 154b0380-1dd2-11b2-9fdd-7156aa720e1a:0 of tablet 7e03dd40-537b-11ef-9fdd-7156aa720e1a:1 not found in topology, at: ... Fixes #20032 Closes scylladb/scylladb#20053	2024-08-07 18:52:58 +03:00
Dawid Medrek	96509c4cf7	db/hints: Make sync points be created for all hosts when not specified Sync points are created, via POST HTTP requests, for a subset of nodes in the cluster. Those nodes are specified in a request's parameter `target_hosts`. When the parameter is empty, Scylla should assume the user wants to create a sync point for ALL nodes. Before these changes, sync points were created only for LIVE nodes. If a node was dead but still part of the cluster and the user requested creating a sync point leaving the parameter `target_hosts` empty, the dead node was skipped during the creation of the sync point. That was inconsistent with the guarantees the sync point API provides. In this commit, we fix that issue and add a test verifying that the changes have made the implementation compliant with the design of the sync point API -- the test only passes after this commit. Fixes scylladb/scylladb#9413 Closes scylladb/scylladb#19750	2024-08-07 13:15:20 +02:00
Pavel Emelyanov	63afbc0fcb	view_builder: Restore indentation after previous patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	aa1a5d3201	view_builder: Coroutinize inner start_in_background() calls One of the co_await-ed parts of this method is async() lambda. It can be coroutinized too. One thing to care is the semaphore units -- its scope should (?) terminate earlier than the whole start_in_background() so release it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	167c6a9c5e	view_builder: Coroutinize outer start_in_background() calls The method consists of two parts -- one running in async() thread and continuations to it. This patch turns the latter chain into co_await-s. The mentioned chain is "guarded" by then_wrapped() catch of any exception, which is turned into a plain try-catch block. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	10a87f5c5b	view_builder: Add helper method for background start The view_builder::start() happens in the background. It's good to have explicit start_in_background() method and coroutinize it next. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 13:59:57 +03:00
Dawid Medrek	ec691a84a5	docs/hinted_handoff: Describe sync point HTTP API In this commit, we describe the mechanism of sync point in Hinted Handoff in the user documentation. We explain the motivation for it and how to use it, as well as list and describe all of the parameters involved in the process. Errors that may appear and experienced by the user are addressed in the article. Fixes scylladb/scylladb#18500 Closes scylladb/scylladb#19686	2024-08-07 11:12:23 +02:00
Pavel Emelyanov	2fd60b0adc	api: Move config-related endpoints from storage_service.cc The get_all_data_file_locations and get_saved_caches_location get the returned data from db::config and should be next other endpoints working with config data. refs: #2737 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19958	2024-08-07 10:18:29 +03:00
Piotr Dulikowski	1963619803	Merge 'Use cross shard barrier to start view builder' from Pavel Emelyanov When starting, view builder wants all shards to synchronize with each other in the middle of initialization. For that they all synchronize via shard-0's instance counter and a shared future. There's cross-shard barrier in utils/ that provides the same facility. Closes scylladb/scylladb#19954 * github.com:scylladb/scylladb: view_builder: Drop unused members view_builder: Use cross-shard barrier on start view_builder: Add cross-shard barrier to its .start() method	2024-08-07 08:54:15 +02:00
Botond Dénes	78206a3fad	test/boost: mutation_test: add test for cell compaction stats	2024-08-06 08:56:28 -04:00
Botond Dénes	259a59bd64	mutation/compact_and_expire_result: drop operator bool() Having an operator bool() on this struct is counter-intuitive, so this commit drops it and migrates any remaining users to bool is_live(). The purpose of this operator bool() was to help in incrementally replace the previous bool return type with compact_and_expire_result in the compact_and_expire() call stack. Now that this is done, it has served its purpose.	2024-08-06 08:56:28 -04:00
Botond Dénes	f638c37c4b	querier: consume_page(): add rate-limiting to tombstone warnings These warnings can be logged once per query, which could result in filling the logs with thousands of log lines. Rate-limit to once per 10sec.	2024-08-06 08:56:11 -04:00
Botond Dénes	d69b16a51e	querier: consume_page(): add cell stats to page stats trace message	2024-08-06 08:56:11 -04:00
Botond Dénes	98c599f73a	querier: consume_page(): add tombstone warning for cell tombstones Since it is really difficult to meaningfully aggregate cell tombstones with row tombstones, there is two separate warning for them.	2024-08-06 08:56:11 -04:00
Botond Dénes	fa2ee6d545	querier: consume_page(): extract code which logs tombstone warning Soon, we want to log a warning on too many cell tombstones as well. Extract the logging code to allow reuse between the row and cell tombstone warnings.	2024-08-06 08:56:11 -04:00
Botond Dénes	e403644c8b	mutation/mutation_compactor: collect and aggregate cell compaction stats row::compact_and_expire() now returns details cell stats. Collect and aggregate these, using the existing compaction_stats::row_stats structure.	2024-08-06 08:56:11 -04:00
Botond Dénes	0396db497c	mutation: row::compact_and_expire(): use compact_and_expire_result Collect, store and return stats about cells, via compact_and_expire_result.	2024-08-06 08:56:11 -04:00
Botond Dénes	2c6d4e21e6	collection_mutation: compact_and_expire(): use compact_and_expire_result Collect, store and return stats about cells, via compact_and_expire_result.	2024-08-06 08:56:11 -04:00
Botond Dénes	e773a8eee6	mutation: introduce compact_and_expire_result To hold cell stats, to be collected during row::compact_and_expire(). Users will come in the next patches.	2024-08-06 08:56:11 -04:00
Aleksandra Martyniuk	9ec8000499	test: add test to check that task handler is fixed	2024-08-06 13:15:33 +02:00
Aleksandra Martyniuk	811ca00cec	tasks: fix task handler There are some bugs missed in task handler: - wait_for_task does not wait until virtual tasks are done, but returns the status immediately; - wait_for_task suffers from use after return; - get_status_recursively does not set the kind of task essentials. Fix the aforementioned.	2024-08-06 13:15:13 +02:00
Anna Stuchlik	849856b964	doc: add post-installation configuration to the Web Installer page This commit extracts the information about the configuration the user should do right after installation (especially running scylla_setup) to a separate file. The file is included in the relevant pages, i.e., installing with packages and installing with Web Installer. In addition, the examples on the Web Installer page are updated with supported versions of ScyllaDB. Fixes https://github.com/scylladb/scylladb/issues/19908 Closes scylladb/scylladb#20035	2024-08-06 13:49:09 +03:00
Kamil Braun	f348f33667	raft topology: improve logging Add more logging for raft-based topology operations in INFO and DEBUG levels. Improve the existing logging, adding more details. Fix a FIXME in test_coordinator_queue_management (by readding a log message that was removed in the past -- probably by accident -- and properly awaiting for it to appear in test). Enable group0_state_machine logging at TRACE level in tests. These logs are relatively rare (group 0 commands are used for metadata operations) and relatively small, mostly consist of printing `system.group0_history` mutation in the applied command, for example: ``` TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - apply() is called with 1 commands TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd: prev_state_id: optional(dd9d47c6-50ee-11ef-d77f-500b8e1edde3), new_state_id: dd9ea5c6-50ee-11ef-ae64-dfbcd08d72c3, creator_addr: 127.219.233.1, creator_id: 02679305-b9d1-41ef-866d-d69be156c981 TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd.history_append: {canonical_mutation: table_id 027e42f5-683a-3ed7-b404-a0100762063c schema_version c9c345e1-428f-36e0-b7d5-9af5f985021e partition_key pk{0007686973746f7279} partition_tombstone {tombstone: none}, row tombstone {range_tombstone: start={position: clustered, ckp{0010b4ba65c64b6e11ef8080808080808080}, 1}, end={position: clustered, ckp{}, 1}, {tombstone: timestamp=1722617232237511, deletion_time=1722617232}}{row {position: clustered, ckp{0010dd9ea5c650ee11efae64dfbcd08d72c3}, 0} tombstone {row_tombstone: none} marker {row_marker: 1722617232237511 0 0}, column description atomic_cell{ create system_distributed keyspace; create system_distributed_everywhere keyspace; create and update system_distributed(_everywhere) tables,ts=1722617232237511,expiry=-1,ttl=0}}} ``` note that the mutation contains a human-readable description of the command -- like "create system_distributed keyspace" above. These logs might help debugging various issues (e.g. when `apply` hangs waiting for read_apply mutex, or takes too long to apply a command). Ref: scylladb/scylladb#19105 Ref: scylladb/scylladb#19945 Closes scylladb/scylladb#19998	2024-08-06 11:50:16 +03:00
Kamil Braun	aa9d5fe3f5	Merge 'doc: add the 6.0-to-6.1 upgrade guide' from Anna Stuchlik This PR adds the 6.0-to-6.1 upgrade guide (including metrics) and removes the 5.4-to-6.0 upgrade guide. Compared 5.4-to-6.0, the the 6.0-to-6.1 guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure. - Removed the references to the Enable Consistent Topology Updates page (which was in version 6.0 and is removed with this PR) across the docs. See the individual commits for more details. Fixes https://github.com/scylladb/scylladb/issues/19853 Fixes https://github.com/scylladb/scylladb/issues/19933 This PR must be backported to branch-6.1 as it is critical in version 6.1. Closes scylladb/scylladb#19983 * github.com:scylladb/scylladb: doc: remove the 5.4-to-6.0 upgrade guide doc: add the 6.0-to-6.1 upgrade guide	2024-08-06 10:23:18 +02:00
Andrei Chekun	cc428e8a36	[test.py] Increase pool size for CI Currently, the resource utilization in CI is low. Increasing the number of clusters will increase how many tests are executed simultaneously. This will decrease the time it takes to execute and improve resource utilization. Related: https://github.com/scylladb/qa-tasks/issues/1667 Closes scylladb/scylladb#19832	2024-08-06 11:20:36 +03:00
Botond Dénes	822d3b11d0	tool/scylla-nodetool: refresh: improve error-message on missing ks/tbl args The command has a singl check for the missing keyspace and/or table parameters and if the check fails, there is a combined error message. Apparently this is confusing, so split the check so that missing keyspace and missing table args have its own check and error message. Fixes: scylladb/scylladb#19984 Closes scylladb/scylladb#20005	2024-08-05 22:36:05 +03:00
Anna Stuchlik	32fa5aa938	doc: remove the 5.4-to-6.0 upgrade guide This commit removes the 5.4-to-6.0 upgrade guide and all references to it. It mainly removes references to the Enable Consistent Topology Updates page, which was added as enabling the feature was optional. In rare cases, when a reference to that page is necessary, the internal link is replaced with an external link to version 6.0. Especially the Handling Cluster Membership Change Failures page was modified for troubleshooting purposes rather than removed.	2024-08-05 20:13:48 +02:00
Kefu Chai	b1405da6ac	s3/client: use div_ceil() defined by utils/div_ceil.hh instead of reinventing the wheel, let's use the existing one. in this change, we trade the `div_ceil()` implementated in s3/client.cc for the existing one in utils/div_ceil.hh . because we are not using `std::lldiv()` anymore, the corresponding `#include <cstdlib>` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20000	2024-08-05 15:35:18 +03:00
Kefu Chai	12a066ccdf	sstable_directory: use return_exception_ptr() when appropriate instead of using `std::rethrow_exception()`, use `coroutine::return_exception_ptr()` which is a little bit more efficient. See also `6cafd83e1c` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20001	2024-08-05 12:54:27 +03:00
Kefu Chai	0bc886d005	service: mark fmt::formatter<T>::format() as const fmt 11 enforces the constness of `format()` member function, if it is not marked with `const`, the tree fails to build with fmt 11, like: ``` /usr/include/fmt/base.h:1393:23: error: no matching member function for call to 'format' 1393 \| ctx.advance_to(cf.format(static_cast<qualified_type>(arg), ctx)); \| ~~~^~~~~~ /usr/include/fmt/base.h:1374:21: note: in instantiation of function template specialization 'fmt::detail::value<fmt::context>::format_custom_arg<service::migration_badness, fmt::formatter<service::migration_badness>>' requested here 1374 \| custom.format = format_custom_arg< \| ^ /home/kefu/dev/scylladb/service/tablet_allocator.cc:170:14: note: in instantiation of function template specialization 'fmt::format_to<fmt::basic_appender<char>, const locator::global_tablet_id &, const locator::tablet_replica &, const locator::tablet_replica &, const service::migration_badness &, 0>' requested here 170 \| fmt::format_to(ctx.out(), "{{tablet: {}, {} -> {}, badness: {}", candidate.tablet, candidate.src, \| ^ /home/kefu/dev/scylladb/service/tablet_allocator.cc:161:10: note: candidate function template not viable: 'this' argument has type 'const fmt::formatter<service::migration_badness>', but method is not marked const 161 \| auto format(const service::migration_badness& badness, FormatContext& ctx) { \| ^ ``` so, in this change, we mark these two `format()` member functions const. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20013	2024-08-05 12:53:42 +03:00
Piotr Dulikowski	a038a1fdef	Merge 'db: coroutinize do_apply_counter_update' from Michael Litvak rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function. The second commit adds a test that drops a table while there is a counter update operation ongoing in the table. The test reproduces issue https://github.com/scylladb/scylla-enterprise/issues/4475 and verifies it is fixed. Follow-up to https://github.com/scylladb/scylladb/pull/19948 Doesn't require backport because the fix to the issue was already done and backported. This is just cleanup and a test. Closes scylladb/scylladb#19982 * github.com:scylladb/scylladb: db: test counter update while table is dropped db: coroutinize do_apply_counter_update	2024-08-05 10:08:18 +02:00
Nadav Har'El	247b84715a	test/cql-pytest: reproducers for key length bugs Recently, some users have seen "Key size too large" errors in various places. Cassandra and Scylla impose a 64KB length limit on keys, and we have known about bugs in this area for a long time - and even had some translated Cassandra unit tests that cover some of them. But these tests did not cover all the corner cases and left us with partial and fragmented knowledge of this problem, spread over many test files and many issues. In this patch, we add a single test file, test/cql-pytest/test_key_length.py which attempts to rigourously explore the various bugs we have with CQL key length limits. These test aim to reproduce all known bugs in this area: * Refs #3017 - CQL layer accepts set values too large to be written to an sstable * Refs #10366 - Enforce Key-length limits during SELECT * Refs #12247 - Better error reporting for oversized keys during INSERT * Refs #16772 - Key length should be limited to exactly 65535, not less The following less interesting bug is already covered by many tests so I decided not to test it again: * Refs #7745 - Length of map keys and set items are incorrectly limited to 64K in unprepared CQL There's also a situation in materialized views and secondary indexes, where a column that was _not_ a key, now becomes a key, and a length limit needs to be enforced on it. We already have good test coverage for this (in test/cql-pytest/test_secondary_index.py and in test/cql-pytest/test_materialized_view.py), and we have an issue: * Refs #8627 - Cleanly reject updates with indexed values where value > 64k All 16 tests added here pass on Cassandra 5 except one that fails on https://issues.apache.org/jira/browse/CASSANDRA-19270, but 11 of the tests currently fail on Scylla (6 on #12247, 2 on #10366, 3 on #16772). It is possible that our decision in #16772 will not be to fix Scylla to match Cassandra but rather to declare that strict compatibility isn't needed in this case or even that Cassandra is wrong. But even then, having these tests which demonstrate the behavior of both Cassandra and Scylla will be important. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16779	2024-08-05 10:13:49 +03:00
Tzach Livyatan	861a1cedea	Improve tombstone_compaction_interval description Closes scylladb/scylladb#19072	2024-08-05 10:10:55 +03:00
Pavel Emelyanov	f0f28cf685	docs: Extend debugging with info about exploring ELF notes When debugging coredumps some (small, but useful) information is hidden in the notes of the core ELF file. Add some words about it exists, what it includes and the thing that is always forgotten -- the way to get one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19962	2024-08-05 09:49:52 +03:00
Tzach Livyatan	858fd4d183	Update tracing.rst - fix table node_slow_log_time name Closes scylladb/scylladb#19893	2024-08-05 09:47:27 +03:00
Botond Dénes	76b6e8c5aa	Merge 'Drop datadir from keyspace::config' from Pavel Emelyanov Commit `ad0e6b79` (replica: Remove all_datadir from keyspace config) removed all_datadirs from keyspace config, now it's datadir turn. After this change keyspace no longer references any on-disk directories, only the sstables's storage driver attached to keyspace's tables does. refs #12707 Closes scylladb/scylladb#19866 * github.com:scylladb/scylladb: replica: Remove keyspace::config::datadir sstables/storage: Evaluate path for keyspace directory in storage sstables/storage: Add sstables_manager arg to init_keyspace_storage()	2024-08-05 09:46:29 +03:00
Avi Kivity	2eff4b41ad	repair: row_level: coroutinize working_row_hashes() It uses do_with, so it allocates unconditionally. Might as well use the allocation for a nice coroutine. Closes scylladb/scylladb#19915	2024-08-05 08:55:34 +03:00
Anna Stuchlik	eca2dfd8c3	doc: add OS support for version 6.1 This commit adds OS support for version 6.1 and removes OS support for 5.4 (according to our support policy for versions). Closes scylladb/scylladb#19992	2024-08-05 08:25:16 +03:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Avi Kivity	cdee667170	alternator: destroy streamed json values gently Large json return values are streamed to avoid memory pressure and stalls, but are destroyed all at once. This in itself can cause stalls [1]. Destroy them gently to avoid the stalls. [1] ++[0#1/1 100%] addr=0x46880df total=514498 count=7004 avg=73: \| seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}> at ./build/release/seastar.lto/./seastar/include/seastar/util/backtrace.hh:64 ++ - addr=0x4680b35: \| seastar::backtrace_buffer::append_backtrace_oneline at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:839 \| (inlined by) seastar::print_with_backtrace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:858 ++ - addr=0x46800f7: \| seastar::internal::cpu_stall_detector::generate_trace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1469 ++ - addr=0x4680178: \| seastar::internal::cpu_stall_detector::maybe_report at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1206 \| (inlined by) seastar::internal::cpu_stall_detector::on_signal at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1226 ++ - addr=0x3dbaf: ?? ??:0 ++[1#1/812 13%] addr=0x217b774 total=69336 count=990 avg=70: \| rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:721 \| ++[2#1/3 85%] addr=0x217b7db total=58974 count=842 avg=70: \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| ++[3#1/4 45%] addr=0x217b7db total=902102 count=12903 avg=70: \| \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| -> continued at addr=0x217b7db above \| \| \|+[3#2/4 40%] addr=0x217b8b3 total=794219 count=11363 avg=70: \| \| \| rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:726 \| \| \| ++[4#1/1 100%] addr=0x217b7db total=909571 count=13012 avg=70: \| \| \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| \| -> continued at addr=0x217b7db above \| \| \|+[3#3/4 15%] addr=0x43d35a3 total=296768 count=4246 avg=70: \| \| \| seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492 \| \| \| (inlined by) seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492 \| \| \| ++[4#1/2 98%] addr=0x43e7d06 total=289680 count=4144 avg=70: \| \| \| \| seastar::shared_ptr<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570 \| \| \| \| (inlined by) alternator::make_streamed(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&&)::$_0::operator() at ./alternator/executor.cc:127 \| \| \| ++ - addr=0x184e0a6: \| \| \| \| std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240 \| \| \| \| (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose at ./build/release/seastar.lto/./seastar/include/seastar/core/coroutine.hh:125 \| \| \| \| (inlined by) seastar::reactor::run_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:2651 \| \| \| \| (inlined by) seastar::reactor::run_some_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3114 \| \| \| \| ++[5#1/1 100%] addr=0x2503b87 total=310677 count=4417 avg=70: \| \| \| \| \| seastar::reactor::do_run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3283 \| \| \| \| ++[6#1/2 78%] addr=0x46a2898 total=400571 count=5450 avg=73: \| \| \| \| \| seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0::operator() at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:4501 \| \| \| \| \| (inlined by) std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 \| \| \| \| \| (inlined by) std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:111 \| \| \| \| \| (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290 \| \| \| \| ++ - addr=0x4673fda: \| \| \| \| \| std::function<void ()>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591 \| \| \| \| \| (inlined by) seastar::posix_thread::start_routine at ./build/release/seastar.lto/./seastar/src/core/posix.cc:90 \| \| \| \| ++ - addr=0x8c946: ?? ??:0 \| \| \| \| ++ - addr=0x11296f: ?? ??:0 \| \| \| \| ++[6#2/2 22%] addr=0x2502c1e total=113613 count=1549 avg=73: \| \| \| \| \| seastar::reactor::run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3166 \| \| \| \| ++ - addr=0x22068e0: \| \| \| \| \| seastar::app_template::run_deprecated at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:276 \| \| \| \| ++ - addr=0x220630b: \| \| \| \| \| seastar::app_template::run at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:167 \| \| \| \| ++ - addr=0x22334bc: \| \| \| \| \| scylla_main at ./main.cc:672 \| \| \| \| ++ - addr=0x20411cc: \| \| \| \| \| std::function<int (int, char**)>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591 \| \| \| \| \| (inlined by) main at ./main.cc:2072 \| \| \| \| ++ - addr=0x27b89: ?? ??:0 \| \| \| \| ++ - addr=0x27c4a: ?? ??:0 \| \| \| \| ++ - addr=0x28c8fb4: \| \| \| \| \| _start at ??:? Closes scylladb/scylladb#19968	2024-08-05 00:35:52 +03:00
Botond Dénes	c34127092d	reader_concurrency_semaphore: test constructor: don't ignore metrics param The for_tests constructor has a metrics parameter defaulted to register_metrics::no, but when delegating to the other constructor, a hard-coded register_metrics::no is passed. This makes no difference currently, because all callers use the default and the hard-coded value corresponds to it. Let's fix it nevertheless to avoid any future surprises. Closes scylladb/scylladb#20007	2024-08-04 21:14:42 +03:00
Laszlo Ersek	0933a52c0b	test/sstable: remove useless variable from promoted_index_read() The large_partition_schema() call returns a copy of the "schema_ptr" object that points to an effectively statically initialized thread_local "schema" object. The large_partition_schema() call has no bearing on whether, or when, the "schema" object is constructed, and has no side effects (other than copying an "lw_shared_ptr" object). Furthermore, the return value of large_partition_schema() is not used for anything in promoted_index_read(). This redundant call seems to date back to original commit `3dd079fb7a` ("tests: add test for reading parts of a large partition", 2016-08-07). Remove the call and the variable. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	bb58446258	test/sstable: rewrite promoted_index_read() with async() For better readability, replace future::then() chaining with future::get(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	1f565626d4	test/sstable: unfuturize lambda invocation in test_using_reusable_sst() All lambdas passed to test_using_reusable_sst() and test_using_reusable_sst_returning() have been converted to future::get() calls (according to the seastar::thread context that they are now executed in). None of the lambdas return futures anymore; they all directly return void or non-void. Therefore, drop futurize_invoke(...).get() around the lambda invocations in test_using_reusable_sst(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	8ea881ae04	test/sstable: rewrite wrong_range() with async() For better readability, replace the future::then() chaining (and the associated manual fiddling with object lifecycles) with future::get() (and rely on seastar::thread's stack). We're already in seastar::thread context. Similarly, replace the future::finally() underlying with_closeable() with deferred_close(); with the assumption that mutation_reader::close() never fails (and is therefore safe to call in the "deferred_close" destructor). This is actually guaranteed, as mutation_reader::close() is marked "noexcept". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	e7e9a0a696	test/sstable: simplify not_find_key_composite_bucket0() under test_using_reusable_sst() According to early patch "test/sstable: rewrite test_using_reusable_sst() with async" in this series, lambdas passed to test_using_reusable_sst() are invoked: (a) less importantly here, in seastar::thread context, (b) more importantly here, futurized (temporarily so). The test case not_find_key_composite_bucket0() doesn't chain futures; therefore it needs no conversion to future::get() for purpose (a); however, we can eliminate its empty future return. Fact (b) will cover for that, until all such lambdas are converted to direct "void" returns (at which point we can remove the futurization from test_using_reusable_sst()). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	95cf16708d	test/sstable: rewrite full_index_search() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) This patch is best viewed with "git show -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	2a27d5b344	test/sstable: simplify find_key*(), all_in_place() under test_using_reusable_sst() According to early patch "test/sstable: rewrite test_using_reusable_sst() with async" in this series, lambdas passed to test_using_reusable_sst() are invoked: (a) less importantly here, in seastar::thread context, (b) more importantly here, futurized (temporarily so). The test cases find_key_map(), find_key_set(), find_key_list(), find_key_composite(), all_in_place() don't chain futures; therefore they need no conversion to future::get() for purpose (a); however, we can eliminate their empty future returns. Fact (b) will cover for that, until all such lambdas are converted to direct "void" returns (at which point we can remove the futurization from test_using_reusable_sst()). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	d22bd93abb	test/sstable: rewrite (un)compressed_random_access_read() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	6e35e584c8	test/sstable: simplify write_and_validate_sst() All three lambdas passed to write_and_validate_sst() now use future::get() rather than future::then() chaining; in other words, the future::get() calls inside all these seastar::thread contexts have been pushed down to the lambdas. Change all these lambdas' return types from future<> to void. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	8819b3f134	test/sstable: simplify check_toc_func() under async() The lambda passed to write_and_validate_sst() already runs in seastar::thread context; replace future::then() chaining with future::get() calls. We're going to eliminate the trailing "return make_ready_future<>()" later. This patch is best viewed with "git show -W -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	de56883a17	test/sstable: simplify check_statistics_func() under async() The lambda passed to write_and_validate_sst() already runs in seastar::thread context; replace future::then() chaining with future::get() calls. We're going to eliminate the trailing "return make_ready_future<>()" later. This patch is best viewed with "git show -W -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	1a85412f96	test/sstable: simplify check_summary_func() under async() The lambda passed to write_and_validate_sst() already runs in seastar::thread context; replace future::then() chaining with future::get() calls. We're going to eliminate the trailing "return make_ready_future<>()" later. This patch is best viewed with "git show -W -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	7b21bce1ca	test/sstable: coroutinize check_component_integrity() check_component_integrity() does not rely on any deferred close or stop operations; turn it into a coroutine therefore, for best readability. This conversion demonstrates particularly well how much the stack eases coding. We no longer need to artificially extend the lifetime of "tmp" with a final .then([tmp] {}) future. Consequently, "tmp" no longer needs to be a shared pointer to an on-heap "tmpdir" object; "tmp" can just be a "tmpdir" object on the stack. While at it, eliminate the single-use local objects "s" and "gen", for movability's sake. (We could use std::move() on these variables, but it seems easier to just flatten the function calls that produce the corresponding rvalues into the write_sst_info() argument list.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	caca13fe28	test/sstable: rewrite write_sst_info() with async() For better readability, replace future::then() chaining with future::get(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	cfe92ee203	test/sstable: simplify missing_summary_first_last_sane() The lambda passed to test_using_reusable_sst() is now invoked -- futurized, transitorily -- in seastar::thread context; stop returning an explicit make_ready_future<>() from the lambda. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	10ebc0a2d2	test/sstable: coroutinize summary_query_fail() summary_query_fail() does not rely on any deferred close or stop operations; turn it into a coroutine therefore, for best readability. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	a403ad0703	test/sstable: rewrite summary_query() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	3a57a7cfea	test/sstable: coroutinize (simple/composite)_index_read() simple_index_read() and composite_index_read() do not rely on any deferred close or stop operations; turn them into coroutines therefore, for best readability. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	eeeab1110a	test/sstable: rewrite index_read() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	17d4fac669	test/sstable: rewrite test_using_reusable_sst() with async() Improve the readability of test_using_reusable_sst() by replacing future::then() chaining with test_env::do_with_async() and future::get(). Unlike seastar::async(), test_env::do_with_async() restricts its input lambda to returning "void". Because of this, introduce the variant test_using_reusable_sst_returning(), based on test_env::do_with_async_returning(), for lambdas returning non-void. Put the latter to use in index_read() at once. Subsequently, we'll gradually convert the lambdas passed to test_using_reusable_sst() and test_using_reusable_sst_returning() from returning futures to returning direct values. In order for test_using_reusable_sst() and test_using_reusable_sst_returning() to cope with both types of lambdas, wrap the lambdas into futurize_invoke().get(). In the seastar::thread context, future::get() will gracefully block on genuine futures, and return immediately on direct values that were futurized on the spot. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	79a8a6c638	test/sstable: rewrite test_using_working_sst() with async() Make test_using_working_sst() easier to read by: (1) replacing test_env::do_with() with seastar::async(), seastar::defer(), and future::get(); (2) replacing seastar::async() and seastar::defer() with test_env::do_with_async(). Technically speaking, this change does not perfectly preserve exceptional behavior. Namely, test_env::do_with() uses future::finally() to link test_env::stop() to the chain of futures, and future::finally() permits test_env::stop() itself to throw an exception -- potentially leading to a seastar::nested_exception being thrown, which would carry both the original exception and the one thrown by test_env::stop(). Contrarily, the test_env::stop() deferred with seastar::defer() runs in a destructor, and therefore test_env::stop() had better not throw there. However, we will assume that test_env::stop() does not throw, albeit not marked "noexcept". Prior commits `8d704f2532` ("sstable_test_env: Coroutinize and move to .cc test_env::stop()", 2023-10-31) and `2c78b46c78` ("sstables::test_env: Carry compaction manager on board", 2023-10-31) show that we've considered individual actions in test_env::stop() not to throw before. The 128KB stack of seastar::thread (which underlies seastar::async()) should be a tolerable cost in a test case, in exchange for the improved readability. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Kefu Chai	0660675387	utils/div_ceil: add constraints to template arguments to better reflect what we expect from the arguments. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20003	2024-08-04 15:32:01 +03:00
Aleksandra Martyniuk	2ab56b7f56	repair: use find_column_family in insert_repair_meta repair_service::insert_repair_meta gets the reference to a table and passes it to continuations. If the table is dropped in the meantime, the reference becomes invalid. Use find_column_family at each table occurrence in insert_repair_meta instead. Closes scylladb/scylladb#19953	2024-08-04 13:56:38 +03:00
Kefu Chai	571ae0ac96	docs: link to current document instead of the github wiki before this change, the hyper link brings us to a GitHub wiki page, which just points the reader to https://docs.scylladb.com/operating-scylla/snitch/. this is not a great user experience. so, in this change, we just reference the document in the current build. more efficient this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19952	2024-08-04 11:47:21 +03:00
Kefu Chai	f7556edc65	build: cmake: define SCYLLA_ENABLE_PREEMPTION_SOURCE for dev build in `fabab2f4`, we introduced preemption_source, and added `SCYLLA_ENABLE_PREEMPTION_SOURCE` preprocessor macro to enable opt-in the pluggable preemption check. but CMake building system was not updated accordingly. so, in this change, let's sync the CMake building system with `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19951	2024-08-04 11:46:28 +03:00
Yaron Kaikov	8221a178d8	Revert "dist: support nonroot and offline mode for scylla-housekeeping" This reverts commit `c3bea539b6`. Since it breaking offline-installer artifact-tests. Also, it seems that we should have merged it in the first place since we don't need scylla-housekeeping checks for offline-installer Closes scylladb/scylladb#19976	2024-08-04 10:55:26 +03:00
Aleksandra Martyniuk	c456a43173	compaction: replace optional<task_info> with task_info param compaction_manager::perform_compaction does not create task manager task for compaction if parent_info is set to std::nullopt. Currently, we always want to create task manager task for compaction. Remove optional from task info parameters which start compaction. Track all compactions with task manager.	2024-08-02 14:38:46 +02:00
Aleksandra Martyniuk	108d0344b8	compaction: keep split executor in task manager If perform_compaction gets std::nullopt as a parent info then the executor won't be tracked by task manager. Modify storage_group::split call so that it passes empty task_info instead of nullopt to track split.	2024-08-02 12:45:32 +02:00
Wojciech Mitros	543dab9e88	mv: test the view update behavior With the recently added mv admission control, we can now test how are the view update backlogs updated and propagated without relying just on the response delays that it was causing until now. This patch adds a test for it, replicating issues scylladb/scylladb#18461 and scylladb/scylladb#18783. In the test, we start with an empty view update backlog, then perform a write to it, increasing its backlog and saving the updated backlog on coordinator, the backlog then drops back to 0, we wait 1s for the backlog to be gossiped and we perform another write which should succeed. Due to scylladb/scylladb#18461, the test would fail because in both gossip rounds before and after the write, the backlog was empty, causing the write to be blocked by admission control indefinitely. Due to scylladb/scylladb#18783, the test would fail because when the backlog drops back to 0 after the write, the change is never registered, causing all writes to be blocked as well.	2024-08-02 12:12:24 +02:00
Wojciech Mitros	795ac177c2	mv: add test for admission control In this patch we add 2 tests for checking that the mv admission control works. The first one simply checks whether, after increasing the backlog on one node over the admission control threshold, the following request is rejected with the error message corresponding to the admission control. The second one checks whether, after triggering admission control, the entire user request fails instead of just failing a replica write. This is done by performing a number of writes, some of which trigger the admission control and cause retries, then checking if the node that had a large view update backlog received all the writes. Before, the writes would succeed on enough replicas, reaching QUORUM, and allowing the user write to succeed and cause no retries, even though on the replica with a high backlog the write got rejected due to the backlog size.	2024-08-02 12:12:24 +02:00
Wojciech Mitros	a55b7688b6	storage_proxy: return overloaded_exception instead of throwing To avoid an expensive stack unwind, instead of throwing an error, we can just return it thanks to the boost::result type that the affected methods use. The result with an exception needs to be constructed not implicitly, but with boost::outcome_v2::failure, because the exception, converted into coordinator_exception_container can be then converted into both into a successful response_id_type as well as into a failure.	2024-08-02 12:12:24 +02:00
Wojciech Mitros	5eaae05aaf	mv: reject user requests by coordinator when a replica is overloaded by MVs Currently, when a replica's view update backlog is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded. Fixes scylladb/scylladb#17426	2024-08-02 12:12:19 +02:00
Piotr Dulikowski	39b49a41cc	Merge 'mv: delete a partition in a single operation when applicable' from Michael Litvak Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199 Closes scylladb/scylladb#19338 * github.com:scylladb/scylladb: mv: skip reading rows when generating partition tombstone update mv: delete a partition in a single operation when applicable cql-pytest: move ScyllaMetrics to util file to allow reuse	2024-08-02 11:00:18 +02:00
Michael Litvak	0f5e8c52ad	db: test counter update while table is dropped Add a test that drops a table while there is a counter update operation ongoing in the table. The test reproduces issue scylladb/scylla-enterprise#4475 and verifies it is fixed.	2024-08-01 22:23:17 +03:00
Avi Kivity	99d0aaa7d2	Merge 'tablets: load_balancer: Improve per-table balance' from Tomasz Grabiec Tablet load balancer tries to equalize tablet load between shards by moving tablets. Currently, the tablet load balancer assumes that each tablet has the same hotness. This may not be true, and some tables may be hotter than others. If some nodes end up getting more tablets of the hot table, we can end up with request load imbalance and reduced performance. In `79d0711c7e` we implemented a mitigation for the problem by randomly choosing the table whose tablet replica should be moved. This should improve fairness of movement. However, this proved to not be enough to get a good distribution of tablets. This change improves candidate selection to not relay on randomness but rather evaluating candidates with respect to the impact on load imbalance. Also, if there is no good candidate, we consider picking other source shards, not the most-loaded one. This is helpful because when finishing node drain we get just a few candidates per shard, all of which may belong to a single table, and the destination may already be overloaded with that table. Another shard may contain tablets of another table which is not yet overloaded on the destination. And shards may be of similar load, so it doesn't matter much which shard we choose to unload. We also consider other destinations, not the least-loaded one. This helps when draining nodes and the source node has few shard candidates. Shards on the destination may have similar load so there is more than one good destinatin candidate. By limiting ourselves to a single shard, we increase the chance that we're overload the table on that shard. The algorithm was evaluated using "scylla perf-load-balancing", which simulates a sequeunce of 8 node bootstraps and decommissions for different node and shard counts, RF, and tablet counts. For example, for the following parameters: params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32} The results are: Before: Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}} Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}} After: Overcommit : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}} Overcommit : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} So worst shard overcommit for table1 was reduced from 4 to 1.5. Overcommit of 4 means that the most-loaded shard has 4 times more tablets than the average per-shard load in the cluster. Also, node overcommit for table1 was reduced from 1.81 to 1.02. The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution. The algorithm is not perfect, it finds a local optimum. In the above test, overcommit of 1.5 is not the best possible (1.25). One of the reason why the current algorithm doesn't achieve best distribution is that it works with a single movement at a time and replication constraints limit the choice of destinations. Viable destinations for remaining candidates may by only on nodes which are not least-loaded, and we won't be able to fill the least loaded node. Doing so would require more complex movement involving moving a tablet from one of the destination nodes which doesn't have a replica on the least loaded node and then replacing it with the candidate from the source node. Another limitation is that the algorithm can only fix balance by moving tablets away from most loaded nodes, and it does so due to imbalance between nodes. So it cannot fix the imbalance which is already present on the nodes if there is not much to move due to similar load between nodes. It is designed to not make the imbalance worse, so it works good if we started in a good shape. Fixes https://github.com/scylladb/scylladb/issues/16824 Closes scylladb/scylladb#19779 * github.com:scylladb/scylladb: test: perf: tablet_load_balancing: Test with higher shard and tablet counts tablets: load_balancer: Avoid quadratic complexity when finding best candidate tablets: load_balancer: Maintain load sketch properly during intra-node migration tablets: load_balancer: Use "drained" flag test: perf: tablet_load_balancing: Report load balancer stats tablets: load_balancer: Move load_balancer_stats_manager to header file tablets: load_balancer: Split evaluate_candidate() into src and dst part tablets: load_balancer: Optimize evaluate_candidate() tablets: load_balancer: Add more statistics tablets: load_balancer: Track load per table on cluster level tablets: load_balancer: Track load per table on node level tablets: load_balancer: Use a single load sketch for tracking all nodes locator: load_sketch: Introduce populate_dc() tablets: load_balancer: Modify target load sketch only when emitting migration locator: load_sketch: Introduce get_most_loaded_shard() locator: load_sketch: Introduce get_least_loaded_shard() locator: load_sketch: Optimize pick()/unload() locator: load_sketch: Introduce load_type test: perf: tablet_load_balancing: Report total tablet counts test: perf: tablet_load_balancing: Print run parameters in the single simulation case too test: perf: tablet_load_balancing: Report time it took to schedule migrations tablets: load_balancer: Log table load stats after each migration tablets: load_balancer: Log per-shard load distribution in debug level tablets: load_balancer: Improve per-table balance tablets: load_balancer: Extract check_convergence() tablets: load_balancer: Extract nodes_by_load_cmp tablets: load_balancer: Maintain tablet count per table tablets: load_balancer: Reuse src_node_info test: perf: tablet_load_balancing: Print warnings about bad overcommit test: perf: tablet_load_balancing: Allow running a single simulation test: perf: tablet_load_balancing: Report best possible shard overcommit test: perf: tablet_load_balancing: Report global shard overcommit	2024-08-01 21:12:14 +03:00
Michael Litvak	22b282f5c5	db: coroutinize do_apply_counter_update rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function.	2024-08-01 19:09:04 +03:00
Anna Stuchlik	9972e50134	doc: add the 6.0-to-6.1 upgrade guide This commit adds the 6.0-to-6.1 upgrade guide. Compared to the previous upgrade guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure.	2024-08-01 14:58:14 +02:00
Piotr Smaron	0ea2128140	cql: refactor rf_change indentation	2024-08-01 14:37:53 +02:00
Piotr Smaron	5b089d8e10	Prevent ALTERing non-existing KS with tablets ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 Fixes: #19576	2024-08-01 14:37:53 +02:00
Piotr Dulikowski	44f327675d	Merge 'Remove gossiper argument from storage_service::join_cluster()' from Pavel Emelyanov It's only needed to start hints via proxy, but proxy can do it without gossiper argument Closes scylladb/scylladb#19894 * github.com:scylladb/scylladb: storage_service: Remote gossiper argument from join_cluster() proxy: Use remote gossiper to start hints resource manager hints: Const-ify gossiper references and anchor pointers	2024-08-01 10:18:14 +02:00
Michael Litvak	c944e28e43	db: fix waiting for counter update operations on table stop When a table is dropped it should wait for all pending operations in the table before the table is destroyed, because the operations may use the table's resources. With counter update operations, currently this is not the case. The table may be destroyed while there is a counter update operation in progress, causing an assert to be triggered due to a resource being destroyed while it's in use. The reason the operation is not waited for is a mistake in the lifetime management of the object representing the write in progress. The commit fixes it so the object lives for the duration of the entire counter update operation, by moving it to the `do_with` list. Fixes scylladb/scylla-enterprise#4475 Closes scylladb/scylladb#19948	2024-08-01 09:39:49 +02:00
Nadav Har'El	5411559a94	test/cql-pytest: test ALLOW FILTERING in intersection of two indexes A user complained that ScyllaDB is incompatible with Cassandra when it requires ALLOW FILTERING on a restriction like WHERE x=1 AND y=1 where x and y are two columns with secondary indexes. In the tests added in this patch we show that: 1. Scylla is compatible with Cassandra when the traditional "CREATE INDEX" is used - ALLOW FILTERING is required in this case in both Cassandra and Scylla. 2. If SAI is used in Cassandra (CREATE CUSTOM INDEX USING 'SAI'), indeed ALLOW FILTERING becomes optional. I believe this is incorrect so I opened CASSANDRA-19795. These two tests combined show that we're not incompatible with Cassandra, rather Cassandra's two index implementations are incompatible between themselves, and Scylla is in fact compatible in this case with Cassadra's traditional index and not with SAI. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19909	2024-07-31 14:01:29 +03:00
Laszlo Ersek	e67eb0ccc1	test/sstable: coroutinize do_write_sst() Make do_write_sst() easier to read by coroutinizing it. Closes #19803. Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19937	2024-07-31 13:59:26 +03:00
Kefu Chai	020333fcf1	sstables: fix a typo in comment s/guranteed/guaranteed/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19946	2024-07-31 13:58:09 +03:00
Tomasz Grabiec	28de5231f4	test: perf: tablet_load_balancing: Test with higher shard and tablet counts We have up to 200 shards in production, so test this to catch performance issues.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	19b7fb3a4d	tablets: load_balancer: Avoid quadratic complexity when finding best candidate If the source and destination shards picked for migration based on global tablet balance do not have a good candidate in terms of effect on per-table balance, the algorithm explores other source shards and destinations. This has quadratic complexity in terms of shard count in the worst case, when there are no good candidates. Since we can have up to ~200 shards, this can slow down scheduling significantly. I saw total scheduling time of 5 min in the following run: scylla perf-load-balancing -c1 -m1G --iterations=8 \ --nodes=4 --tablets1=1024 --tablets2=8096 \ --rf1=2 --rf2=3 --shards=256 To improve, change the apprach to first find the best source shard and then best target shard, sequentially. So it's now linear in terms of shard count. After the change, the total scheduling time in that run is down to 4s. Minimizing source and destination metrics piece-wise minimizes the combined metric, so badness of the best candidate doesn't suffer after this change.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	93df82032f	tablets: load_balancer: Maintain load sketch properly during intra-node migration Affects only intra-node migration. The code was recording destination shard as taken and did not un-take it in case we skipped the migration due to lack of candidates. Noticed during code review. Impact is minor, since even if this leads to suboptimal balance, the next scheduling round should fix it. Also, the source shard was not unloaded, but that should have no impact on decisions. But to be future-proof, better to maintain the load accurately in case the algorithm is extended with more steps.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	88988ce0db	tablets: load_balancer: Use "drained" flag Cleanup / optimization.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	56801b7cb7	test: perf: tablet_load_balancing: Report load balancer stats	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	90c9934099	tablets: load_balancer: Move load_balancer_stats_manager to header file So that stats can be accessed outside tablet allocator.	2024-07-31 12:57:15 +02:00
Anna Stuchlik	ae28880fc8	doc: enable publishing docs for branch-6.1 This commit enables publishing documentation from branch-6.1. The docs will be published as UNSTABLE (the warning about version 6.1 being unstable will be displayed). Fixes https://github.com/scylladb/scylladb/issues/19926 No backport is required. Closes scylladb/scylladb#19931	2024-07-31 12:48:51 +02:00
Kamil Braun	c05e077a13	Merge 'raft: fix the shutdown phase being stuck' from Emil Maskovsky Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 Closes scylladb/scylladb#19860 * github.com:scylladb/scylladb: raft: fix the shutdown phase being stuck raft: use the abort source reference in raft group0 client interface	2024-07-31 12:10:30 +02:00
Pavel Emelyanov	93ed978729	view_builder: Drop unused members There's a counter and a shared future on board, that used to facilitate start-time barrier synchronization. Now they are not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:59:40 +03:00
Pavel Emelyanov	613161c7b9	view_builder: Use cross-shard barrier on start When starting, view builder spawns an async background fibers, and upon its completion each shard needs to wait for other shards to do the same. This is exactly what cross-shard barrier is about, so instead of synchronizing via v.b.'s shard-0 instance, use the barrier. This makes the view_builder::start() shorder and earier to read. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:56:25 +03:00
Pavel Emelyanov	fb1b749445	view_builder: Add cross-shard barrier to its .start() method The barrier will be used by next patch to synchronize shards with each other. When passed to invoke_on_all() lambda like this, each lambda gets its its copy of the barrier "handler" that maintains shared state across shards. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:54:28 +03:00
Tomasz Grabiec	94cce4b7d3	tablets: load_balancer: Split evaluate_candidate() into src and dst part Those parts will be used separately later.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	4df2abe47a	tablets: load_balancer: Optimize evaluate_candidate() Moves load computation out of the hot path by relying on data structures maintained globally during plan making.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	5e7facd543	tablets: load_balancer: Add more statistics	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	be055977c9	tablets: load_balancer: Track load per table on cluster level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	81fcee2040	tablets: load_balancer: Track load per table on node level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	e7ef7419dc	tablets: load_balancer: Use a single load sketch for tracking all nodes This is code simplification and optimization. Avoids multiple passes of tablet metadata to consturct load sketch for each target node.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	352b8e0ddd	locator: load_sketch: Introduce populate_dc()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	9a7afd334b	tablets: load_balancer: Modify target load sketch only when emitting migration This avoids the need to unpick() a replica when the candidate is not selected. Optimization.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	b78657ce7d	locator: load_sketch: Introduce get_most_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	de404471b7	locator: load_sketch: Introduce get_least_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	8fbfd595bb	locator: load_sketch: Optimize pick()/unload() They are executed frequently during tablet scheduling. Currently, they have time complexity of O(N*log(N)) in terms of shard count. With large shard counts, that has significant overhead. This patch optimizes them down to O(log(N)).	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	d0b0f95849	locator: load_sketch: Introduce load_type	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	8f3b623144	test: perf: tablet_load_balancing: Report total tablet counts	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	662a0ff038	test: perf: tablet_load_balancing: Print run parameters in the single simulation case too	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	a040404875	test: perf: tablet_load_balancing: Report time it took to schedule migrations	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	ae7fd80554	tablets: load_balancer: Log table load stats after each migration	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	b8996a0f59	tablets: load_balancer: Log per-shard load distribution in debug level	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	469e2f3f90	tablets: load_balancer: Improve per-table balance Tablet load balancer tries to equalize tablet load between shards by moving tablets. Currently, the tablet load balancer assumes that each tablet has the same hotness. This may not be true, and some tables may be hotter than others. If some nodes end up getting more tablets of the hot table, we can end up with request load imbalance and reduced performance. In `79d0711c7e` we implemented a mitigation for the problem by randomly choosing the table whose tablet replica should be moved. This should improve fairness of movement. However, this proved to not be enough to get a good distribution of tablets. This change improves candidate selection to not relay on randomness but rather evaluating candidates with respect to the impact on load imbalance. Also, if there is no good candidate, we consider picking other source shards, not the most-loaded one. This is helpful because when finishing node drain we get just a few candidates per shard, all of which may belong to a single table, and the destination may already be overloaded with that table. Another shard may contain tablets of another table which is not yet overloaded on the destination. And shards may be of similar load, so it doesn't matter much which shard we choose to unload. We also consider other destinations, not the least-loaded one. This helps when draining nodes and the source node has few shard candidates. Shards on the destination may have similar load so there is more than one good destinatin candidate. By limiting ourselves to a single shard, we increase the chance that we're overload the table on that shard. The algorithm was evaluated using "scylla perf-load-balancing", which simulates a sequeunce of 8 node bootstraps and decommissions for different node and shard counts, RF, and tablet counts. For example, for the following parameters: params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32} The results are: After: Overcommit : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}} Overcommit : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Before: Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}} Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}} So shard overcommit for table1 was reduced from 4 to 1.5. Overcommit of 4 means that the most-loaded shard has 4 times more tablets than the average per-shard load in the cluster. Also, node overcommit for table1 was reduced from 1.81 to 1.02. The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution. The algorithm is not perfect, it finds a local optimum. In the above test, overcommit of 1.5 is not the best possible (1.25). One of the reason why the current algorithm doesn't achieve best distribution is that it works with a single movement at a time and replication constraints limit the choice of destinations. Viable destinations for remaining candidates may by only on nodes which are not least-loaded, and we won't be able to fill the least loaded node. Doing so would require more complex movement involving moving a tablet from one of the destination nodes which doesn't have a replica on the least loaded node and then replacing it with the candidate from the source node. Another limitation is that the algorithm can only fix balance by moving tablets away from most loaded nodes, and it does so due to imbalance between nodes. So it cannot fix the imbalance which is already present on the nodes if there is not much to move due to similar load between nodes. It is designed to not make the imbalance worse, so it works good if we started in a good shape. Fixes #16824	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	b7661aa6c9	tablets: load_balancer: Extract check_convergence() Will be reused when evaluating different targets for migration in later stages. The refactoring drops updating of _stats.for_dc(dc).stop_no_candidates and we update _stats.for_dc(dc).stop_load_inversion in both cases where convergence check may fail. The reason is that stat updates must be outside check_convergence(), since the new use case should not update those stats (it doesn't stop balancing, just drops candidates). Propagating the information for distinguishing the two cases would be a burden. But it's not necessary, since both cases are actually load inversion cases, one pre-migration the other post-migration, so we don't need the distinction. It's actually wrong to increment stop_no_candidates, since there may still be candidates, it's the load which is inverted.	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	41e643ddb9	tablets: load_balancer: Extract nodes_by_load_cmp Will be reused in a different place.	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	8a7257971d	tablets: load_balancer: Maintain tablet count per table	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	4e4f13ac9d	tablets: load_balancer: Reuse src_node_info	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	71b8d6b7aa	test: perf: tablet_load_balancing: Print warnings about bad overcommit	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	0d50a028a5	test: perf: tablet_load_balancing: Allow running a single simulation	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	3f3660c3fe	test: perf: tablet_load_balancing: Report best possible shard overcommit	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	c89a320925	test: perf: tablet_load_balancing: Report global shard overcommit Rather than maximum per-node shard overcommit. Global shard overcommit is a better metric since we want to equalize global load not just per-node load.	2024-07-31 11:26:11 +02:00
Emil Maskovsky	5dfc50d354	raft: fix the shutdown phase being stuck Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223	2024-07-31 09:18:54 +02:00
Emil Maskovsky	2dbe9ef2f2	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent.	2024-07-31 09:18:54 +02:00
Benny Halevy	82333036f3	cell_locker: maybe_rehash: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-31 10:06:07 +03:00
Benny Halevy	8853adea96	cell_locker: maybe_rehash: ignore allocation failures `maybe_rehash` is complimentary and is not strictly required to succeed. If it fails, it will retry on the next call, but there's no reason to throw a bad_alloc exception that will fail its caller, since `maybe_rehash` is called as the final step after the caller has already succeeded with its action. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-31 10:06:06 +03:00
Pavel Emelyanov	9214aecbe7	storage_service: Remove orphan forward declaration of a method The start_sys_dist_ks() itself was removed by `bc051387c5` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19928	2024-07-30 16:17:49 +03:00
Benny Halevy	e58ca8c44b	service_level_controller: stop: always call subscription on_abort We want to call `service_level_controller::do_abort()` in all cases. The current code (introduced in `535e5f4ae7`) calls do_abort if abort was not requested, however, since it does so by checking the subscription bool operator, it would miss the case where abort was already requested before the subscription took place (in service_level_controller ctor). With scylladb/seastar@470b539b1c and scylladb/seastar@8ecce18c51 we can just unconditionally call the subscription `on_abort` method, that ensures only-once semantics, even if abort was already requested at subscription time. Fixes scylladb/scylladb#19075 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19929	2024-07-30 13:23:17 +03:00
Kefu Chai	35394c3f9a	docs/dev: fix a typo remove the extraneous "is". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19902	2024-07-30 10:46:25 +03:00
Pavel Emelyanov	97154b0671	Merge 'mapreduce_service: complete coroutinization' from Avi Kivity mapreduce_server was previously coroutinized, but only partially. This series completes coroutinization and eliminates remaining continuation chains. None of this code is performance sensitive as it runs at the super-coordinator level and is amortized over a full scan of the entire table. No backport needed as this is a cleanup. Closes scylladb/scylladb#19913 * github.com:scylladb/scylladb: mapreduce_service: reindent mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node() mapreduce_service: coroutinize dispatch() inner lambda	2024-07-30 10:44:34 +03:00
Nadav Har'El	d293a5787f	alternator: exclude CDC log table from ListTables The Alternator command ListTables is supposed to list actual tables created with CreateTable, and should list things like materialized views (created for GSI or LSI) or CDC log tables. We already properly excluded materialized views from the list - and had the tests to prove it - but forgot both the exclusion and the testing for CDC log tables - so creating a table xyz with streams enable would cause ListTables to also list "xyz_scylla_cdc_log". This patch fixes both oversights: It adds the code to exclude CDC logs from the output of ListTables, add adds a test which reproduces the bug before this fix, and verifies the fix works. Fixes #19911. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19914	2024-07-30 10:43:29 +03:00
Nadav Har'El	ca8b91f641	test: increase timeouts for /localnodes test In commit `bac7c33313` we introduced a new test for the Alternator "/localnodes" request, checking that a node that is still joining does not get returned. The tests used what I thought were "very high" timeouts - we had a timeout of 10 seconds for starting a single node, and injected a 20 second sleep to leave us 10 seconds after the first sleep. But the test failed in one extremely slow run (a debug build on aarch64), where starting just a single node took more than 15 seconds! So in this patch I increase the timeouts significantly: We increase the wait for the node to 60 seconds, and the sleeping injection to 120 seconds. These should definitely be enough for anyone (famous last words...). The test doesn't actually wait for these timeouts, so the ridiculously high timeouts shouldn't affect the normal runtime of this test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19916	2024-07-30 10:41:48 +03:00
Avi Kivity	52ee6127dd	Merge 'Use boto3 in object_store test to list bucket' from Pavel Emelyanov There's a test in object_store suite that verifies the contents of a bucket. It does with the plain http request, but unfortunately this doesn't work -- even local minio uses restricted bucket and using plain http request results in 403(Forbidden) error code. Test doesn't check it and continues working with empty list of objects which, in turn, is what it expects to see. The fix is in using boto3. With it, the acc/secret pair is picked up and listing the bucket finally works. Closes scylladb/scylladb#19889 * github.com:scylladb/scylladb: test/object_store: Use boto3.resource to list bucket test/object_store: Add get_s3_resource() helper	2024-07-29 13:49:50 +03:00
Pavel Emelyanov	8b1a106b62	test/object_store: Use boto3.resource to list bucket Instead of plain http request, use the power of boto3 package. The recently added get_s3_resource() facilitates creating one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-29 12:29:16 +03:00
Pavel Emelyanov	172e1cb0da	test/object_store: Add get_s3_resource() helper It creates boto3.resource object that points to endpoint maintained by s3_server argument (that tests obtain via fixture). This allows using boto3 to access S3 bucket from local minio server. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-29 12:25:57 +03:00
Kefu Chai	1094c71282	cql3/statement: use compile-time format string instead of using fmt::runtime, use compile-time format string in order to detect the bad format string, or missing format arguments, or arguments which are not formattable at compile time. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19901	2024-07-28 21:54:43 +03:00
Benny Halevy	be880ab22c	Update seastar submodule * seastar 67065040...a7d81328 (30): > reactor: Initialize _aio_pollfd later > abortable_fifo: fix a typo in comment > net: Expose DNS error category > pollable_fd_state: use default-generated dtor > perftune: tune tcp_mem > scripts/perftune.py: clock source tweaking: special case Amazon and Google KVM virtualizations > abort_source: subscription: keep callback function alive after abort > github: disable ccache when building with C++ modules > github: add enable-ccache input to test.yaml > pollable_fd_state: Mark destructor protected and make non-virtual > reactor: Mark .configure() private > reactor: Set aio_nowait_supported once > reactor: Add .no_poll_aio to reactor_config > reactor: Move .max_poll_time on reactor_config > reactor: Move .task_quota on reactor_config > reactor: Move .strict_o_direct on reactor_config > reactor: Move .bypass_fsync on reactor_config > reactor: Move .max_task_backlog on reactor_config > reactor: Move .force_io_getevents_syscall on reactor_config > reactor: Move .have_aio_fsync on reactor_config > reactor: Move .kernel_page_cache on reactor_config > reactor: Move .handle_sigint on reactor_config > reactor_backend: Construct _polling_io from reactor config > reactor: Move config when constructing > reactor: Use designated initializers to set up reactor_config > native-stack: use queue::pop_eventually() in listener::accept() > abort_source: subscription: allow calling on_abort explicitly > file: document that close() returns the file object to uninitialized state > code-cleanup: do not include 'smp.hh' in 'reactor.hh' > code-cleanup: remove redundant includes of smp.hh Closes scylladb/scylladb#19912	2024-07-28 21:04:45 +03:00
Kefu Chai	36f5032b2d	db: correct the doxygen comment the parameter names do not match with the ones we are using. these comments were inherited from Origin, but we failed to update them accordingly. in this change, the comments are updated to reflect the function signatures. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19900	2024-07-28 18:24:57 +03:00
Kefu Chai	67e07bee25	build: cmake: use per-mode build dir The build_unified.sh script accepts a --build-dir option, which specifies the directory used for storing temporary files extracted from tarballs defined by the --pkgs option. When performing parallel builds of multiple modes, it's crucial that each build uses a unique build directory. Reusing the same build directory for different modes can lead to conflicts, resulting in build failures or, more seriously, the creation of tarballs containing corrupted files. so, in this change, we specify a different directory for each mode, so that they don't share the same one. Refs scylladb/scylladb#2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19905	2024-07-28 18:11:37 +03:00
Avi Kivity	149a47088e	mapreduce_service: reindent	2024-07-28 17:55:51 +03:00
Avi Kivity	0dd03789f3	mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node() Simplify the function by converting it to a coroutine. Note that while the final co_return co_await looks like a loop (and therefore an await would introduce an O(n) allocation), it really isn't - we retry at most once.	2024-07-28 17:54:01 +03:00
Avi Kivity	b019927a0e	mapreduce_service: coroutinize dispatch() inner lambda dispatch() is a coroutine, but the inner lambda that is executed per node is still a continuation chain. Make it uniform by converting to a coroutine.	2024-07-28 17:36:08 +03:00
Kefu Chai	ee80742c39	cql3: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19906	2024-07-28 17:29:07 +03:00
Benny Halevy	26abad23d9	sstable_directory: delete_atomically: allow sstables from multiple prefixes Currently, delete_atomically can be called with a list of sstables from mixed prefixes in two cases: 1. truncate: where we delete all the sstables in the table directory 2. tablet cleanup: similar to truncate but restricted to sstables in a single tablet replica In both cases, it is possible that sstables in staging (or quarantine) are mixed with sstables in the base directory. Until a more comprehensive fix is in place, (see https://github.com/scylladb/scylladb/pull/19555) this change just lifts the ban on atomic deletion of sstables from different prefixes, and acknowledging that the implementation is not atomic across prefixes. This is better than crashing for now, and can be backported more easily to branches that support tablets so tablet migration can be done safely in the presence of repair of tables with views. Refs scylladb/scylladb#18862 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19816	2024-07-28 17:26:31 +03:00
Pavel Emelyanov	aaad2bbeaf	storage_service: Remote gossiper argument from join_cluster() This pointer was only needed to pull all the way down the hints resource manager start() method. It's no longer needed for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-26 16:29:58 +03:00
Pavel Emelyanov	a1dbaba9e1	proxy: Use remote gossiper to start hints resource manager By the time hinst resource manager is started, proxy already has its remote part initialized. Remote returns const gossiper pointer, but after previous change hints code can live with it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-26 16:29:03 +03:00
Pavel Emelyanov	dd7c7c301d	hints: Const-ify gossiper references and anchor pointers There are two places in hints code that need gossiper: hist_sender calling gossiper::is_alive() and endpoint_downtime_not_bigger_than() helper in manager. Both can live with const gossiper, so the dependency references and anchor pointers can be restricted to const too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-26 16:28:54 +03:00
Lakshmi Narayanan Sreethar	27b305b9d1	boost/bloom_filter_test: wait for total memory reclaimed update The testcase `test_bloom_filter_reclaim_during_reload` checks the SSTable manager's `_total_memory_reclaimed` against an expected value to verify that a Bloom filter was reloaded. However, it does not wait for the manager to update the variable, causing the check to fail if the update has not occurred yet. Fix it by making the testcase wait until the variable is updated to the expected value. Fixes #19879 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#19883	2024-07-26 08:15:11 +03:00
Tomasz Grabiec	851da230c8	Merge 'db/view: drop view updates to replaced node marked as left' from Piotr Dulikowski When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. In addition to the fix, this PR also includes a regression test heavily based on the test that @kbr-scylla prepared during his investigation of the issue. Fixes: scylladb/scylladb#19439 This issue can cause multiple nodes to crash at once and the fix is quite small, so I think this justifies backporting it to all affected versions. 6.0 and 6.1 are affected. No need to backport to 5.4 as this issue only happens with tablets, and tablets are experimental there. Closes scylladb/scylladb#19765 * github.com:scylladb/scylladb: test: regression test for MV crash with tablets during decommission db/view: drop view updates to replaced node marked as left	2024-07-25 11:47:14 +02:00
Michael Litvak	6f25f4b387	mv: skip reading rows when generating partition tombstone update when deleting a base partition, in some cases we can update the view by generating a single partition deletion update, instead of generating a row deletion update for each of the partition rows. If this is the case for all the affected views, and there are no other updates besides deleting the partition, then we can skip reading and iterating over all the rows, since this won't generate any additional updates that are not covered already.	2024-07-25 11:12:58 +03:00
Michael Litvak	d0b02dc0d0	mv: delete a partition in a single operation when applicable Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199	2024-07-25 11:12:58 +03:00
Michael Litvak	98cc707c76	cql-pytest: move ScyllaMetrics to util file to allow reuse ScyllaMetrics is a useful generic component for retrieving metrics in a pytest. The commit moves the implementation from test_shedding.py to util.py to make it reusable in other tests in cql-pytest.	2024-07-25 11:12:58 +03:00
Botond Dénes	1bfe73c2ea	Merge 'Order API endpoints registration in main' from Pavel Emelyanov There are few api::set_foo()-s left in main that are placed in ~~random~~ legacy order. This PR fixes it and makes few more associated cleanups. refs: #2737 Closes scylladb/scylladb#19682 * github.com:scylladb/scylladb: api: Unset cache_service endpoints on stop main: Don't ignore set_cache_service() future api: Move storage API few steps above api: Register token-metadata API next to token-metadata itsels api: Do not return zero local host-id api: Move snitch API registration next to snitch itself	2024-07-25 09:59:38 +03:00
Pavel Emelyanov	456dbc122b	api: Unset cache_service endpoints on stop They currently stay registered long after the dependent services get stopped. There's a need for batch unsetting (scylladb/seastar#1620), so currently only this explicit listing :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	61fb0ad996	main: Don't ignore set_cache_service() future The call itself seem to be in wrong place -- there's no "cache service" also the API uses database and snapshot_ctl to work on. So it deserves more cleanup, but at least don't throw the returned future<> away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	e1eb48f9c2	api: Move storage API few steps above The sequence currently is sharded<storage_service>.start() sharded<query_processor>.invoke_on_all(start_remote) api::set_server_storage_service() The last two steps can be safely swapped to keep storage service API next to its service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	6ae09cc6bf	api: Register token-metadata API next to token-metadata itsels Right now API registration happens quite late because it waits storage service to register its "function" first. This can be done beforeheand and the t.m. API can be moved to where it should be. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	10566256fd	api: Do not return zero local host-id The local host id is read from local token metadata and returned to the caller as string. The t.m. itself starts with default-constructed host id vlaue which is updated later. However, even such "unset" host id value can be rendered as string without errors. This makes the correct work of the API endpoint depend on the initialization sequence which may (spoilter: it will) change in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	29738f0cb6	api: Move snitch API registration next to snitch itself Once sharded<snitch> is started, it can register its handlers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:07 +03:00
Pavel Emelyanov	6357755624	replica: Remove keyspace::config::datadir It's finally no longer used. Now only sstables storage code "knows" that keyspace may have its on-disk directory. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 17:45:51 +03:00
Pavel Emelyanov	f767e25c8b	sstables/storage: Evaluate path for keyspace directory in storage Currently the init_keyspace_storage() expects that the caller would tell it where the ks directory is, but it's not nice as keyspace may not necessarity keep its sstables in any directory. This patch moves the directory path evaluation into storage code, specifically to the lambda that is called for on-disk sstables. The way directory is evaluated mirrors the one from make_keyspace_config() that will be removed by next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 17:45:50 +03:00
Pavel Emelyanov	3ae41bd6f6	sstables/storage: Add sstables_manager arg to init_keyspace_storage() Will be needed by next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 17:41:45 +03:00
Botond Dénes	6337372b9d	test/boost/reader_concurrency_semaphore_test: un-flake test admission The admission test has a section which tests admission when the semaphore has inactive reads. This section (and therefore the enire test) became flaky lately, after a seemingly unrelated seastar upgrade, which improved timers. The cause of the flakyness is the permit which is made inactive later: this permit is created with 0 timeout (times out immediately). For some time now, when the timeout timer of a permit fires, if the permit is inactive, it is evicted. This is what makes the test fail: the inactive read times out and ends up evicting this permit, which is not expected for the test. The reason this was not a problem before, is that the test finishes very quickly, usually, before the timer could even be polled by the reactor. The recent seastar changes changed this and now the timer sometimes get polled and fires, failing the test. Fixes: #19801 Closes scylladb/scylladb#19859	2024-07-24 13:04:50 +03:00
Takuya ASADA	02b20089cb	scylla_raid_setup: install update-initramfs when it's not available scylla_raid_setup may fail on Ubuntu minimal image since it calls update-initramfs without installing. Closes scylladb/scylladb#19651	2024-07-24 11:55:16 +03:00
Pavel Emelyanov	b02d20d12d	Merge 'Minor improvements around compaction groups' from Raphael "Raph" Carvalho Minor changes, no backporting needed. Closes scylladb/scylladb#19723 * github.com:scylladb/scylladb: replica: rename for_each_const_compaction_group() replica: Fix comment about compaction group replica: remove unused compaction_group_vector	2024-07-24 11:22:24 +03:00
Nadav Har'El	edc5bca6b1	alternator: do not allow authentication with a non-"login" role Alternator allows authentication into the existing CQL roles, but roles which have the flag "login=false" should be refused in authentication, and this patch adds the missing check. The patch also adds a regression test for this feature in the test/alternator test framework, in a new test file test/alternator/cql_rbac.py. This test file will later include more tests of how the CQL RBAC commands (CREATE ROLE, GRANT, REVOKE) affect authentication and authorization in Alternator. In particular, these tests need to use not just the DynamoDB API but also CQL, so this new test file includes the "cql" fixture that allows us to run CQL commands, to create roles, to retrieve their secret keys, and so on. Fixes scylladb/scylladb#19735 Closes scylladb/scylladb#19740	2024-07-24 08:20:23 +02:00
Botond Dénes	84db147c58	Merge 'tasks: introduce virtual tasks' from Aleksandra Martyniuk Introduce virtual tasks - task manager tasks which cover cluster-wide operations. Virtual tasks aren't kept in memory, instead their statuses are retrieved from associated service when user requests them with task manager API. From API users' perspective, virtual tasks behave similarly to regular tasks, but they can be queried from any node in a cluster. Virtual tasks cannot have a parent task. They can have children on each node in a cluster, but do not keep references to them. So, if a direct child of a virtual task is unregistered from task manager, it will no longer be shown in parent's children vector. virtual_task class corresponds to all virtual tasks in one group. If users want to list all tasks in a module, a virtual_task returns all recent supported operations; if they request virtual task's status - info about the one specified operation is presented. Time to live, number of tracked operations etc. depend on the implementation of individual virtual_task. All virtual_tasks are kept only on shard 0. Refs: https://github.com/scylladb/scylladb/issues/15852 New feature, no backport needed. Closes scylladb/scylladb#16374 * github.com:scylladb/scylladb: docs: describe virtual tasks db: node_ops: filter topology request entries test: add a topology suite for testing tasks node_ops: service: create streaming tasks node_ops: register node_ops_virtual_task in task manager service: node_ops: keep node ops module in storage service node_ops: implement node_ops_virtual_task methods db: service: modify methods to get topology_requests data db: service: add request type column to topology_requests node_ops: add task manager module and node_ops_virtual_task tasks: api: add virtual task support to get_task_status_recursively tasks: api: add virtual task support tasks: api: add virtual tasks support to get_tasks tasks: add task_handler to hide task and virtual_task differences from user tasks: modify invoke_on_task tasks: implement task_manager::virtual_task::impl::get_children tasks: keep virtual tasks in task manager tasks: introduce task_manager::virtual_task	2024-07-24 08:34:28 +03:00
Botond Dénes	0bb6413ea5	Merge 'github: disable scheduled workflow on forks' from Kefu Chai as these workflows are scheduled periodically, and if they fail, notifications are sent to the repo's owner. to minimize the surprises to the contributors using github, let's disable these workflows on fork repos. Closes scylladb/scylladb#19736 * github.com:scylladb/scylladb: github: do not run clang-tidy as a cron job github: disable scheduled workflow on forks	2024-07-24 07:50:39 +03:00
Avi Kivity	3c930a61c9	Merge 'test: scylla_cluster: support more test scenarios' from Patryk Jędrzejczak We modify `ScyllaCluster.server_start` so that it changes seeds of the starting node to all currently running nodes. This allows writing tests like ```python s1 = await manager.server_add(start=False) await manager.server_add() await manager.server_start(s1.server_id) ``` However, it disallows writing tests that start multiple clusters. To fix this, we add the `seeds` parameter to `server_start`. We also improve the logic in `ScyllaCluster.add_server` to allow writing tests like ```python await manager.server_add(expected_error="...") await manager.server_add() ``` This PR only adds improvements to the `test.py` framework, no need to backport it. Closes scylladb/scylladb#19847 * github.com:scylladb/scylladb: test: scylla_cluster: improve expected_error in add_server test: scylla_cluster: support more test scenarios test: scylla_cluster: correctly change seeds in server_start	2024-07-23 22:05:31 +03:00
Patryk Jędrzejczak	02ccd2e3af	test: scylla_cluster: improve expected_error in add_server We make two changes: - we lease the IP address of a node that failed to boot because of an expected error, - we don't log "Cluster ... added ..." when a node fails to boot because of an expected error.	2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak	4079cd1a7b	test: scylla_cluster: support more test scenarios Here are some examples of tests that don't work with no initial nodes, but they should work: 1. ``` await manager.server_add(expected_error="...") await manager.server_add() ``` 2. ``` await manager.servers_add(2, expected_error="...") await manager.servers_add(2) ``` 3. ``` s1 = await manager.server_add(start=False) await manager.server_start(s1.server_id, expected_error="...") await manager.server_add() ``` 4. ``` [s1, s2] = await manager.servers_add(2, start=False) await manager.server_start(s1.server_id, expected_error="...") await manager.server_start(s2.server_id, expected_error="...") await manager.servers_add(2) ``` 5. ``` s1 = await manager.server_add(start=False) await manager.server_add() await manager.server_start(s1.server_id) ``` 6. ``` [s1, s2] = await manager.servers_add(2, start=False) await manager.servers_add(2) await manager.server_start(s1.server_id) await manager.server_start(s2.server_id) ``` In this patch, we make a few improvements to make tests like the ones presented above work. I tested all the examples above manually. From now on, servers receive correct seeds if the first servers added in the test didn't start or failed to boot. Also, we remove the assertion preventing the creation of a second cluster. This assertion failed the tests presented above. We could weaken it to make these tests pass, but it would require some work. Moreover, we have tests that intentionally create two clusters. Therefore, we go for the easiest solution and accept that a single `ScyllaCluster` may not correspond to a single Scylla cluster.	2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak	e196c1727e	test: scylla_cluster: correctly change seeds in server_start We change seeds in `ScyllaCluster.server_start` to all currently running nodes. The previous code only pretended that it did it. After doing this change, writing tests that create multiple clusters is impossible. To allow it, we add the `seeds` parameter to `ManagerClient.server_start`. We use it to fix and simplify the only test that creates two clusters - `test_different_group0_ids`.	2024-07-23 14:35:08 +02:00
Aleksandra Martyniuk	d04159e7de	docs: describe virtual tasks	2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk	c64cb98bcf	db: node_ops: filter topology request entries system_keyspace::get_topology_request_entries returns entries for requests which are running or have finished after specified time. In task manager node ops task set the time so that they are shown for task_ttl seconds after they have finished.	2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk	36b77c0592	test: add a topology suite for testing tasks Add topology_tasks test suite for testing task manager's node ops tasks. Add TaskManagerClient to topology_tasks for an easy usage of task manager rest api. Write a test for bootstrap, replace, rebuild, decommission and remove top level tasks using the above.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	a903971a74	node_ops: service: create streaming tasks Create tasks which cover streaming part of topology changes. These tasks are children of respective node_ops_virtual_task.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	63e82764e1	node_ops: register node_ops_virtual_task in task manager	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	8e56913fdf	service: node_ops: keep node ops module in storage service Keep task manager node ops module in storage service. It will be used to create and manage tasks related to topology changes. The module is created and registered in storage service constructor. In storage_service::stop() the module is stopped and so all the remaining tasks would be unregistered immediately after they are finished.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	b97a348361	node_ops: implement node_ops_virtual_task methods	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	94282b5214	db: service: modify methods to get topology_requests data Modify get_topology_request_state (and wait_for_topology_request_completion), so that it doesn't call on_internal_error when request_id isn't in the topology_requests table if require_entry == false. Add other methods to get topology request entry.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	880058073b	db: service: add request type column to topology_requests topology_requests table will be used by task manager node ops tasks, but it loses info about request type, which is required by tasks. Add request_type column to topology_requests.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	91fbfbf98a	node_ops: add task manager module and node_ops_virtual_task Add task manager node ops module and node_ops_virtual_task. Some methods will be implemented in later patches.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	d2e6010670	tasks: api: add virtual task support to get_task_status_recursively Virtual tasks are supported by get_task_status_recursively. Currently only local descendants' statuses are shown.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	5f7f403a15	tasks: api: add virtual task support Virtual tasks are supported by get_task_status, abort_task and wait_task. Task status returned by get_task_status and wait_task: - contains task_kind to indicate whether it's virtual (cluster) or regular (node) task; - children list apart from task_id contains node address of the task.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	20ba7ceff9	tasks: api: add virtual tasks support to get_tasks task_manager/list_module_tasks/{module} starts supporting virtual tasks, which means that their stats will also be shown for users. Additional task_kind param is added to indicate whether the task is virutal (cluster-wide) or regular (node-wide). Support in other paths will be added in following patches.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	1d85b319e0	tasks: add task_handler to hide task and virtual_task differences from user Contrary to regular tasks, which are per-operation, virtual tasks are associated with the whole group of operations. There may be many operations of each group performed at the same time. Info about each running operation will be shown to a user through the API. For virtual tasks, task manager imitates a regular task covering each operation, but task_manager::tasks aren't actually created in the memory. Instead, information (e.g. status) about the operation is retrieved from associated service and passed to a user. To hide most of the differences from user, task_handler class is created. Task handler performs appropriate actions depending on task's kind. However, users need to stay conscious about the kind of task, because: - get_task_status and wait_task do not unregister virtual tasks; - time for which a virtual tasks stays in task manager depends on associated service and tasks' implementation; - number of virtual task's children shown by get_tasks doesn't have to be monotonous. API is modified to use task_handler. API-specific classes are moved to task_handler.{cc,hh}.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	abde7ba271	tasks: modify invoke_on_task Modify task_manager::invoke_on_task to also check virtual tasks.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	6029936665	tasks: implement task_manager::virtual_task::impl::get_children Return a vector of task_identity of all children of a virtual task in a cluster.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	9de8d4b5b0	tasks: keep virtual tasks in task manager Virtual tasks are kept in task manager together with regular tasks. All virtual tasks are stored on shard 0. task_manager::module::make_task is modified to consider virtual tasks as possible parents.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	00cfc49d18	tasks: introduce task_manager::virtual_task A virtual task is a new kind of task supported by task manager, which covers cluster-wide operations. From users' perspective virtual tasks behave similarly to task_manager::tasks. The API side of virtual tasks will be covered in the following patches. Contrary to task_manager::task, virtual task does not update its fields proactively. Moreover, no object is kept in memory for each individual virtual task's operation. Instead a service (or services) is queried on API user's demand to learn about the status of running operation. Hence the name. task_manager::virtual_task is responsible for a whole group of virtual tasks, i.e. for tracking and generating statuses of all operations of similar type. To enable tracking of some kind of operations, one needs to override task_manager::virtual_task::impl and provide implementations of the methods returning appropriate information about the operations. task_manager::virtual_task must be kept on shard 0. Similarly to task_manager::tasks, virtual tasks can have child tasks, responsible for tracking suboperations' progress. But virtual tasks cannot have parents - they are always roots in task trees. Some methods and structs will be implemented in later patches.	2024-07-23 13:35:01 +02:00
Nadav Har'El	bac7c33313	alternator: fix "/localnodes" to not return nodes still joining Alternator's "/localnodes" HTTP request is supposed to return the list of nodes in the local DC to which the user can send requests. The existing implementation incorrectly used gossiper::is_alive() to check for which nodes to return - but "alive" nodes include nodes which are still joining the cluster and not really usable. These nodes can remain in the JOINING state for a long time while they are copying data, and an attempt to send requests to them will fail. The fix for this bug is trivial: change the call to is_alive() to a call to is_normal(). But the hard part of this test is the testing: 1. An existing multi-node test for "/localnodes" assummed that right after a new node was created, it appears on "/localnodes". But after this patch, it may take a bit more time for the bootstrapping to complete and the new node to appear in /localnodes - so I had to add a retry loop. 2. I added a test that reproduces the bug fixed here, and verifies its fix. The test is in the multi-node topology framework. It adds an injection which delays the bootstrap, which leaves a new node in JOINING state for a long time. The test then verifies that the new node is alive (as checked by the REST API), but is not returned by "/localnodes". 3. The new injection for delaying the bootstrap is unfortunately not very pretty - I had to do it in three places because we have several code paths of how bootstrap works without repair, with repair, without Raft and with Raft - and I wanted to delay all of them. Fixes #19694. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19725	2024-07-23 13:51:16 +03:00
Pavel Emelyanov	65565a56c3	Merge 's3/client: add client::upload_file()' from Kefu Chai this member function prepares for the backup feature, where the object to be stored in the object storage is already persisted as a file on local filesystem. this brings us two benefits: - with the file, we don't need to accumulate the payloads in memory and send them in batch, as we do in upload_sink and in upload_jumbo_sink. this puts less pressure on the memory subsystem. - with the file, we can read multiple parts in parallel if multpart upload applies to it, this helps to improve the throughput. so, this new helper is introduced to help upload an sstable from local filesystem to the object storage. Fixes https://github.com/scylladb/scylladb/issues/16287 Closes scylladb/scylladb#16387 * github.com:scylladb/scylladb: s3/client: add client::upload_file() s3/client: move constants related to aws constraints out	2024-07-23 12:39:27 +03:00
Kefu Chai	061def001d	s3/client: add client::upload_file() this member function prepares for the backup feature, where the object to be stored in the object storage is already persisted as a file on local filesystem. this brings us two benefits: - with the file, we don't need to accumulate the payloads in memory and send them in batch, as we do in upload_sink and in upload_jumbo_sink. this puts less pressure on the memory subsystem. - with the file, we can read multiple parts in parallel if multpart upload applies to it, this helps to improve the throughput. so, this new helper is introduced to help upload an sstable from local filesystem to the object storage. Fixes #16287 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-23 14:39:30 +08:00
Kefu Chai	6701ce50a5	s3/client: move constants related to aws constraints out minimum_part_size and aws_maximum_parts_in_piece are AWS S3 related constraints, they can be reused out of client::upload_sink and client::upload_jumbo_sink, so in this change * extract them out. * use the user-defined literal with IEC prefix for better readablity to define minimum_part_size * add "aws_" prefix to `minimum_part_size` to be more consistent. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-23 14:33:54 +08:00
Takuya ASADA	c3bea539b6	dist: support nonroot and offline mode for scylla-housekeeping Introduce support nonroot and offline mode for scylla-housekeeping. Closes #13084 Closes scylladb/scylladb#13088	2024-07-23 07:57:32 +03:00
Aleksandra Martyniuk	dfe3af40ed	test: tasks: adjust tests to new wait_task behavior After `c1b2b8cb2c` /task_manager/wait_task/ does not unregister tasks anymore. Delete the check if the task was unregistered from test_task_manager_wait. Check task status in drain_module_tasks to ensure that the task is removed from task manager. Fixes: #19351. Closes scylladb/scylladb#19834	2024-07-22 18:24:54 +03:00
Nadav Har'El	9eb47b3ef0	Merge 'config: round-trip boolean configuration variables' from Avi Kivity When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in both directions. Not a regression, so a backport isn't strictly necessary. Closes scylladb/scylladb#19792 * github.com:scylladb/scylladb: config: specialize from-string conversion for bool config: wrap boost::lexical_cast<> when converting from strings	2024-07-22 17:53:02 +03:00
Botond Dénes	d3135db457	Merge 'commitlog: Add optional max lifetime parameter to cl instance' from Calle Wilund If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log. Related to #15820 The functionality here is to prevent pathological/test cases where a silent system cannot fully process stuff like compaction, GC etc due to things like CL forcing smaller GC windows etc. Closes scylladb/scylladb#15971 * github.com:scylladb/scylladb: commitlog: Make max data lifetime runtime-configurable db::config: Expose commitlog_max_data_lifetime_in_s parameter commitlog: Add optional max lifetime parameter to cl instance	2024-07-22 17:21:33 +03:00
Botond Dénes	3ff33e9c70	Update ./tools/java submodule * ./tools/java dbaf7ba7...0b4accdd (1): > cassandra-stress: Make default repl. strategy NetworkTopologyStrategy Closes scylladb/scylladb#19818	2024-07-22 17:12:09 +03:00
Kamil Braun	8ec90a0e60	docs: extend "forbidden operations" section for Raft-topology upgrade The Raft-topology upgrade procedure must not be run concurrently with version upgrade. Closes scylladb/scylladb#19746	2024-07-22 12:45:38 +03:00
Botond Dénes	591876b44e	Merge 'sstables: do not reload components of unlinked sstables' from Lakshmi Narayanan Sreethar The SSTable is removed from the reclaimed memory tracking logic only when its object is deleted. However, there is a risk that the Bloom filter reloader may attempt to reload the SSTable after it has been unlinked but before the SSTable object is destroyed. Prevent this by removing the SSTable from the reclaimed list maintained by the manager as soon as it is unlinked. The original logic that updated the memory tracking in `sstables_manager::deactivate()` is left in place as (a) the variables have to be updated only when the SSTable object is actually deleted, as the memory used by the filter is not freed as long as the SSTable is alive, and (b) the `_reclaimed.erase(sst)` is still useful during shutdown, for example, when the SSTable is not unlinked but just destroyed. Fixes https://github.com/scylladb/scylladb/issues/19722 Closes scylladb/scylladb#19717 github.com:scylladb/scylladb: boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded sstables: do not reload components of unlinked sstables sstables/sstables_manager: introduce on_unlink method	2024-07-22 12:08:25 +03:00
Avi Kivity	358147959e	Merge 'keep table directory open for flushing' from Laszlo Ersek `filesystem_storage` methods frequently call `sync_directory()`, for the sake of flushing (sync'ing) a directory. `sync_directory()` always brackets the sync with open and close, and given that most `sync_directory()` calls target the sstable base directory, those repeated opens and closes are considered wasteful. Rework the `filesystem_storage::_dir` member (from a mere pathname) so that it stand for an `opened_directory` object, which keeps the sstable base directory open, for the purpose of repeated sync'ing. Resolves #2399. Closes scylladb/scylladb#19624 * github.com:scylladb/scylladb: sstables/storage: synch "dst_dir" more leanly in create_links_common() sstables/storage: close previous directory asynchronously upon dir change sstables/storage: futurize change_dir_for_test() sstables/storage: sync through "opened_directory" in filesystem...::move() sstables/storage: sync through "opened_directory" in the "easy" cases sstables/storage: introduce "opened_directory" class	2024-07-21 17:07:44 +03:00
Yaron Kaikov	d3cbe04130	.github/mergify.yml: update conf to support `6.1` Modify Mergify configuation to support `6.1` instead of `5.2` which is EOL Closes scylladb/scylladb#19810	2024-07-21 17:02:19 +03:00
Łukasz Paszkowski	781eb7517c	api/system: add highest_supported_sstable_format path Current upgrade dtest rely on a ccm node function to get_highest_supported_sstable_version() that looks for r'Feature (.*)_SSTABLE_FORMAT is enabled' in the log files. Starting from scylla-6.0 ME_SSTABLE_FORMAT is enabled by default and there is no cluster feature for it. Thus get_highest_supported_sstable_version() returns an empty list resulting in the upgrade tests failures. This change introduces a seperate API path that returns the highest supported sstable format (one of la, mc, md, me) by a scylla node. Fixes scylladb/scylladb#19772 Backports to 6.0 and 6.1 required. The current upgrade test in dtest checks scylla upgrades up to version 5.4 only. This patch is a prerequisite to backport the upgrade tests fix in dtest. Closes scylladb/scylladb#19787	2024-07-21 17:00:19 +03:00
Avi Kivity	36b57f3432	Merge 'token: inline optimizations' from Benny Halevy This series contains several optimizations for dht::token around its comparison functions as well as minimum_token and maximum_token definitions, by moving them inline into dht/token.hh This results in a nice improvement in perf-simple-query: ``` ==> perf-simple-query.pre <== (`21c67a5a64`) throughput: mean=95774.01 standard-deviation=1129.83 median=96243.64 median-absolute-deviation=1090.08 maximum=96864.09 minimum=94471.19 instructions_per_op: mean=41813.68 standard-deviation=16.27 median=41809.29 median-absolute-deviation=7.02 maximum=41841.64 minimum=41799.41 cpu_cycles_per_op: mean=22383.19 standard-deviation=331.01 median=22254.53 median-absolute-deviation=332.26 maximum=22744.11 minimum=21996.73 ==> perf-simple-query.post.0 <== (token: move ordering operator inline) throughput: mean=96350.01 standard-deviation=640.10 median=96228.88 median-absolute-deviation=621.45 maximum=96988.16 minimum=95478.51 instructions_per_op: mean=41627.13 standard-deviation=37.55 median=41627.06 median-absolute-deviation=2.43 maximum=41679.44 minimum=41573.31 cpu_cycles_per_op: mean=22184.65 standard-deviation=151.03 median=22163.05 median-absolute-deviation=120.83 maximum=22348.49 minimum=21967.30 ==> perf-simple-query.post.1 <== (token: operator<=>: optimize the common case) throughput: mean=96778.29 standard-deviation=1719.34 median=97021.72 median-absolute-deviation=1059.56 maximum=98300.99 minimum=93893.75 instructions_per_op: mean=41590.25 standard-deviation=5.53 median=41589.50 median-absolute-deviation=4.17 maximum=41598.39 minimum=41584.57 cpu_cycles_per_op: mean=22135.33 standard-deviation=471.98 median=21969.30 median-absolute-deviation=244.89 maximum=22905.24 minimum=21685.33 ==> perf-simple-query.post.3 <== (token: always initialize data member) throughput: mean=98264.33 standard-deviation=998.49 median=98533.02 median-absolute-deviation=780.45 maximum=99075.40 minimum=96656.51 instructions_per_op: mean=41657.61 standard-deviation=22.53 median=41648.49 median-absolute-deviation=12.89 maximum=41696.81 minimum=41642.07 cpu_cycles_per_op: mean=21808.57 standard-deviation=93.63 median=21794.56 median-absolute-deviation=75.41 maximum=21949.46 minimum=21719.55 ==> perf-simple-query.post.4 <== (token: constexpr ctors, methods, and minimum/maximum_token) throughput: mean=98095.05 standard-deviation=1333.32 median=98930.22 median-absolute-deviation=906.80 maximum=99209.38 minimum=96194.25 instructions_per_op: mean=41572.28 standard-deviation=6.04 median=41574.49 median-absolute-deviation=4.76 maximum=41579.56 minimum=41564.72 cpu_cycles_per_op: mean=21831.35 standard-deviation=169.56 median=21732.86 median-absolute-deviation=102.93 maximum=22091.66 minimum=21689.63 ==> perf-simple-query.post.5 <== (token: initialize non-key tokens with min() value) throughput: mean=99502.32 standard-deviation=1003.70 median=99744.03 median-absolute-deviation=388.87 maximum=100482.95 minimum=97813.42 instructions_per_op: mean=41593.48 standard-deviation=17.27 median=41585.25 median-absolute-deviation=8.46 maximum=41619.41 minimum=41575.86 cpu_cycles_per_op: mean=21545.90 standard-deviation=86.66 median=21578.01 median-absolute-deviation=43.17 maximum=21612.41 minimum=21395.42 ``` Optimization only. No backport required Closes scylladb/scylladb#19782 * github.com:scylladb/scylladb: token: initialize non-key tokens with min() value token: make kind-based ctor private token: constexpr ctors, methods, and minimum/maximum_token token: always initialize data member everywhere: use dht::token is_{minimum,maximum} token: operator<=>: optimize the common case token: move ordering operator inline partitioner_test: add more token-level tests	2024-07-21 15:07:36 +03:00
Benny Halevy	365e1fb1b9	token: initialize non-key tokens with min() value We already have code to return min() for the minimum and maximum tokens in long_token() and raw(), so instead of using code to return it, just make sure to set it in the _data member. Note that although this change affect serialization, the existing codebase ignores the deserialized bytes and places a constant (0 before this patch, or min() with it) in _data for non-key (minumum or maximum) tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	9f05072527	token: make kind-based ctor private Users outside of the token module don't need to mess with the token::kind. They can only create key tokens. Never, minimum or maximum tokens, with a particular datya value. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	6806112189	token: constexpr ctors, methods, and minimum/maximum_token sizeof(dht::token) is only 16 bytes and therefore it can be passed with 2 registers. There is no sense in defining minimum_token and maximum_token out of line, returning a token& to statically allocated values that require memory access/copy, while the only call sites that needs to point to the static min/max tokens are in dht::ring_position_view. Instead, they can be defined inline as constexpr functions and return their const values. Respectively, define token ctors and methods as constexpr where applicable (and noexcept while at it where applicable) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	e509ccd184	token: always initialize data member Make sure to always initalize the _data member to 0 for non-key (minimum or maximum) tokens. This allows to simplify the equality operator that now doesn't need to rely on `operator<=>` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	850f298ccd	everywhere: use dht::token is_{minimum,maximum} The is_minimum/is_maximum predicates are more efficient than comparing the the m{minimum,maximum}_token values, respectrively. since the is_* functions need to check only the token kind. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	5a60ba5c5f	token: operator<=>: optimize the common case Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	adc1d7f68f	token: move ordering operator inline Token comparisons are abundant. The equality operator is defined inline in dht/token.hh by calling `t1 <=> t2`, and so is `tri_compare_raw`, which `operator<=>` calls in the common path, but `operator<=>` itself is defined out of line, losing the benefits of inlining. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	7e745d31ed	partitioner_test: add more token-level tests Before changing how minimum and maximum tokens are represented in memory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:37 +03:00
Kamil Braun	ad68a7f799	Merge 'test: raft: fix the flaky `test_raft_recovery_stuck`' from Emil Maskovsky Use the rolling restart to avoid spurious driver reconnects. This can be eventually reverted once the scylladb/python-driver#295 is fixed. Fixes scylladb/scylladb#19154 Closes scylladb/scylladb#19771 * github.com:scylladb/scylladb: test: raft: fix the flaky `test_raft_recovery_stuck` test: raft: code cleanup in `test_raft_recovery_stuck`	2024-07-19 19:34:43 +02:00
Piotr Dulikowski	4571262e46	Merge 'Improve constness of functions schema code' from Marcin Maliszkiewicz In v4 of scylladb/scylladb#19598 the last commit of the patch was replaced but this change missed merge so submitting it in a separate patch. In the current patch, the original functions class correctly marks methods as const where appropriate, and the instance() method now returns a const object. This ensures protection against accidental modifications, as all changes must go through the change_batch object. Since the functions_changer class was intended to serve the same purpose, it is now redundant. Therefore, we are reverting the commit that introduced it. Relates scylladb/scylladb#19153 Closes scylladb/scylladb#19647 * github.com:scylladb/scylladb: cql3: functions: replace template with std::function in with_udf_iter() cql3: functions: improve functions class constness handling Revert "cql3: functions: make modification functions accessible only via batch class"	2024-07-19 19:23:11 +02:00
Emil Maskovsky	9ab25e5cbf	test: raft: replace the use of read_barrier work-around Replaced the old `read_barrier` helper from "test/pylib/util.py" by the new helper from "test/pylib/rest_client.py" that is calling the newly introduced direct REST API. Replaced in all relevant tests and decommissioned the old helper. Introduced a new helper `get_host_api_address` to retrieve the host API address - which in come cases can be different from the host address (e.g. if the RPC address is changed). Fixes: scylladb/scylladb#19662 Closes scylladb/scylladb#19739	2024-07-19 19:20:44 +02:00
Laszlo Ersek	680403d2cd	sstables/storage: synch "dst_dir" more leanly in create_links_common() filesystem_storage::create_links_common() runs on directories that generally differ from "_dir", thus, we can't replace its sync_directory() calls with _dir.sync(). We can still use a common (temporary) "opened_directory" object for synching "dst_dir" three times, saving two open and two close operations. This patch is best viewed with "git show -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:46:31 +02:00
Laszlo Ersek	0057ee2431	sstables/storage: close previous directory asynchronously upon dir change In "filesystem_storage", change_dir_for_test() and move() replace "_dir" with "opened_directory(new_dir)" using the move assignment operator. Consequently, the file descriptor underlying "_dir" is closed synchronously as a part of object destruction. Expose the async file::close() function through "opened_directory". Introduce filesystem_storage::change_dir() as a common async workhorse for both change_dir_for_test() and move(). In change_dir(), close the old directory asynchronously. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:43:19 +02:00
Laszlo Ersek	6711574646	sstables/storage: futurize change_dir_for_test() Currently change_dir_for_test() is synchronous. Make it return a future, so that we can use async operations in change_dir_for_test() overrides. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:43:19 +02:00
Laszlo Ersek	ef446c4da0	sstables/storage: sync through "opened_directory" in filesystem...::move() Near the end of filesystem_storage::move(), we sync both the old directory, and the new directory, if "delay_commit" is null. At that point, the new directory is just "_dir"; call _dir.sync() instead of sync_directory(). This patch is best viewed with "git show -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:14:46 +02:00
Laszlo Ersek	4d33640481	sstables/storage: sync through "opened_directory" in the "easy" cases Replace sst.sstable_write_io_check(sync_directory, _dir.native()) with _dir.sync(sst._write_error_handler) Also replace the explicit (but still relatively "easy") open_checked_directory() + flush() + flush() operations in filesystem_storage::seal() with two _dir.sync() calls. Because filesystem_storage::create_links_common() is marked "const", we need to declare "_dir" mutable. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:14:46 +02:00
Laszlo Ersek	2c01171a4d	sstables/storage: introduce "opened_directory" class "filesystem_storage::_dir" is currently of type "std::filesystem::path". Introduce a new class called "opened_directory", and change the type of "_dir" to the new class "opened_directory". "opened_directory" keeps the directory open, and offers synchronization on that open directory (i.e., without having to reopen the directory every time). In subsequent patches, that will be put to use. The opening and closing of the wrapped directory cannot easily be handled explicitly in the "filesystem_storage" member functions. ( Namely, test::store() and test::rewrite_toc_without_scylla_component() -- both in "test/lib/sstable_utils.hh" -- perform "open -> ... -> seal" sequences, and such a sequence may be executed repeatedly. For example, sstable_directory_shared_sstables_reshard_correctly() [test/boost/sstable_directory_test.cc] does just that; it "reopens" the "filesystem_storage" object repeatedly. ) Rather than trying to restrict the order of "filesystem_storage" member function calls, replace the "opened_directory" object with a new one whenever the directory pathname is re-set; namely in filesystem_storage::change_dir_for_test() and filesystem_storage::move(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:14:46 +02:00
Piotr Dulikowski	204a479e82	Merge 'db/hints: Test `manager::too_many_in_flight_hints_for()`' from Dawid Mędrek In `6e79d64`, the behavior of `manager::too_many_in_flight_hints_for()` was accidentally modified. It remained unnoticed for some time and then fixed. In this commit, we add a test verifying that the concurrency of hints being written to disk is indeed limited and the limitations are imposed properly. Refs scylladb/scylladb#17636 Fixes scylladb/scylladb#17660 Closes scylladb/scylladb#19741 * github.com:scylladb/scylladb: db/hints: Verify that Scylla limits the concurrency of written hints db/hints: Coroutinize `hint_endpoint_manager::store_hint()` db/hints: Move a constant value to the TU it's used in	2024-07-19 13:26:34 +02:00
Lakshmi Narayanan Sreethar	0615c8a46b	boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-19 13:15:57 +05:30
Lakshmi Narayanan Sreethar	31ff69a13c	sstables: do not reload components of unlinked sstables The SSTable is removed from the reclaimed memory tracking logic only when its object is deleted. However, there is a risk that the Bloom filter reloader may attempt to reload the SSTable after it has been unlinked but before the SSTable object is destroyed. Prevent this by removing the SSTable from the reclaimed list maintained by the manager as soon as it is unlinked. The original logic that updated the memory tracking in `sstables_manager::deactivate()` is left in place as (a) the variables have to be updated only when the SSTable object is actually deleted, as the memory used by the filter is not freed as long as the SSTable is alive, and (b) the `_reclaimed.erase(*sst)` is still useful during shutdown, for example, when the SSTable is not unlinked but just destroyed. Fixes #19722 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-19 13:15:57 +05:30
Lakshmi Narayanan Sreethar	dbf22848a8	sstables/sstables_manager: introduce on_unlink method Added a new method, on_unlink() to the sstable_manager. This method is now used by the sstable to notify the manager when it has been unlinked, enabling the manager to update its bookkeeping as required. The on_unlink method doesn't do anything yet but will be updated by the next patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-19 13:15:55 +05:30
Kefu Chai	c52f49facb	build: cmake: do not mark cqlsh noarch in `3c7af287`, cqlsh's reloc package was marked as "noarch", and its filename was updated accordingly in `configure.py`, so let's update the CMake building system accordingly. this change should address the build failure of ``` 08:48:14 [3325/4124] Generating ../Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 FAILED: Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 cd /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist && /usr/bin/cmake -E copy /jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 Error copying file "/jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz" to "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz". ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19710	2024-07-19 08:00:17 +03:00
Kefu Chai	34bf10050b	build: cmake: bump up the minimal required fmt to 10.0.0 in `cccec07581`, we started using a featured introduced by {fmt} v10. so we need to bump up the required version in CMake as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19709	2024-07-19 07:58:31 +03:00
Botond Dénes	79567c1c98	scripts/open-coredump.sh: allow complete bypass of S3 server In some cases, the S3 server will not know about a certain build and any attempt to open a coredump which was generated by this build will fail, because the S3 server returns an empty/illegal response. There is already a bypass for missing package-url in the S3 server response, but this doesn't help in the case when the response is also missing other metadata, like build-id and version info. Extend this existig mechanism with a new --scylla-package-url flag, which provides complete bypass. When provided, the S3 server will not be queried at all, instead the package is downloaded from the link and version metadata is extracted from the package itself. Closes scylladb/scylladb#19769	2024-07-18 21:43:53 +03:00
Avi Kivity	58a8fd6f19	Update tools/python3 submodule (install umask, selinux) * tools/python3 18fa79e...fbf12d0 (1): > install.sh: fix incorrect permission on strict umask Ref https://github.com/scylladb/scylladb/issues/8589 Ref https://github.com/scylladb/scylladb/issues/19775	2024-07-18 21:36:50 +03:00
Avi Kivity	7984e595ce	Update tools/java submodule (install selinux context) * tools/java 33938ec16f...dbaf7ba7db (1): > install.sh: apply correct security context on offline installer Ref https://github.com/scylladb/scylladb/issues/8589	2024-07-18 21:03:32 +03:00
Kefu Chai	4fbfecbb3e	Update seastar submodule * seastar 908ccd93...67065040 (44): > metrics: Use this_shard_id unconditionally > sstring: prevent fmt from formatting sstring as a sequence > coding style: allow lines up to 160 chars in length > src/core: remove unnecessary includes > when_all: stop using deprecated std::aligned_union_t > reactor: respect preempt requests in debug mode > core: fix -Wunused-but-set-variable > gate: add try_hold > sstring: declare nested type with typename > rpc: pass start time to `wait_for_reply()` which accepts `no_wait_type` > scripts/perftune.py: get rid of "SyntaxWarning: invalid escape sequence" > scripts/perftune.py: add support for tweaking VLAN interfaces > scripts/perftune.py: improve discovery of bond device slaves > scripts/perftune.py: refactor __learn_slaves() function > code-cleanup: add missing header guards > code-cleanup: remove redundant includes of 'reactor.hh' > code-cleanup: explicitly depend on io_desc.hh > scripts/perftune.py: aRFS should be disabled by default in non-MQ mode > code-cleanup: remove unneeded includes of fair_queue.hh > docker: fix mount of install-dependencies > code-cleanup: remove redundant includes of linux-aio.hh > fstream: reformat the doxygen comment of make_file_input_stream() > iostream: use new-style consumer to implement copy() > stall-analyser: use 0 for default value of --minimum > reactor: fix crash during metrics gathering > build: run socket test with linux-aio reactor backend > test: Add testing of connect()-ion abort ability > linux_perf_event: exclude_idle only on x86_64 > linux_perf_event: add make_linux_perf_event > stall-analyser: gracefully handle empty input > shared_token_bucket: resolve FIXME > io_tester: ensure that file object is valid when closing it > tutorial.md: fix typo in Dan Kegel's name > test,rpc: Extend simple ping-pong case > rpc: Calculate delay and export it via metrics > rpc: Exchange handler duration with server responses > rpc: Track handler execution time > rpc: Fix hard-coded constants when sending unknown verb reply > reactor: Unfriend alien and smp queues > reactor: Add and use stopped() getter > reactor: Generalize wakeup() callers > file: Use lighter access to map of fs-info-s > file: Fix indentation after previous patch > file: Don't return chain of ready futures from make_file_impl Closes scylladb/scylladb#19780	2024-07-18 20:00:15 +03:00
Avi Kivity	f7e24cf0b1	Update tools/jmx submodule (umask fix) * tools/jmx 3328a22...89308b7 (1): > install.sh: fix incorrect permission on strict umask Ref scylladb/scylladb#14383 Ref scylladb/scylladb#8589	2024-07-18 19:37:57 +03:00
Avi Kivity	c3b9e64713	Merge 'sstable::open_sstable: pass origin from the writer' from Lakshmi Narayanan Sreethar Pass origin when opening the sstable from the writer and store it in the sstable object. This will make the origin available for the entire write path. Closes scylladb/scylladb#19721 * github.com:scylladb/scylladb: sstables: use _origin in write path sstable::open_sstable: pass and store origin	2024-07-18 19:30:32 +03:00
Avi Kivity	926a02451e	Merge 'sstables/index_reader: abort reading during shutdown' from Lakshmi Narayanan Sreethar This PR adds support for aborting index reads from within `index_consume_entry_context::consume_input` when the server is being stopped. The abort source is now propagated down to the `index_consume_entry_context`, making it available for `consume_input` to check if an abort has been requested. If an abort is detected, `consume_input` will throw an exception to stop the index read operation. Closes scylladb/scylladb#19453 * github.com:scylladb/scylladb: test/boost: test abort behaviour during index read sstables/index_reader: stop consuming index when abort has been requested sstables::index_consume_entry_context: store abort_source sstable: drop old filter only after the new filter is built during rebuild sstables/sstables_manager: store abort_source in sstable_manager replica/database: pass abort_source to database constructor	2024-07-18 19:26:22 +03:00
Avi Kivity	0780228aa2	config: specialize from-string conversion for bool The yaml/json representation for bool is true/false, but boost::lexical_cast is 1/0. Specialize bool conversion to accept true/false (for yaml/json compatibilty) and 1/0 (for backward compatibility). This provides round-trip conversion for bool configs in system.config.	2024-07-18 18:38:22 +03:00
Avi Kivity	33eaa61cdd	config: wrap boost::lexical_cast<> when converting from strings Configuration uses boost::lexical_cast to convert strings to native values (e.g. bools/ints). However, boost::lexical_cast doesn't recognize true/false for bool. Since we can't change boost::lexical_cast, replace it with a wrapper that forwards directly to boost::lexical_cast. In the next step, we'll specialize it for bool.	2024-07-18 18:38:19 +03:00
Piotr Dulikowski	5ec8c06561	test: regression test for MV crash with tablets during decommission Regression test for scylladb/scylladb#19439. Co-authored-by: Kamil Braun <kbraun@scylladb.com>	2024-07-18 16:00:26 +02:00
Anna Mikhlin	cd007123c3	Update ScyllaDB version to: 6.2.0-dev	2024-07-18 16:07:07 +03:00
Avi Kivity	47e99f4e04	Merge 'Fix lwt semaphore guard accounting' from Gleb Natapov Currently the guard does not account correctly for ongoing operation if semaphore acquisition fails. It may signal a semaphore when it is not held. Should be backported to all supported versions. Closes scylladb/scylladb#19699 * github.com:scylladb/scylladb: test: add test to check that coordinator lwt semaphore continues functioning after locking failures paxos: do not signal semaphore if it was not acquired	2024-07-18 14:58:31 +03:00
Dawid Medrek	8b6e887e02	db/hints: Verify that Scylla limits the concurrency of written hints In `6e79d64`, the behavior of `manager::too_many_in_flight_hints_for()` was accidentally modified. It remained unnoticed for some time and then fixed. In this commit, we add a test verifying that the concurrency of hints being written to disk is indeed limited and the limitations are imposed properly.	2024-07-18 13:49:29 +02:00
Kefu Chai	db56af2e41	replication_strategy: mark fmt::formatter<..>::format() const since fmt 11, it is required that the format() to be const, otherwise its caller in fmt library would not be able to call it. and compile would fail like: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o -MF locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o.d -o locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o -c /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc:9: In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:16: In file included from /home/kefu/dev/scylladb/gms/inet_address.hh:11: In file included from /usr/include/fmt/ostream.h:23: In file included from /usr/include/fmt/chrono.h:23: In file included from /usr/include/fmt/format.h:41: /usr/include/fmt/base.h:1393:23: error: no matching member function for call to 'format' 1393 \| ctx.advance_to(cf.format(static_cast<qualified_type>(arg), ctx)); \| ~~~^~~~~~ /usr/include/fmt/base.h:1374:21: note: in instantiation of function template specialization 'fmt::detail::value<fmt::context>::format_custom_arg<locator::vnode_effective_replication_map::factory_key, fmt::formatter<locator::vnode_effective_replication_map::factory_key>>' requested here 1374 \| custom.format = format_custom_arg< \| ^ /home/kefu/dev/scylladb/seastar/include/seastar/util/log.hh:299:33: note: in instantiation of function template specialization 'fmt::format_to<seastar::internal::log_buf::inserter_iterator &, locator::vnode_effective_replication_map::factory_key &, const void , 0>' requested here 299 \| return fmt::format_to(it, fmt.format, std::forward<Args>(args)...); \| ^ /home/kefu/dev/scylladb/seastar/include/seastar/util/log.hh:428:9: note: in instantiation of function template specialization 'seastar::logger::log<locator::vnode_effective_replication_map::factory_key &, const void >' requested here 428 \| log(log_level::debug, std::move(fmt), std::forward<Args>(args)...); \| ^ /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc:561:18: note: in instantiation of function template specialization 'seastar::logger::debug<locator::vnode_effective_replication_map::factory_key &, const void *>' requested here 561 \| rslogger.debug("create_effective_replication_map: found {} [{}]", key, fmt::ptr(erm.get())); \| ^ /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:471:10: note: candidate function template not viable: 'this' argument has type 'const fmt::formatter<locator::vnode_effective_replication_map::factory_key>', but method is not marked const 471 \| auto format(const locator::vnode_effective_replication_map::factory_key& key, FormatContext& ctx) { \| ^ 1 error generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19768	2024-07-18 13:52:36 +03:00
Avi Kivity	c93e2662ae	build: regenerate toolchain for optimized clang Generate a profile-guided-optimization build of clang and install it. See `bd34f2fe46`. The optimized clang package can be found in https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-x86_64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-aarch64.tar.gz Closes scylladb/scylladb#19685	2024-07-18 12:57:45 +03:00
Botond Dénes	8cc99973eb	Merge 'Apply sstable io error handler to exceptions generated when opening file' from Calle Wilund Fixes #19753 SSTable file open provides an `io_error_handler` instance which is applied to a file-wrapper to process any IO errors happing during read/write via the handler in `storage_service`, which in turn will effectively disable the node. However, this is not applied to the actual open operation itself, i.e. any exception generated by the file open call itself will instead just escape to caller. This PR adds filtering via the `error_handler` to sstable open + makes `storage_service` "isolate" mechanism non-module-static (thus making it testable) and adds tests to check we exhibit the same behaviour in both cases. The main motivation for this issue it discussions that secondary level IO issues (i.e. caused by extensions) should trigger the same behaviour as, for example, running out of disk space. Closes scylladb/scylladb#19766 * github.com:scylladb/scylladb: memtable_test: Add test for isolate behaviour on exceptions during flush cql_test_env: Expose storage service storage_service: Make isolate guard non-static and add test accessor sstable: apply error_handler on open exceptions	2024-07-18 08:14:40 +03:00
Avi Kivity	d5af86bd8a	test: cql-pytest: config_value_context: remove strange ast.literal_eval call cql-pytest's config_value_context is used to run a code sequence with different ScyllaDB configuration applied for a while. When it reads the original value (in order to restore it later), it applies ast.literal_eval() to it. This is strange, since the config variable isn't a Python literal. It was added in `8c464b2ddb` ("guardrails: restrict replication strategy (RS)"). Presumably, as a workaround for #19604 - it sufficiently massaged the input we read via SELECT to be acceptable later via UPDATE. Now that #19604 is fixed, we can remove the call to ast.literal_eval, but have to fix up the parameters to config_value_context to something that will be accepted without further massaging. This is a step towards fixing #15559, where we want to run some tests with a boolean configuration variable changed, and literal_eval is transforming the string representation of integers to integers and confusing the driver. Closes scylladb/scylladb#19696	2024-07-18 08:11:26 +03:00
Dawid Medrek	414ea68cac	exceptions/exceptions.hh: Wrap `#include <concepts>` within an `#ifdef` `GitHub Actions / Analyze #includes in source files` keeps reporting that the include shouldn't be present in the file. The reason is that we use FMT with version >10, so the fragment of the code that uses the include is not compiled. We move the include to a place where it's used, which should fix the warnings. Closes scylladb/scylladb#19776	2024-07-17 22:09:41 +03:00
Yaron Kaikov	ddcc6ec1e4	dist/docker/debian/build_docker.sh: Build container based on Ubuntu24.04 Now that we added support for Ubuntu24.04 and also migrating our images to be based on that (https://github.com/scylladb/scylla-machine-image/pull/530), we should also modify our docker image Fixes: https://github.com/scylladb/scylladb/issues/19738 Closes scylladb/scylladb#19764	2024-07-17 18:45:48 +03:00
Calle Wilund	91b1be6736	memtable_test: Add test for isolate behaviour on exceptions during flush Tests that certain exceptions thrown during flush to sstable does not crash the node, but does trigger io_error_handler and causes node isolation	2024-07-17 09:36:28 +00:00
Calle Wilund	f996dfc4fa	cql_test_env: Expose storage service So tests can play with it.	2024-07-17 09:36:28 +00:00
Calle Wilund	de728958d1	storage_service: Make isolate guard non-static and add test accessor Makes storage service isolate repeatable in same process and more testable. Note, since the test var now is shard-local we need to check twice: once on error, once on reaching shard zero for actual shutdown.	2024-07-17 09:36:28 +00:00
Calle Wilund	7918ec2e39	sstable: apply error_handler on open exceptions	2024-07-17 09:36:27 +00:00
Emil Maskovsky	a89facbc74	test: raft: fix the flaky `test_raft_recovery_stuck` Use the rolling restart to avoid spurious driver reconnects. This can be eventually reverted once the scylladb/python-driver#295 is fixed. Fixes scylladb/scylladb#19154	2024-07-17 09:16:06 +02:00
Emil Maskovsky	ef3393bd36	test: raft: code cleanup in `test_raft_recovery_stuck` Cleaning up the imports.	2024-07-17 09:09:46 +02:00
Lakshmi Narayanan Sreethar	7b58fa2534	sstables: use _origin in write path Now that the origin is available inside the sstable object, no need to pass it to the methods called in the write path. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:44:28 +05:30
Lakshmi Narayanan Sreethar	b762a09dcd	sstable::open_sstable: pass and store origin Pass origin when opening the sstable from the writer and store it in the sstable object. This will make the origin available for the entire write path. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:43:30 +05:30
Lakshmi Narayanan Sreethar	7d0f3ace4a	test/boost: test abort behaviour during index read Added a new boost test, index_reader_test, with a testcase to verifyi the abort behaviour during an index read using index_consume_entry_context. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar	64dadd5ec2	sstables/index_reader: stop consuming index when abort has been requested When an abort is requested, stop further reading of the index file and throw and exception from index_consume_entry_context::process_state(). Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar	c2524337a2	sstables::index_consume_entry_context: store abort_source Store abort source inside sstables::index_consume_entry_context, so that the next patch can implement cancelling the index read when abort is requested. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar	587da62686	sstable: drop old filter only after the new filter is built during rebuild sstable::maybe_rebuild_filter_from_index drops the existing filter first and then rebuilds the new filter as the method is only called before the sstable is sealed. But to make the index read abortable, the old filter can be dropped only after the new filter is built so that in case if the index consumer gets aborted, we still have the old filter to write to disk. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:47 +05:30
Lakshmi Narayanan Sreethar	6a3e7a5e7a	sstables/sstables_manager: store abort_source in sstable_manager Add a new member that stores the abort_source. This can later be used by the sstables to check if an abort has been requested. Also implement sstables_manager::get_abort_source() that returns a const reference to the abort source. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:36:06 +05:30
Lakshmi Narayanan Sreethar	e2142974f8	replica/database: pass abort_source to database constructor This is in preparation for the following patch that adds abort_source variable to the sstables_manager. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:36:06 +05:30
Piotr Dulikowski	6af7882c59	db/view: drop view updates to replaced node marked as left When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. Fixes: scylladb/scylladb#19439	2024-07-16 15:50:11 +02:00
Emil Maskovsky	21c67a5a64	test: raft: fix the flaky `test_change_ip` The python driver might currently trigger spurios reconnects that cause the `NoHostAvailable` to be thrown, which is not expected. This patch adds a retry mechanism to the test to make skip this failure if it occurs, as a work-around. The proper fix is expected to be done in the scylladb/python-driver#295, once fixed there this work-around can be reverted. Fixes: scylladb/scylla#18547 Closes scylladb/scylladb#19759	2024-07-16 15:46:16 +02:00
Botond Dénes	1be6cfb16e	Update tools/java submodule * tools/java 01ba3c19...33938ec1 (1): > cassandra-stress: delay before retry	2024-07-16 16:29:51 +03:00
Gleb Natapov	4178589826	test: add test to check that coordinator lwt semaphore continues functioning after locking failures	2024-07-16 12:32:25 +03:00
Gleb Natapov	87beebeed0	paxos: do not signal semaphore if it was not acquired The guard signals a semaphore during destruction if it is marked as locked, but currently it may be marked as locked even if locking failed. Fix this by using semaphore_units instead of managing the locked flag manually. Fixes: https://github.com/scylladb/scylladb/issues/19698	2024-07-16 12:32:25 +03:00
Avi Kivity	dde209390f	Merge 'sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski There are two schemas associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. This series fixes the known mixups between the two — when setting up compression, and when setting up the bloom filters. Fixes #16065 The bug is present in all supported versions, so the patch has to be backported to all of them. Closes scylladb/scylladb#19695 * github.com:scylladb/scylladb: sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's sstables: for i_filter downcasts, use dynamic_cast instead of static_cast	2024-07-16 12:17:41 +03:00
Raphael S. Carvalho	c061ec8d1c	test: Fix max_ongoing_compaction_test test ``` DEBUG 2024-07-03 00:59:58,291 [shard 0:main] compaction_manager - Compaction task 0x51800002a480 for table tests.3 compaction_group=0 [0x503000062050]: switch_state: none -> pending: pending=2 active=0 done=0 errors=0 DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - Checking droppable sstables in tests.3, candidates=0 DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - time_window_compaction_strategy::newest_bucket: now 1720314000000000 buckets = { key=1720314000000000, size=2 key=1720310400000000, size=2 1720314000000000: GMT: Sunday, July 7, 2024 1:00:00 AM 1720310400000000: GMT: Sunday, July 7, 2024 12:00:00 AM ``` the test failed to complete when ran across different clock hours, as it expected all sstables produced to belong to same window of 1h size. let's fix it by reusing timestamps, so it's always consistent. Fixes #13280. Fixes #18564. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19749	2024-07-16 07:29:10 +03:00
Kefu Chai	c911832ed9	github: do not run clang-tidy as a cron job we already run it for every pull request, so no need to run it periodically. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-15 19:19:49 +08:00
Kefu Chai	dc189c67a6	github: disable scheduled workflow on forks as these workflows are scheduled periodically, and if they fail, notifications are sent to the repo's owner. to minimize the surprises to the contributors using github, let's disable these workflows on fork repos. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-15 19:19:28 +08:00
Emil Maskovsky	144794a952	raft: Fix crash in leader_host API handler The leader_host API handler was eventually using the `req` unique_ptr after it has been already destroyed (passed down to the future lambda by reference). This was causing an occassional crash in some tests. Reworked the leader_host handler to use the req only outside of the future lambda. Also updated the code to handle the possibility that the non-default leader group (other than Group 0) might reside on a different shard than the shard 0 - using the same concept of calling on all shards via `invoke_on_all()` as done for the other requests. Fixes scylladb/scylladb#19714 Closes scylladb/scylladb#19715	2024-07-15 11:06:56 +02:00
Marcin Maliszkiewicz	395dec35c1	cql3: functions: replace template with std::function in with_udf_iter() Templates are slower to compile and more difficult to read, in this case generalization is not needed and can be replaced by std::function.	2024-07-15 09:39:20 +02:00
Marcin Maliszkiewicz	85d38e013c	cql3: functions: improve functions class constness handling Declares getters as const methods. Makes instance() function return const object so that it may only be modified via change_batch class.	2024-07-15 09:39:20 +02:00
Marcin Maliszkiewicz	b9861c0bb7	Revert "cql3: functions: make modification functions accessible only via batch class" This reverts commit `3f1c2fecc2`. This access control property will be implemented differently (by using const) in subsequent commit hence revert.	2024-07-15 09:39:20 +02:00
Dawid Medrek	7301a96ff4	db/hints: Coroutinize `hint_endpoint_manager::store_hint()`	2024-07-15 04:15:25 +02:00
Avi Kivity	c11f2c9bcd	Merge 'scylla-housekeeping: fix exception on parsing version string v2' from Takuya ASADA This reverts `65fbf72ed0` and introduce new version of the patch which fixes SCT breakage after the commit merged. ---- Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to pass acceptable version string to parse_version() like '6.1.0.dev0', which is allowed on Python version scheme. reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes https://github.com/scylladb/scylladb/issues/19564 Closes https://github.com/scylladb/scylladb/pull/19572 Closes scylladb/scylladb#19670 * github.com:scylladb/scylladb: scylla-housekeeping: fix exception on parsing version string Revert "scylla-housekeeping: fix exception on parsing version string"	2024-07-14 16:24:41 +03:00
Raphael S. Carvalho	8df7f78969	replica: rename for_each_const_compaction_group() use same name as non-const-qualified variant, by relying on overloading. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-12 16:33:34 -03:00
Raphael S. Carvalho	518677d7f9	replica: Fix comment about compaction group there's not a 1:1 relationship between compaction group count and tablet count. a tablet replica has a storage group instance, which may map to multiple compaction groups during split mode. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-12 16:24:51 -03:00
Raphael S. Carvalho	f139aa1df6	replica: remove unused compaction_group_vector Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-12 16:16:47 -03:00
Botond Dénes	53a6ec05ed	Merge 'replica: remove rwlock for protecting iteration over storage group map' from Raphael "Raph" Carvalho rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Fixes #18821. ``` WRITE ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets --write - BEFORE 65559.52 tps ( 59.6 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52841 insns/op, 30946 cycles/op, 0 errors) 67408.05 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53018 insns/op, 30874 cycles/op, 0 errors) 67714.72 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53026 insns/op, 30881 cycles/op, 0 errors) 67825.57 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53015 insns/op, 30821 cycles/op, 0 errors) 67810.74 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53009 insns/op, 30828 cycles/op, 0 errors) throughput: mean=67263.72 standard-deviation=967.40 median=67714.72 median-absolute-deviation=547.02 maximum=67825.57 minimum=65559.52 instructions_per_op: mean=52981.61 standard-deviation=79.09 median=53014.96 median-absolute-deviation=36.54 maximum=53025.79 minimum=52840.56 cpu_cycles_per_op: mean=30869.90 standard-deviation=50.23 median=30874.06 median-absolute-deviation=42.11 maximum=30945.94 minimum=30820.89 - AFTER 65448.76 tps ( 59.5 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52788 insns/op, 31013 cycles/op, 0 errors) 67290.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30950 cycles/op, 0 errors) 67646.81 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30909 cycles/op, 0 errors) 67565.90 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53058 insns/op, 30951 cycles/op, 0 errors) 67537.32 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 52983 insns/op, 30963 cycles/op, 0 errors) throughput: mean=67097.93 standard-deviation=931.44 median=67537.32 median-absolute-deviation=467.97 maximum=67646.81 minimum=65448.76 instructions_per_op: mean=52975.85 standard-deviation=108.07 median=53024.55 median-absolute-deviation=49.45 maximum=53057.99 minimum=52788.49 cpu_cycles_per_op: mean=30957.17 standard-deviation=37.43 median=30951.31 median-absolute-deviation=7.51 maximum=31013.01 minimum=30908.62 READ ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets - BEFORE 79423.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41840 insns/op, 26820 cycles/op, 0 errors) 81076.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41837 insns/op, 26583 cycles/op, 0 errors) 80927.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41829 insns/op, 26629 cycles/op, 0 errors) 80539.44 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41841 insns/op, 26735 cycles/op, 0 errors) 80793.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41864 insns/op, 26662 cycles/op, 0 errors) throughput: mean=80551.99 standard-deviation=661.12 median=80793.10 median-absolute-deviation=375.37 maximum=81076.70 minimum=79423.36 instructions_per_op: mean=41842.20 standard-deviation=13.26 median=41840.14 median-absolute-deviation=5.68 maximum=41864.50 minimum=41829.29 cpu_cycles_per_op: mean=26685.88 standard-deviation=93.31 median=26662.18 median-absolute-deviation=56.47 maximum=26820.08 minimum=26582.68 - AFTER 79464.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41799 insns/op, 26761 cycles/op, 0 errors) 80954.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41803 insns/op, 26605 cycles/op, 0 errors) 81160.90 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41811 insns/op, 26555 cycles/op, 0 errors) 81263.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41814 insns/op, 26527 cycles/op, 0 errors) 81162.97 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41806 insns/op, 26549 cycles/op, 0 errors) throughput: mean=80801.25 standard-deviation=755.54 median=81160.90 median-absolute-deviation=361.72 maximum=81263.10 minimum=79464.70 instructions_per_op: mean=41806.47 standard-deviation=5.85 median=41806.05 median-absolute-deviation=4.05 maximum=41813.86 minimum=41799.36 cpu_cycles_per_op: mean=26599.22 standard-deviation=94.84 median=26554.54 median-absolute-deviation=50.51 maximum=26761.06 minimum=26527.05 ``` Closes scylladb/scylladb#19469 * github.com:scylladb/scylladb: replica: remove rwlock for protecting iteration over storage group map replica: get rid of fragile compaction group intrusive list	2024-07-12 15:45:36 +03:00
Dawid Medrek	3e02e66ca8	db/hints: Move a constant value to the TU it's used in Until now, the constant `HINT_FILE_WRITE_TIMEOUT` was declared as a static member of `db::hints::manager`. However, the constant is only ever used in one translation unit, so it makes more sense to move it there and not include boilerplate in a header.	2024-07-12 13:08:33 +02:00
Piotr Dulikowski	3cdf549da2	Merge 'remove utils::in' from Avi Kivity utils::in uses std::aligned_storage, which is deprecated. Rather than fixing it, replace its only user with simpler code and remove it. No backport needed as this isn't fixing a bug. Closes scylladb/scylladb#19683 * github.com:scylladb/scylladb: utils: remove utils/in.hh gossiper: remove initializer-list overload of add_local_application_state()	2024-07-12 12:06:09 +02:00
Takuya ASADA	373a7825b5	scylla-housekeeping: fix exception on parsing version string Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to pass acceptable version string to parse_version() like '6.1.0.dev0', which is allowed on Python version scheme. Also, release canditate version like '6.0.0~rc3' has same issue, it should be replaced to '6.0.0rc3' to compare in parse_version(). reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes #19564 Closes scylladb/scylladb#19572	2024-07-12 03:23:34 +09:00
Takuya ASADA	db04f8b16e	Revert "scylla-housekeeping: fix exception on parsing version string" This reverts commit `65fbf72ed0`, since it breaks scylla-housekeeping and SCT because the patch modified version string. We shoudn't modify version string directly, need to pass modified string just for parse_version() instead.	2024-07-12 03:23:34 +09:00
Emil Maskovsky	b9abad0515	test: raft: fix the topology failure recovery test flakiness Setting the error condition for all nodes in the cluster to avoid having to check which one is the coordinator. This should make the test more stable and avoid the flakiness observed when the coordinator node is the one that got the error condition injected. Randomizing the retrieved running servers to reproduce the issue more frequently and to avoid making any assumptions about the order of the servers. Note that only the "raft_topology_barrier_fail" needs to run on a non-coordinator node, the other error "stream_ranges_fail" can be injected on any node (including the coordinator). Fixes: scylladb/scylladb#18614 Closes scylladb/scylladb#19663	2024-07-11 16:23:26 +02:00
Piotr Dulikowski	188b4ac0fc	Merge 'service_level_controller: update configuration on raft change' from Michał Jadwiszczak This patch is a follow-up to scylladb/scylladb#16585. Once we have service levels on raft, we can get rid of update loop, which updates the configuration in a configured interval (default is 10s). Instead, this PR introduces methods to `group0_state_machine` which look through table ids in mutations in `write_mutation` and update submodules based on that ids. Fixes: scylladb/scylladb#18060 Closes scylladb/scylladb#18758 * github.com:scylladb/scylladb: test: remove `sleep()`s which were required to reload service levels configuration test/cql_test_env: remove unit test service levels data accessors service/storage_service: reload SL cache on topology_state_load() service/qos/service_level_controller: move semaphore breaking to stop service/qos/service_level_controller: maybe start and stop legacy update loop service/qos/service_level_controller: make update loop legacy raft/group0_state_machine: update submodules based on table_id service/storage_service: add a proxy method to reload sl cache	2024-07-11 16:18:48 +02:00
Kefu Chai	2a1c9ed7cb	github: use needs.read-toolchain.outputs.image for iwyu's container in `9a71543fd2`, we introduced a regression, which failed to use the proper value for the container image in which the iwyu workflow is run. in this change, we pass the correct value, as we do in clang-tidy.yaml workflow. Refs `9a71543fd2` Fixes scylladb/scylladb#19704 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19697	2024-07-11 17:17:37 +03:00
Michał Chojnowski	1a8ee69a43	sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the compressor objects based on its own schema, but using them based based on the sstable's schema the sstable's schema. This patch forces the writer to use the sstable's schema for both.	2024-07-11 12:53:54 +02:00
Michał Chojnowski	d10b38ba5b	sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the filter based on its own schema, while the layer outside the writer was interpreting it as if it was created with the sstable's schema. This patch forces the writer to pick the filter's parameters based on the sstable's schema instead.	2024-07-11 12:53:54 +02:00
Michał Chojnowski	a1834efd82	sstables: for i_filter downcasts, use dynamic_cast instead of static_cast As of this patch, those static_casts are actually invalid in some cases (they cast to the wrong type) because of an oversight. A later patch will fix that. But to even write a reliable reproducer for the problem, we must force the invalid casts to manifest as a crash (instead of weird results). This patch both allows writing a reproducer for the bug and serves as a bit of defensive programming for the future.	2024-07-11 12:53:54 +02:00
Tomas Nozicka	26466a3043	Allow configuring default loglevel with args for container images Closes scylladb/scylladb#19671	2024-07-11 12:37:53 +03:00
Piotr Dulikowski	19c5e1807c	Merge 'schema: fix describe of indexes on collections' from Michał Jadwiszczak If the index was created on collection (both frozen or not), its description wasn't a correct create statement. This patch fixes the bug and includes functions like `full()`, `keys()`, `values()`, ... used to create index on collections. Fixes scylladb/scylladb#19278 Closes scylladb/scylladb#19381 * github.com:scylladb/scylladb: cql-pytest/test_describe: add a test for describe indexes schema/schema: fix column names in index description	2024-07-11 09:11:01 +02:00
Kefu Chai	9a71543fd2	github: always use the tools/toolchain/image for lint workflows instead of hardwiring the toolchain image in github workflows, read it from `tools/toolchain/image`. a dedicated reusable workflow is added to read from this file, and expose its content with an output parameter. also, switch iwyu.yaml workflow to this image, more maintainable this way. please note, before this change, we are also using the latest stable build of clang, and since fedora 40 is also using the clang 18, so the behavior is not change. but with this change, we don't have the flexibility of using other clang versions provided https://apt.llvm.org in future. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19655	2024-07-10 23:45:35 +03:00
Avi Kivity	65a7fc9902	Merge 'transport, service: move definition of destructors into .cc' from Kefu Chai this changeset includes two changes: - service: move storage_service::~storage_service() into .cc - transport: move the cql_server::~cql_server() into .cc they intends to address the compile failures when building scylladb with clang-19. clang-19 is more picky when generating the defaulted destructors with incomplete types. but its behavior makes sense regarding to standard compliance. so let's update accordingly. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19668 * github.com:scylladb/scylladb: transport: move the cql_server::~cql_server() into .cc service: move storage_service::~storage_service() into .cc	2024-07-10 23:43:16 +03:00
Kefu Chai	06ba523818	sstable: extract file_writer out `sstables::write()` has multiple overloads, which are defined in `sstables/writer.hh`. two of these overloads are template functions, which have a template parameter named `W`, which has a type constraint requiring it to fulfill the `Writer` concept. but in `types.hh`, when the compiler tries to instantiate the template function with signature of `write(sstable_version_types v, W& out, const T& t)` with `file_writer` as the template parameter of `w`, `file_writer` is only forward-declared using `class file_writer` in the same header file, so this type is still an incomplete type at that moment. that's why the compiler is not able to determine if `file_writer` fulfills the constraint or not. actually, the declaration of `file_writer` is located in `sstables/writer.hh`, which in turn includes `types.hh`. so they form a cyclic dependency. in this change, in order to break this cycle, we extract file_writer out into a separate header file, so that both `sstables/writer.hh` and `sstables/types.hh` can include it. this address the build failure. Fixes scylladb/scylladb#19667 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19669	2024-07-10 23:32:47 +03:00
Michał Chojnowski	fdd8b03d4b	scylla-gdb.py: add $coro_frame() Adds a convenience function for inspecting the coroutine frame of a given seastar task. Short example of extracting a coroutine argument: ``` (gdb) p $coro_frame(seastar::local_engine->_current_task) $1 = { __resume_fn = 0x2485f80 <sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&)>, ... PointerType_7 = 0x601008e67880, ... __coro_index = 0 '\000' ... (gdb) p $downcast_vptr($->PointerType_7) $2 = (schema ) 0x601008e67880 ``` Closes scylladb/scylladb#19479	2024-07-10 21:46:27 +03:00
Avi Kivity	45e27c0da2	config, enum_option: allow round-trip string conversion The default configuration for replication_strategy_warn_list is ["SimpleStrategy"], but one cannot set this via CQL: cqlsh> select * from system.config where name = 'replication_strategy_warn_list'; name \| source \| type \| value --------------------------------+---------+---------------------------+-------------------- replication_strategy_warn_list \| default \| replication strategy list \| ["SimpleStrategy"] (1 rows) cqlsh> update system.config set value = '[NetworkTopologyStrategy]' where name = 'replication_strategy_warn_list'; cqlsh> select * from system.config where name = 'replication_strategy_warn_list'; name \| source \| type \| value --------------------------------+--------+---------------------------+----------------------------- replication_strategy_warn_list \| cql \| replication strategy list \| ["NetworkTopologyStrategy"] (1 rows) cqlsh> update system.config set value = '["NetworkTopologyStrategy"]' where name = 'replication_strategy_warn_list'; WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed for system.config - received 0 responses and 1 failures from 1 CL=ONE." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1} Fix by allowing quotes in enum_set parsing. Bug present since `8c464b2ddb` ("guardrails: restrict replication strategy (RS)", 6.0). Fixes #19604. Closes scylladb/scylladb#19605	2024-07-10 20:39:01 +03:00
Yaron Kaikov	e33126fc3e	.github/script/label_promoted_commit.py: add label only if ref is PR we got a failure during check-commit action: ``` Run python .github/scripts/label_promoted_commits.py --commit_before_merge `30e82a81e8` --commit_after_merge `f31d5e3204` --repository scylladb/scylladb --ref refs/heads/master Commit sha is: `d5a149fc01` Commit sha is: `415457be2b` Commit sha is: `d3b1ccd03a` Commit sha is: `1fca341514` Commit sha is: `f784be6a7e` Commit sha is: `80986c17c3` Commit sha is: `492d0a5c86` Commit sha is: `7b3f55a65f` Commit sha is: `78d6471ce4` Commit sha is: `7a69d9070f` Commit sha is: `a9e985fcc9` master branch, pr number is: 19213 Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 87, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 81, in main pr = repo.get_pull(pr_number) File "/usr/lib/python3/dist-packages/github/Repository.py", line 2746, in get_pull headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/pulls/pulls#get-a-pull-request", "status": "404"} Error: Process completed with exit code 1. ``` The reason for this failure is since in one of the promoted commits (`a9e985fcc9`) had a reference of `Closes` to an issue. Fixes: https://github.com/scylladb/scylladb/issues/19677 Closes scylladb/scylladb#19678	2024-07-10 15:27:12 +03:00
Botond Dénes	9bdcba7a46	Merge 'conf: scylla.yaml: update documentation for tablets' from Benny Halevy Tablets are no longer in experimental_features since `83d491a`, so remove them from the experimental_features section documentation. Also, expand the documentation for the `enable_tablets` option. Fixes #19456 Needs backport to 6.0 Closes scylladb/scylladb#19516 * github.com:scylladb/scylladb: conf: scylla.yaml: enable_tablets: expand documentation conf: scylla.yaml: remove tablets from experimental_features doc comment	2024-07-10 14:32:40 +03:00
Avi Kivity	8b7a2661c1	utils: remove utils/in.hh It uses deprecated std::aligned_storage and had only one user (now removed) rather than maintain it, remove.	2024-07-10 14:11:27 +03:00
Avi Kivity	d50ba03965	gossiper: remove initializer-list overload of add_local_application_state() The initializer_list overload uses a too-clever technique to avoid copies. While copies here are unlikely to pose any real problem (we're allocating map nodes anyway), it's simple enough to provide a copy-less replacement that doesn't require questionable tricks. We replace the initializer_list<..., in<>> overload with a variadic template that constructs a temporary map.	2024-07-10 14:11:27 +03:00
Michał Jadwiszczak	375499b727	test: remove `sleep()`s which were required to reload service levels configuration Previously, some service levels tests requires to sleep in order to ensure in-memory configuration of service levels was updated. Now, when we are updating the configuration as the raft log is applied, doing read barrier (for instance to execute `DROP TABLE IF EXISTS non_existing_table`) is enough and the sleeps are not needed.	2024-07-10 10:42:21 +02:00
Michał Jadwiszczak	23bebb8037	test/cql_test_env: remove unit test service levels data accessors Unit test data accessors were created to avoid starting update loop in unit test and to update controller's configuration directly. With raft data accessor and configuration updates on applying raft log, we can get rid of unit test data accessors and use the raft one. This also make unit test env a bit like real Scylla environment.	2024-07-10 10:42:21 +02:00
Michał Jadwiszczak	de857d9ce3	service/storage_service: reload SL cache on topology_state_load() Since SL cache is no longer updated in a loop, it needs to be initialized on startup and because we are updating the cache while applying raft commands, we can initialize it on topology_state_load().	2024-07-10 10:42:20 +02:00
Jadw1	cf29242962	service/qos/service_level_controller: move semaphore breaking to stop Before this, the notification semaphore was broken() in do_abort(), which was triggered by early abort source. However we are going to reload sl cache on topology state reload and it can happen after the early abort source is triggered, so it may throw broken_semaphore exception. We can move semaphore breaking to stop() method. Legacy update loop is still stopped in do_abort(), so it doesn't change the order of service level controller shutdown.	2024-07-10 10:33:24 +02:00
Michał Jadwiszczak	85119b90df	service/qos/service_level_controller: maybe start and stop legacy update loop In previous commit, we marked the update loop as legacy. For compatibility reasons, we need to start legacy update loop when the cluster is in recovery mode or it hasn't been upgraded to raft topology. Then, in the update loop we check if all conditions are met and stop the loop. This commit also moves start of update loop later (after topology state is loaded) in main.cc. There is no risk in doing it later.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	b0f76db9f2	service/qos/service_level_controller: make update loop legacy Rename method which started update loop to better reflect what it does. Previously the method was named `update_from_distributed_data`, however it doesn't update anything but only start the update loop, which we are making legacy.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	5ddf5e3d7d	raft/group0_state_machine: update submodules based on table_id We want to update service levels cache when any new mutations are applied to service levels table. To not create new raft command type, this commit changes design of `write_mutations` to updated in-memory structures based on mutations' table_id.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	b61047a3f8	service/storage_service: add a proxy method to reload sl cache In this series of patches, we want to reload service levels cache when any changes to SL table are applied. Firstly we need to have a way to trigger reload of the cache from `group0_state_machines`. To not introduce another dependency, we can use `storage_service` (which has access to SL controller) and add a proxy method to it.	2024-07-10 10:23:04 +02:00
Nadav Har'El	c6cffe36dd	Merge 'cql: forbid having counter columns in tablets tables' from Piotr Smaron Counter updates break under tablet migration (#18180), and for this reason counters need to be disabled until the problem is fixed. It's enough to forbid creating a table with counters, as altering a table without counters already cannot result in the table having counters: 1) Adding a counter column to a table without counters: ``` cqlsh> ALTER TABLE temp.cf ADD (col_name counter); ConfigurationException: Cannot add a counter column (col_name) in a non counter column family ``` 2) Altering a column to be of the counter type: ``` cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter; ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible. ``` Fixes: #19449 Fixes: https://github.com/scylladb/scylladb/issues/18876 Need to backport to 6.0, as this is broken there. Closes scylladb/scylladb#19518 * github.com:scylladb/scylladb: doc: add notes to feature pages which don't support tablets cql: adjust warning about tablets cql: forbid having counter columns in tablets tables	2024-07-10 10:18:30 +03:00
Michał Jadwiszczak	b65a4c66f0	cql-pytest/test_describe: add a test for describe indexes	2024-07-10 07:14:46 +02:00
Kefu Chai	7e4e685964	transport: move the cql_server::~cql_server() into .cc because transport/server.cc has the complete definition of event_notifier, the compiler can default-generate the destructor of `cql_server` with the necessary information. otherwise, clang-19 would fail to build, like: ``` FAILED: CMakeFiles/scylla.dir/Dev/main.cc.o /home/kefu/.local/bin/clang++ -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_PROGRAM_OPTIONS_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT CMakeFiles/scylla.dir/Dev/main.cc.o -MF CMakeFiles/scylla.dir/Dev/main.cc.o.d -o CMakeFiles/scylla.dir/Dev/main.cc.o -c /home/kefu/dev/scylladb/main.cc In file included from /home/kefu/dev/scylladb/main.cc:11: In file included from /usr/include/yaml-cpp/yaml.h:10: In file included from /usr/include/yaml-cpp/parser.h:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql_transport::cql_server::event_notifier' 91 \| static_assert(sizeof(_Tp)>0, \| ^~~~~~~~~~~ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql_transport::cql_server::event_notifier>::operator()' requested here 398 \| get_deleter()(std::move(__ptr)); \| ^ /home/kefu/dev/scylladb/transport/server.hh:135:7: note: in instantiation of member function 'std::unique_ptr<cql_transport::cql_server::event_notifier>::~unique_ptr' requested here 135 \| class cql_server : public seastar::peering_sharded_service<cql_server>, public generic_server::server { \| ^ /home/kefu/dev/scylladb/transport/server.hh:135:7: note: in implicit destructor for 'cql_transport::cql_server' first required here /home/kefu/dev/scylladb/transport/server.hh:149:11: note: forward declaration of 'cql_transport::cql_server::event_notifier' 149 \| class event_notifier; \| ^ 1 error generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-10 12:52:51 +08:00
Kefu Chai	79ffde063a	service: move storage_service::~storage_service() into .cc as repair/repair.cc has the complete definition of node_ops_meta_data, the compiler can default-generate the destructor of `storage_service` with the necessary information. otherwise, clang-19 would fail to build, like: ``` FAILED: repair/CMakeFiles/repair.dir/Dev/repair.cc.o /home/kefu/.local/bin/clang++ -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/Dev/repair.cc.o -MF repair/CMakeFiles/repair.dir/Dev/repair.cc.o.d -o repair/CMakeFiles/repair.dir/Dev/repair.cc.o -c /home/kefu/dev/scylladb/repair/repair.cc In file included from /home/kefu/dev/scylladb/repair/repair.cc:9: In file included from /home/kefu/dev/scylladb/repair/repair.hh:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:291:11: error: field has incomplete type 'service::node_ops_meta_data' 291 \| _T2 second; ///< The second member \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/ext/aligned_buffer.h:93:28: note: in instantiation of template class 'std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>' requested here 93 \| : std::aligned_storage<sizeof(_Tp), __alignof__(_Tp)> \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:334:43: note: in instantiation of template class '__gnu_cxx::__aligned_buffer<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here 334 \| __gnu_cxx::__aligned_buffer<_Value> _M_storage; \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:373:7: note: in instantiation of template class 'std::__detail::_Hash_node_value_base<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here 373 \| : _Hash_node_value_base<_Value> \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:1662:21: note: in instantiation of template class 'std::__detail::_Hash_node_value<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, false>' requested here 1662 \| ._M_bucket_index(declval<const __node_value_type&>(), \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:109:11: note: in instantiation of member function 'std::_Hashtable<utils::tagged_uuid<node_ops_id_tag>, std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, std::allocator<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<node_ops_id_tag>>, std::hash<utils::tagged_uuid<node_ops_id_tag>>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>>::~_Hashtable' requested here 109 \| class unordered_map \| ^ /home/kefu/dev/scylladb/service/storage_service.hh:109:7: note: forward declaration of 'service::node_ops_meta_data' 109 \| class node_ops_meta_data; \| ^ In file included from /home/kefu/dev/scylladb/repair/repair.cc:9: In file included from /home/kefu/dev/scylladb/repair/repair.hh:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:60: ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-10 12:52:51 +08:00
Michał Jadwiszczak	253feb6811	schema/schema: fix column names in index description Previously description of index didn't include functions for indexes on collections like full(), keys(), values(), etc...	2024-07-09 22:37:05 +02:00
Raphael S. Carvalho	c539b7c861	replica: remove rwlock for protecting iteration over storage group map rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Check documentation added to compaction_group.hh to understand how concurrent iterations and updates to the map work without the rwlock. Yielding variants that iterate over groups are no longer returning group id since id stability can no longer be guaranteed without serializing split finalization and iteration. Fixes #18821. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:59:24 -03:00
Raphael S. Carvalho	ad5c5bca5f	replica: get rid of fragile compaction group intrusive list It was added to make integration of storage groups easier, but it's complicated since it's another source of truth and we could have problems if it becomes inconsistent with the group map. Fixes #18506. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:53:35 -03:00
Piotr Smaron	531659f8dc	doc: add notes to feature pages which don't support tablets There's already a page which lists which features are not working with tablets: architecture/tablets.html#limitations-and-unsupported-features, but it's also helpful for users to be warned about this when visiting a specific feature doc page.	2024-07-09 18:18:05 +02:00
Avi Kivity	f31d5e3204	Merge 'repair/streaming: enable toggling tombstone gc with a config item' from Botond Dénes We currently disable tombstone GC for compaction done on the read path of streaming and repair, because those expired tombstones can still prevent data resurrection. With time-based tombstone GC, missing a repair for long enough can cause data resurrection because a tombstone is potentially GC'd before it could be spread to every node by repair. So repair disseminating these expired tombstones helps clusters which missed repair for long enough. It is not a guarantee because compaction could have done the GC itself, but it is better than nothing. This last resort is getting less important with repair-based tombstone GC. Furthermore, we have seen this cause huge repair amplification in a cluster, where expired tombstones triggered repair replicating otherwise identical rows. This series makes tombstone GC on the streaming/repair compaction path configurable with a config item. This new config item defaults to `false` (current behaviour), setting it to `true`, will enable tombstone GC. Fixes: https://github.com/scylladb/scylladb/issues/19015 Not a regression, no backport needed Closes scylladb/scylladb#19016 * github.com:scylladb/scylladb: test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming() db/config: introduce enable_tombstone_gc_for_streaming_and_repair	2024-07-09 19:04:11 +03:00
Piotr Smaron	5bfabff9a0	cql: adjust warning about tablets Made it shorter, simpler and mentioned also that counters aren't supported with tablets. Fixes: #18876	2024-07-09 18:01:37 +02:00
Piotr Smaron	c70f321c6f	cql: forbid having counter columns in tablets tables Counter updates break under tablet migration (#18180), and for this reason they need to be disabled until the problem is fixed. It's enough to forbid creating a table with counters, as altering a table without counters already cannot result in the table having counters: 1) Adding a counter column to a table without counters: ``` cqlsh> ALTER TABLE temp.cf ADD (col_name counter); ConfigurationException: Cannot add a counter column (col_name) in a non counter column family ``` 2) Altering a column to be of the counter type: ``` cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter; ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible. ``` Fixes: #19449	2024-07-09 18:01:31 +02:00
Patryk Wrobel	a89e3d10af	code-cleanup: add missing header guards The following command had been executed to get the list of headers that did not contain '#pragma once': 'grep -rnw . -e "#pragma once" --include *.hh -L' This change adds missing include guard to headers that did not contain any guard. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#19626	2024-07-09 18:31:35 +03:00
Calle Wilund	8295980d14	commitlog: Make max data lifetime runtime-configurable	2024-07-09 12:30:49 +00:00
Calle Wilund	0c6679e55f	db::config: Expose commitlog_max_data_lifetime_in_s parameter To allow user control of commitlog time based expiry. Set to 24h initially.	2024-07-09 12:30:48 +00:00
Calle Wilund	55d6afda6e	commitlog: Add optional max lifetime parameter to cl instance If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log.	2024-07-09 12:30:48 +00:00
Takuya ASADA	cae999c094	toolchain: change optimized clang install method to standard one Previously optimized clang installation was not used standard build script, it overwrites preinstalled Fedora's clang binaries instead. However this breaks on clang-18.1.8, since libLTO versioning convention. To avoid such problem, let's switch to standard installation method and swith install prefix to /usr/local. Fixes #19203 Closes scylladb/scylladb#19505	2024-07-09 14:22:42 +03:00
Tomasz Grabiec	252110bc54	Merge 'mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion' from Michał Chojnowski apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes https://github.com/scylladb/scylladb/issues/19552 Closes scylladb/scylladb#19617 * github.com:scylladb/scylladb: mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion logalloc: add hold_reserve logalloc: generalize refill_emergency_reserve()	2024-07-09 13:09:01 +02:00
Anna Stuchlik	948459b1ac	doc: replace a link on the CDC+Kafka page This commit replaces a link to the installation section with a link to the getting started section. Closes scylladb/scylladb#19658	2024-07-09 12:35:43 +03:00
Michael Litvak	ed33e59714	storage_proxy: remove response handler if no targets When writing a mutation, it might happen that there are no live targets to send the mutation to, yet the request can be satisfied. For example, when writing with CL=ANY to a dead node, the request is completed by storing a local hint. Currently, in that case, a write response handler is created for the request and it remains active until it timeouts because it is not removed anywhere, even though the write is completed successfuly after storing the hint. The response handler should be removed usually when receiving responses from all targets, but in this case there are no targets to trigger the removal. In this commit we check if we don't have live targets to send the mutation to. If so, we remove the response handler immediately. Fixes scylladb/scylladb#19529 Closes scylladb/scylladb#19586	2024-07-09 12:11:05 +03:00
Kamil Braun	98c18d8904	Merge 'Add API for read barrier' from Emil Maskovsky Introduce REST API for triggering a read barrier. This is to make sure the database schema is up to date on the node where the read barrier is triggered. One of the use cases is the database backup via the Scylla Manager, which requires that the schema backed up is matching the data or newer (data can be migrated, but an older schema would cause issues). Fixes scylladb/scylladb#19213 Closes scylladb/scylladb#19597 * github.com:scylladb/scylladb: raft: add the read barrier REST API raft: use `raft_timeout` in trigger_snapshot raft: use bad_param_exception for consistency test: raft: verify schema updated after read barrier	2024-07-09 10:58:21 +02:00
Kefu Chai	6af989782c	test: sstable_directory_test: use THREADSAFE_BOOST_REQUIRE_EQUAL when appropriate for better debugging experience. before this change, we have ``` fatal error: in "sstable_directory_test_generation_sanity": critical check sst->generation() == sst1->generation() has failed ``` after this change, we have ``` fatal error: in "sstable_directory_test_generation_sanity": critical check sst->generation() == sst1->generation() has failed [3ghm_0ntw_29vj625yegw7jodysc != 3ghm_0ntw_29vj625yegw7jodysd] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19639	2024-07-09 10:54:23 +03:00
Kefu Chai	30e82a81e8	test: do not define boost_test_print_type() for types with operator<< before this change, we provide `boost_test_print_type()` for all types which can be formatted using {fmt}. these types includes those who fulfill the concept of range, and their element can be formatted using {fmt}. if the compilation unit happens to include `fmt/ranges.h`. the ranges are formatted with `boost_test_print_type()` as well. this is what we expect. in other words, we use {fmt} to format types which do not natively support {fmt}, but they fulfill the range concept. but `boost::unit_test::basic_cstring` is one of them - it can be formatted using operator<<, but it does not provide fmt::format specialization - it fulfills the concept of range - and its element type is `char const`, which can be formatted using {fmt} that's why it's formatted like: ``` test/boost/sstable_directory_test.cc(317): fatal error: in "sstable_directory_test_generation_sanity": critical check ['s', 's', 't', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')', ' ', '=', '=', ' ', 's', 's', 't', '1', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')'] has failed` ``` where the string is formatted as a sequence-alike container. this is far from readable. so, in this change, we do not define `boost_test_print_type()` for the types which natively support `operator<<` anymore. so they can be printed with `operator<<` when boost::test prints them. Fixes scylladb/scylladb#19637 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19638	2024-07-09 10:34:37 +03:00
Botond Dénes	9544c364be	scylla-gdb.py: introduce scylla large-objects The equivalent of small-objects, but for large objects (spans). Allows listing object of a large-class, and therefore investigating a run-away class, by attempting to identify the owners of the objects in it. Written to investigate #16493 Closes scylladb/scylladb#16711	2024-07-09 10:21:09 +03:00
Emil Maskovsky	a9e985fcc9	raft: add the read barrier REST API This will allow to trigger the read barrier directly via the API, instead of doing work-arounds (like dropping a non-existent table). The intended use-case is in the Scylla Manager, to make sure that the database schema is up to date after the data has been backed up and before attempting to backup the database schema. The database schema in particular is being backed up just on a single node, which might not yet have the schema at least as new as the data (data can be migrated to a newer schema, but not a vice-versa). The read barrier issued on the node should ensure that the node should have the schema at least as new as the data or newer. Closes #19213	2024-07-08 18:16:27 +02:00
Emil Maskovsky	7a69d9070f	raft: use `raft_timeout` in trigger_snapshot Migrate the "trigger_snapshot" to use the standardized `raft_timeout` approach.	2024-07-08 18:13:31 +02:00
Michał Chojnowski	78d6471ce4	mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes scylladb/scylladb#19552	2024-07-08 16:08:27 +02:00
Michał Chojnowski	7b3f55a65f	logalloc: add hold_reserve mutation_partition_v2::apply_monotonically() needs to perform some allocations in a destructor, to ensure that the invariants of the data structure are restored before returning. But it is usually called with reclaiming disabled, so the allocations might fail even in a perfectly healthy node with plenty of reclaimable memory. This patch adds a mechanism which allows to reserve some LSA memory (by asking the allocator to keep it unused) and make it available for allocation right when we need to guarantee allocation success.	2024-07-08 16:08:27 +02:00
Wojciech Przytuła	691e245152	storage_proxy: fix uninitialized LWT contention counter When debugging the issue of high LWT contention metric, we (the drivers team) discovered that at least 3 drivers (Go, Java, Rust) cause high numbers in that metrics in LWT workloads - we doubted that all those drivers route LWT queries badly. We tried to understand that metric and its semantics. It took 3 people over 10 hours to figure out what it is supposed to count. People from core team suspected that it was the drivers sending requests to different shards, causing contention. Then we ran the workload against a single node single shard cluster... and observed contention. Finally, we looked into the Scylla code and saw it. Uninitialized stack value. The core member was shocked. But we, the drivers people, felt we always knew it. It's yet another time that we are blamed for a server-side issue. We rebuilt scylla with the variable initialized to 0 and the metric kept being 0. To prevent such errors in the future, let's consider some lints that warn against uninitialized variables. This is such an obvious feature of e.g. Rust, and yet this has shown to be cause a painful bug in 2024. Closes scylladb/scylladb#19625	2024-07-08 16:55:46 +03:00
Emil Maskovsky	492d0a5c86	raft: use bad_param_exception for consistency Replace the `std::runtime_error` by the `bad_param_exception` that is used in other places.	2024-07-08 14:31:11 +02:00
Takuya ASADA	cbf33aba5c	scylla_coredump_setup: install systemd-coredump before has_zstd() On Ubuntu/Debian, we have to install systemd-coredump before running has_ztd(), since it detect ZSTD support by running coredumpctl. Move pkg_install('systemd-coredump') to the head of the script. Fixes #19643 Closes scylladb/scylladb#19648	2024-07-08 15:04:34 +03:00
Kefu Chai	229250ef3e	.github: use scylla-toolchain for newer fmt in `cccec07581`, we started using a featured introduced by {fmt} v10. but we are still using the {fmt} cooked using seastar, and it is 9.1.0, so this breaks the build when running the clang-tidy workflow. in this change, instead of building on ubuntu jammy, we use the scylladb/scylla-toolchain image based on fedora 40, which provides {fmt} v10.2.1. since we are have clang 18 in fedora 40, this change does not sacrifice anything. after this change, clang-tidy workflow should be back to normal. Fixes scylladb/scylladb#19621 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19628	2024-07-08 11:14:02 +02:00
Emil Maskovsky	80986c17c3	test: raft: verify schema updated after read barrier Regression test for #19213.	2024-07-08 10:50:32 +02:00
Piotr Dulikowski	3c535641fd	Merge 'service/storage_proxy: Add metrics keeping track of incoming hints' from Dawid Mędrek Although Scylla already exposes metrics keeping track of various information related to hinted handoff, all of them correspond to either storing or sending hints. However, when debugging, it's also crucial to be aware of how many hints are coming to a given node and what their size is. Unfortunately, the existing metrics are not enough to obtain that information. This PR introduces the following new metrics: * `sent_bytes_total` – the total size of the hints that have been sent from a given shard, * `received_hints_total` – the total number of hints that a given shard has received, * `received_hints_bytes_total` – the total size of the hints a given shard has received. It also renames `hints_manager_sent` to `hints_manager_sent_total` to avoid conflicts of prefixes between that metric and `sent_bytes_total` in tests. Fixes scylladb/scylladb#10987 Closes scylladb/scylladb#18976 * github.com:scylladb/scylladb: db/hints: Add a metric for the size of sent hints service/storage_proxy: Add metrics for received hints	2024-07-08 10:29:53 +02:00
Botond Dénes	56c194e52c	Merge 'compaction: not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19581 * github.com:scylladb/scylladb: .github: add compaction to iwyu's CLEANER_DIR compaction: not include unused headers	2024-07-08 10:03:51 +03:00
Israel Fruchter	32e6725b8e	Update tools/cqlsh submodule * tools/cqlsh 73bdbeb0...86a280a1 (1): > remove cassandra from the shiv package Ref: scylladb/scylla-cqlsh#96 Closes scylladb/scylladb#19558	2024-07-08 10:00:59 +03:00
Michael Litvak	407274e828	view: drain view builder before database The view builder is doing write operations to the database. In order for the view builder to shutdown gracefully without errors, we need to ensure the database can handle writes while it is drained. The commit changes the drain order, so that view builder is drained before the database shuts down. Fixes scylladb/scylladb#18929 Closes scylladb/scylladb#19609	2024-07-05 22:17:40 +03:00
Botond Dénes	103bd8334a	service/paxos/paxos_state: restore resilience against dropped tables Recently, the code in paxos_state::prepare(), paxos_state::accept() and paxos_state::learn() was coroutinized by `58912c2cc1`, `887a5a8f62` and `2b7acdb32c` respectively. This introduced a regression: the latency histogram updater code, was moved from a finally() to a defer(). Unlike the former, the latter runs in a noexcept context so the possible replica::no_such_column_family raised from the latency update code now crashes the node, instead of failing just the paxos operation as before. Fix by only updating the latency histogram if the table still exists. Fixes: scylladb/scylladb#19620 Closes scylladb/scylladb#19623	2024-07-05 14:58:11 +02:00
Anna Stuchlik	8759dfae96	doc: add Run in Docker page to the documentation The page was missing from the docs. I created the page based on the information in the download center (which will be closed down soon) and other ScyllaDB resources. Closes scylladb/scylladb#19577	2024-07-04 20:20:03 +03:00
Dawid Medrek	0e1cb0dc73	db/hints: Add logging when ignoring hint directories In `2446cce`, we stopped trying to attempt to create endpoint managers for invalid hint directories even when their names represented IP addresses or host IDs. In this commit, we add logging informing the user about it. Refs scylladb/scylladb#19173 Closes scylladb/scylladb#19618	2024-07-04 20:14:52 +03:00
Botond Dénes	155acbb306	reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop Now that the CPU concurency limit is configurable, new reads might be ready to execute right after the current one was executed. So move the poll for admitting new reads into the inner loop, to prevent the situation where the inner loop yields and a concurrent do_wait_admission() finds that there are waiters (queued because at the time they arrived to the semaphore, the _ready_list was not empty) but it is is possible to admit a new read. When this happens the semaphore will dump diagnostics to help debug the apparent contradiction, which can generate a lot of log spam. Moving the poll into the inner loop prevents the false-positive contradiction detection from firing. Refs: scylladb/scylladb#19017 Closes scylladb/scylladb#19600	2024-07-04 17:47:52 +03:00
Avi Kivity	0626e0487d	Merge 'Add copy on write to functions schema code' from Marcin Maliszkiewicz This is the first patch from series which would allow us to unify raft command code. Property we want to achieve is that all modifications performed by a single raft command can be made visible atomically. This helps to exclude accidental dependencies across subsystem updates and make easier to reason about state. Here we alter functions schema code so that changes are first applied to a copy of declared functions and then made visible atomically. Later work will apply similar strategy to the whole schema. Relates scylladb/scylladb#19153 Closes scylladb/scylladb#19598 * github.com:scylladb/scylladb: cql3: functions: make modification functions accessible only via batch class db: replica: batch functions schema modifications cql3: functions: introduce class for batching functions modifications cql3: functions: make functions class non-static cql3: functions: remove reduntant class access specifiers cql3: functions: remove unused java snippet	2024-07-04 17:40:23 +03:00
Anna Stuchlik	822a58f964	doc: remove support for Debian 10 This PR removes support for Debian 10, which reached end of life on June 30, 2024. Refs https://github.com/scylladb/scylla-enterprise/issues/4377 Closes scylladb/scylladb#19616	2024-07-04 17:24:57 +03:00
Marcin Maliszkiewicz	3f1c2fecc2	cql3: functions: make modification functions accessible only via batch class This is to assure that all the code is using batching	2024-07-04 13:10:26 +02:00
Marcin Maliszkiewicz	32fe101f9d	db: replica: batch functions schema modifications Before each function change was immediately visible as during event notification logic yielded. Now we first gather the modifications and then commit them. Further work will broaden the scope of atomicity to the whole schema and even across other subsystems.	2024-07-04 13:10:26 +02:00
Michał Chojnowski	f784be6a7e	logalloc: generalize refill_emergency_reserve() In the next patch, we will want to do the thing as refill_emergency_reserve() does, just with a quantity different than _emergency_reserve_max. So we split off the shareable part to a new function, and use it to implement refill_emergency_reserve().	2024-07-04 12:19:01 +02:00
Pavel Emelyanov	9a654730a7	tablet_allocator: Put more info into failed-to-drain exception When balancer fails to find a node to balance drained tablets into, it throws an exception with tablet id and node id, but it's also good to know more details about the balancing state that lead to failure refs: #19504 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19588	2024-07-04 12:18:50 +02:00
Marcin Maliszkiewicz	4d937c5a17	cql3: functions: introduce class for batching functions modifications It will hold a temporary shallow copy of declared functions. Then each modification adds/removes/replaces stored function object. At the end change is commited by moving temporary copy to the main functions class instance.	2024-07-04 12:14:36 +02:00
Nadav Har'El	96dff367f8	Merge 'storage_proxy: update view update backlog on correct shard when writing' from Wojciech Mitros This series is another approach of https://github.com/scylladb/scylladb/pull/18646 and https://github.com/scylladb/scylladb/pull/19181. In this series we only change where the view backlog gets updated - we do not assure that the view update backlog returned in a response is necessarily the backlog that increased due to the corresponding write, the returned backlog may be outdated up to 10ms. Because this series does not include this change, it's considerably less complex and it doesn't modify the common write patch, so no particular performance considerations were needed in that context. The issue being fixed is still the same, the full description can be seen below. When a replica applies a write on a table which has a materialized view it generates view updates. These updates take memory which is tracked by `database::_view_update_concurrency_sem`, separate on each shard. The fraction of units taken from the semaphore to the semaphore limit is the shard's view update backlog. Based on these backlogs, we want to estimate how busy a node is with its view updates work. We do that by taking the max backlog across all shards. To avoid excessive cross-shard operations, the node's (max) backlog isn't calculated each time we need it, but up to 1 time per 10ms (the `_interval`) with an optimization where the backlog of the calculating shard is immediately up-to-date (we don't need cross-shard operations for it): ``` update_backlog node_update_backlog::fetch() { auto now = clock::now(); if (now >= _last_update.load(std::memory_order_relaxed) + _interval) { _last_update.store(now, std::memory_order_relaxed); auto new_max = boost::accumulate( _backlogs, update_backlog::no_backlog(), [] (const update_backlog& lhs, const per_shard_backlog& rhs) { return std::max(lhs, rhs.load()); }); _max.store(new_max, std::memory_order_relaxed); return new_max; } return std::max(fetch_shard(this_shard_id()), _max.load(std::memory_order_relaxed)); } ``` For the same reason, even when we do calculate the new node's backlog, we don't read from the `_view_update_concurrency_sem`. Instead, for each shard we also store a update_backlog atomic which we use for calculation: ``` struct per_shard_backlog { // Multiply by 2 to defeat the prefetcher alignas(seastar::cache_line_size * 2) std::atomic<update_backlog> backlog = update_backlog::no_backlog(); need_publishing need_publishing = need_publishing::no; update_backlog load() const { return backlog.load(std::memory_order_relaxed); } }; std::vector<per_shard_backlog> _backlogs; ``` Due to this distinction, the update_backlog atomic need to be updated separately, when the `_view_update_concurrency_sem` changes. This is done by calling `storage_proxy::update_view_update_backlog`, which reads the `_view_update_concurrency_sem` of the shard (in `database::get_view_update_backlog`) and then calls node`_update_backlog::add` where the read backlog is stored in the atomic: ``` void storage_proxy::update_view_update_backlog() { _max_view_update_backlog.add(get_db().local().get_view_update_backlog()); } void node_update_backlog::add(update_backlog backlog) { _backlogs[this_shard_id()].backlog.store(backlog, std::memory_order_relaxed); _backlogs[this_shard_id()].need_publishing = need_publishing::yes; } ``` For this implementation of calculating the node's view update backlog to work, we need the atomics to be updated correctly when the semaphores of corresponding shards change. The main event where the view update backlog changes is an incoming write request. That's why when handling the request and preparing a response we update the backlog calling `storage_proxy::get_view_update_backlog` (also because we want to read the backlog and send it in the response): backlog update after local view updates (`storage_proxy::send_to_live_endpoints` in `mutate_begin`) ``` auto lmutate = [handler_ptr, response_id, this, my_address, timeout] () mutable { return handler_ptr->apply_locally(timeout, handler_ptr->get_trace_state()) .then([response_id, this, my_address, h = std::move(handler_ptr), p = shared_from_this()] { // make mutation alive until it is processed locally, otherwise it // may disappear if write timeouts before this future is ready got_response(response_id, my_address, get_view_update_backlog()); }); }; backlog update after remote view updates (storage_proxy::remote::handle_write) auto f = co_await coroutine::as_future(send_mutation_done(netw::messaging_service::msg_addr{reply_to, shard}, trace_state_ptr, shard, response_id, p->get_view_update_backlog())); ``` Now assume that on a certain node we have a write request received on shard A, which updates a row on shard B (A!=B). As a result, shard B will generate view updates and consume units from its `_view_update_concurrency_sem`, but will not update its atomic in `_backlogs` yet. Because both shards in the example are on the same node, shard A will perform a local write calling `lmutate` shown above. In the `lmutate` call, the `apply_locally` will initiate the actual write on shard B and the `storage_proxy::update_view_update_backlog` will be called back on shard A. In no place will the backlog atomic on shard B get updated even though it increased in size due to the view updates generated there. Currently, what we calculate there doesn't really matter - it's only used for the MV flow control delays, so currently, in this scenario, we may only overload a replica causing failed replica writes which will be later retried as hints. However, when we add MV admission control, the calculated backlog will be the difference between an accepted and a rejected request. Fixes: https://github.com/scylladb/scylladb/issues/18542 Without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none Closes scylladb/scylladb#19341 * github.com:scylladb/scylladb: test: add test for view backlog not being updated on correct shard test: move auxiliary methods for waiting until a view is built to util mv: update view update backlog when it increases on correct shard	2024-07-04 11:40:09 +03:00
Marcin Maliszkiewicz	16b770ff1a	cql3: functions: make functions class non-static This is done to ease code reuse in the following commit. It'd also help should we ever want properly mount functions class to schema object instead of static storage.	2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz	47033dce7a	cql3: functions: remove reduntant class access specifiers	2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz	e86191b19f	cql3: functions: remove unused java snippet It doesn't seem to serve any purpose now.	2024-07-04 10:24:57 +02:00
Kefu Chai	cccec07581	db: use format_as() in favor of fmt::streamed() since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also, we've switched to the build image based on fedora 40, which ships fmt-devel v10.2.1, there is no need to use fmt::streamed() when the corresponding format_as() as available. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19594	2024-07-04 11:10:43 +03:00
Kefu Chai	35e7a0b36f	test/cql-pytest: use offset-aware API to avoid deprecate warning to avoid warning like ``` DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC). ``` and to be future-proof, let's use the offset-aware timestamp. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19536	2024-07-04 10:48:00 +03:00
Kefu Chai	03e1fce7aa	zstd: include external header with brackets zstd.h is a header provided by libzstd, so let's include it with brackets, more consistent this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19538	2024-07-04 10:42:29 +03:00
Takuya ASADA	09e22690dc	scylla_coredump_setup: enable compress by default when zstd support detected We disabled coredump compression by default because it was too slow, but recent versions of systemd-coredump supports faster zstd based compression, so let's enable compression by default when zstd support detected. Related scylladb/scylla-machine-image#462 Closes scylladb/scylladb#18854	2024-07-04 10:38:22 +03:00
Botond Dénes	e3e5f8209d	Merge 'alternator: fix "/localnodes" to use broadcast_rpc_address' from Nadav Har'El This short series fixes Alternator's "/localnodes" request to allow a node's external IP address - configured with `broadcast_rpc_address` - to be listed instead of its usual, internal, IP address. The first patch fixes a bug in gossiper::get_rpc_address(), which the second patch needs to implement the feature. The second patch also contains regression tests. Fixes #18711. Closes scylladb/scylladb#18828 * github.com:scylladb/scylladb: alternator: fix "/localnodes" to use broadcast_rpc_address gossiper: fix get_rpc_address() for this node	2024-07-04 10:37:28 +03:00
Takuya ASADA	65fbf72ed0	scylla-housekeeping: fix exception on parsing version string Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to replace version string from '6.1.0~dev' to '6.1.0.dev0', which is allowed on Python version scheme. reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes #19564 Closes scylladb/scylladb#19572	2024-07-04 10:27:51 +03:00
Avi Kivity	69450780a7	docs: explain tuning for a node that is overcommitted at the hypervisor level Closes scylladb/scylladb#19589	2024-07-04 10:23:25 +03:00
Pavel Emelyanov	8809b99736	s3/client: Unmark put-object lambdas from mutable They don't need to modify the captured objects. In fact, they must not do it in the first place, because the request can be called more than once and the buffers must not change between those invocations. For the memory_sink_buffers there must be const method to get the vector of temporary_buffers themselves. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19599	2024-07-04 10:07:48 +03:00
Lakshmi Narayanan Sreethar	c80df8504c	sstables::maybe_rebuild_filter_from_index: log sstable origin Log the sstable origin when its bloom filter is being rebuilt. The origin has to be passed to the method by the caller as it is not available in the sstable object when the filter is rebuilt. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#19601	2024-07-04 10:01:23 +03:00
Wojciech Mitros	1fdc65279d	test: add test for view backlog not being updated on correct shard This patch adds a test for reproducing issue https://github.com/scylladb/scylladb/issues/18542 The test performs writes on a table with a materialized view and checks that the view backlog increases. To get the current view update backlog, a new metric "view_update_backlog" is added to the `storage_proxy` metrics. The metric differs from the metric from `database` metric with the same name by taking the backlog from the max_view_update_backlog which keeps view update backlogs from all shards which may be a bit outdated, instead of taking the backlog by checking the view_update_semaphore which the backlog is based on directly.	2024-07-03 23:18:52 +02:00
Wojciech Mitros	c4f5659c11	test: move auxiliary methods for waiting until a view is built to util In many materialized view tests we need to wait until a view is built before actually working on it, future tests will also need it. In existing tests we use the same, duplicated method for achieving that. In this patch the method is deduplicated and moved to pylib/util.py and existing tests are modified to use it instead.	2024-07-03 23:18:52 +02:00
Wojciech Mitros	fd9c7d4d59	mv: update view update backlog when it increases on correct shard When performing a write, we should update the view update backlog on the shard where the mutation is actually applied. Instead, currently we only update it on the shard that initially received the write request (which didn't change at all) and as a result, the backlog on the correct shard and the aggregated max view update backlog are not updated at all. This patch enables updating the backlog on the correct shard. The update is now performed just after the view generation and propagation finishes, so that all backlog increases are noted and the backlog is ready to be used in the write response. Additionally, after this patch, we no longer (falsely) assume that the backlog is modified on the same shard as where we later read it to attach to a response. However, we still compare the aggregated backlog from all shards and the backlog from the shard retrieving the max, as with a shard-aware driver, it's likely the exact shard whose backlog changed.	2024-07-03 23:18:52 +02:00
Avi Kivity	3fc4e23a36	forward_service: rename to mapreduce_service forward_service is nondescriptive and misnamed, as it does more than forward requests. It's a classic map/reduce algorithm (and in fact one of its parameters is "reducer"), so name it accordingly. The name "forward" leaked into the wire protocol for the messaging service RPC isolation cookie, so it's kept there. It's also maintained in the name of the logger (for "nodetool setlogginglevel") for compatibility with tests. Closes scylladb/scylladb#19444	2024-07-03 19:29:47 +03:00
Avi Kivity	f798217293	Merge 'build: cmake: include the whole archive of zstd.a' from Kefu Chai before this change, when linking scylla-main, the linker discards the unreferenced symbols defined by zstd.cc. but we use constructor of static variable `registerator` to register the zstd compressor, this variable is not used from the linker's point of view. but we do rely on the side effect of its constructor. that's why the rules generated by CMake fails to build tests and scylla executables with zstd support. that's why we have following test failure: ``` boost.sstable_3_x_test.test_uncompressed_collections_read ... [Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor' == [File] - seastar/src/testing/seastar_test.cc == [Line] - 43 ``` in this change, we single out zstd.cc and build it as an archive, so that scylla-main can include as a whole. an alternative is to link scylla-main as a whole archive, but that might increase the disk foot print when building lots of tests -- some of them do not use all symbols exposed by scylla-main, and can potentially have smaller size if linker can discard the unused symbols. Refs https://github.com/scylladb/scylladb/issues/2717 --- cmake related change, hence no need to backport. Closes scylladb/scylladb#19539 * github.com:scylladb/scylladb: build: cmake: include the whole archive of zstd.a build: cmake: find libzstd before using it	2024-07-03 17:38:22 +03:00
Botond Dénes	fca0a58674	Merge 'Close output_stream in get_compaction_history() API handler' from Pavel Emelyanov If an httpd body writer is called with output_stream<>, it mist close the stream on its own regardless of any exceptions it may generate while working, otherwise stream destructor may step on non-closed assertion. Stepped on with different handler, see #19541 Coroutinize the handler as the first step while at it (though the fix would have been notably shorter if done with .finally() lambda) Closes scylladb/scylladb#19543 * github.com:scylladb/scylladb: api: Close response stream of get_compaction_history() api: Flush output stream in get_compaction_history() call api: Coroutinize get_compaction_history inner function	2024-07-03 17:00:26 +03:00
Kefu Chai	fd5c04acbb	.github: use the latest dbuild image scylla does not build using scylla-toolchain:fedora-38-20240521, like: ``` FAILED: repair/CMakeFiles/repair.dir/repair.cc.o /usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -isystem /__w/scylladb/scylladb/abseil -O2 -std=gnu++2b -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/repair.cc.o -MF repair/CMakeFiles/repair.dir/repair.cc.o.d -o repair/CMakeFiles/repair.dir/repair.cc.o -c /__w/scylladb/scylladb/repair/repair.cc In file included from /__w/scylladb/scylladb/repair/repair.cc:10: In file included from /__w/scylladb/scylladb/repair/row_level.hh:14: In file included from /__w/scylladb/scylladb/repair/task_manager_module.hh:14: In file included from /__w/scylladb/scylladb/tasks/task_manager.hh:20: In file included from /__w/scylladb/scylladb/seastar/include/seastar/coroutine/parallel_for_each.hh:24: /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:6161:14: error: requires clause differs in template redeclaration requires forward_range<_Vp> ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:5860:14: note: previous template declaration is here requires input_range<_Vp> ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19547	2024-07-03 16:57:22 +03:00
Kefu Chai	a88496318b	alternator: use std::to_underlying() when appropriate now that we can use C++23 features, there is no need to hardcode the underlying type anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19546	2024-07-02 18:51:29 +03:00
Kefu Chai	57def6f1e2	docs: install in `non-package` node when running `make setup`, we could have following failure: ``` Installing the current project: scylla (4.3.0) The current project could not be installed: No file/folder found for package scylla If you do not want to install the current project use --no-root ``` because docs is not a proper python project named "scylla", and do not have a directory structure expected by poetry. what we expect from poetry, is to manage the dependencies for building the document. so, in this change, we install in the `non-package` mode when running `poetry install`, this skips the root package, which does not exist. as an alternative, we could put an empty `scylla.py` under `docs` directory, but that'd be overkill. or we could pass `--no-root` to `poetry install`, but would be ideal if we can keep the settings in a single place. see also https://python-poetry.org/docs/basic-usage/#operating-modes, and https://python-poetry.org/docs/cli/#options-2, for more details on the settings and command line options of poetry. please note this setting was added to poetry 1.8, so the required poetry version is updated. we might need to upgrade poetry in existing installation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19498	2024-07-02 18:03:20 +03:00
Michael Litvak	08b29460fc	mv: skip building view updates on a pending replica Currently, a pending replica that applies a write on a table that has materialized views, will build all the view updates as a normal replica, only to realize at a late point, in db::view::get_view_natural_endpoint(), that it doesn't have a paired view replica to send the updates to. It will then either drop the view updates, or send them to a pending view replica, if such exists. This work is unnecessary since it may be dropped, and even if there is a pending view replica to send the updates to, the updates that are built by the pending replica may be wrong since it may have incomplete information. This commit fixes the inefficiency by skipping the view update building step when applying an update on a pending replica. The metric total_view_updates_on_wrong_node is added to count the cases that a view update is determined to be unnecessary. The test reproduces the scenario of writing to a table and applying the update on a pending replica, and verifies that the pending replica doesn't try to build view updates. Fixes scylladb/scylladb#19152 Closes scylladb/scylladb#19488	2024-07-02 13:10:18 +02:00
Nadav Har'El	d61513c41c	Merge 'reader_concurrency_semaphore: make CPU concurrency configurable' from Botond Dénes The reader concurrency semaphore restricts the concurrency of reads that require CPU (intention: they read from the cache) to 1, meaning that if there is even a single active read which declares that it needs just CPU to proceed, no new read is admitted. This is meant to keep the concurrency of reads in the cache at 1. The idea is that concurrency in the cache is not useful: it just leads to the reactor rotating between these reads, all of the finishing later then they could if they were the only active read in the cache. This was observed to backfire in the case where there reads from a single table are mostly very fast, but on some keys are very slow (hint: collection full of tombstones). In this case the slow read keeps up the fast reads in the queue, increasing the 99th percentile latencies significantly. This series proposes to fix this, by making the CPU concurrency configurable. We don't like tunables like this and this is not a proper fix, but a workaround. The proper fix would be to allow to cut any page early, but we cannot cut a page in the middle of a row. We could maybe have a way of detecting slow reads and excluding them from the CPU concurrency. This would be a heuristic and it would be hard to get right. So in this series a robust and simple configurable is offered, which can be used on those few clusters which do suffer from the too strict concurrency limit. We have seen it in very few cases so far, so this doesn't seem to be wide-spread. Fixes: https://github.com/scylladb/scylladb/issues/19017 This fixes a regression introduced in 5.0, so we have to backport to all currently supported releases Closes scylladb/scylladb#19018 * github.com:scylladb/scylladb: test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc Please enter the commit message for your changes. Lines starting test/boost/reader_concurrency_semaphore_test: hoist require_can_admit reader_concurrency_semaphore: wire in the configurable cpu concurrency reader_concurrency_semaphore: add cpu_concurrency constructor parameter db/config: introduce reader_concurrency_semahore_cpu_concurrency	2024-07-02 13:39:00 +03:00
Tzach Livyatan	6ea475ec76	Docs: Fix a typo in sstable-corruption.rst Closes scylladb/scylladb#19515	2024-07-02 11:58:27 +02:00
Kamil Braun	bcfdeda080	Merge 'co-routinize paxos_state functions' from Gleb Co-routinize paxos_state functions to make them more readable. * 'gleb/coroutineze-paxos-state' of github.com:scylladb/scylla-dev: paxos: simplify paxos_state::prepare code to not work with raw futures paxos: co-routinize paxos_state::learn function paxos: remove no longer used with_locked_key functions paxos: co-routinize paxos_state::accept function paxos: co-routinize paxos_state::prepare function paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access	2024-07-02 11:54:13 +02:00
Tzach Livyatan	4938927fc2	Docs: fix typo in config-commands.rst This is a leftover from https://github.com/scylladb/scylladb/pull/19578, which mistakenly update the "scylla" script name to "ScyllaDB" Closes scylladb/scylladb#19583	2024-07-02 10:54:47 +02:00
Kamil Braun	edeb266fc2	Merge 'docs, config: render logging related options' from Kefu Chai this changeset adds a filter to customize the rendering of default values, and enables the `scylladb_cc_properties` extension to display the logging message related options. it prepares for the further improvements in https://opensource.docs.scylladb.com/master/reference/configuration-parameters.html. this changeset also prepare for the improvements requested by #19463 --- it's an improvement in the document, hence no need to backport. Closes scylladb/scylladb#19483 * github.com:scylladb/scylladb: config: add descriptions for default_log_level and friends config: define log_to_syslog in a different line docs: parse log_legacy_value as declarations of config option	2024-07-02 10:44:50 +02:00
Kefu Chai	aedd145d6b	.github: add compaction to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-02 14:06:42 +08:00
Kefu Chai	e87b64b7bb	compaction: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-02 14:06:42 +08:00
Tzach Livyatan	91401f7da5	docs: Update Scylla to ScyllaDB in all RST docs files v3 Closes scylladb/scylladb#19578	2024-07-01 18:04:21 +02:00
Andrei Chekun	b6aabca9a7	Add documentation how to use allure reporting Add documentation how to install and basic usage example of the allure reporting tool. Fix typo test/README.md Related: scylladb/qa-tasks#1665 Depends on: scylladb/scylladb#18169 Closes scylladb/scylladb#18710	2024-07-01 16:21:50 +02:00
Gleb Natapov	9ebdb23002	raft: add more raft metrics to make debug easier	2024-07-01 10:55:22 +02:00
Kamil Braun	94bc9d4f5b	Merge 'Do not expire local addres in raft address map since the local node cannot disappear' from Gleb Natapov A node may wait in the topology coordinator queue for awhile before been joined. Since the local address is added as expiring entry to the raft address map it may expire meanwhile and the bootstrap will fail. The series makes the entry non expiring. Fixes scylladb/scylladb#19523 Needs to be backported to 6.0 since the bug may cause bootstrap to fail. Closes scylladb/scylladb#19557 * github.com:scylladb/scylladb: test: add test that checks that local address cannot expire between join request placemen and its processing storage_service: make node's entry non expiring in raft address map	2024-07-01 09:12:48 +02:00
Kefu Chai	90be71d959	build: cmake: include the whole archive of zstd.a before this change, when linking scylla-main, the linker discards the unreferenced symbols defined by zstd.cc. but we use constructor of static variable `registerator` to register the zstd compressor, this variable is not used from the linker's point of view. but we do rely on the side effect of its constructor. that's why the rules generated by CMake fails to build tests and scylla executables with zstd support. that's why we have following test failure: ``` boost.sstable_3_x_test.test_uncompressed_collections_read ... [Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor' == [File] - seastar/src/testing/seastar_test.cc == [Line] - 43 ``` in this change, we single out zstd.cc and build it as an archive, so that scylla-main can include as a whole. an alternative is to link scylla-main as a whole archive, but that might increase the disk foot print when building lots of tests -- some of them do not use all symbols exposed by scylla-main, and can potentially have smaller size if linker can discard the unused symbols. Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 11:51:19 +08:00
Kefu Chai	1e0af0fb7e	build: cmake: find libzstd before using it we use libzstd in zstd.cc. so let's find this library before using it. this helps user to identify problem when preparing the building environment, instead of being greeted by a compile-time failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 11:51:19 +08:00
Kefu Chai	b71b638b2e	config: add descriptions for default_log_level and friends so that their description can be displayed in `reference/configuration-parameters/` web page. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	b486f4ef01	config: define log_to_syslog in a different line before this change, docs/_ext/scylladb_cc_properties.py parses the options line by line, because `log_to_stdout` and `log_to_syslog` are defined in a single line, this script is not able to parse them, hence fails to display them on the `reference/configuration-parameters/` web page. after this change, these two member variables are defined on different lines. both of them can be displayed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	34cab80103	docs: parse log_legacy_value as declarations of config option before this change, we only consider "named_value<type>" as the declaration of option, and the "Type" field of the corresponding option is displayed if its declaration is found. otherwise, "Type" field is not rendered. but some logging related options are declared using `log_legacy_value`, so they are missing. after this change, they are displayed as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	405f624776	cql3: define dtor of modification_statement in .cc file before this change, we rely on the compiler to use the definition of `cql3::attributes` to generate the defaulted destructor in .cc file. but with clang-19, it insists that we should have a complete definition available for defining the defaulted destructor, otherwise it fails the build: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -MF CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o.d -o CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -c /home/kefu/dev/scylladb/table_helper.cc In file included from /home/kefu/dev/scylladb/table_helper.cc:10: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/coroutine.hh:25: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/future.hh:30: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql3::attributes' 91 \| static_assert(sizeof(_Tp)>0, \| ^~~~~~~~~~~ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql3::attributes>::operator()' requested here 398 \| get_deleter()(std::move(__ptr)); \| ^ /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in instantiation of member function 'std::unique_ptr<cql3::attributes>::~unique_ptr' requested here 40 \| class modification_statement : public cql_statement_opt_metadata { \| ^ /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in implicit destructor for 'cql3::statements::modification_statement' first required here /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:28:7: note: forward declaration of 'cql3::attributes' 28 \| class attributes; \| ^ ``` so, in this change, we define the destructor in .cc file, where the complete definition of `cql3::attributes` is available. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19545	2024-06-30 19:35:05 +03:00
Avi Kivity	0ce00ebfbd	Merge 'Close output stream in task manager's API get_tasks handler' from Pavel Emelyanov If client stops reading response early, the server-side stream throws but must be closed anyway. Seen in another endpoint and fixed by #19541 Closes scylladb/scylladb#19542 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close response stream on error api: Flush response output stream before closing	2024-06-30 19:34:00 +03:00
Avi Kivity	3a85d88b68	Merge 'Close output_stream in get_snapshot_details() API handler' from Pavel Emelyanov All streams used by httpd handlers are to be closed by the handler itself, caller doesn't take care of that. fixes: #19494 Closes scylladb/scylladb#19541 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close output_stream on error api: Flush response output stream before closing	2024-06-30 19:33:16 +03:00
Avi Kivity	2fbc532e4d	Update tools/python3 submodule * tools/python3 3e833f1...18fa79e (1): > reloc: use `--add-rpath` and not `--set-rpath`	2024-06-30 19:31:23 +03:00
Kefu Chai	77d2d5821d	build: cmake: do not mark cqlsh noarch in `3c7af287`, cqlsh's reloc package was marked as "noarch", and its filename was updated accordingly in `configure.py`, so let's update the CMake building system accordingly. this change should address the build failure of ``` 08:48:14 [3325/4124] Generating ../Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 FAILED: Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 cd /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist && /usr/bin/cmake -E copy /jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 Error copying file "/jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz" to "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz". ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19544	2024-06-30 19:26:54 +03:00
Nadav Har'El	44e036c53c	alternator: fix "/localnodes" to use broadcast_rpc_address Alternator's non-standard "/localnodes" HTTP request returns a list of live nodes on this DC, to consider for load balancing. The returned node addresses should be external IP addresses usable by the clients. Scylla has a configuration parameter - broadcast_rpc_address - which defines for a node an external IP address. If such a configuration exists, we need to use those external IP addresses, not the internal ones. Finding these broadcast_rpc_address of all nodes is easy, because the gossiper already gossips them. This patch also tests the new feature: 1. The existing single-node test is extended to verify that without broadcast_rpc_address we get the usual IP address. 2. A new two-node test is added to check that when broadcast_rpc_address is configured, we get that address and not the usual internal IP addresses. Fixes #18711. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-06-30 18:38:15 +03:00
Nadav Har'El	2a2e8167c8	gossiper: fix get_rpc_address() for this node Commit `dd46a92e23` introduced a function gossiper::get_rpc_address() as a shortcut for get_application_state_ptr(endpoint, RPC_ADDRESS) - i.e., it fetches the endpoint's configured broadcast_rpc_address (despite its confusing name, this is the endpoint's external IP address that clients can use to make CQL connections). But strangely, the implementation get_rpc_address() made an exception for asking about the current host - where instead of getting this node's broadcast_rpc_address, it returns its internal address, which is not what this function was supposed to do - it's not useful for it to do one thing for this node, and a different thing for other nodes, and when I wrote code that uses this function (see the next patch), this resulted in wrong results for the current node. The fix is simple - drop the wrong if(), and get the broadcast_rpc_address stored by the gossiper unconditionally - the gossiper knows it for this node just like for other nodes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-06-30 18:38:15 +03:00
Gleb Natapov	3f136cf2eb	test: add test that checks that local address cannot expire between join request placemen and its processing	2024-06-30 15:52:23 +03:00
Gleb Natapov	5d8f08c0d7	storage_service: make node's entry non expiring in raft address map Local address map entry should never expire in the address map.	2024-06-30 15:08:50 +03:00
Kefu Chai	947e28146d	dbuild: pass --tty when running in interactive mode podman does not allocate a tty by default, so without `-t` or `--tty`, one cannot use a functional terminal when interacting with the container. that what one can expect when running `dbuild -i --`, and we are greeted with : ``` bash: cannot set terminal process group (-1): Inappropriate ioctl for device bash: no job control in this shell ``` after this change, one can enjoy the good-old terminal as usual after being dropped to the container provided by `dbuild -i --`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19550	2024-06-30 12:06:55 +03:00
Pavel Emelyanov	d034cde01f	Merge 'build: update C++ standard to C++23' from Avi Kivity Switch the C++ standard from C++20 to C++23. This is straightforward, but there are a few fallouts (mostly due to std::unique_ptr that became constexpr) that need to be fixed first. Internal enhancement - no backport required Closes scylladb/scylladb#19528 * github.com:scylladb/scylladb: build: switch to C++23 config: avoid binding an lvalue reference to an rvalue reference readers: define query::partition_slice before using it in default argument test: define table_for_tests earlier compaction: define compaction_group::table_state earlier compaction: compaction_group: define destructor out-of-line compaction_manager: define compaction_manager::strategy_control earlier	2024-06-28 18:02:33 +03:00
Avi Kivity	cf66f233aa	build: remove aarch64 workarounds In `90a6c3bd7a` ("build: reduce release mode inline tuning on aarch64") we reduced inlining on aarch64, due to miscompiles. In `224a2877b9` ("build: disable -Og in debug mode to avoid coroutine asan breakage") we disabled optimization in debug mode, due to miscompiles. With clang 18.1, it appears the miscompiles are gone, and we can remove the two workarounds. Closes scylladb/scylladb#19531	2024-06-28 17:53:51 +03:00
Pavel Emelyanov	b4f9387a9d	api: Close response stream of get_compaction_history() The function must close the stream even if it throws along the way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:56:53 +03:00
Pavel Emelyanov	6d4ba98796	api: Flush output stream in get_compaction_history() call It's currently implicitly flushed on its close, but in that case close can throw while flusing. Next patch wants close not to throw and that's possible if flushing the stream in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:55:58 +03:00
Pavel Emelyanov	acb351f4ee	api: Coroutinize get_compaction_history inner function The handler returns a function which is then invoked with output_stream argument to render the json into. This function is converted into coroutine. It has yet another inner lambda that's passed into compaction_manager::get_compaction_history() as consumer lambda. It's coroutinized too. The indentation looks weird as preparation for future patching. Hopefullly it's still possible to understand what's going on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:53:46 +03:00
Pavel Emelyanov	1be8b2fd25	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:07:21 +03:00
Pavel Emelyanov	986a04cb11	api: Close response stream on error The handler's lambda is called with && stream object and must close the stream on its own regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:06:41 +03:00
Pavel Emelyanov	4897d8f145	api: Flush response output stream before closing The .close() method flushes the stream, but it may throw doing it. Next patch will want .close() not to throw, for that stream must be flushed explicitly before closing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:05:20 +03:00
Pavel Emelyanov	1839030e3b	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:41:12 +03:00
Pavel Emelyanov	a0c1552cea	api: Close output_stream on error If the get_snapshot_details() lambda throws, the output stream remains non-closed which is bad. Close it regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:40:42 +03:00
Pavel Emelyanov	d1fd886608	api: Flush response output stream before closing Otherwise close() may throw and this is what next patch will want not to happen. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:40:00 +03:00
Piotr Dulikowski	f00c4eaf72	Merge '[test.py] add --extra-scylla-cmdline-options argument for test.py' from Artsiom Mishuta this PR has 2 commits - [test: pass Scylla extra CMD args from test.py args](`6b367a04b5`) - [test: adjust scylla_cluster.merge_cmdline_options behavior](`c60b36090a`) the main goal is to solve [test.py: provide an easy-to-remember, univeral way to run scylla with trace level logging](https://github.com/scylladb/scylladb/issues/14960) issue but also can be used to easily apply additional arguments for all UnitTests and PythonTests on the fly from the test.py CMD Closes scylladb/scylladb#19509 * github.com:scylladb/scylladb: test: adjust scylla_cluster.merge_cmdline_options behavior test: pass scylla extra CMD args from test.py args	2024-06-28 11:11:29 +02:00
Kamil Braun	6ec8143e56	Merge 'Remove dead code from migration_manager and schema_tables' from Benny Halevy This short series removed some ancient legacy code from migration_manager and schema_tables, before I make further changes in this area. We have more such code under the cql3 hierarchy but it can be dealt with as a follow up. No backport required Closes scylladb/scylladb#19530 * github.com:scylladb/scylladb: schema_tables: remove dead code migration_manager: remove dead code	2024-06-28 10:59:21 +02:00
Piotr Smaron	88eda47f13	cql: forbid switching from tablets to vnodes in ALTER KS This check is already in place, but isn't fully working, i.e. switching from a vnode KS to a tablets KS is not allowed, but this check doesn't work in the other direction. To fix the latter, `ks_prop_defs::get_initial_tablets()` has been changed to handle 3 states: (1) init_tablets is set, (2) it was skipped, (3) tablets are disabled. These couldn't fit into std::optional, so a new local struct to hold these states has been introduced. Callers of this function have been adjusted to set init_tablets to an appropriate value according to the circumstances, i.e. if tablets are globally enabled, but have been skipped in the CQL, init_tablets is automatically set to 0, but if someone executes ALTER KS and doesn't provide tablets options, they're inherited from the old KS. I tried various approaches and this one resulted in the least lines of code changed. I also provided testcases to explain how the code behaves. Fixes: #18795 Closes scylladb/scylladb#19368	2024-06-28 11:41:41 +03:00
Gleb Natapov	5c72af7a93	paxos: simplify paxos_state::prepare code to not work with raw futures	2024-06-28 07:30:45 +03:00
Gleb Natapov	2b7acdb32c	paxos: co-routinize paxos_state::learn function	2024-06-28 07:30:45 +03:00
Gleb Natapov	6bf307ffe8	paxos: remove no longer used with_locked_key functions	2024-06-28 07:30:45 +03:00
Gleb Natapov	887a5a8f62	paxos: co-routinize paxos_state::accept function	2024-06-28 07:30:45 +03:00
Benny Halevy	b7f00ba4bf	schema_tables: remove dead code Well, even after 10 years, the c++ compilers still do not compile Java... And having that legacy code laying around not only it doesn't help anyone understand what's going on, but on the contrary, it's confusing and distracting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 20:34:02 +03:00
Benny Halevy	5f6c411656	migration_manager: remove dead code Well, even after 10 years, the c++ compilers still do not compile Java... And having that legacy code laying around not only it doesn't help anyone understand what's going on, but on the contrary, it's confusing and distracting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 20:30:33 +03:00
Avi Kivity	4d85db9f39	build: switch to C++23 Set the C++ dialect to C++23, allowing us to use the new features.	2024-06-27 19:36:13 +03:00
Avi Kivity	d14eec8160	config: avoid binding an lvalue reference to an rvalue reference config_file::add_deprecated_options() returns an lvalue reference to a parameter which itself is an rvalue reference. In C++20 this is bad practice (but not a bug in this case) as rvalue references are not expected to live past the call. In C++23, it fails to compile. Fix by accepting an lvalue reference for the parameter, and adjust the caller.	2024-06-27 19:36:13 +03:00
Avi Kivity	ed816afac4	readers: define query::partition_slice before using it in default argument C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is the default argument of make_reversing_reader; define it earlier (by including its header) to build with C++23.	2024-06-27 19:36:13 +03:00
Piotr Dulikowski	f9abe52d3b	Merge 'test: auth: add random tag to resources in test_auth_v2_migration' from Marcin Maliszkiewicz Those tests are sometimes failing on CI and we have two hypothesis: 1. Something wrong with consistency of statements 2. Interruption from another test run (e.g. same queries performed concurrently or data remained after previous run) To exclude or confirm 2. we add random marker to avoid potential collision, in such case it will be clearly visible that wrong data comes from a different run. Related scylladb/scylladb#18931 Related scylladb/scylladb#18319 backport: no, just a test fix Closes scylladb/scylladb#19484 * github.com:scylladb/scylladb: test: auth: add random tag to resources in test_auth_v2_migration test: extend unique_name with random sufix	2024-06-27 17:35:14 +02:00
Gleb Natapov	58912c2cc1	paxos: co-routinize paxos_state::prepare function	2024-06-27 18:10:49 +03:00
Gleb Natapov	4f546b8b79	paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access	2024-06-27 18:09:30 +03:00
Avi Kivity	e5807555bd	test: define table_for_tests earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is table_for_tests; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Avi Kivity	d5ba0b4041	compaction: define compaction_group::table_state earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is compaction_group::table_state; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Avi Kivity	9ecf4ada49	compaction: compaction_group: define destructor out-of-line Define compaction_group::~compaction_group() out-of-line to prevent problems instantiating compaction_group::_table_state, which is an std::unique_ptr. In C++23, std::unique_ptr is constexpr, which means its methods (in this case the destructor) require seeing the definition of the class at the point of instantiation.	2024-06-27 17:54:12 +03:00
Avi Kivity	050e7bbd64	compaction_manager: define compaction_manager::strategy_control earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is compaction_manager::strategy_control; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Andrei Chekun	561e88f00e	[test.py] Throw meaningful error when something wrong wit Scylla binary Fixes: https://github.com/scylladb/scylladb/issues/19489 There is already a check that Scylla binary is executable, but it's done on later stage. So in logs for specific test file there will be a message about something wrong with binary, but in console there will be now signs of that. Moreover, there will be an error that completely misleads what actually happened and why test run failed. With this check test will fail earlier providing the correct reason why it's failed Closes scylladb/scylladb#19491	2024-06-27 17:38:32 +03:00
Avi Kivity	581d619572	storage_proxy: trace speculative retries A speculative retry can appear out of the blue[1] and confuse people, as it looks like the consistency level was elevated. Fix by adding such a tracepoint. Sample output: ``` activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2024-06-27 14:25:58.947000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2024-06-27 14:25:58.947918 \| 127.0.0.1 \| 2 \| 127.0.0.1 Processing a statement for authenticated user: anonymous [shard 0] \| 2024-06-27 14:25:58.948025 \| 127.0.0.1 \| 108 \| 127.0.0.1 Creating read executor for token -4069959284402364209 with all: [127.0.0.1, 127.0.0.2] targets: [127.0.0.2] repair decision: NONE [shard 0] \| 2024-06-27 14:25:58.948125 \| 127.0.0.1 \| 209 \| 127.0.0.1 Added extra target 127.0.0.1 for speculative read [shard 0] \| 2024-06-27 14:25:58.948128 \| 127.0.0.1 \| 212 \| 127.0.0.1 Creating speculating_read_executor [shard 0] \| 2024-06-27 14:25:58.948129 \| 127.0.0.1 \| 213 \| 127.0.0.1 read_data: sending a message to /127.0.0.2 [shard 0] \| 2024-06-27 14:25:58.948138 \| 127.0.0.1 \| 222 \| 127.0.0.1 Launching speculative retry for data [shard 0] \| 2024-06-27 14:25:58.948234 \| 127.0.0.1 \| 318 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2024-06-27 14:25:58.948235 \| 127.0.0.1 \| 319 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] \| 2024-06-27 14:25:58.948246 \| 127.0.0.1 \| 330 \| 127.0.0.1 [reader concurrency semaphore user] admitted immediately [shard 0] \| 2024-06-27 14:25:58.948250 \| 127.0.0.1 \| 334 \| 127.0.0.1 [reader concurrency semaphore user] executing read [shard 0] \| 2024-06-27 14:25:58.948258 \| 127.0.0.1 \| 342 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] \| 2024-06-27 14:25:58.948281 \| 127.0.0.1 \| 365 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2024-06-27 14:25:58.948311 \| 127.0.0.1 \| 395 \| 127.0.0.1 Querying is done [shard 0] \| 2024-06-27 14:25:58.948320 \| 127.0.0.1 \| 404 \| 127.0.0.1 read_data: message received from /127.0.0.1 [shard 0] \| 2024-06-27 14:25:58.948351 \| 127.0.0.2 \| 12 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2024-06-27 14:25:58.948354 \| 127.0.0.1 \| 438 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] \| 2024-06-27 14:25:58.948370 \| 127.0.0.2 \| 31 \| 127.0.0.1 [reader concurrency semaphore user] admitted immediately [shard 0] \| 2024-06-27 14:25:58.948374 \| 127.0.0.2 \| 35 \| 127.0.0.1 [reader concurrency semaphore user] executing read [shard 0] \| 2024-06-27 14:25:58.948388 \| 127.0.0.2 \| 49 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] \| 2024-06-27 14:25:58.948405 \| 127.0.0.2 \| 66 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2024-06-27 14:25:58.948424 \| 127.0.0.2 \| 85 \| 127.0.0.1 Querying is done [shard 0] \| 2024-06-27 14:25:58.948430 \| 127.0.0.2 \| 91 \| 127.0.0.1 read_data handling is done, sending a response to /127.0.0.1 [shard 0] \| 2024-06-27 14:25:58.948436 \| 127.0.0.2 \| 97 \| 127.0.0.1 read_data: got response from /127.0.0.2 [shard 0] \| 2024-06-27 14:25:58.949140 \| 127.0.0.1 \| 1224 \| 127.0.0.1 Request complete \| 2024-06-27 14:25:58.947449 \| 127.0.0.1 \| 449 \| 127.0.0.1 ``` Ref #18988 [1] not completely out of the blue, `ff29f430` indicates that a speculative read can happen. Closes scylladb/scylladb#19520	2024-06-27 17:37:36 +03:00
Botond Dénes	b4f3809ad2	test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc Please enter the commit message for your changes. Lines starting	2024-06-27 09:57:11 -04:00
Botond Dénes	9cbdd8ef92	test/boost/reader_concurrency_semaphore_test: hoist require_can_admit This is currently a lambda in a test, hoist it into the global scope and make it into a function, so other tests can use it too (in the next patch).	2024-06-27 09:57:11 -04:00
Botond Dénes	07c0a8a6f8	reader_concurrency_semaphore: wire in the configurable cpu concurrency Before this patch, the semaphore was hard-wired to stop admission, if there is even a single permit, which is in the need_cpu state. Therefore, keeping the CPU concurrency at 1. This patch makes use of the new cpu_concurrency parameter, which was wired in in the last patches, allowing for a configurable amount of concurrent need_cpu permits. This is to address workloads where some small subset of reads are expected to be slow, and can hold up faster reads behind them in the semaphore queue.	2024-06-27 09:57:11 -04:00
Botond Dénes	59faa6d4ff	reader_concurrency_semaphore: add cpu_concurrency constructor parameter In the case of the user semaphore, this receives the new reader_concurrency_semaphore_cpu_limit config item. Not used yet.	2024-06-27 09:57:11 -04:00
Benny Halevy	7f05f95ec4	conf: scylla.yaml: enable_tablets: expand documentation The exiting documentation comment for `enable_tablets` is very terse and lacks details about the effect of enabling or disabling tablets. This change adds more details about the impact of `enable_tablets` on newly created keyspaces, and hot to disable tablets when keyspaces are created. Also, a note was added to warn about the irreversibility of the tablets enablement per keyspace. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 14:41:43 +03:00
Avi Kivity	0d23b8165e	build: update frozen toolchain to Fedora 40 with clang 18.1.6 This refreshes our dependencies to a supported distribution. Closes scylladb/scylladb#19205	2024-06-27 14:27:21 +03:00
Yaron Kaikov	efa94b06c2	.github/scripts/label_promoted_commits.py: fix adding labels when PR is closed `prs = response.json().get("items", [])` will return empty when there are no merged PRs, and this will just skip the all-label replacement process. This is a regression following the work done in #19442 Adding another part to handle closed PRs (which is the majority of the cases we have in Scylla core) Fixes: https://github.com/scylladb/scylladb/issues/19441 Closes scylladb/scylladb#19497	2024-06-27 14:00:44 +03:00
Pavel Emelyanov	6c1e5c248f	main,proxy: Drain proxy in its stop_remote Currently proxy initialization is pretty disperse, in particular it's stopped in several steps -- first drain_on_shutdown() then stop_remote(). In between there's nothing that needs proxy in any particular sate, so those two steps can be merged into one. refs: scylladb/scylladb#2737 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19344	2024-06-27 12:26:51 +02:00
Pavel Emelyanov	1a219c674c	s3/client: Always retry http requests Real S3 server is known to actively close connections, thus breaking S3 storage backend at random places. The recent http client update is more robust against that, but the needed feature is OFF by default. refs: scylladb/seastar#1883 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19461	2024-06-27 13:14:24 +03:00
Artsiom Mishuta	919d44e0c7	test: adjust scylla_cluster.merge_cmdline_options behavior adjust merge_cmdline_options behaviour to append --logger-log-level option instead of merge this behaviour can be changed(if needed) to previour version(all merge): merge_cmdline_options(list1, list2, appending_options=[]) or, to append different cmd options: merge_cmdline_options(list1, list2, appending_options=[option1,option2])	2024-06-27 10:03:31 +02:00
Artsiom Mishuta	440785bc41	test: pass scylla extra CMD args from test.py args this commit introduces a test.py option --extra-scylla-cmdline-options to pass extra scylla cmdline options for all tests. Options should be space-separated: '--logger-log-level raft=trace --default-log-level error'	2024-06-27 10:02:55 +02:00
Artsiom Mishuta	677173bf8b	test: generate core dumps on crashes in nodetool tests The nodetool tests does not set the asan/ubsan options to abort on error and create core dumps Fix by setting the environment variables in nodetool tests. Closes scylladb/scylladb#19503	2024-06-27 10:44:33 +03:00
Marcin Maliszkiewicz	b708c5701f	test: auth: add random tag to resources in test_auth_v2_migration Those tests are sometimes failing on CI and we have two hypothesis: 1. Something wrong with consistency of statements 2. Interruption from another test run (e.g. same queries performed concurrently or data remained after previous run) To exclude or confirm 2. we add random marker to avoid potential collision, in such case it will be clearly visible that wrong data comes from a different run. Related scylladb/scylladb#18931 Related scylladb/scylladb#18319	2024-06-27 09:28:27 +02:00
Marcin Maliszkiewicz	d08a80b34f	test: extend unique_name with random sufix This reduces collision risk in an unlikely and incorrect setup where tests would be run concurrently by multiple processes.	2024-06-27 09:28:02 +02:00
Anna Stuchlik	e2994a19d5	doc: update Scylla Doctor installation This commit updates the instuctions on how to download and run Scylla Doctor, following the changes in how Scylla Doctor is released. Closes scylladb/scylladb#19510	2024-06-27 10:22:08 +03:00
Botond Dénes	2fe50cda22	Merge 'chunked_vector enhancements' from Benny Halevy This short series enhances utils::chunked_vector so it could be used more easily to convert dht::partition_range_vector to chunked_vector, for example. - utils: chunked_vector: document invalidation of iterators on move - utils: chunked_vector: add ctor from std::initializer_list - utils: chunked_vector: add ctor from a single value No backport required Closes scylladb/scylladb#19462 * github.com:scylladb/scylladb: chunked_vector_test: add tests for value-initialization constructor utils: chunked_vector: add ctor from std::initializer_list utils: chunked_vector: document invalidation of iterators on move	2024-06-27 10:20:47 +03:00
Benny Halevy	92f8d219b3	conf: scylla.yaml: remove tablets from experimental_features doc comment tablets are no longer in experimental_features since `83d491af02`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 08:55:30 +03:00
Anna Stuchlik	072542a5cc	doc: add a page with ScyllaDB limits This commit adds a page listing the ScyllDB limits we know today. The page can and should be extended when other limits are confirmed. Closes scylladb/scylladb#19399	2024-06-27 08:28:51 +03:00
Kefu Chai	52f1168a3d	repair: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19508	2024-06-26 21:57:03 +03:00
Israel Fruchter	3c7af28725	cqlsh: update cqlsh submodule this change updates the cqlsh submodule: * tools/cqlsh/ ba83aea3...73bdbeb0 (4): > install.sh: replace tab with spaces > define the the debug packge is empty > tests: switch from using cqlsh bash to the test the python file > package python driver as wheels it also includes follow change to package cqlsh as a regular rpm instead of as a "noarch" rpm: so far cqlsh bundles the python-driver in, but only as source. meaning the package wasn't architecture, and also didn't have the libev eventloop compiled in. Since from python 3.12 and up, that would mean we would fallback into asyncio eventloop (which still exprimental) or into error (once we'll sync with the driver upstream) so to avoid those, we are change the packaging of cqlsh to be architecture specific, and get cqlsh compiled, and bundle all of it's requirements as per architecture installed bundle of wheels. using `shiv`, i.e. one file virtualenv that we'll be packing into our artifacts Ref: https://github.com/scylladb/scylla-cqlsh/issues/90 Ref: https://github.com/scylladb/scylla-cqlsh/pull/91 Ref: https://github.com/linkedin/shiv Closes scylladb/scylladb#19385 * tools/cqlsh ba83aea...242876c (1): > Merge 'package python driver as wheels' from Israel Fruchter Update tools/cqlsh/ submodule in which, the change of `define the the debug packge is empty` should address the build failure like ``` Processing files: scylla-cqlsh-debugsource-6.1.0~dev-0.20240624.c7748f60c0bc.aarch64 error: Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list RPM build errors: Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list ``` Closes scylladb/scylladb#19473	2024-06-26 12:07:21 +03:00
Botond Dénes	1fca341514	test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair	2024-06-26 04:05:17 -04:00
Botond Dénes	d3b1ccd03a	replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag Now enable_tombstone_gc_for_streaming_and_repair is wired in all the way to maybe_compact_for_streaming(), so we can implement the toggling of tombstone GC based on it.	2024-06-26 04:05:17 -04:00
Botond Dénes	415457be2b	replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming() Just wiring, the new flag will be used in the next patch.	2024-06-26 04:05:17 -04:00
Botond Dénes	d5a149fc01	db/config: introduce enable_tombstone_gc_for_streaming_and_repair To control whether the compacting reader (if enabled) for streaming and repair can garbage-collect tombstones. Default is false (previous behaviour). Not wired yet.	2024-06-26 04:05:17 -04:00
Pavel Emelyanov	263668bc85	transport: Use sharded<>::invoke_on_others() When preparing statement, the server code first does it on non-local shards, then on local one. The former call is done the hard way, while there's a short sugar sharded<> class method doing it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19485	2024-06-25 22:17:59 +03:00
Kamil Braun	13fc2bd854	Merge `notify other nodes on boot` from Gleb The series adds a step during node's boot process, just before completing the initialization, in which the node sends a notification to all other normal nodes in the cluster that it is UP now. Other nodes wait for this node to be UP and in normal state before replying. This ensures that, in a healthy cluster, when a node start serving queries the entire cluster knows its up-to-date state. The notification is a best effort though. If some nodes are down or do not reply in time the boot process continues. It is somewhat similar to shutdown notification in this regard. * 'gleb/notify-up-v2' of github.com:scylladb/scylla-dev: gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization Wait for booting node to be marked UP before complete booting. gossiper: move gossip verbs to the idl	2024-06-25 17:58:17 +02:00
Aleksandra Martyniuk	2394e3ee7a	repair: drop timeout from table_sync_and_check Delete 10s timeout from read barrier in table_sync_and_check, so that the function always considers all previous group0 changes. Fixes: #18490. Closes scylladb/scylladb#18752	2024-06-25 17:44:31 +02:00
Avi Kivity	c80dc57156	Merge 'batchlog replay: bypass tombstones generated by past replays' from Botond Dénes The `system.batchlog` table has a partition for each batch that failed to complete. After finally applying the batch, the partition is deleted. Although the table has gc_grace_second = 0, tombstones can still accumulate in memory, because we don't purge partition tombstones from either the memtable or the cache. This can lead to the cache and memtable of this table to accumulate many thousands of even millions of tombstones, making batchlog replay very slow. We didn't notice this before, because we would only replay all failed batches on unbootstrap, which is rare and a heavy and slow operation on its own right already. With repair-based tombstone-gc however, we do a full batchlog replay at the beginning of each repair, and now this extra delay is noticeable. Fix this by making sure batchlog replays don't have to scan through all the tombstones generated by previous replays: * flush the `system.batchlog` memtable at the end of each batchlog replay, so it is cleared of tombstones * bypass the cache Fixes: https://github.com/scylladb/scylladb/issues/19376 Although this is not a regression -- replay was like this since forever -- now that repair calls into batchlog replay, every release which uses repair-based tombstone-gc should get this fix Closes scylladb/scylladb#19377 * github.com:scylladb/scylladb: db/batchlog_manager: bypass cache when scanning batchlog table db/batchlog_manager: replace open-coded paging with internal one db/batchlog_manager: implement cleanup after all batchlog replay cql3/query_processor: for_each_cql_result(): move func to the coro frame	2024-06-25 16:11:01 +03:00
Avi Kivity	371e37924f	Merge 'Rebuild bloom filters that have bad partition estimates' from Lakshmi Narayanan Sreethar The bloom filters are built with partition estimates because the actual partition count might not be available in all cases. If the estimate is inaccurate, the bloom filters might end up being too large or too small compared to their optimal sizes. This PR rebuilds bloom filters with inaccurate partition estimates using the actual partition count before the filter is written to disk. A bloom filter is considered to have an inaccurate estimate if its false positive rate based on the current bitmap size is either less than 75% or more than 125% of the configured false positive rate. Fixes #19049 A manual test was run to check the impact of rebuild on compaction. Table definition used : CREATE TABLE scylla_bench.simple_table (id int PRIMARY KEY); Setup : 3 billion random rows with id in the range [0, 1e8) were inserted as batches of 5 rows into scylla_bench.simple_table via 80 threads. Compaction statistics : scylla_bench.simple_table : (a) Total number of compactions : `1501` (b) Total time spent in compaction : `9h58m47.269s` (c) Number of compactions which rebuilt bloom filters : `16` (d) Total time taken by these 16 compactions which rebuilt bloom filters : `2h55m11.89s` (e) Total time spent by these 16 compactions to rebuild bloom filters : `8m6.221s` which is - `4.63%` of the total time taken by the compactions which rebuilt filters (d) - `1.35%` of the total compaction time (b). (f) Total bytes saved by rebuilding filters : `388 MB` system.compaction_history : (a) Total number of compactions : `77` (b) Total time spent in compaction : `21.24s` (c) Number of compactions which rebuilt bloom filters : `74` (d) Time taken by these 74 compactions which rebuilt bloom filters : `20.48s` (e) Time spent by these 74 compactions to rebuild bloom filters : `377ms` which is - `1.84%` of the total time taken by the compactions which rebuilt filters (d) - `1.77%` of the total compaction time (b). (f) Total bytes saved by rebuilding filters : `20 kB` The following tables also had compactions and the bloom filter was rebuilt in all those compactions. However, the time taken for every rebuild was observed as 0ms from the logs as it completed within a microsecond : system.raft : (a) Total number of compactions : `2` (b) Total time spent in compaction : `106ms` (c) Total bytes saved by rebuilding filters : `960 B` system_schema.tables : (a) Total number of compactions : `1` (b) Total time spent in compaction : `25ms` (c) Total bytes saved by rebuilding filter : `312 B` system.topology : (a) Total number of compactions : `1` (b) Total time spent in compaction : `25ms` (c) Total bytes saved by rebuilding filter : `320 B` Closes scylladb/scylladb#19190 * github.com:scylladb/scylladb: bloom_filter_test: add testcase to verify filter rebuilds test/boost: move bloom filter tests from sstable_datafile_test into a new file sstables/mx/writer: rebuild bloom filters with bad partition estimates sstables/mx/writer: add variable to track number of partitions consumed sstable: introduce sstable::maybe_rebuild_filter_from_index() sstable: add method to return filter format for the given sstable version utils/i_filter: introduce get_filter_size()	2024-06-25 15:35:09 +03:00
Nadav Har'El	35ace0af5c	Merge 'Move some /storage_proxy API endpoints to config.cc' from Pavel Emelyanov API endpoints that need a particular service to get data from are registered next to this service (#2737). In /storage_proxy function there live some endpoints that work with config, so this PR moves them to the existing config.cc with config-related endpoints. The path these endpoints are registered with remains intact, so some tweak in proxy API registration is also here. Closes scylladb/scylladb#19417 * github.com:scylladb/scylladb: api: Use provided db::config, not the one from ctx api: Move some config endpoints from proxy to config api: Split storage_proxy api registration api: Unset config endpoints	2024-06-25 13:55:58 +03:00
Michał Chojnowski	c7dc3b9b58	scylla-gdb.py: add line information to coroutine names in `scylla fiber` For convenience. Note that this line info only points to the function as a whole, not to the current suspend point. I think there's no facility for converting the `__coro_index` to the current suspend point automatically. Before: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` After: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) at sstables/sstables.cc:352) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) at sstables/sstables.cc:570) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const at sstables/sstables.cc:992) ``` Closes scylladb/scylladb#19478	2024-06-25 13:55:10 +03:00
Kefu Chai	def432617d	docs: print out invalid branch name to help user to understand what the extension is expecting. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19477	2024-06-25 13:17:25 +03:00
Botond Dénes	31c0fa07d8	db/batchlog_manager: bypass cache when scanning batchlog table Scans should not pollute the cache with cold data, in general. In the case of the batchlog table, there is another reason to bypass the cache: this table can have a lot of partition tombstones, which currently are not purged from the cache. So in certain cases, using the cache can make batch replay very slow, because it has to scan past tombstones of already replayed batches.	2024-06-25 06:15:47 -04:00
Botond Dénes	29f610d861	db/batchlog_manager: replace open-coded paging with internal one query_processor has built-in paging support, no need to open-code paging in batchlog manager code.	2024-06-25 06:15:47 -04:00
Botond Dénes	2dd057c96d	db/batchlog_manager: implement cleanup after all batchlog replay We have a commented code snippet from Origin with cleanup and a FIXME to implement it. Origin flushes the memtables and kicks a compaction. We only implement the flush here -- the flush will trigger a compaction check and we leave it up to the compaction manager to decide when a compaction is worthwhile. This method used to be called only from unbootstrap, so a cleanup was not really needed. Now it is also called at the end of repair, if the table is using repair-based tombstone-gc. If the memtable is filled with tombstones, this can add a lot of time to the runtime of each repair. So flush the memtable at the end, so the tombstones can be purged (they aren't purged from memtables yet).	2024-06-25 06:15:47 -04:00
Botond Dénes	4e96e320b4	cql3/query_processor: for_each_cql_result(): move func to the coro frame Said method has a func parameter (called just f), which it receives as rvalue ref and just uses as a reference. This means that if caller doesn't keep the func alive, for_each_cql_result() will run into use-after-free after the first suspention point. This is unexpected for callers, who don't expect to have to keep something alive, which they passed in with std::move(). Adjust the signature to take a value instead, value parameters are moved to the coro frame and survive suspention points. Adjust internal callers (query_internal()) the same way. There are no known vulnerable external callers.	2024-06-25 06:15:25 -04:00
Benny Halevy	3f23016cc0	perf-simple-query: add mean and standard deviation stats Currently, perf-simple-query summarizes the statistics only for the throughput, printing the median, median absolute deviation, minimum, and maximum. But the throughput put is typically highly variable and its median is noisy. This patch calculates also the mean and standard deviation and does that also for instructions_per_op and cpu_cycles_per_op to present a fuller picture of the performance metrics. Output example: ``` random-seed=3383668492 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 95613.97 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42456 insns/op, 22117 cycles/op, 0 errors) 97538.45 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42454 insns/op, 22094 cycles/op, 0 errors) 95883.37 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42438 insns/op, 22268 cycles/op, 0 errors) 96791.45 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42433 insns/op, 22256 cycles/op, 0 errors) 97894.71 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42420 insns/op, 22010 cycles/op, 0 errors) throughput: mean=96744.39 standard-deviation=996.89 median=96791.45 median-absolute-deviation=861.02 maximum=97894.71 minimum=95613.97 instructions_per_op: mean=42440.08 standard-deviation=14.99 median=42437.59 median-absolute-deviation=13.58 maximum=42456.15 minimum=42420.10 cpu_cycles_per_op: mean=22148.98 standard-deviation=110.43 median=22117.04 median-absolute-deviation=106.89 maximum=22267.70 minimum=22010.42 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19450	2024-06-25 12:25:59 +03:00
Yaron Kaikov	394cba3e4b	.github/workflow: close and replace label when backport promoted Today after Mergify opened a Backport PR, it will stay open until someone manually close the backport PR , also we can't track using labels which backport was done or not since there is no indication for that except digging into the PR and looking for a comment or a commit ref The following changes were made in this PR: * trigger add-label-when-promoted.yaml also when the push was made to `branch-x.y` * Replace label `backport/x.y` with `backport/x.y-done` in the original PR (this will automatically update the original Issue as well) * Add a comment on the backport PR and close it Fixes: https://github.com/scylladb/scylladb/issues/19441 Closes scylladb/scylladb#19442	2024-06-25 12:11:28 +03:00
Benny Halevy	8daf755f8a	statement_restrictions: partition_ranges_from_singles: no need to default-initialize result Currently, the returned `ranges` vector is first initialized to `product_size` and then the returned partition ranges are copied into it. Instead, we can simply reserve the vector capacity, without initializing it, and then emplace all partition ranges onto it using std::back_inserter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19457	2024-06-25 12:11:28 +03:00
Laszlo Ersek	656a9468bb	HACKING.md: fix typo in "--overprovisioned" option name Grepped the tree for "--overprovisioned" (coming from <https://university.scylladb.com/courses/scylla-essentials-overview/lessons/high-availability/topic/consistency-level-demo-part-1/>), and noticed that this instance was not matched by grep (while another one just below was). Fixes: `4f838a82e2` Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19458	2024-06-25 12:11:28 +03:00
Kefu Chai	adca415245	bytes: drop unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. the callers in alternator/streams.cc is updated to use `fmt::print()` to format the `bytes` instances. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19448	2024-06-25 12:11:28 +03:00
Kefu Chai	94e36d4af4	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. this change addresses the leftover of 850ee7e170a. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19467	2024-06-25 12:11:28 +03:00
Benny Halevy	378578b481	chunked_vector_test: add tests for value-initialization constructor Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 12:08:11 +03:00
Benny Halevy	5bd2ee7507	utils: chunked_vector: add ctor from std::initializer_list Prepare for using utils::chunked_vector for dht::partition_range_vector Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 12:08:06 +03:00
Benny Halevy	7780af2e84	utils: chunked_vector: document invalidation of iterators on move chunked_vector differs from std::vector where the latter's move constructor is required to preserve and iterators to the moved-from vector. In contrast, chunked_vector::iterator keeps a pointer to the chunked_vector::_chunks data, which is a utils::small_vector, and when moved, it might invalidate the iterator since the moved-to _chunks might copy the contents of the internal capacity rather than moving the allocated capacity. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 11:44:50 +03:00
Botond Dénes	c7317be09a	db/config: introduce reader_concurrency_semahore_cpu_concurrency To allow increasing the semaphore's CPU concurrency, which is currently hard-limited to 1. Not wired yet.	2024-06-25 04:00:11 -04:00
Piotr Dulikowski	85219e9294	configure.py: fix the 'configure' rule generated during regeneration The Ninja makefile (build.ninja) generated by the ./configure.py script is smart enough to notice when the configure.py script is modified and re-runs the script in order to regenerate itself. However, this operation is currently not idempotent and quickly breaks because information about the Ninja makefile's name is not passed properly. This is the rule used for makefile's regeneration: ``` rule configure command = {python} configure.py --out={buildfile}.new $configure_args && mv {buildfile}.new {buildfile} generator = 1 description = CONFIGURE $configure_args ``` The `buildfile` variable holds the value of the `--out` option which is set to `build.ninja` if not provided explicitly. Note that regenerating the makefile passes a name with the `.new` suffix added to the end; we want to first write the file in full and then overwrite the old file via a rename. However, notice that the script was called with `--out=build.ninja.new`; the `configure` rule in the regenerated file will have `configure.py --out=build.ninja.new.new` and then `mv build.ninja.new.new build.ninja.new`. So, second regeneration will just leave a build.ninja.new file which is not useful. Fix this by introducing an additional parameter `--out-final-name`. This parameter is only supposed to be used in the regeneration rule and its purpose is to preserve information about the original file name. After this change I no longer see `build.ninja.new` being created after a sequence of `touch configure.py && ninja` calls. Closes scylladb/scylladb#19428	2024-06-24 21:20:32 +03:00
Laszlo Ersek	a4c6ae688a	install-dependencies.sh: set file mode creation mask to 0022 The docs [1] clearly say "install-dependencies.sh" should be run as "root"; however, the script silently assumes that the umask inherited from the calling environment is 0022. That's not necessarily the case, and there's an argument to be made for "root" setting umask 0077 by default. The script behaves unexpectedly under such circumstances; files and directories it creates under /opt and /usr/local are then not accessible to unprivileged users, leading to compilation failures later on. Set the creation mask explicitly to 0022. [1] https://github.com/scylladb/scylladb/blob/master/HACKING.md#dependencies Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19464	2024-06-24 19:46:15 +03:00
Marcin Maliszkiewicz	a4e26585e5	git: add build.ninja.new to .gitignore Since some time executing our ninja build targets generates also build.ninja.new file. Adding it to .gitignore for convenience as we won't commit this file. Closes scylladb/scylladb#19367	2024-06-24 16:48:50 +03:00
Kefu Chai	e61061d19f	test.py: improve help message on tests selection Since `3afbd21f`, we are able to selectively choose a single test in a boost test executable which represents a test suite, and to choose a single test in a pytest script with the syntax of "test_suite::test_case". it's very handy for manual testing. so let's document in the command line help message as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19454	2024-06-24 14:27:02 +03:00
Kefu Chai	e9d8c25e86	alternator: define static variable before this change, when linking an executable referencing `marker`, we could have following error: ``` 13:58:02 ld.lld: error: undefined symbol: alternator::event_id::marker 13:58:02 >>> referenced by streams.cc 13:58:02 >>> build/dev/alternator/streams.o:(from_string_helper<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>, alternator::event_id>::Set(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&, alternator::event_id, rjson::internal::throwing_allocator&)) 13:58:02 clang-16: error: linker command failed with exit code 1 (use -v to see invocation) ``` it turns out `event_id::marker` is only declared, but never defined. please note, the non-inline static member variable in its class definition is not considered as a definition, see [class.static.data](https://eel.is/c++draft/class.static.data#3) > The declaration of a non-inline static data member in its class > definition is not a definition and may be of an incomplete type > other than cv void. so, let's declare it as a `constexpr` instead. it implies `inline`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19452	2024-06-24 13:15:00 +03:00
Kefu Chai	af2b0b030b	test/pylib: use raw string to avoid using escape sequence before this change, when running test like: ```console ./test.py --mode release topology_experimental_raft/test_tablets /home/kefu/dev/scylladb/test/pylib/scylla_cluster.py:333: SyntaxWarning: invalid escape sequence '$' deleted_sstable_re = f"^./{keyspace}/{table}-[0-9a-f]{{32}}/. \(deleted$$" ``` we could have the warning above. because `\(` is not a valid escape sequence, but the Python interpreter accepts it as two separated characters of `\(` after complaining. but it's still annoying. so, let's use a raw string here, as we want to match "(deleted)". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19451	2024-06-24 11:11:44 +03:00
Lakshmi Narayanan Sreethar	a09556a49f	bloom_filter_test: add testcase to verify filter rebuilds Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:11:37 +05:30
Lakshmi Narayanan Sreethar	4aa5698f0d	test/boost: move bloom filter tests from sstable_datafile_test into a new file Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	21e463b108	sstables/mx/writer: rebuild bloom filters with bad partition estimates The bloom filters are built with partition estimates, as the actual partition count might not be available in all the cases. If the estimate was bad, the bloom filters might end up too large or too small than their optimal sizes. Rebuild such bloom filters with actual partition count before the filter is written to disk and the sstable is sealed. Fixes #19049 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	afc90657d6	sstables/mx/writer: add variable to track number of partitions consumed Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	fccb1a11e5	sstable: introduce sstable::maybe_rebuild_filter_from_index() Add method sstable::maybe_rebuild_filter_from_index() that rebuilds bloom filters which had bad partition estimates when they were built. The method checks the false positive rate based on the current bitset size against the configured false positive rate to decide whether a filter needs to be rebuilt. If the current false positive rate is within 75% to 125% of the configured false positive rate, the bloom filter will not be rebuilt. Otherwise, the filter will be rebuilt from the index entries. This method should only be called before an SSTable is sealed as the bloom filter is updated in-place. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	a7d77f6304	sstable: add method to return filter format for the given sstable version Extract out the filter format computing logic from sstable::read_filter into a separate function. This is done so that the subsequent patches can make use of this function. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:01 +05:30
Botond Dénes	6dd6f0198e	utils/i_filter: introduce get_filter_size() Currently, the only way to get the size of a filter, for certain parameters is to actually create one. This requires a seastar thread context and potentially also allocates huge amount of memory. Provdide a method which just calculates the size, without any of the above mentioned baggage. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:01 +05:30
Kefu Chai	a230ecc4eb	utils/murmur_hash: replace rotl64() with std::rotl() since we are now able to use C++20, there is no need to use the homebrew rotl64(). so in this change, we replace rotl64() with std::rotl(), and remove the former from the source tree. the underlying implementations of these two solutions are equivalent, so no performance changes are expected. all caller sites have been audited: all of them pass `uint64` as the first parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19447	2024-06-24 08:24:43 +03:00
Marcin Maliszkiewicz	794440eb85	test: skip checking default role in test_auth_v2_migration Default role creation in auth-v1 is asynchronous and all nodes race to create it so we'd need to delay the test and wait. Checking this particular role doesn't bring much value to the test as we check other roles to demonstrate correctness. Fixes scylladb/scylladb#19039 Closes scylladb/scylladb#19424	2024-06-23 19:50:55 +03:00
Avi Kivity	0d52f0684a	Merge 'Sanitize gossiper API endpoints management' from Pavel Emelyanov Gossiper has two blocs of endpoints, both are registered in legacy/random place in main. This PR moves them next to gossiper start and adds unregistration for both. refs: #2737 Closes scylladb/scylladb#19425 * github.com:scylladb/scylladb: api: Remove dedicated failure_detector registration method api: Move failure_detector endpoints set/unset to gossiper api: Unset failure detector endpoints method api: (Un)Register gossiper API in correct place api: Unset gossiper endpoints on stop asi: Coroutinize set_server_gossip()	2024-06-23 19:35:11 +03:00
Kefu Chai	850ee7e170	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19429	2024-06-23 19:25:23 +03:00
Kefu Chai	72fdee1efb	README.md: add badges for cron jobs these jobs are scheduled to verify the builds of scylla, like if it builds with the latest Seastar, if scylla can generated reproducible builds, and if it builds with the nightly build of clang. the failure of these workflow are not very visible without clicking into the corresponding workflow in https://github.com/scylladb/scylladb/actions. in this change, we add their badges in the testing section of README.md, so one can identify the test failures of them if any, Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19430	2024-06-23 19:24:40 +03:00
Kefu Chai	a7e38ada8e	test: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19432	2024-06-23 18:02:52 +03:00
zhouxiang	694014591a	test/alternator/test_projection_expression.py: remove useless comparisons pytest.raises expects a block of code that will raise an exception, not a comparison of results. Closes scylladb/scylladb#19436	2024-06-23 13:53:14 +03:00
Pavel Emelyanov	d8009ed843	api/cache_service: Don't use database to perform map+reduce on The sharded<database> is used as a map_reduce0() method provider, there's no real need in database itself. Simple smp::map_reduce() would work just as good. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19364	2024-06-21 19:47:25 +03:00
Kefu Chai	f781c3babe	.github: add reproducible-build workflow to verify that scylla builds are reproducible. the new workflow builds scylla twice with master HEAD, and compares the md5sums of the built scylla executables. it fails if the md5sum:s do not match. this workflow is triggered at 5AM every Friday. its status can be found at https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml after it's built for the first time. Refs #19225 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19409	2024-06-21 19:39:37 +03:00
Nadav Har'El	81a02f06dd	test/cql-pytest: add more tests for SELECT's LIMIT SELECT's "LIMIT" feature is tested in combination with other features in different test/cql-pytest/*.py source files - for examples the combination of LIMIT and GROUP BY is tested in test_group_by.py. This patch adds a new test file, test_limit.py, for testing aspects basic usage of LIMIT that weren't already tested in other files. The new file also has a comment saying where we have other tests for LIMIT combined with other features. All the new tests pass (on both Scylla and Cassandra). But they can be useful as regression tests to test patches which modify the behavior of LIMIT - e.g., pull reques #18842. This patch also adds another test in test_group_by.py. This adds to one of the tests for the combination of LIMIT and GROUP BY (in this case, GROUP BY of clustering prefix, no aggregation) also a check for paging, that was previously missing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19392	2024-06-21 19:35:15 +03:00
Pavel Emelyanov	755be887a6	api: Remove dedicated failure_detector registration method It's now empty and can be dropped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:54 +03:00
Pavel Emelyanov	2bfa1b3832	api: Move failure_detector endpoints set/unset to gossiper These two api functions both need gossiper service and only it, and thus should have set/unset calls next to each other. It's worth putting them into a single place Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:54 +03:00
Pavel Emelyanov	88a6094121	api: Unset failure detector endpoints method There's one more set of endpoints that need gossiper -- the failure_detector ones. They are registered, but not unregistered, so here's the method to do it. It's not called by any code yet, because next patch would need to rework the caller anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	f84694166e	api: (Un)Register gossiper API in correct place Each service's endpoints are to be registered just after the service itself, so should gossiper's Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	19f3a9805a	api: Unset gossiper endpoints on stop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	c7547b9c7e	asi: Coroutinize set_server_gossip() One of the next patches will add more async calls here, so not to create then-chains, convert it into a coroutine Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Kefu Chai	eef64a6bb8	build: cmake: do not add "absl::headers" to include dirs `absl::headers` is a library, not the path to its headers. before this change, the command lines of genereated build rule look like: ``` -I/home/kefu/dev/scylladb/repair/absl::headers ``` this does not hurt, as other libraries might add the intended include dir to the compiler command line, but this is just wrong. so let's remove it. please note, `repair` target already links against `absl::headers`. so we don't need to add `absl::headers` to its linkage again. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19384	2024-06-21 19:22:17 +03:00
Kefu Chai	7b10cc8079	treewide: include seastar headers with brackets this change was created in the same spirit of `ebff5f5d`. despite that we include Seastar as a submodule, Seastar is not a part of scylla project. so we'd better include its headers using brackets. `ebff5f5d` addressed this cosmetic issue a while back. but probably clangd's header-insertion helped some of contributor to insert the missing headers with `"`. so this style of `include` returned to the tree with these new changes. unfortunately, clangd does not allow us to configure the style of `include` at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19406	2024-06-21 19:20:27 +03:00
Kefu Chai	987fd59f21	test: correct some misspellings fix a typo in source code. this typo was identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19412	2024-06-21 19:16:11 +03:00
Kefu Chai	52693fc21c	Update seastar submodule * seastar 9ce62705...908ccd93 (42): > include/seastar: do not include unused headers > timer-set: Add missing sanity headers > tutorial.md: fix typos > Update tutorial.md to reflect update preemption methods > tutorial.md: remove trailing whitespace > json: Add a test for jsonable objects > json: Make formatter::write(vector/map/umap) copy their arguments > json: Make formatter call write for jsonable > test: futures: verify stream yields the consumed value > build: add pyyaml to install-dependencies.sh > stall-analyser: remove unused variable > stall-analyser: use itertools.dropwhile when appropriate > scripts: sort packages alphanumerically > docker: bind the file instead of copying during the build stage > docker: lint dockerfile > dns: use undeprecated c-ares APIs > stall-analyser: use argparse.FileType when appropriate > http/client: Retry request over fresh connection in case old one failed > http/client: Fix indentation after previous patch > http/client: Pass request and handle by reference > http/client: Introduce make_new_connection() > http/client: Fix parser result checking > http/client: Document max_connections > test/http: Generalize http connection factory > loopback_socket: Shutdown socket on EOF close > loopback_socket: Rename buffer's shutdown() to abort() > test: Add test for sharded<>::invoke_on_...() compilation > net/tls: Added additional error codes > io-tester.md: update available parameters for job description > io_tester: expose extent_allocation_size_hint via job param > file: Unfriend reactor class > memory.cc: fix cross-shard shrinking realloc > sharded: Mark invoke_on_others() helper lambda mutable > scheduling: Unfriend reactor from scheduling_group_key > reactor: Make allocate_scheduling_group_specific_data() accept key_id argument > reactor: Add local key_id variable to allocate_scheduling_group_specific_data() > timer: Unfriend reactor > reactor: Generalize timer removal > timer: Add type alias for timer_set > reactor: Move reactor::complete_timers() to timer_set > tests: test protobuf support in prometheus_test.py > tests: enable prometheus_test.py to test metrics without aggregation Closes scylladb/scylladb#19405	2024-06-21 18:52:58 +03:00
Dawid Medrek	2446cce272	db/hints: Initialize endpoint managers only for valid hint directories Before these changes, it could happen that Scylla initialized endpoint managers for hint directories representing * host IDs before migrating hinted handoff to using host IDs, * IP addresses after the migration. One scenario looked like this: 1. Start Scylla and upgrade the cluster to using host IDs. 2. Create, by hand, a hint directory representing an IP address. 3. Trigger changing the host filter in hinted handoff; it could be achieved by, for example, restricting the set of data centers Scylla is allowed to save hints for. When changing the host filter, we browse the hint directories and create endpoint managers if we can send hints towards the node corresponding to a given hint directory. We only accepted hint directories representing IP addresses and host IDs. However, we didn't check whether the local node has already been upgraded to host-ID-based hinted handoff or not. As a result, endpoint managers were created for both IP addresses and host IDs, no matter whether we were before or after the migration. These changes make sure that any time we browse the hint directories, we take that into account. Fixes scylladb/scylladb#19172 Closes scylladb/scylladb#19173	2024-06-21 15:59:49 +02:00
Avi Kivity	3cfb0503a9	Update tools/cqlsh submodule for v6.0.21-scylla * tools/cqlsh 0d58e5c...ba83aea (1): > requirements: update scylla-driver	2024-06-21 16:04:21 +03:00
Piotr Dulikowski	cf2b4bf721	Merge 'cdc: do not include unused headers' from Kefu Chai also add `auth` and `cdc` to iwyu's `CLEANER_DIR` setting. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19410 * github.com:scylladb/scylladb: .github: add auth and cdc to iwyu's CLEANER_DIR cdc: do not include unused headers	2024-06-21 13:44:40 +02:00
Pavel Emelyanov	0330640b4d	api: Use provided db::config, not the one from ctx The set_server_config() already has the db::config reference for endpoints to work with, there's no need to obtain one via ctx and database. This change kills two birds with one stone -- less users of database as config provider, less places that need http context -> database dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:30:54 +03:00
Pavel Emelyanov	afb48d8ab9	api: Move some config endpoints from proxy to config Those getting (and setting, but these are not implemented) various timeouts work on config, whilst register themselves in storage_proxy function. Since the "service" they need to work with is config, move the endpoints to config endpoints code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:29:38 +03:00
Pavel Emelyanov	0aad406a2f	api: Split storage_proxy api registration The set_server_storage_proxy() does two things -- registers storage_proxy "function" and sets proxy routes, that depend on it. Next patches will move some /storage_proxy/... endpoints registration to earlier stage, so the function should be ready in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:28:29 +03:00
Pavel Emelyanov	473cb62a9a	api: Unset config endpoints The set_server_config() needs the stop-time peer, here it is. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:28:06 +03:00
Kefu Chai	c429a8d8ae	sstables: use "me" sstable format by default in `7952200c`, we changed the `selected_format` from `mc` to `me`, but to be backward compatible the cluster starts with "md", so when the nodes in cluster agree on the "ME_SSTABLE_FORMAT" feature, the format selector believes that the node is already using "ME", which is specified by `_selected_format`. even it is actually still using "md", which is specified by `sstable_manager::_format`, as changed by `54d49c04`. as explained above, it was specified to "md" in hope to be backward compatible when upgrading from an existign installation which might be still using "md". but after a second thought, since we are able to read sstables persisted with older formats, this concern is not valid. in other words, `7952200c` introduced a regression which changed the "default" sstable format from `me` to `md`. to address this, we just change `sstable_manager::_format` to "me", so that all sstables are created using "me" format. a test is added accordingly. Fixes #18995 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19293	2024-06-21 12:56:01 +03:00
Yaron Kaikov	57428d373b	[actions] fix sync label from PR to linked issue in `b8c705bc54` i modified the even name to `pull_request_target`, This caused skipping sync process when PR label was added/removed Fixing it Closes scylladb/scylladb#19408	2024-06-21 11:39:44 +03:00
Kamil Braun	627d566811	Merge 'join_token_ring, gossip topology: recalculate sync nodes in wait_alive' from Patryk Jędrzejczak The node booting in gossip topology waits until all NORMAL nodes are UP. If we removed a different node just before, the booting node could still see it as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop. Although the issue fixed by this PR caused only test flakiness, it could also manifest in real clusters. It's best to backport this PR to 5.4 and 6.0. Fixes scylladb/scylladb#17526 Closes scylladb/scylladb#19387 * github.com:scylladb/scylladb: join_token_ring, gossip topology: update obsolete comment join_token_ring, gossip topology: fix indendation after previous patch join_token_ring, gossip topology: recalculate sync nodes in wait_alive	2024-06-21 10:22:32 +02:00
Piotr Dulikowski	c3536015e4	Merge 'cql3/statement/select_statement: do not parallelize single-partition aggregations' from Michał Jadwiszczak This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request. Fixes scylladb/scylladb#19349 Closes scylladb/scylladb#19350 * github.com:scylladb/scylladb: test/boost/cql_query_test: add test for single-partition aggregation cql3/select_statement: do not parallelize single-partition aggregations	2024-06-21 08:50:00 +02:00
Kefu Chai	694fe58d6e	.github: add auth and cdc to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-21 14:29:48 +08:00
Kefu Chai	1a4740ddc0	cdc: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-21 14:29:48 +08:00
Avi Kivity	fdc1449392	treewide: rename flat_mutation_reader_v2 to mutation_reader flat_mutation_reader_v2 was introduced in a pair of commits in 2021: `e3309322c3` "Clone flat_mutation_reader related classes into v2 variants" `08b5773c12` "Adapt flat_mutation_reader_v2 to the new version of the API" as a replacement for flat_mutation_reader, using range_tombstone_change instead of range_tombstone to represent represent range tombstones. See those commits for more information. The transition was incremental; the last use of the original flat_mutation_reader was removed in 2022 in commit `026f8cc1e7` "db: Use mutation_partition_v2 in mvcc" In turn, flat_mutation_reader was introduced in 2017 in commit `748205ca75` "Introduce flat_mutation_reader" To transition from a mutation_reader that nested rows within a partition in a separate stream, to a flat reader that streamed partitions and rows in the same stream. Here, we reclaim the original name and rename the awkward flat_mutation_reader_v2 to mutation_reader. Note that mutation_fragment_v2 remains since we still use the original for compatibilty, sometimes. Some notes about the transition: - files were also renamed. In one case (flat_mutation_reader_test.cc), the rename target already existed, so we rename to mutation_reader_another_test.cc. - a namespace 'mutation_reader' with two definitions existed (in mutation_reader_fwd.hh). Its contents was folded into the mutation_reader class. As a result, a few #includes had to be adjusted. Closes scylladb/scylladb#19356	2024-06-21 07:12:06 +03:00
Avi Kivity	185338c8cf	Merge 'Reduce TWCS off-strategy space overhead' from Raphael "Raph" Carvalho Normally, the space overhead for TWCS is 1/N, where is number of windows. But during off-strategy, the overhead is 100% because input sstables cannot be released earlier. Reshaping a TWCS table that takes ~50% of available space can result in system running out of space. That's fixed by restricting every TWCS off-strategy job to 10% of free space in disk. Tables that aren't big will not be penalized with increased write amplification, as all input (disjoint) sstables can still be compacted in a single round. Fixes #16514. Closes scylladb/scylladb#18137 * github.com:scylladb/scylladb: compaction: Reduce twcs off-strategy space overhead to 10% of free space compaction: wire storage free space into reshape procedure sstables: Allow to get free space from underlying storage replica: don't expose compaction_group to reshape task	2024-06-20 18:51:25 +03:00
Kefu Chai	42b9784650	build: cmake: mark wasm "ALL" so that "wasm" target is built. "wasm" generates the text format of wasm code. and these wasm applications are used by the test_wasm tests. the rules generated by `configure.py` adds these .wat files as a dependency of `{mode}-build`, which is in turn a dependency of `{mode}`. in this change, let's mirror this behavior by making `wasm` ALL, so it is built by the default target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19391	2024-06-20 18:45:31 +03:00
Kefu Chai	caf1149f11	cql-pytest/test_sstable: do not import unused modules Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19389	2024-06-20 17:14:28 +03:00
Avi Kivity	02cf17f4dc	Merge 'Sanitize load_meter API handlers management' from Pavel Emelyanov The service in question is pretty small one, but it has its API endpoint that lives in /storage_service group. Currently when a service starts and has any endpoints that depend on it, the endpoint registration should follow it (#2737). Here's the PR that does it for load meter. Another goal of this change is that http context now has one less dependency onboard. Closes scylladb/scylladb#19390 * github.com:scylladb/scylladb: api: Remove ctx->load_meter dependency api: Use local load_meter reference in handlers api: Fix indentation after previous patch api: Coroutinize load_meter::get_load_map handler api: Move load meter handlers api: Add set/unset methods for load_meter	2024-06-20 17:07:19 +03:00
Gleb Natapov	7bc05c3880	gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization When a node bootstraps it may happen that some nodes still see it as bootstrapping while the node itself already is in normal state and ready to serve queries. We want to delay the bootstrap completion until all nodes see the new node as normal. Piggy back on UP notification to do so and what of the node that sent the notification to be seen as normal. Fixes #18678	2024-06-20 16:37:56 +03:00
Anna Stuchlik	027cf3f47d	doc: remove the link to Scylladb Google group The group is no longer active and should be removed from resources. Closes scylladb/scylladb#19379	2024-06-20 15:31:03 +02:00
Yaron Kaikov	f2705b3887	[action] add github context info for better debugging It seems that we skip the sync label process between PR and linked Issues Adding those debug prints will allow us to understand why Closes scylladb/scylladb#19393	2024-06-20 16:17:04 +03:00
Gleb Natapov	28c0a27467	Wait for booting node to be marked UP before complete booting. Currently a node does not wait to be marked UP by other nodes before complete booting which creates a usability issue: during a rolling restart it is not enough to wait for local CQL port to be opened before restarting next node, but it is also needed to check that all other nodes already see this node as alive otherwise if next node is restarted some nodes may see two node as dead instead of one. This patch improves the situation by making sure that boot process does not complete before all other nodes do not see the booting one as alive. This is still a best effort thing: if some nodes are unreachable or gossiper propagation takes too much time the boot process continues anyway. Fixes scylladb/scylladb#19206	2024-06-20 14:55:40 +03:00
Pavel Emelyanov	de80094815	Merge 'treewide: remove unused operator<<' from Kefu Chai since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. there are more occurrences of unused operator<< in the tree, but let's do the cleanup piecemeal. --- this is a cleanup, so no need to backport Closes scylladb/scylladb#19346 * github.com:scylladb/scylladb: types: remove unused operator<< node_ops: remove unused operator<< lang: remove unused operator<< gms: remove unused operator<< dht: remove unused operator<< test: do not use operator<< for std::optional	2024-06-20 13:18:59 +03:00
Pavel Emelyanov	873d76c02b	api: Remove ctx->load_meter dependency Now the API uses captured reference and the explicit dependency is not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:38:28 +03:00
Pavel Emelyanov	d85e70ef98	api: Use local load_meter reference in handlers Now it uses ctx.lm dependency, but the idiomatic way for API is to use the argument one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:48 +03:00
Pavel Emelyanov	bc5e360066	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:39 +03:00
Pavel Emelyanov	e54f651beb	api: Coroutinize load_meter::get_load_map handler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:18 +03:00
Pavel Emelyanov	40c178bee2	api: Move load meter handlers Now they are in storage service set/unset helper, but there's the dedicated set/unset pair for meter's enpoints. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:36:38 +03:00
Pavel Emelyanov	724d62aa87	api: Add set/unset methods for load_meter The meter is pretty small sevice and its API is also tiny. Still, it's a standalone top-level service, and its API should come next to it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:35:58 +03:00
Botond Dénes	b09196ac49	Merge 'tasks: fix tasks abort' from Aleksandra Martyniuk Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: #19304. Requires backport to all versions containing task manager Closes scylladb/scylladb#19305 * github.com:scylladb/scylladb: test: add test for abort while a task is being unregistered tasks: fix tasks abort	2024-06-20 12:09:30 +03:00
Kefu Chai	1a724f22f9	mutation: silence false alarm from clang-tidy before this change, because it seems that we move away from `p2` in each iteration, so the succeeding iterations are moving from an empty `p2`, clang-tidy warns at seeing this. but we only move from `p2._static_row` in the first iteration when the dest `mutation_partition` instance's static row is empty. and in the succeeding iterations, the dest `mutation_partition` instance's static row is not empty anymore if it is set. so, this is a false alarm. in this change, we silence this warning. another option is to extract the single-shot mutation out of the loop, and pass the `std::move(p2)` only for the single-shot mutation, but that'd be a much more intrusive change. we can revisit this later. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19331	2024-06-20 12:05:20 +03:00
Kefu Chai	9f0b60c7a0	rust: disable incremental build for release build so that the release build is reproducible. a reproduciable helps developers to perform postmortem debugging. Fixes #19225 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19374	2024-06-20 12:01:14 +03:00
Patryk Jędrzejczak	bcc0a352b7	join_token_ring, gossip topology: update obsolete comment The code mentioned in the comment has already been added. We change the comment to prevent confusion.	2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak	7735bd539b	join_token_ring, gossip topology: fix indendation after previous patch	2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak	017134fd38	join_token_ring, gossip topology: recalculate sync nodes in wait_alive Before this patch, if we booted a node just after removing a different node, the booting node may still see the removed node as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop.	2024-06-20 10:59:49 +02:00
Anna Stuchlik	680405b465	doc: separate Entrprise- from OSS-only content This commit adds files that contain Open Source-specific information and includes these files with the .. scylladb_include_flag:: directive. The files include a) a link and b) Table of Contents. The purpose of this update is to enable adding Open Source/Enterprise-specific information in the Reference section. Closes scylladb/scylladb#19362	2024-06-20 11:58:32 +03:00
Piotr Dulikowski	75441ee120	Merge 'mv: fix value of the gossiped view update backlog' from Wojciech Mitros Currently, when calculating the view update backlog for gossip, we start with `db::view::update_backlog()` and compare it to backlogs from all shards. However, this backlog can't be compared to other backlogs - it has size 0 and we compare the fraction current/size when comparing backlogs, causing us to compare with `NaN`. This patch fixes it by starting the comparisons with an empty backlog. The patch introducing this issue (`f70f774e40`) wasn't backported, so this one doesn't need to be either Closes scylladb/scylladb#19247 * github.com:scylladb/scylladb: mv: make the view update backlog unmofidiable mv: fix value of the gossiped view update backlog	2024-06-20 06:27:11 +02:00
Piotr Dulikowski	78a40dbe2c	Merge 'cql: remove global_req_id from schema_altering_statement' from Marcin Maliszkiewicz Such field is no longer needed as the information comes directly from group0_batch. Fixes scylladb/scylladb#19365 Backport: no, we don't backport code cleanups Closes scylladb/scylladb#19366 * github.com:scylladb/scylladb: cql: remove global_req_id from schema_altering_statement cql: switch alter keyspace prepare_schema_mutations to use group0_batch	2024-06-20 06:21:48 +02:00
Dawid Medrek	c56de90a26	test/boost/hint_test.cc: Add missing parse() callback Before these changes, compilation was failing with the following error: In file included from test/boost/hint_test.cc:12: /usr/include/fmt/ranges.h:298:7: error: no member named 'parse' in 'fmt::formatter<db::hints::sync_point::host_id_or_addr>' 298 \| f.parse(ctx); \| ~ ^ We add the missing callback. Closes scylladb/scylladb#19375	2024-06-19 23:19:33 +02:00
Wojciech Mitros	cde14a5788	mv: make the view update backlog unmofidiable Currently, a view update backlog may reach an invalid state, when its max is 0 and its relative_size() is NaN as a result. This can be achieved either by constructing the backlog with a 0 max or by modifying the max of an existing backlog. In particular, this happens when creating the backlog using the default constructor. In this patch the the default constructor is deleted and a check is added to make sure that the max is different than 0 is added to its constructor - if the check fails, we construct an empty backlog instead, to handle the possibility of getting an invalid backlog sent from a node with a version that's missing this check. Additionally, we make the backlogs members private, exposing them only through const getters.	2024-06-19 19:44:57 +02:00
Pavel Emelyanov	5fe4290f66	gitattributes: Mark swagger .js files as binary The goal is the same as in `29768a2d02` (gitattributes: Mark *.svg as binary) -- prevent grep from searching patterns in those files. Despite those files are, in fact, javascript code, the way they are formatted is not suitable for human reading, so it's unlikely that anyone would be interested in grep-ing patters in it. At the same time, those files consist of of very long lines, so if a grep finds a pattern in one of those, the output is spoiled. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19357	2024-06-19 15:07:56 +03:00
Botond Dénes	9d1fa828be	Merge 'utils/large_bitset: replace reserve_partial with utils::reserve_gently' from Lakshmi Narayanan Sreethar Replace the reserve_partial loop in large_bitset constructor with a new function - reserve_gently() that can reserve memory without stalling by repeatedly calling reserve_partial() method of the passed container. Closes scylladb/scylladb#19361 * github.com:scylladb/scylladb: utils/large_bitset: replace reserve_partial with utils::reserve_gently utils/stall_free: introduce reserve_gently	2024-06-19 14:31:59 +03:00
Michał Jadwiszczak	8eb5ca8202	test/boost/cql_query_test: add test for single-partition aggregation	2024-06-19 09:24:17 +02:00
Piotr Dulikowski	7567b87e72	Merge 'auth: reuse roles select query during cache population' from Marcin Maliszkiewicz With big number of shards in the cluster (e.g. 500+) due to cache periodic refresh we experience high load on role_permissions table (e.g. 1k op/s). The load on roles table is amplified because to populate single entry in the cache we do several selects on roles table. Some of this can't be avoided because roles are arranged in a tree-like structure where permissions can be inherited. This patch tries to reuse queries which are simply duplicated. It should reduce the load on roles table by up to 50%. Fixes scylladb/scylladb#19299 Closes scylladb/scylladb#19300 * github.com:scylladb/scylladb: auth: reuse roles select query during cache population auth: coroutinize service::get_uncached_permissions auth: coroutinize service::has_superuser	2024-06-19 07:53:47 +02:00
Marcin Maliszkiewicz	56707e2965	cql: remove global_req_id from schema_altering_statement Such field is no longer needed as the information comes directly from group0_batch. Fixes scylladb/scylladb#19365	2024-06-18 20:26:09 +02:00
Lakshmi Narayanan Sreethar	9ad800cfb9	utils/large_bitset: replace reserve_partial with utils::reserve_gently Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-18 23:36:30 +05:30
Lakshmi Narayanan Sreethar	31414f54c6	utils/stall_free: introduce reserve_gently Add reserve_gently() that can reserve memory without stalling by repeatedly calling reserve_partial() method of the passed container. Update the comments of existing reserve_partial() methods to mention this newly introduced reserve_gently() wrapper. Also, add test to verify the functionality. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-18 23:36:30 +05:30
Marcin Maliszkiewicz	685aecde61	cql: switch alter keyspace prepare_schema_mutations to use group0_batch This is needed to simplify the code in the following commit.	2024-06-18 19:54:55 +02:00
Michał Jadwiszczak	e9ace7c203	cql3/select_statement: do not parallelize single-partition aggregations Currently reads with WHERE clause which limits them to be single-partition reads, are unnecessarily parallelized. This commit checks this condition and the query doesn't use forward_service in single-partition reads.	2024-06-18 19:21:32 +02:00
Pavel Emelyanov	f7d5d4877c	Merge '[test.py] Fix several issues in log gathering' from Andrei Chekun Related: https://github.com/scylladb/scylladb/issues/17851 Fix the issue that test logs were not deleted Fix the issue that the URL to the failed test directory was incorrectly shown even when artifacts_dir_url option was not provided Fix the issue that there were no node logs when it failed to join the cluster Closes scylladb/scylladb#19115 * github.com:scylladb/scylladb: [test.py] Fix logs had multiplication of lines [test.py] Fix log not deleted [test.py] Fix log for failed node was nod added to failed directory [test.py] Fix URl for failed logs directory in CI	2024-06-18 15:37:29 +03:00
Aleksandra Martyniuk	50cb797d95	test: add test for abort while a task is being unregistered	2024-06-18 13:41:51 +02:00
Aleksandra Martyniuk	3463f495b1	tasks: fix tasks abort Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: #19304.	2024-06-18 13:39:29 +02:00
Botond Dénes	2123b22526	Merge 'doc: add 6.x.y to 6.x.z and remove 5.x.y to 5.x.z upgrade guide' from Anna Stuchlik This PR removes the 5.x.y to 5.x.z upgrade guide and adds the 6.x.y to 6.x.z upgrade guide. The previous maintenance upgrade guides, such as from 5.x.y to 5.x.z, consisted of several documents - separate for each platform. The new 6.x.y to 6.x.z upgrade guide is one document - there are tabs to include platform-specific information (we've already done it for other upgrade guides as one generic document is more convenient to use and maintain). I did not modify the procedures. At some point, they have been reviewed for previous upgrade guides. Fixes https://github.com/scylladb/scylladb/issues/19322 - This PR must be backported to branch-6.0, as it adds 6.x specific content. Closes scylladb/scylladb#19340 * github.com:scylladb/scylladb: doc: remove the 5.x.y to 5.x.z upgrade guide doc: add the 6.x.y to 6.x.z upgrade guide-6	2024-06-18 14:24:38 +03:00
Wojciech Mitros	1de5566cfa	mv: fix value of the gossiped view update backlog Currently, when calculating the view update backlog for gossip, we start with `db::view::update_backlog()` and compare it to backlogs from all shards. However, this backlog can't be compared to other backlogs - it has size 0 and we compare the fraction current/size when comparing backlogs, causing us to compare with `NaN`. This patch fixes it by starting the comparisons with an empty backlog.	2024-06-18 13:15:18 +02:00
Kefu Chai	87247c6542	.github: add workflow to build with latest seastar so we can be awared that if scylla builds with seastar master HEAD, and to be prepared if a build failure is found. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19135	2024-06-18 13:34:43 +03:00
Andrei Chekun	6a4b441bf2	[test.py] Fix logs had multiplication of lines Since the test name was not unique across the run and when we were using a --repeat option, there were several handlers for the same file. With this change test name and accordingly, the log name will be different for the same test but different repeat case. Remove mode from the test name since it's already in mode directory.	2024-06-18 11:14:07 +02:00
Andrei Chekun	b01a5f9bd9	[test.py] Fix log not deleted One of the created log files was not deleted at all, because there was no delete command. Unlink moved on later stage explicitly after removing the handler that writing to this file to avoid the possibility that something will be added after removing the file.	2024-06-18 11:14:01 +02:00
Kefu Chai	0a74d45425	build: cmake: add commitlog_cleanup_test in `94cdfcaa94`, we added commitlog_cleanup_test to `configure.py`, but didn't add it to the CMake building system. in this change, let's add it to the CMake building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19314	2024-06-18 12:12:28 +03:00
Kefu Chai	68ef7dda79	config: correct the comment on printable_to_json() seastar::format() does not use operator<< under the hood, it uses {fmt}, so update the comment accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19315	2024-06-18 12:08:59 +03:00
Nadav Har'El	2ec1e0f0d5	test/cql-pytest: tests verifying UUID sort order In issue #15561 some doubts were raised regarding the way ScyllaDB sorts UUID values. This patch adds a heavily-commented cql-pytest test that helps understand - and verify that understanding - of the way Scylla sorts UUIDs, and shows there is some reason in the madness (in particular, Version 1 UUIDs (time uuids) are sorted like timeuuids, and not as byte arrays. The new tests check the different cases (see the comments in the test), and as usual for cql-pytest tests - they passes also on Cassandra, which allows us to confirm that the sort order we used is identical to the one used by Cassandra and not something that Scylla mis-implemented. Having this test in our suite will also ensure that the UUID ordering never changes accidentally in the future. If it ever changes, it can break access to existing tables that use UUID clustering keys, so it shouldn't change. Fixes #15561 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19343	2024-06-18 12:05:30 +03:00
Pavel Emelyanov	147552c34a	Merge 'configurable maintenance (streaming) semaphore count resource limit' from Botond Dénes Making the count resources on the maintenance (streaming) semaphore live update via config. This will allow us to improve repair speed on mixed-shard clusters, where we suspect that reader trashing -- due to the combination of high number of readers on each shard and very conservative reader count limit (10) -- is the main cause of the slowness. Making this count limit confgurable allows us to start experimenting with this fix, without committing to a count limit increase (or removal), addressing the pain in the field. Refs: #18269 No OSS backport needed. Closes scylladb/scylladb#19248 * github.com:scylladb/scylladb: replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit db/config: introduce maintenance_reader_concurrency_semaphore_count_limit reader_concurrency_semaphore: make count parameter live-update	2024-06-18 12:02:24 +03:00
Gleb Natapov	fb764720d3	topology coordinator: add more trace level logging for debugging Add more logging that provide more visibility into what happens during topology loading. Message-ID: <ZnE5OAmUbExVZMWA@scylladb.com>	2024-06-18 10:34:03 +02:00
Botond Dénes	1acc57e19d	Merge 'schema: Make "describe" use extensions to string' from Calle Wilund Fixes #19334 Current impl uses hardcoded printing of a few extensions. Instead, use extension options to string and print all. Note: required to make enterprise CI happy again. Closes scylladb/scylladb#19337 * github.com:scylladb/scylladb: schema: Make "describe" use extensions to string schema_extensions: Add an option to string method	2024-06-18 11:28:11 +03:00
Botond Dénes	495f7160da	Update tools/jmx submodule * tools/jmx 53696b13...3328a229 (1): > scylla-apiclient: add missing license for SBOM report	2024-06-18 11:11:57 +03:00
Kefu Chai	fd0de02b81	types: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	2c1a3e7191	node_ops: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	84f0fd6823	lang: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	ec5f0fccce	gms: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	51d686ea9f	dht: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 11:26:20 +08:00
Kefu Chai	ef0f4eaef2	test: do not use operator<< for std::optional we don't provide it anymore, and if any of existing type provides constructor accepting an `optional<>`, and hence can be formatted using operator<< after converting it, neither shall we rely on this behavior, as it is fragile. so, in this change, we switch to `fmt::print()` to use {fmt} to print `optional<>`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 10:41:48 +08:00
Andrei Chekun	3c921d5712	Add allure pytest adaptor to the toolchain Add allure-pytest pip dependency to be able to use it for generating the allure report later. Main benefits of the allure report: 1. Group test failures 2. Possibility to attach log files to she test itself 3. Timeline of test run 4. Test description on the report 5. Search by test name or tag [avi: regenerate toolchain] Closes scylladb/scylladb#19335	2024-06-17 23:17:01 +03:00
Nadav Har'El	4faceeaa33	Merge 'treewide: drop thrift support' from Kefu Chai thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> - [x] not a fix, no need to backport Closes scylladb/scylladb#18453 * github.com:scylladb/scylladb: config: expand on rpc_keepalive's description api: s/rpc/thrift/ db/system_keyspace: drop thrift_version from system.local table transport: do not return client_type from cql_server::connection::make_client_key() treewide: drop thrift support	2024-06-17 22:36:49 +03:00
Andrei Chekun	8845978ec5	[test.py] Unbreak cql-pytest and alternator Provide possibility to run pytest without explicitly providing mode parameter Closes scylladb/scylladb#19342	2024-06-17 21:41:09 +03:00
Piotr Dulikowski	85128c5b10	Merge 'cql3: always return created event in create keyspace statement' from Marcin Maliszkiewicz cql3: always return created event in create ks/table/type/view statement In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS and later USE KEYSPACE it can happen that schema in driver's session is out of sync because it synces when it receives special message from CREATE KEYSPACE response. Similar situation occurs with other schema change statements. In this patch we fix only create keyspace/table/type/view statements by always sending created event. Behavior of any other schema altering statements remains unchanged. Fixes https://github.com/scylladb/scylladb/issues/16909 backport: no, it's not a regression Closes scylladb/scylladb#18819 * github.com:scylladb/scylladb: cql3: always return created event in create ks/table/type/view statement cql3: auth: move auto-grant closer to resource creation code cql3: extract create ks/table/type/view event code	2024-06-17 19:58:38 +02:00
Anna Stuchlik	ea35982764	doc: remove the 5.x.y to 5.x.z upgrade guide This commit removes the upgrade guide from 5.x.y to 5.x.z. It is reduntant in version 6.x.	2024-06-17 17:28:39 +02:00
Anna Stuchlik	ead201496d	doc: add the 6.x.y to 6.x.z upgrade guide-6 This commit adds the upgrade guide from 6.x.y to 6.x.z.	2024-06-17 17:23:00 +02:00
Marcin Maliszkiewicz	95673907ca	auth: reuse roles select query during cache population With big number of shards in the cluster (e.g. 500+) due to cache periodic refresh we experience high load on role_permissions table (e.g. 1k op/s). The load on roles table is amplified because to populate single entry in the cache we do several selects on roles table. Some of this can't be avoided because roles are arranged in a tree-like structure where permissions can be inherited. This patch tries to reuse queries which are simply duplicated. It should reduce the load on roles table by up to 50%. Fixes scylladb/scylladb#19299	2024-06-17 16:46:33 +02:00
Marcin Maliszkiewicz	547eb6d59b	auth: coroutinize service::get_uncached_permissions	2024-06-17 16:46:28 +02:00
Marcin Maliszkiewicz	00a24507cb	auth: coroutinize service::has_superuser	2024-06-17 16:46:22 +02:00
Kefu Chai	a5a5ca0785	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19312	2024-06-17 17:33:55 +03:00
Yaniv Michael Kaul	9b0eb82175	dist/common/scripts/scylla_coredump_setup: fix typo Does not able -> Unable Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#19328	2024-06-17 17:33:46 +03:00
Kefu Chai	b64126fe1c	db: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19313	2024-06-17 17:33:31 +03:00
Calle Wilund	73abc56d79	schema: Make "describe" use extensions to string Fixes #19334 Current impl uses hardcoded printing of a few extensions. Instead, use extension options to string and print all.	2024-06-17 13:30:24 +00:00
Calle Wilund	d27620e146	schema_extensions: Add an option to string method Allow an extension to describe itself as the CQL property string that created it (and is serialized to schema tables) Only paxos extension requires override.	2024-06-17 13:30:10 +00:00
Gleb Natapov	09556bff0e	gossiper: move gossip verbs to the idl	2024-06-17 12:47:17 +03:00
Kefu Chai	7e9550e9f9	test/py/minio_server.py: do not reference non-existent old_env in `51c53d8db6`, we check `self.old_env[env]` for None, but there are chances `self.old_env` does not contain a value with `env`. in that case, we'd have following failure: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 307, in <module> asyncio.run(main()) File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 304, in main await server.stop() File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 274, in stop self._unset_environ() File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 211, in _unset_environ if self.old_env[env] is not None: ~~~~~~~~~~~~^^^^^ KeyError: 'S3_CONFFILE_FOR_TEST' ``` this happens if we run `pylib/minio_server.py` as a standalone application. in this change, instead of getting the value with index, we use `dict.get()`, so that it does not throw when the dict does not have the given key. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19291	2024-06-17 12:42:43 +03:00
Andrei Chekun	293cf355df	[test.py] Fix log for failed node was nod added to failed directory If something happens during nod adding to the cluster, it will not be registered as a part of the cluster. This leads to situations during log gathering that logs for a such node will be missing.	2024-06-17 11:16:55 +02:00
Andrei Chekun	7bbb8d9260	[test.py] Fix URl for failed logs directory in CI Incorrect passing of the artifacts_dir_url parameter from test.py to pytest leads to the situation when it will pass None as a string and pytest will generate incorrect URL.	2024-06-17 11:16:48 +02:00
Aleksandra Martyniuk	fb3153d253	api: task_manager: delete module from full_task_status Delete module field from full_task_status as it is unused. Closes scylladb/scylladb#18853	2024-06-17 09:03:19 +03:00
Nadav Har'El	9fc70a28ca	test: unflake test test_alternator_ttl_scheduling_group This test in topology_experimental_raft/test_alternator.py wants to check that during Alternator TTL's expiration scans, ALL of the CPU was used in the "streaming" scheduling group and not in the "statement" scheduling group. But to allow for some fluke requests (e.g., from the driver), the test actually allows work in the statement group to be up to 1% of the work. Unfortunately, in one test run - a very slow debug+aarch64 run - we saw the work on the statement group reach 1.4%, failing the test. I don't know exactly where this work comes from, perhaps the driver, but before this bug was fixed we saw more than 58% of the work in the wrong scheduling group, so neither 1% or 1.4% is a sign that the bug came back. In fact, let's just change the threshold in the test to 10%, which is also much lower than the pre-fix value of 58%, so is still a valid regression test. Fixes #19307 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19323	2024-06-17 08:39:38 +03:00
Yaron Kaikov	996be2e235	dbuild: update toolchain to get latest scylla-api-client a new Scylla-api-client was released to get a proper license information in our SBOM report, Refs: https://github.com/scylladb/scylla-jmx/issues/237 Closes scylladb/scylladb#19324	2024-06-17 08:37:49 +03:00
Dawid Medrek	670830091c	db/hints: Use dedicated functions to lock a shared mutex Seastar has functions implementing locking a `seastar::shared_mutex`. We should use those now instead of reimplementing them in Scylla. Closes scylladb/scylladb#19253	2024-06-14 20:31:37 +02:00
Kamil Braun	bbb424a757	Merge '[test.py] Add uniqueness to the test name' from Andrei Chekun In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id. Fixes: https://github.com/scylladb/scylladb/issues/17851 Fixes: https://github.com/scylladb/scylladb/issues/15973 Closes scylladb/scylladb#19235 * github.com:scylladb/scylladb: [test.py] Add uniqueness to the test name [test.py] Refactor alternator, nodetool, rest_api	2024-06-14 17:59:07 +02:00
Botond Dénes	5b87fa4cea	Merge 'doc: document `keyspace` and `table` for `nodetool ring`' from Kefu Chai these two arguments are critical when tablets are enabled. Fixes https://github.com/scylladb/scylladb/issues/19296 --- 6.0 is the first release with tablets support. and `nodetool ring` is an important tool to understand the data distribution. so we need to backport this document change to 6.0 Closes scylladb/scylladb#19297 * github.com:scylladb/scylladb: doc: document `keyspace` and `table` for `nodetool ring` doc: replace tab with space	2024-06-14 16:04:23 +03:00
Kefu Chai	ea3b8c5e4f	doc: document `keyspace` and `table` for `nodetool ring` these two arguments are critical when tablets are enabled. Fixes #19296 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 21:01:14 +08:00
Botond Dénes	c563acdbe9	Merge 'build: cmake: use path to be compatible with CI' from Kefu Chai this change is created in the same spirit of `1186ddef16`, which updated the rule for generating the stripped dist pkg, but it failed to update the one for generating the unstripped dist pkg. what's why we have build failure when the workflow is looking for the unstripped tar.gz: ``` 08:02:47 ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz 08:02:47 ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory` ``` so, in this change, we fix the path. Refs #2717 --- * cmake related change, hence no need to backport. Closes scylladb/scylladb#19290 * github.com:scylladb/scylladb: build: cmake: use per-mode path for building unstripped_dist_pkg build: cmake: use path to be compatible with CI	2024-06-14 15:35:26 +03:00
Kefu Chai	d498ca3afa	test: randomized_nemesis_test: use BOOST_REQUIRE_* when appropriate for better debuggability. Refs #17030 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19282	2024-06-14 15:33:07 +03:00
Kefu Chai	d887fd2402	build: use default modes when no modes are selected when `--use-cmake` option is passed to `configure.py`, - before this change, all modes are selected if no `--mode` options are passed to `configure.py`. - after this change, only the modes whose `build_by_default` is `True` are selected, if no `--mode` options are specfied. the new behavior matches the existing behavior. otherwise, `ninja -C build mode_list` would list the mode which is not built by default. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19292	2024-06-14 15:31:58 +03:00
Botond Dénes	b2ebc172d0	Merge 'Fix usage of utils/chunked_vector::reserve_partial' from Lakshmi Narayanan Sreethar utils/chunked_vector::reserve_partial: fix usage in callers The method reserve_partial(), when used as documented, quits before the intended capacity can be reserved fully. This can lead to overallocation of memory in the last chunk when data is inserted to the chunked vector. The method itself doesn't have any bug but the way it is being used by the callers needs to be updated to get the desired behaviour. Instead of calling it repeatedly with the value returned from the previous call until it returns zero, it should be repeatedly called with the intended size until the vector's capacity reaches that size. This PR updates the method comment and all the callers to use the right way. Fixes #19254 Closes scylladb/scylladb#19279 * github.com:scylladb/scylladb: utils/large_bitset: remove unused includes identified by clangd utils/large_bitset: use thread::maybe_yield() test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial utils/lsa/chunked_managed_vector: fix reserve_partial() utils/chunked_vector: return void from reserve_partial and make_room test/boost/chunked_vector_test: fix testcase tests_reserve_partial utils/chunked_vector::reserve_partial: fix usage in callers	2024-06-14 15:31:00 +03:00
Kefu Chai	5c41073e00	tools/scylla-sstable: format error message with compile-time check before this change, we use runtime format string to format error messages. but it does not have the compile time format check. if we pass arguments which are not formattable, {fmt} throws at runtime, instead of error out at compile-time. this could be very annoying, because we format error messages at the error handling path. but if user ends up seeing an exception for {fmt} instead of a nice error message, it would be far from helpful. in this change, we - use compile-time format string - fix two caller sites, where we pass `std::exception_ptr` to {fmt}, but `std::exception_ptr` is not formattable by {fmt} at the time of writing. we do have operator<< based formatter for it though. so we delegate to `fmt::streamed` to format it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19294	2024-06-14 15:30:19 +03:00
Kefu Chai	aef1718833	doc: replace tab with space more consistent this way, also easier to format in a regular editor without additional setup. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 18:46:09 +08:00
Kamil Braun	982fa31250	Merge 'test: servers_add: fix the expected_error parameter' from Patryk Jędrzejczak This PR fixes two problems with the `expected_error` parameter in `server_add` and `servers_add`. 1. It didn't work in `server_add` if the cluster was empty because of an incorrect attempt to connect the driver. 2. It didn't work in `servers_add` completely because the `seeds` parameter was handled incorrectly. This PR only adds improvements in the testing framework, no need to backport it. Closes scylladb/scylladb#19255 * github.com:scylladb/scylladb: test: manager_client, scylla_cluster: fix type annotations in add_servers test: manager_client: don't connect driver after failed server_{add, start} test: scylla_cluster: pass seeds to add_servers	2024-06-14 11:33:21 +02:00
Wojciech Mitros	d31437b589	mv: replicate the gossiped backlog to all shards On each shard of each node we store the view update backlogs of other nodes to, depending on their size, delay responses to incoming writes, lowering the load on these nodes and helping them get their backlog to normal if it were too high. These backlogs are propagated between nodes in two ways: the first one is adding them to replica write responses. The seconds one is gossiping any changes to the node's backlog every 1s. The gossip becomes useful when writes stop to some node for some time and we stop getting the backlog using the first method, but we still want to be able to select a proper delay for new writes coming to this node. It will also be needed for the mv admission control. Currently, the backlog is gossiped from shard 0, as expected. However, we also receive the backlog only on shard 0 and only update this shard's backlogs for the other node. Instead, we'd want to have the backlogs updated on all shards, allowing us to use proper delays also when requests are received on shards different than 0. This patch changes the backlog update code, so that the backlogs on all shards are updated instead. This will only be performed up to once per second for each other node, and is done with a lower priority, so it won't severly impact other work. Fixes: scylladb/scylladb#19232 Closes scylladb/scylladb#19268	2024-06-14 11:24:20 +02:00
Andrei Chekun	8d1d206aff	[test.py] Add uniqueness to the test name In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id. Fixes: https://github.com/scylladb/scylladb/issues/17851 Fixes: https://github.com/scylladb/scylladb/issues/15973	2024-06-14 11:23:04 +02:00
Wojciech Mitros	9bae1814ab	test: add test for failed view building write For various reasons, a view building write may fail. When that happens, the view building should not finish until these writes are successfully retried and they should not interfere with any writes that are performed to the base table while the view is building. The test introduced in this patch confirms that this is the case. Refs scylladb/scylladb#19261 Closes scylladb/scylladb#19263	2024-06-14 10:38:21 +02:00
Lakshmi Narayanan Sreethar	c49f6391ab	utils/large_bitset: remove unused includes identified by clangd Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	83190fa075	utils/large_bitset: use thread::maybe_yield() Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	310c5da4bb	test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial Update the maximum size tested by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (12.8 KB). So, use twice the max_chunk_capacity as the test size distribution upper limit to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	d4f8b91bd6	utils/lsa/chunked_managed_vector: fix reserve_partial() Fix the method comment and return types of chunked_managed_vector's reserve_partial() similar to chunked_vector's reserve_partial() as it has the same issues mentioned in #19254. Also update the usage in the chunked_managed_vector_test. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	0a22759c2a	utils/chunked_vector: return void from reserve_partial and make_room Since reserve_partial does not depend on the number of remaining capacity to be reserved, there is no need to return anything from it and the make_room method. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:43:07 +05:30
Lakshmi Narayanan Sreethar	29f036a777	test/boost/chunked_vector_test: fix testcase tests_reserve_partial Fix the usage of reserve_partial in the testcase. Also update the maximum chunk size used by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (128 KB). So, use smaller chunk size, 512 bytes, to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:43:07 +05:30
Kefu Chai	df094061e3	test: randomized_nemesis_test: define static variable before this change, when linking randomized_nemesis_test with ld.lld: ``` [4/4] Linking CXX executable test/raft/RelWithDebInfo/randomized_nemesis_test FAILED: test/raft/RelWithDebInfo/randomized_nemesis_test : && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 --ld-path=ld.lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections test/raft/CMakeFiles/test-raft-helper.dir/RelWithDebInfo/helpers.cc.o test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o -o test/raft/RelWithDebInfo/randomized_nemesis_test -L/home/kefu/dev/scylladb/idl/absl::headers -Wl,-rpath,/home/kefu/dev/scylladb/idl/absl::headers test/lib/RelWithDebInfo/libtest-lib.a seastar/RelWithDebInfo/libseastar.a /usr/lib64/libxxhash.so seastar/RelWithDebInfo/libseastar_testing.a test/lib/RelWithDebInfo/libtest-lib.a -Xlinker --push-state -Xlinker --whole-archive auth/RelWithDebInfo/libscylla_auth.a -Xlinker --pop-state /usr/lib64/libcrypt.so cdc/RelWithDebInfo/libcdc.a compaction/RelWithDebInfo/libcompaction.a mutation_writer/RelWithDebInfo/libmutation_writer.a -Xlinker --push-state -Xlinker --whole-archive dht/RelWithDebInfo/libscylla_dht.a -Xlinker --pop-state types/RelWithDebInfo/libtypes.a index/RelWithDebInfo/libindex.a -Xlinker --push-state -Xlinker --whole-archive locator/RelWithDebInfo/libscylla_locator.a -Xlinker --pop-state message/RelWithDebInfo/libmessage.a gms/RelWithDebInfo/libgms.a sstables/RelWithDebInfo/libsstables.a readers/RelWithDebInfo/libreaders.a schema/RelWithDebInfo/libschema.a -Xlinker --push-state -Xlinker --whole-archive tracing/RelWithDebInfo/libscylla_tracing.a -Xlinker --pop-state RelWithDebInfo/libscylla-main.a abseil/absl/strings/RelWithDebInfo/libabsl_cord.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_info.a abseil/absl/strings/RelWithDebInfo/libabsl_cord_internal.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_functions.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_handle.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_cord_state.a abseil/absl/crc/RelWithDebInfo/libabsl_crc32c.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_internal.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_cpu_detect.a abseil/absl/strings/RelWithDebInfo/libabsl_str_format_internal.a /usr/lib64/libz.so service/RelWithDebInfo/libservice.a node_ops/RelWithDebInfo/libnode_ops.a service/RelWithDebInfo/libservice.a node_ops/RelWithDebInfo/libnode_ops.a -lsystemd raft/RelWithDebInfo/libraft.a repair/RelWithDebInfo/librepair.a streaming/RelWithDebInfo/libstreaming.a replica/RelWithDebInfo/libreplica.a db/RelWithDebInfo/libdb.a mutation/RelWithDebInfo/libmutation.a data_dictionary/RelWithDebInfo/libdata_dictionary.a cql3/RelWithDebInfo/libcql3.a transport/RelWithDebInfo/libtransport.a cql3/RelWithDebInfo/libcql3.a transport/RelWithDebInfo/libtransport.a lang/RelWithDebInfo/liblang.a /usr/lib64/liblua-5.4.so -lm /usr/lib64/libsnappy.so.1.1.10 abseil/absl/container/RelWithDebInfo/libabsl_raw_hash_set.a abseil/absl/hash/RelWithDebInfo/libabsl_hash.a abseil/absl/hash/RelWithDebInfo/libabsl_city.a abseil/absl/types/RelWithDebInfo/libabsl_bad_variant_access.a abseil/absl/hash/RelWithDebInfo/libabsl_low_level_hash.a abseil/absl/types/RelWithDebInfo/libabsl_bad_optional_access.a abseil/absl/container/RelWithDebInfo/libabsl_hashtablez_sampler.a abseil/absl/profiling/RelWithDebInfo/libabsl_exponential_biased.a abseil/absl/synchronization/RelWithDebInfo/libabsl_synchronization.a abseil/absl/debugging/RelWithDebInfo/libabsl_stacktrace.a abseil/absl/synchronization/RelWithDebInfo/libabsl_graphcycles_internal.a abseil/absl/synchronization/RelWithDebInfo/libabsl_kernel_timeout_internal.a abseil/absl/debugging/RelWithDebInfo/libabsl_symbolize.a abseil/absl/debugging/RelWithDebInfo/libabsl_debugging_internal.a abseil/absl/base/RelWithDebInfo/libabsl_malloc_internal.a abseil/absl/debugging/RelWithDebInfo/libabsl_demangle_internal.a abseil/absl/time/RelWithDebInfo/libabsl_time.a abseil/absl/strings/RelWithDebInfo/libabsl_strings.a abseil/absl/strings/RelWithDebInfo/libabsl_strings_internal.a abseil/absl/strings/RelWithDebInfo/libabsl_string_view.a abseil/absl/base/RelWithDebInfo/libabsl_throw_delegate.a abseil/absl/numeric/RelWithDebInfo/libabsl_int128.a abseil/absl/base/RelWithDebInfo/libabsl_base.a abseil/absl/base/RelWithDebInfo/libabsl_raw_logging_internal.a abseil/absl/base/RelWithDebInfo/libabsl_log_severity.a abseil/absl/base/RelWithDebInfo/libabsl_spinlock_wait.a -lrt abseil/absl/time/RelWithDebInfo/libabsl_civil_time.a abseil/absl/time/RelWithDebInfo/libabsl_time_zone.a rust/RelWithDebInfo/libwasmtime_bindings.a rust/librust_combined.a /usr/lib64/libdeflate.so utils/RelWithDebInfo/libutils.a /usr/lib64/libxxhash.so /usr/lib64/libcryptopp.so /usr/lib64/libboost_regex.so.1.83.0 /usr/lib64/libicui18n.so /usr/lib64/libicuuc.so /usr/lib64/libboost_unit_test_framework.so.1.83.0 seastar/RelWithDebInfo/libseastar_testing.a seastar/RelWithDebInfo/libseastar.a /usr/lib64/libboost_program_options.so /usr/lib64/libboost_thread.so /usr/lib64/libboost_chrono.so /usr/lib64/libboost_atomic.so /usr/lib64/libcares.so /usr/lib64/libfmt.so.10.2.1 /usr/lib64/liblz4.so -ldl /usr/lib64/libgnutls.so -latomic /usr/lib64/libsctp.so /usr/lib64/libprotobuf.so /usr/lib64/libyaml-cpp.so /usr/lib64/libhwloc.so //usr/lib64/liburing.so /usr/lib64/libnuma.so /usr/lib64/libboost_unit_test_framework.so && : ld.lld: error: undefined symbol: append_seq::magic >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38) >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38) >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(append_seq::append(int) const) >>> referenced 5 more times clang++: error: linker command failed with exit code 1 (use -v to see invocation) ``` it turns out `append_seq::magic` is only declared, but never defined. please note, the non-inline static member variable in its class definition is not considered as a definition, see [class.static.data](https://eel.is/c++draft/class.static.data#3) > The declaration of a non-inline static data member in its class > definition is not a definition and may be of an incomplete type > other than cv void. so, let's declare it as a `constexpr` instead. it implies `inline`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19283	2024-06-14 10:00:21 +03:00
Kefu Chai	4c1006a5bb	dist: s/SafeConfigParser/ConfigParser/ `SafeConfigParser` was renamed to `ConfigParser` in Python 3.2, and Python warns us: > scylla-housekeeping:183: DeprecationWarning: The SafeConfigParser > class has been renamed to ConfigParser in Python 3.2. This alias will > be removed in Python 3.12. Use ConfigParser directly instead. see https://docs.python.org/3.2/library/configparser.html#configparser.ConfigParser and https://docs.python.org/3.1/library/configparser.html#configparser.SafeConfigParser Fixes #13046 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19285	2024-06-14 09:59:22 +03:00
Kefu Chai	3a5898880e	alternator: drop unused friend declaration in `57c408ab`, we dropped operator<< for `parsed::path`, but we forgot to drop the friend declaration for it along with the operator. so in this change, let's drop the friend declaration. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19287	2024-06-14 09:58:09 +03:00
Kefu Chai	83c6ae10c4	sstables/compress: put type constraints into template type param more compact this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19284	2024-06-14 09:50:55 +03:00
Kefu Chai	6556cd684e	cql3: remove unused operator<< as these operators are not used anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19288	2024-06-14 09:45:35 +03:00
Botond Dénes	d50688efee	Merge 'api: do not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. also, add api to iwyu github workflow's CLEANER_DIR, to prevent future violations. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19269 * github.com:scylladb/scylladb: .github: add api to iwyu's CLEANER_DIR api: do not include unused headers	2024-06-14 09:34:13 +03:00
Kefu Chai	28a4298005	build: cmake: use per-mode path for building unstripped_dist_pkg `before this change, we use "scylla" as the dependecy of unstripped_dist_pkg, but that's implies the scylla built with the default mode. if the build rules is generated using the multi-config generator, the default mode does not necessarily identical to the current `$<CONFIG>`, so let's be more explicit. otherwise, we could run into built failure like ``` FAILED: dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz cd /jenkins/workspace/scylla-master/scylla-ci/scylla && scripts/create-relocatable-package.py --build-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo --node-exporter-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/node_exporter --debian-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/debian /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz ldd: /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla: No such file or directory Traceback (most recent call last): File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 109, in <module> libs.update(ldd(exe)) ^^^^^^^^ File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 37, in ldd for ldd_line in subprocess.check_output( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/subprocess.py", line 466, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ldd', '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla']' returned non-zero exit status 1. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 13:27:26 +08:00
Kefu Chai	b94420a9dd	build: cmake: use path to be compatible with CI this change is created in the same spirit of `1186ddef16`, which updated the rule for generating the stripped dist pkg, but it failed to update the one for generating the unstripped dist pkg. what's why we have build failure when the workflow is looking for the unstripped tar.gz: ``` 08:02:47 ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz 08:02:47 ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory` ``` so, in this change, we fix the path. Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 13:27:26 +08:00
Botond Dénes	ea40567bbc	Merge 'Some cleanups for replica table' from Raphael "Raph" Carvalho backport not needed, these are just cleanups. Closes scylladb/scylladb#19260 * github.com:scylladb/scylladb: replica: simplify perform_cleanup_compaction() replica: return storage_group by reference on storage_group_for*() replica: devirtualize storage_group_of()	2024-06-14 08:14:58 +03:00
Botond Dénes	bf429695b6	Merge 'test_tablets: add test_tablet_storage_freeing' from Michał Chojnowski Before work on tablets was completed, it was noticed that — due to some missing pieces of implementation — Scylla doesn't properly close sstables for migrated-away tablets. Because of this, disk space wasn't being reclaimed properly. Since the missing pieces of implementation were added, the problem should be gone now. This patch adds a test which was used to reproduce the problem earlier. It's expected to pass now, validating that the issue was fixed. Should be backported to branch-6.0, because the tested problem was also affecting that branch. Fixes #16946 Closes scylladb/scylladb#18906 * github.com:scylladb/scylladb: test_tablets: add test_tablet_storage_freeing test: pylib: add get_sstables_disk_usage()	2024-06-14 08:08:54 +03:00
Raphael S. Carvalho	f143f5b90d	replica: remove linear search when picking memtable_list for range scan with tablets with tablets, we're expected to have a worst of ~100 tablets in a given table and shard, so let's avoid linear search when looking for the memtable_list in a range scan. we're bounded by ~100 elements, so shouldn't be a big problem, but it's an inefficiency we can easily get rid of. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19286	2024-06-14 08:00:17 +03:00
Benny Halevy	fb3db7d81f	perf-simple-query: add cpu_cycles / op metric Example output: ``` bhalevy@[] scylla$ build/release/scylla perf-simple-query --default-log-level=error -c 1 --duration 10 random-seed=4058714023 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 86912.75 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42346 insns/op, 22811 cycles/op, 0 errors) 91348.29 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42306 insns/op, 22362 cycles/op, 0 errors) 87965.84 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42338 insns/op, 22966 cycles/op, 0 errors) 90793.67 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42351 insns/op, 22783 cycles/op, 0 errors) 90104.27 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42358 insns/op, 22875 cycles/op, 0 errors) 90397.13 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42355 insns/op, 22735 cycles/op, 0 errors) 89142.39 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42363 insns/op, 22996 cycles/op, 0 errors) 90410.40 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42363 insns/op, 22725 cycles/op, 0 errors) 88173.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42366 insns/op, 23160 cycles/op, 0 errors) 88416.51 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42379 insns/op, 23102 cycles/op, 0 errors) median 90104.26849997675 median absolute deviation: 1244.02 maximum: 91348.29 minimum: 86912.75 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18818	2024-06-14 07:42:09 +03:00
Lakshmi Narayanan Sreethar	64768b58e5	utils/chunked_vector::reserve_partial: fix usage in callers The method reserve_partial(), when used as documented, quits before the intended capacity can be reserved fully. This can lead to overallocation of memory in the last chunk when data is inserted to the chunked vector. The method itself doesn't have any bug but the way it is being used by the callers needs to be updated to get the desired behaviour. Instead of calling it repeatedly with the value returned from the previous call until it returns zero, it should be repeatedly called with the intended size until the vector's capacity reaches that size. This commit updates the method comment and all the callers to use the right way. Fixes #19254 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-13 21:42:11 +05:30
Raphael S. Carvalho	ace4e5111e	compaction: Reduce twcs off-strategy space overhead to 10% of free space TWCS off-strategy suffers with 100% space overhead, so a big TWCS table can cause scylla to run out of disk space during node ops. To not penalize TWCS tables, that take a small percentage of disk, with increased write ampl, TWCS off-strategy will be restricted to 10% of free disk space. Then small tables can still compact all disjoint sstables in a single round. Fixes #16514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 13:06:51 -03:00
Raphael S. Carvalho	0ce8ee03f1	compaction: wire storage free space into reshape procedure After this, TWCS reshape procedure can be changed to limit job to 10% of available space. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:53:27 -03:00
Raphael S. Carvalho	51c7ee889e	sstables: Allow to get free space from underlying storage That will be used in turn to restrict reshape to 10% of available space in underlying storage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:43:14 -03:00
Raphael S. Carvalho	b8bd4c51c2	replica: don't expose compaction_group to reshape task compaction_group sits in replica layer and compaction layer is supposed to talk to it through compaction::table_state only. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:43:14 -03:00
Andrei Chekun	93b9b85c12	[test.py] Refactor alternator, nodetool, rest_api Make alternator, nodetool and rest_api test directories as python packages. Move scylla-gdb to scylla_gdb and make it python package.	2024-06-13 13:56:10 +02:00
Avi Kivity	f1819419cc	Merge 'scylla-sstable: add method to load the schema from the sstable itself' from Botond Dénes As it turns out, each sstable carries its own schema in its serialization header (Statistics component). This schema is incomplete -- the names of the key columns are not stored, just their type. Static and regular columns do have names and types stored however. This bare-bones schema is enough to parse and display the content of the sstable. Another thing missing is schema options (the stuff after the `WITH` keyword, except the clustering order). The only options stored are the compression options (in the CompressionInfo component), this is actually needed to read the Data component. This series adds a new method to `tools/schema_loader.cc` to extract the schema stored in the sstable itself. This new schema load method is used as the last fall-back for obtaining the schema, in case scylla-sstable is trying to autodetect the schema of the sstable. Although, right now this bare-bones schema is enough for everything scylla-sstable does, it is more future proof to stick to the "full" schema if possible, so this new method is the last resort for now. Fixes: https://github.com/scylladb/scylladb/issues/17869 Fixes: https://github.com/scylladb/scylladb/issues/18809 New functionality, no backport needed. Closes scylladb/scylladb#19169 * github.com:scylladb/scylladb: tools/scylla-sstable: log loaded schema with trace level tools/scylla-sstable: load schema from the sstable as fallback tools/schema_loader: introduce load_schema_from_sstable() test/lib/random_schema: remove assert on min number of regular columns sstables: introduce load_metadata()	2024-06-13 12:21:09 +03:00
Benny Halevy	34dfa4d3a3	storage_service: join_token_ring: reject replace on different dc or rack Do not allow replacing a node on one dc/rack with a node on a different dc/rack as this violates the assumption of replace node operation that all token ranges previously owned by the dead node would be rebuilt on the new node. Fixes scylladb/scylladb#16858 Refs scylladb/scylla-enterprise#3518 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16862	2024-06-13 11:19:47 +02:00
Botond Dénes	6868add228	replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit Making the count resources on the maintenance (streaming) semaphore live update via config. This will allow us to improve repair speed on mixed-shard clusters, where we suspect that reader trashing -- due to the combination of high number of readers on each shard and very conservative reader count limit (10) -- is the main cause of the slowness. Making this count limit confgurable allows us to start experimenting with this fix, without committing to a count limit increase (or removal), addressing the pain in the field.	2024-06-13 01:59:21 -04:00
Botond Dénes	665fdd6ce4	db/config: introduce maintenance_reader_concurrency_semaphore_count_limit To control the amount of count resources of the maintenance (streaming) semaphore. Not wired yet.	2024-06-13 01:59:21 -04:00
Botond Dénes	ba0cc29d82	reader_concurrency_semaphore: make count parameter live-update So that the amount of count resources can be changed at run-time, triggered by a e.g. a config change. Previous constant-count based constructor is left intact, to avoid patching all clients, as only a small subset will want the new functionality.	2024-06-13 01:59:21 -04:00
Nadav Har'El	44ea1993ba	test/cql-pytest: tests CREATE/DROP INDEX during paged query This patch includes extensive testing for what happens to an ongoing paged query when a secondary index is suddenly added or dropped. Issue #18992 was opened suggesting that this would be broken, and indeed the tests included here show that it is indeed broken. The four tests included in this patch are heavily commented to explain what they are testing and why, but here is a short summary of what is being tested by each of them: 1. A paged query filtering on v=17 continues correctly even if an index is created on v. 2. A paged query filtering on v1 and v2 where v2 is indexed, continues correctly even if an index is created on v1 (remember that Scylla prefers to use the first index mentioned in the query). 3. A paged query using an index on v continues correctly even if that index is deleted. 4. However, if the query doesn't say "ALLOW FILTERING", it cannot be continued after the index is deleted. All these tests pass on Cassandra, but all of them except the fourth fail on Scylla, reproducing issue #18992. Somewhat to my suprise, the failure of the query in all the failed tests is silent (i.e., trying to fetch the next page just fetches nothing and says the iteration is done). I was expecting more dramatic failures ("marshaling error" messages, crashes, etc.) but didn't get them. Refs #18992 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19000	2024-06-13 08:39:22 +03:00
Botond Dénes	145a67f77c	tools/scylla-sstable: log loaded schema with trace level The schema of the sstable can be interesting, so log it with trace level. Unfortunately, this is not the nice CQL statement we are used to (that requires a database object), but the not-nearly-so-nice CFMetadata printout. Still, it is better then nothing.	2024-06-13 01:32:17 -04:00
Botond Dénes	43c44f0af5	tools/scylla-sstable: load schema from the sstable as fallback When auto-detecting the schema of the sstable, if all other methods failed, load the schema from the sstable's serialization header. This schema is incomplete. It is just enough to parse and display the content of the sstable. Although parsing and displaying the content of the sstable is all scylla-sstable does, it is more future-compatible to us the full schema when possible. So the always-available but minimal schema that each sstable has on itself, is used just as a fallback. The test which tested the case when all schema load attempts fail, doesn't work now, because loading the serialization header always succeeds. So convert this test into two positive tests, testing the serialization header schema fallback instead.	2024-06-13 01:32:17 -04:00
Botond Dénes	8f2ba03465	tools/schema_loader: introduce load_schema_from_sstable() Allows loading the schema from an sstable's serialization header. This schema is incomplete, but it is enough to parse and display the content of the sstable.	2024-06-13 01:32:17 -04:00
Botond Dénes	0d7335dd27	test/lib/random_schema: remove assert on min number of regular columns It is legal for a schema to have 0 regular columns, so remove the assert on the schema specification's regular column count.	2024-06-13 01:32:17 -04:00
Piotr Dulikowski	0b5a0c969a	Merge 'hinted handoff: migrate sync point to host ID' from Michael Litvak Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module. Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs. The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of new type the translation is avoided. Fixes #18653 Closes scylladb/scylladb#19134 * github.com:scylladb/scylladb: db/hints: migrate sync point to host ID db/hints: rename sync point structures with _v1 suffix to _v1_v2	2024-06-13 06:16:00 +02:00
Kefu Chai	9d8d9168e6	.github: add api to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-13 09:32:51 +08:00
Kefu Chai	c03141b4b2	api: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-13 09:32:51 +08:00
Anna Stuchlik	603c662049	doc: remove an entry about seeds from FAQ This commit removes a useless entry from the FAQ page. It contains a false recommendation to configure multiple seeds. Closes scylladb/scylladb#19259	2024-06-12 19:11:52 +02:00
Dawid Medrek	dc41086c57	db/hints: Add a metric for the size of sent hints In this commit, we add a new metric `sent_total_size` keeping track of how many bytes of hints a node has sent. The metric is supposed to complement its counterpart in storage proxy that counts how many bytes of hints a node has received. That information should prove useful in analyzing statistics of a cluster -- load on given nodes and where it comes from. We also change the name of the matric `sent` to `sent_total` to avoid the conflict of prefixes between the two metrics.	2024-06-12 18:20:08 +02:00
Raphael S. Carvalho	f3a1f5df83	replica: simplify perform_cleanup_compaction() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 12:44:21 -03:00
Raphael S. Carvalho	6214dda506	replica: return storage_group by reference on storage_group_for*() those functions cannot return nullptr, will throw when group is not found, so better return ref instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 11:53:06 -03:00
Patryk Jędrzejczak	a7ab9a015a	test: manager_client, scylla_cluster: fix type annotations in add_servers	2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak	1eb25d22c6	test: manager_client: don't connect driver after failed server_{add, start} If adding or starting a server fails expectedly, there is no reason to update or connect the driver. Moreover, before this patch, we couldn't use `server_add` and `servers_add` with `expected_error` if the cluster was empty. After expected bootstrap failures, we tried to connect the driver, which rightfully failed on `assert len(hosts) > 0` in `cluster_con`.	2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak	8f486de8d3	test: scylla_cluster: pass seeds to add_servers This parameter was incorrectly missing. For this reason, `expected_error` was passed from `add_servers` to `add_server` as `seeds`, which caused strange crashes.	2024-06-12 16:51:19 +02:00
Botond Dénes	435c01d1e6	sstables: introduce load_metadata() Loads just the metadata components. No validation. Split off from load(), to allow scylla-sstable to partially load an sstable.	2024-06-12 10:46:38 -04:00
Botond Dénes	aa27f8f365	Merge 'Improve handling of outdated --experimental-features' from Pavel Emelyanov Some time ago it turned out that if unrecognized feature name is met in scylla.yaml, the whole experimental features list is ignored, but scylla continues to boot. There's UNUSED feature which is the proper way to deprecate a feature, and this PR improves its handling in several ways. 1. The recently removed "tablets" feature is partially brought back, but marked as UNUSED 2. Any UNUSED features met while parsing are printed into logs 3. The enum_option<> helper is enlightened along the way refs: #18968 Closes scylladb/scylladb#19230 * github.com:scylladb/scylladb: config: Mark tablets feature as unused main: Warn unused features enum_option: Carry optional key on board enum_option: Remove on-board _map member	2024-06-12 17:33:14 +03:00
Botond Dénes	d2a4cd9cae	Merge 'Register API endpoints next to corresponding services' from Pavel Emelyanov The API endpoints are registered for particular services (with rare exceptions), and once the corresponding service is ready, its endpoints section can be registered too. Same but reversed is for shutdown, and it's automatic with deferred actions. refs: #2737 Closes scylladb/scylladb#19208 * github.com:scylladb/scylladb: main: Register task manager API next to task manager itself main: Register messaging API next to messaging service main: Register repair API next to repair service	2024-06-12 17:31:30 +03:00
Kefu Chai	2eca8b54de	auth/role_or_anonymous: drop operator<< for role_or_anonymous its declaration was removed in `84a9d2fa`, which failed to remove the implementation from .cc file. in this change, let's remove operator<< for role_or_anonymous completely. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19243	2024-06-12 17:30:20 +03:00
Raphael S. Carvalho	9c1d3bcc02	replica: devirtualize storage_group_of() can be made private to tablet_storage_group_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 11:29:49 -03:00
Kamil Braun	a441d06d6c	raft: fsm: add details to on_internal_error_noexcept message If we receive a message in the same term but from a different leader than we expect, we print: ``` Got append request/install snapshot/read_quorum from an unexpected leader ``` For some reason the message did not include the details (who the leader was and who the sender was) which requires almost zero effort and might be useful for debugging. So let's include them. Ref: scylladb/scylla-enterprise#4276 Closes scylladb/scylladb#19238	2024-06-12 17:29:42 +03:00
Pavel Emelyanov	4400f9082e	lang: Return context as future, not via reference argument Commit `882b2f4e9f` (cql3, schema_tables: Generalize function creation) erroneously says that optional<context> is not suitable for future<> type, but in fact it is. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19204	2024-06-12 16:54:46 +03:00
Kefu Chai	8c99d9e721	.github: use libstdc++-13 since gcc-13 is packaged by ppa:ubuntu-toolchain-r, and GCC-13 was released 1 year ago, let's use it instead. less warnings, as the standard library from GCC-13 is more standard compliant. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19162	2024-06-12 16:52:05 +03:00
Botond Dénes	e91f82fd5c	Merge '.github: add workflow to build with clang nightly' from Kefu Chai to be prepared for changes from clang, and enjoy the new warnings/errors from this compiler. * it is an improvement in our CI, no need to backport. Closes scylladb/scylladb#19164 * github.com:scylladb/scylladb: .github: add workflow to build with clang nightly .github: rename clang-tidy-matcher.json to clang-matcher.json	2024-06-12 16:50:21 +03:00
Pavel Emelyanov	24c818453d	main: Start view builder earlier Commit `47dbf23773` (Rework view services and system-distributed-keyspace dependencies) made streaming and repair services depend on view builder, but missed the fact that the builder itself starts much later. Move view builder earlier, that's safe, no activity is started upon that, real building is kicked much later when invoke_on_all(start) happens. Other than than, start system distributed keyspace earlier, which also looks safe, as it's also started "for real" later, by storage service when it joins the ring. fixes: #19133 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19250	2024-06-12 16:46:55 +03:00
Anna Stuchlik	3f9cc0ec3f	doc: reorganize ToC of the Reference section This commit adds a proper ToC to the Reference section to improve how it renders. Closes scylladb/scylladb#18901	2024-06-12 16:16:04 +03:00
Kefu Chai	da59710fb9	doc: remove unused documents upgrade/_common are document fragments included by other documents. but quite a few the documents previously including these fragments were removed. but we didn't remove these fragments along with them. in this change, we drop them. Fixes #19245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19251	2024-06-12 16:14:57 +03:00
Botond Dénes	cd05de6cfb	Merge 'test: memtable_test: increase unspooled_dirty_soft_limit ' from Kefu Chai before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes https://github.com/scylladb/scylladb/issues/19034 --- the issue applies to both 5.4 and 6.0, and this issue hurts the CI stability, hence we should backport it. Closes scylladb/scylladb#19252 * github.com:scylladb/scylladb: test: memtable_test: increase unspooled_dirty_soft_limit test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE	2024-06-12 16:14:05 +03:00
Dawid Medrek	23bea50de0	service/storage_proxy: Add metrics for received hints In this commit, we add two new metrics to storage proxy: * `received_hints_total`, * `received_hints_bytes_total`. Before these changes, we had to rely solely on other metrics indicating how many hints nodes have written, rejected, sent, etc. Because hints are subject to many more or less controllable factors, e.g. a target node still being a replica for a mutation, it was very difficult to approximate how many hints a given node might have received or what part of its load they were. The newly introduced metrics are supposed to help reason about those.	2024-06-12 14:44:47 +02:00
Kefu Chai	223fba3243	test: memtable_test: increase unspooled_dirty_soft_limit before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Kefu Chai	2df4e9cfc2	test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE before this change, we verify the behavior of design under test using `BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test fails, the test just aborts. this is not very helpful for postmortem debugging. after this change, we use `BOOST_REQUIRE` macro for verifying the behavior, so that Boost.Test prints out the condition if it does not hold when we test it. Refs #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Pavel Emelyanov	c752bda0a2	Merge '.github: change severity to error in clang-include-cleaner ' from Kefu Chai in this changeset, we tighten the clang-include-cleaner workflow, and address the warnings in two more subdirectories in the source tree. * it's a cleanup, no need to backport Closes scylladb/scylladb#19155 * github.com:scylladb/scylladb: .github: add alternator to iwyu's CLEANER_DIR alternator: do not include unused headers .github: change severity to error in clang-include-cleaner exceptions: do not include unused headers	2024-06-12 10:16:17 +03:00
Kefu Chai	0c9ea654f5	service/paxos: drop operator<< for proposal since we stopped using the generic container formatters which in turn use operator<< for formatting the elemements. we can drop more operator<< operators. so, in this change, we drop operator<< for proposal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19156	2024-06-12 10:14:47 +03:00
Dawid Medrek	431ec55f6c	service/storage_proxy: Move a comment to its relevant place In `b92fb35`, we put a comment in the wrong place. These changes move it to the right one. Closes scylladb/scylladb#19215	2024-06-12 10:10:02 +03:00
Avi Kivity	dffd0901b3	dist: scylla_util: sysconfig_parser: replace deprecated ConfigParser.readfp ConfigParser.readfp was deprecated in Python 3.2 and removed in Python 3.12. Under Fedora 40, the container fails to launch because it cannot parse its configuration. Fix by using the newer read_file(). Closes scylladb/scylladb#19236	2024-06-12 10:07:10 +03:00
Benny Halevy	2ed81cbf84	locator/topology: update_node: format also shard_count in debug log message The format string is missing `shard_count={}` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19242	2024-06-12 10:04:23 +03:00
Kefu Chai	4175e02d9d	clustering_bounds_comparator: drop operator<< for bound_kind turns out operator<< for bound_kind is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19159	2024-06-11 18:01:06 +02:00
Avi Kivity	6608f49718	Merge 'make enable_compacting_data_for_streaming_and_repair truly live-update' from Botond Dénes This config item is propagated to the table object via table::config. Although the field in `table::config`, used to propagate the value, was `utils::updateable_value<T>`, it was assigned a constant and so the live-update chain was broken. This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series. Fixes: https://github.com/scylladb/scylladb/issues/18674 - [x] This patch has to be backported because it fixes broken functionality Closes scylladb/scylladb#18705 * github.com:scylladb/scylladb: test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update test/pylib: rest_client: add get_injection() api/error_injection: add getter for error_injection utils/error_injection: add set_parameter() replica/database: fix live-update enable_compacting_data_for_streaming_and_repair	2024-06-11 15:53:19 +03:00
Kefu Chai	d05db52d11	build: remove coverage compiling options from the cxx_flags in `44e85c7d`, we remove coverage compiling options from the cflags when building abseil. but in `535f2b21`, these options were brought back as parts of cxx_flags. so we need to remove them again from cxx_flags. Fixes #19219 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19220	2024-06-11 14:58:27 +03:00
Pavel Emelyanov	b2520b8185	config: Mark tablets feature as unused This features used to be there for a while, but then it was removed by `83d491af02`. This patch partially takes it back, but maps to UNUSED, so that if met in config, it's warned, but other features are parsed as well. refs: #18968 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:58:19 +03:00
Pavel Emelyanov	b85a02a3fe	main: Warn unused features When seeing an UNUSED feature -- print it into log. This is where the enum_option::key is in use. The thing is that experimental features map different unused feature names into the single UNUSED feature enum value, so once the feature is parsed its configured name only persists in the option's key member (saved by previous patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:56:51 +03:00
Pavel Emelyanov	0c0a7d9b9a	enum_option: Carry optional key on board It facilitates option formatting, but the main purpose is to be able to find out the exact keys, not values, later (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:55:14 +03:00
Pavel Emelyanov	f56cdb1cac	enum_option: Remove on-board _map member The map in question is immutable and can obtained from the Mapper type at any time, there's no need in keeping its copy on each enum_option Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:54:39 +03:00
Michael Litvak	afc9a1a8a6	db/hints: migrate sync point to host ID Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module. Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs. The encoding of sync points now always uses the new v3 format with host IDs. The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of the new format the translation from IP to host ID is avoided.	2024-06-11 11:07:00 +02:00
Michael Litvak	b824e73418	db/hints: rename sync point structures with _v1 suffix to _v1_v2 rename sync point types and variables to have v1/v2 suffix according to their use.	2024-06-11 11:05:59 +02:00
Avi Kivity	03e776ce3e	Update tools/java submodule * tools/java 88809606c8...01ba3c196f (3): > Revert "build: don't add nonexistent directory 'lib' to relocatable packages" > build: run antlr in a separate process > build: don't add nonexistent directory 'lib' to relocatable packages	2024-06-11 11:58:56 +03:00
Botond Dénes	8ef4fbdb87	test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update Avoid this the live-update feature of this config item breaking silently.	2024-06-11 04:17:48 -04:00
Botond Dénes	0c61b1822c	test/pylib: rest_client: add get_injection() The /v2/error_injection/{injection} endpoint now has a GET method too, expose this.	2024-06-11 04:17:48 -04:00
Botond Dénes	feea609e37	api/error_injection: add getter for error_injection Allow external code to obtain information about an error injection point, including whether it is enabled, and importantly, what its parameters are. Together with the `set_parameter()` added in the previous patch, this allows tests to read out the values of internal parameters, via a set_parameter() injection point.	2024-06-11 04:17:48 -04:00
Botond Dénes	4590026b38	utils/error_injection: add set_parameter() Allow injection points to write values into the parameter map, which external code can then examine. This allows exfiltrating the values if internal variables, to be examined by tests, without exposing these variables via an "official" path.	2024-06-11 04:17:48 -04:00
Pavel Emelyanov	1b9cedb3f3	test: Reduce failure detector timeout for failed tablets migration test Most of the time this test spends waiting for a node to die. Helps 3x times Was real 9m21,950s user 1m11,439s sys 1m26,022s Now real 3m37,780s user 0m58,439s sys 1m13,698s refs: #17764 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19222	2024-06-11 09:55:06 +02:00
Calle Wilund	dfd996e7c1	describe_statement: Filter out "extension internal" keyspaces in DESC SCHEMA Fixes /scylladb/scylla-enterprise#4168 Unless listing all (including system) keyspaces, filter out "extension internal" keyspaces. These are to be considered "system" for the purposes of exposing to end user. Closes scylladb/scylladb#19214	2024-06-11 10:01:42 +03:00
Botond Dénes	dbccb61636	replica/database: fix live-update enable_compacting_data_for_streaming_and_repair This config item is propagated to the table object via table::config. Although the field in table::config, used to propagate the value, was utils::updateable_value<T>, it was assigned a constant and so the live-update chain was broken. This patch fixes this.	2024-06-11 01:15:20 -04:00
Raphael S. Carvalho	7b41630299	replica: Refresh mutation source when allocating tablet replicas Consider the following: 1) table A has N tablets and views 2) migration starts for a tablet of A from node 1 to 2. 3) migration is at write_both_read_old stage 4) coordinator will push writes to both nodes (pending and leaving) 5) A has view, so writes to it will also result in reads (table::push_view_replica_updates()) 6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in) 7) so read on step 5 is not being able to find sstable set for tablet migrating in Causes the following error: "tablets - SSTable set wasn't found for tablet 21 of table mview.users" which means loss of write on pending replica. The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot. It's not a problem to refresh the cache snapshot as long as the logical state of the data hasn't changed, which is true when allocating new tablet replicas. That's also done in the context of compactions for example. Fixes #19052. Fixes #19033. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19099	2024-06-11 06:59:04 +03:00
Calle Wilund	51c53d8db6	main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars Fixes scylladb/scylla-pkg#3845 Don't overwrite (or rather change) AWS credentials variables if already set in enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI. v2: * Allow environment variables in reading obj storage config - allows CI to use real credentials in env without risking putting them info less seure files * Don't write credentials info from miniserver into config, instead use said environment vars to propagate creds. v3: * Fix python launch scripts to not clear environment, thus retaining above aws envs. Closes scylladb/scylladb#19086	2024-06-11 06:59:04 +03:00
Nadav Har'El	73dfa4143a	cql-pytest: translate Cassandra's tests for SELECT DISTINCT This is a translation of Cassandra's CQL unit test source file DistinctQueryPagingTest.java into our cql-pytest framework. The 5 tests did not reproduce any previously-unknown bug, but did provide additional reproducers for one already-known issue: Refs #10354: SELECT DISTINCT should allow filter on static columns, not just partition keys Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18971	2024-06-11 06:59:04 +03:00
Michał Chojnowski	823da140dd	test_tablets: add test_tablet_storage_freeing Tests that tablet storage is freed after it is migrated away. Fixes #16946	2024-06-10 14:25:37 +02:00
Michał Chojnowski	7741491b47	test: pylib: add get_sstables_disk_usage() Adds an util for measuring the disk usage of the given table on the given node. Will be used in a follow-up patch for testing that sstables are freed properly.	2024-06-10 14:25:37 +02:00
Pavel Emelyanov	b10ddcfd18	main: Register task manager API next to task manager itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:49:11 +03:00
Pavel Emelyanov	02c36ebd2e	main: Register messaging API next to messaging service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:49:02 +03:00
Pavel Emelyanov	f7e4724770	main: Register repair API next to repair service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:48:51 +03:00
Anna Stuchlik	55ed18db07	doc: mark tablets as GA in the CREATE KEYSPACE section This commit removes the information that tablets are an experimental feature from the CREATE KEYSPACE section. In addition, it removes the notes and cautions that are redundant when a feature is GA, especially the information and warnings about the future plans. Fixes https://github.com/scylladb/scylladb/issues/18670 Closes scylladb/scylladb#19063	2024-06-10 12:36:36 +03:00
Kefu Chai	069be01451	lang: remove redundant std::move() C++ standard enforces copy elision in this case. and copy elision is more performant than constructing the return value with a move constructor, so no need to use `std:move()` here. and GCC-14 rightfully points this out: ``` /home/kefu/dev/scylladb/lang/lua.cc: In member function ‘data_value {anonymous}::from_lua_visitor::operator()(const utf8_type_impl&)’: /var/ssd/scylladb/lang/lua.cc:797:25: error: redundant move in return statement [-Werror=redundant-move] 797 \| return std::move(s); \| ~~~~~~~~~^~~ /home/kefu/dev/scylladb/lang/lua.cc:797:25: note: remove ‘std::move’ call ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19187	2024-06-10 07:41:25 +03:00
Botond Dénes	7b2aad56c4	test/boost/sstable_datafile_test: remove unused semaphores The tests use the ones from test_env, the explicitely created ones are unused. Closes scylladb/scylladb#19167	2024-06-09 20:43:59 +03:00
Kefu Chai	535f2b2134	build: populate cxxflags to abseil before this change, when building abseil, we don't pass cxxflags to compiler, and abseil libraries are build with the default optimization level. in the case of clang, its default optimization level is `-O0`, it compiles the fastest, but the performance of the emitted code is not optimized for runtime performance. but we expect good performance for the release build. a typical command line for building abseil looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -MF absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o.d -o absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/base/internal/scoped_set_env.cc ``` so, in this change, we populate cxxflags to abseil, so that the per-mode `-O` option can be populated when building abseil. after this change, the command line building abseil in release mode looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffunction-sections -fdata-sections -O3 -mllvm -inline-threshold=2500 -fno-slp-vectorize -DSCYLLA_BUILD_MODE=release -g -gz -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -MF absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o.d -o absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/flags/internal/commandlineflag.cc ``` Refs `0b0e661a85` Fixes #19161 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19160	2024-06-09 20:01:50 +03:00
Tomasz Grabiec	c8f71f4825	test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout The check query may be executed on a node which doesn't yet see that the downed server is down, as it is not shut down gracefully. The query coordinator can choose the down node as a CL=1 replica for read and time out. To fix, wait for all nodes to notice the node is down before executing the checking query. Fixes #17938 Closes scylladb/scylladb#19137	2024-06-09 19:39:57 +03:00
Kefu Chai	b5dce7e3d0	docs: correct the link pointing to Scylla U before this change it points to https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/ which then redirects the browser to https://university.scylladb.com/courses/scylla-operations/, but it should have point to https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/ in this change, the hyperlink is corrected. Fixes #19163 Refs `6e97b83b60` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19182	2024-06-09 19:37:21 +03:00
Avi Kivity	7b301f0cb9	Merge 'Encapsulate wasm and lua management in lang::manager service' from Pavel Emelyanov After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code. This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737) Closes scylladb/scylladb#19166 * github.com:scylladb/scylladb: code: Enlighten wasm headers usage lang: Unfriend wasm context from manager lang, cql3, schema_tables: Don't mess with db::config lang: Don't use db::config to create lua context lang: Don't use db::config to create wasm context lang: Drop manager::precompile() method cql3, schema_tables: Generalize function creation wasm: Replace startup_context with wasm_config lang: Add manager::start() method lang: Move manager to lang namespace lang: Move wasm::manager to its .cc/.hh files	2024-06-09 19:32:26 +03:00
Kefu Chai	9318d21a22	sstables: change const_iterator::value_type to uint64_t in general, the value_type of a `const_iterator` is `T` instead of `const T`, what has the const specifier is `reference`. because, when dereferencing an iterator, the value type does not matter any more, as it always a copy. and GCC-14 points this out: ``` /home/kefu/dev/scylladb/sstables/compress.hh:224:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] 224 \| value_type operator() const { \| ^~~~~~~~~~ /home/kefu/dev/scylladb/sstables/compress.hh:228:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] 228 \| value_type operator[](ssize_t i) const { \| ^~~~~~~~~~ ``` so, in this change, let's change the value_type to `uint64_t`. please note, it's not typical to return `value_type` from `operator` or `operator[]` of an iterator. but due to the design of segmented_offsets, we cannot return a reference, so let's keep it this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19186	2024-06-09 19:21:16 +03:00
Avi Kivity	b2a500a9a1	Merge 'alternator: keep TTL work in the maintenance scheduling group' from Botond Dénes Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group. Fixes: #18719 - [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this. Closes scylladb/scylladb#18729 * github.com:scylladb/scylladb: alternator, scheduler: test reproducing RPC scheduling group bug main: add maintenance tenant to messaging_service's scheduling config	2024-06-09 19:20:18 +03:00
Kefu Chai	58edee8d93	mutation/mutation_rebuilder: remove redundant std::move() GCC-14 rightfully points out: ``` /var/ssd/scylladb/mutation/mutation_rebuilder.hh: In member function ‘const mutation& mutation_rebuilder::consume_new_partition(const dht::decorated_key&)’: /var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: error: redundant move in initialization [-Werror=redundant-move] 24 \| _m = mutation(_s, std::move(dk)); \| ~~~~~~~~~^~~~ /var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: note: remove ‘std::move’ call ``` as `dk` is passed with a const reference, `std::move()` does not help the callee to consume from it. so drop the `std::move()` here. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19188	2024-06-09 19:19:37 +03:00
Nadav Har'El	13cf6c543d	test/alternator: fix flaky test test_item_latency The Alternator test test_metrics.py::test_item_latency confirms that for several operation types (PutItem, GetItem, DeleteItem, UpdateItem) we did not forget to measure their latencies. The test checked that a latency was updated by checking that two metrics increases: scylla_alternator_op_latency_count scylla_alternator_op_latency_sum However, it turns out that the "sum" is only an approximate sum of all latencies, and when the total sum grows large it sometimes does not increase when a short latency is added to the statistics. When this happens, this test fails on the assertion that the "sum" increases after an operation. We saw this happening sometimes in CI runs. The simple fix is to stop checking _sum at all, and only verify that the _count increases - this is really an integer counter that unconditionally increases when a latency is added to the histogram. Don't worry that the strength of this test is reduced - this test was never meant to check the accuracy or correctness of the histograms - we should have different (and better) tests for that, unrelated to Alternator. The purpose of this test is only to verify that for some specific operation like PutItem, Alternator didn't forget to measure its latency and update the histogram. We want to avoid a bug like we had in counters in the past (#9406). Fixes #18847. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19080	2024-06-09 19:19:09 +03:00
Botond Dénes	37fd568139	sstables/compress.hh: remove unused forward declaration struct compress if forward declared right before its definition. At some point in the past there was probably some code there using it, but now its gone so remove it. Closes scylladb/scylladb#19168	2024-06-09 17:52:05 +03:00
Guilherme Nogueira	cf157e4423	Remove comma that breaks CQL DML on tablets.rst The current sample reads: ```cql CREATE KEYSPACE my_keyspace WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3, } AND tablets = { 'enabled': false }; ``` The additional comma after `'replication_factor': 3` breaks the query execution. Closes scylladb/scylladb#19177	2024-06-09 14:58:13 +03:00
Botond Dénes	6e3b997e04	docs: nodetool status: document keyspace and table arguments Also fix the example nodetool status invocation. Fixes: #17840 Closes scylladb/scylladb#18037	2024-06-09 00:37:12 +02:00
Kefu Chai	f4706be8a8	test: test_topology_ops: adapt to tablets in `e7d4e080`, we reenabled the background writes in this test, but when running with tablets enabled, background writes are still disabled because of #17025, which was fixed last week. so we can enable background writes with tablets. in this change, * background writes are enabled with tablets. * increase the number of nodes by 1 so that we have enough nodes to fulfill the needs of tablets, which enforces that the number of replicas should always satisfy RF. * pass rf to `start_writes()` explicitly, so we have less magic numbers in the test, and make the data dependencies more obvious. Fixes #17589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18707	2024-06-08 17:46:37 +02:00
Dawid Medrek	a5528a2093	db/hints: Log when ignoring invalid hint directories In `58784cd`, `aa4b06a` and other commits migrating hinted handoff from IPs to host IDs (scylladb/scylladb#15567), we started ignoring hint directories of invalid names, i.e. those that represent neither an IP address, nor a host ID. They remain on disk and are taken into account while computing e.g. the total size of hints, but they're not used in any way. These changes add logs informing the user when Scylla encounters such a directory. Closes scylladb/scylladb#17566	2024-06-07 19:19:15 +02:00
Michał Chojnowski	fee48f67ef	storage_proxy: avoid infinite growth of _throttled_writes storage_proxy has a throttling mechanism which attempts to limit the number of background writes by forcefully raising CL to ALL (it's not implemented exactly like that, but that's the effect) when the amount of background and queued writes is above some fixed threshold. If this is applied to a write, it becomes "throttled", and its ID is appended to into _throttled_writes. Whenever the amount of background and queued writes falls below the threshold, writes are "unthrottled" — some IDs are popped from _throttled_writes and the writes represented by these IDs — if their handlers still exist — have their CL lowered back. The problem here is that IDs are only ever removed from _throttled_writes if the number of queued and background writes falls below the threshold. But this doesn't have to happen in any finite time, if there's constant write pressure. And in fact, in one load test, it hasn't happened in 3 hours, eventually causing the buffer to grow into gigabytes and trigger OOM. This patch is intended to be a good-enough-in-practice fix for the problem. Fixes scylladb/scylladb#17476 Fixes scylladb/scylladb#1834 Closes scylladb/scylladb#19136	2024-06-07 15:56:23 +02:00
Gleb Natapov	34cf5c81f6	group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group Currently they both run in streaming group and it may become busy during repair/mv building and affect group0 functionality. Move it to the gossiper group where it should have more time to run. Fixes scylladb/scylladb#18863 Closes scylladb/scylladb#19138	2024-06-07 15:31:44 +02:00
Pavel Emelyanov	bebd121936	code: Enlighten wasm headers usage Now when function context creation is encapsulated in lang::manager, some .cc files can stop using wasm-specific headers and just go with the lang/manager.hh one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	ceebbc5948	lang: Unfriend wasm context from manager The friendship was needed to get engine and instance cache from manager, but there's a shorter way to create cotnext with the info it needs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	b0ffc03599	lang, cql3, schema_tables: Don't mess with db::config Not function context creation is encapsulated in lang::manager so it's possible to patch-out few more places that use database as config provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	b854bf4b83	lang: Don't use db::config to create lua context Similarly to previous patch, lua context needs db::config for creation. It's better to get the configurables via lang::manager::config. One thing to note -- lua config carries updateable_values on board, but respective db::config options and _not_ LiveUpdate-able, so the lua config could just use simple data types. This patch keeps updateable values intact for brevity. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	783ccc0a74	lang: Don't use db::config to create wasm context The managerr needs to get two "fuel" configurables from db::config in order to create context. Instead of carrying db config from callers, keep the options on existing lang::manager::config and use them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	f277bd89f5	lang: Drop manager::precompile() method It's not helping much any longer. Manager can call wasm:: stuff directly with less code involved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	882b2f4e9f	cql3, schema_tables: Generalize function creation When a function is created with the CREATE FUNCTION statement, the statement handler does all the necessary preparations on its own. The very same code exists in schema_tables, when the function is loaded on boot. This patch generalizes both and keeps function language-specific context creation inside lang/ code. The creation function returns context via argument reference. It would have been nicer if it was returned via future<>, but it's not suitable for future<T> type :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	fe7ff7172d	wasm: Replace startup_context with wasm_config The lang::manager starts with the help of a context because it needs to have std::shared_ptr<> pointg to cross-shard shared wasm engine and runner thread. For that a context is created in advance, that then helps sharing the engine and runner across manager instances. This patch removes the "context" and replaces it with classical manager::config. With it, it's lang::manager who's now responsible for initializing itself. In order to have cross-shard engine and thread pointers, the start() method uses invoke_on_others() facility to share the pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	0dad72b736	lang: Add manager::start() method Just like any other sharded<> service, the lang::manager now starts and stops in a classical sequence of await sharded<manager>::start() defer([] { await sharded<manager>::stop() }) await sharded<manager>::invoke_on_all(&manager::start) For now the method is no-op, next patches will start using it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	f950469af5	lang: Move manager to lang namespace And, while at it, rename local variable to refer to it to as "manager" not "wasm". Query processor and database also have getters named "wasm()", these are not renamed yet to keep patch smaller (and those getters are going to be reworked further anyway). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	1dec79e97d	lang: Move wasm::manager to its .cc/.hh files It's going to become a facade in front of both -- wasm and lua, so keep it in files with language independent names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Marcin Maliszkiewicz	c13fea371c	cql3: always return created event in create ks/table/type/view statement In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS and later USE KEYSPACE it can happen that schema in driver's session is out of sync because it synces when it receives special message from CREATE KEYSPACE response. Similar situation occurs with other schema change statements. In this patch we fix only create keyspace/table/type/view statements by always sending created event. Behavior of any other schema altering statements remains unchanged.	2024-06-07 10:36:40 +02:00
Marcin Maliszkiewicz	f6108a72d3	cql3: auth: move auto-grant closer to resource creation code This should reduce the risk of re-introducing issue similar to the one fixed in `ab6988c52f` When grant code is closer to actual creation code (announcing mutations) there is lower chance of those two effects being triggered differently, if we ever call grant_permissions_to_creator and not announce mutations that's very likely a security vulnerability. Additionally comment was rewritten to be more accurate.	2024-06-07 10:26:32 +02:00
Piotr Dulikowski	e18aeb2486	Merge 'mv: gossip the same backlog if a different backlog was sent in a response' from Wojciech Mitros Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog - this can be observed in the following scenario: 1. Cluster starts, all nodes gossip their empty view update backlog to one another 2. On node N, `view_update_backlog_broker` (the backlog gossiper) performs an iteration of its backlog update loop, sees no change (backlog has been empty since the start), schedules the next iteration after 1s 3. Within the next 1s, coordinator (different than N) sends a write to N causing a remote view update (which we do not wait for). As a result, node N replies immediately with an increased view update backlog, which is then noted by the coordinator. 4. Still within the 1s, node N finishes the view update in the background, dropping its view update backlog to 0. 5. In the next and following iterations of `view_update_backlog_broker` on N, backlog is empty, as it was in step 2, so no change is seen and no update is sent due to the check ``` auto backlog = _sp.local().get_view_update_backlog(); if (backlog_published && backlog_published == backlog) { sleep_abortable(gms::gossiper::INTERVAL, _as).get(); continue; } ``` After this scenario happens, the coordinator keeps an information about an increased view update backlog on N even though it's actually already empty This patch fixes the issue this by notifying the gossip that a different backlog was sent in a response, causing it to send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461 Similarly to https://github.com/scylladb/scylladb/pull/18646, without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none Tests: manual. Currently this patch only affects the length of MV flow control delay, which is not reliable to base a test on. A proper test will be added when MV admission control is added, so we'll be able to base the test on rejected requests Closes scylladb/scylladb#18663 github.com:scylladb/scylladb: mv: gossip the same backlog if a different backlog was sent in a response node_update_backlog: divide adding and fetching backlogs	2024-06-07 10:20:21 +02:00
Marcin Maliszkiewicz	281c06ba2e	cql3: extract create ks/table/type/view event code So that the code in subsequent commit is cleaner. Create function/aggregate code was not changed as it would require bigger refactor.	2024-06-07 10:07:50 +02:00
Wojciech Mitros	4aa7ada771	exceptions: make view update timeouts inherit from timed_out_error Currently, when generating and propagating view updates, if we notice that we've already exceeded the time limit, we throw an exception inheriting from `request_timeout_exception`, to later catch and log it when finishing request handling. However, when catching, we only check timeouts by matching the `timed_out_error` exception, so the exception thrown in the view update code is not registered as a timeout exception, but an unknown one. This can cause tests which were based on the log output to start failing, as in the past we were noticing the timeout at the end of the request handling and using the `timed_out_error` to keep processing it and now, even though we do notice the timeout even earlier, due to it's type we log an error to the log, instead of treating it as a regular timeout. In this patch we make the error thrown on timeout during view updates inherit from `timed_out_error` instead of the `request_timeout_exception` (it is also moved from the "exceptions" directory, where we define exceptions returned to the user). Aside from helping with the issue described above, we also improve our metrics, as the `request_timeout_exception` is also not checked for in the `is_timeout_exception` method, and because we're using it to check whether we should update write timeout metrics, they will only start getting updated after this patch. Closes scylladb/scylladb#19102	2024-06-07 09:54:48 +02:00
Kefu Chai	01568a36a5	.github: add workflow to build with clang nightly to be prepared for changes from clang, and enjoy the new warnings/errors from this compiler. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 14:23:06 +08:00
Kefu Chai	bbeabe2989	.github: rename clang-tidy-matcher.json to clang-matcher.json as the matcher actually applies to all warnings from clang frontend, and hence can be reused when building the tree with clang, so let's rename it before using it in the clang build workflows. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 14:23:06 +08:00
Anna Stuchlik	582bafabb3	doc: set 6.0 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: 6.0 is the latest version. 6.0 is removed from the list of unstable versions. It must be merged when ScyllaDB 6.0 is released. No backport is required. Closes scylladb/scylladb#19003	2024-06-07 09:13:56 +03:00
Kefu Chai	571ab9f5f0	config: expand on rpc_keepalive's description before this change, we use "RPC or native". but before thrift support is removed "RPC" implies "thrift", now that we've dropped thrift support, "RPC" could be confusing here, so let's be more specific, and put all connection types in place of "RPC or native". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	c75442bc2a	api: s/rpc/thrift/ replace all occurrences of "rpc" in function names and debugging messages to "thrift", as "rpc" is way too general, and since we are removing "thrift" support, let's take this opportunity to use a more specific name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	36239ec592	db/system_keyspace: drop thrift_version from system.local table so we don't create new sstables with this unused column, but we can still open old sstables of this table which was created with the old schema. Refs #3811 Refs #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	f688fa16bc	transport: do not return client_type from cql_server::connection::make_client_key() since we've dropped the thift support, the `client_type` is always `cql`, there is no need to differentiate different clients anymore. so, we change `make_client_key()` so that it only return the IP address and port. Refs #3811 Refs #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:06 +08:00
Kefu Chai	0e04a033af	.github: add alternator to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:45:00 +08:00
Kefu Chai	a2f54ded80	alternator: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:45:00 +08:00
Kefu Chai	0ff66bf564	.github: change severity to error in clang-include-cleaner since we've addressed all warnings, we are ready to tighten the standards of this workflow, so that contributors are awared of the violation of include-what-you-use policy. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:28:52 +08:00
Kefu Chai	d33ab21ef8	exceptions: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:28:52 +08:00
Kefu Chai	ad649be1bf	treewide: drop thrift support thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. * `rpc_port` and `start_rpc` options are preserved, but they are marked as "Unused". so that the new release of scylladb can consume existing scylla.yaml configurations which might contain these settings. by making them deprecated, user will be able get warned, and update their configurations before we actually remove them in the next major release. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 06:44:59 +08:00
Avi Kivity	cd553848c1	Merge 'auth-v2: use a single transaction in auth related statements ' from Marcin Maliszkiewicz Due to gradual raft introduction into statements code in cases when single statement modified more than one table or mutation producing function was composed out of simpler ones we violated transactional logic and statement execution was not atomic as whole. This patch changes that, so now either all changes resulting from statement execution are applied or none. Affected statements types are: - schema modification - auth modifications - service levels modifications Fixes https://github.com/scylladb/scylladb/issues/17738 Closes scylladb/scylladb#17910 * github.com:scylladb/scylladb: raft: rename mutations_collector to group0_batch raft: rename announce to commit cql3: raft: attach description to each mutations collector group auth: unify mutations_generator type auth: drop redundant 'this' keyword auth: remove no longer used code from standard_role_manager::legacy_modify_membership cql3: auth: use mutation collector for service levels statements cql3: auth: use mutation collector for alter role cql3: auth: use mutation collector for grant role and revoke role cql3: auth: use mutation collector for drop role and auto-revoke auth: add refactored modify_membership func in standard_role_manager auth: implement empty revoke_all in allow_all_authorizer auth: drop request_execution_exception handling from default_authorizer::revoke_all Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" cql3: auth: use mutation collector for grant and revoke permissions cql3: extract changes_tablets function in alter_keyspace_statement cql3: auth: use mutation collector for create role statement auth: move create_role code into service auth: add a way to announce mutations having only client_state ref auth: add collect_mutations common helper auth: remove unused header in common.hh auth: add class for gathering mutations without immediate announce auth: cql3: use auth facade functions consistently on write path auth: remove unused is_enforcing function	2024-06-06 17:31:26 +03:00
Yaniv Michael Kaul	82875095e9	Raft: improve descriptions of metrics 1. Fixed a single typo (send -> sent) 2. Rephrase 'How many' to 'Number of' and use less passive tense. 3. Be more specific in the description of the different metrics insteda of the more generic descriptions. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#19067	2024-06-06 15:18:47 +03:00
Kefu Chai	bac7e1e942	doc: document "enable_tablets" option it sets the cluster feature of tablets, and is a prerequisite for using tablets. Refs #18670 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19090	2024-06-06 15:06:32 +03:00
Marcin Maliszkiewicz	63e6334a64	raft: rename mutations_collector to group0_batch	2024-06-06 13:26:34 +02:00
Kamil Braun	57e810c852	Merge 'Serialize repair with tablet migration' from Tomasz Grabiec We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requests start using the new topology. Also, if transitions are already running, repair waits for them to finish. A blocked tablet migration (e.g. due to down node) will block repair, whereas before it would fail. Once admin resolves the cause of blocked migration, repair will continue. Fixes #17658. Fixes #18561. Closes scylladb/scylladb#18641 * github.com:scylladb/scylladb: test: pylib: Do not block async reactor while removing directories repair: Exclude tablet migrations with tablet repair repair_service: Propagate topology_state_machine to repair_service main, storage_service: Move topology_state_machine outside storage_service storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() tablet_scheduler: Make disabling of balancing interrupt shuffle mode tablet_scheduler: Log whether balancing is considered as enabled	2024-06-06 11:27:03 +02:00
Kamil Braun	256517b570	Merge 'tablets: Filter-out left nodes in get_natural_endpoints()' from Tomasz Grabiec The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 Closes scylladb/scylladb#18955 * github.com:scylladb/scylladb: tablets: Filter-out left nodes in get_natural_endpoints() test: pylib: Extract start_writes() load generator utility	2024-06-06 11:23:27 +02:00
Wojciech Mitros	f70f774e40	mv: gossip the same backlog if a different backlog was sent in a response Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog. This patch changes this by notifying the gossip that a the backlog changed since the last gossip round so a different backlog could have been send through the response piggyback mechanism. With that information, gossip will send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461	2024-06-06 10:45:15 +02:00
Wojciech Mitros	272e80fe0a	node_update_backlog: divide adding and fetching backlogs Currently, we only update the backlogs in node_update_backlog at the same time when we're fetching them. This is done using storage_proxy's method get_view_update_backlog, which is confusing because it's a getter with side-effects. Additionally, we don't always want to update the backlog when we're reading it (as in gossip which is only on shard 0) and we don't always want to read it when we're updating it (when we're not handling any writes but the backlog drops due to background work finish). This patch divides the node_view_backlog::add_fetch as well the storage_proxy::get_view_update_backlog both into two methods; one for updating and one for reading the backlog. This patch only replaces the places where we're currently using the view backlog getter, more situations where we should get/update the backlog should be considered in a following patch.	2024-06-06 10:45:13 +02:00
Botond Dénes	8ff1742182	Merge 'Relax production_snitch_base's property file parsing' from Pavel Emelyanov It consists of reading method and parsing one and it uses class fields to carry data between those two. The former is additionally built with curly continuation chains, while it's naturally linear, so turn it into a coroutine while at it Closes scylladb/scylladb#18994 * github.com:scylladb/scylladb: snitch: Remove production_snitch_base::_prop_file_contents snitch: Remove production_snitch_base::_prop_file_size snitch: Coroutinize load_property_file()	2024-06-06 09:14:33 +03:00
Botond Dénes	cd10beb89d	Merge 'Don't use db::config by gossiper' from Pavel Emelyanov All sharded<service>'s a supposed to have their own config and not use global db::config one. The service config, in turn, is to be created by main/cql_test_env/whatever out of db::config and, maybe, other data. Gossiper is almost there, but it still uses db::config in few places. Closes scylladb/scylladb#19051 * github.com:scylladb/scylladb: gossiper: Stop using db::config gossiper: Move force_gossip_generation on gossip_config gossiper: Move failure_detector_timeout_ms on gossip_config main: Fix indentation after previous patch main: Make gossiper config a sharded parameter main: Add local variable for set of seeds main: Add local variable for group0 id main: Add local variable for cluster_name	2024-06-06 09:12:51 +03:00
Botond Dénes	44975abe18	Merge 'Sanitize start-stop of protocol servers' from Pavel Emelyanov Protocol servers are started last, and are registered in storage_service, which stops them. Also there are deferred actions scheduled to stop protocol servers on aborted start and a FIXME asking to make even this case rely on storage_service. Also, there's a (rather rare) aborted-start bug in alternator and redis. Yet, thrift can be left started in some weird circumstances. This patch fixes it all. As a side effect, the start-stop code becomes shorter and a bit better structured. refs: #2737 Closes scylladb/scylladb#19042 * github.com:scylladb/scylladb: main: Start alternator expiration service earlier main: Start redis transparently main: Start alternator transparently main: Start thrift transparently main: Start native transport transparently storage_service: Make register_protocol_server() start the server storage_service: Turn register_protocol_server() async method storage_service: Outline register_protocol_server() main: Schedule deferred drain_on_shutdown() prior to protocol servers main: Move some trailing startup earlier	2024-06-06 09:08:05 +03:00
Botond Dénes	db5c23491e	Merge '.github: annotate the report from clang-include-cleaner' from Kefu Chai this series * add annotation to the github pull request when extraneous `#include` processor macros are identified * add `exceptions` subdirectory to `CLEANER_DIRS` to demonstrate the annotation. we will fix the identified issue in a follow-up change. --- * This is a CI workflow improvement. No backporting is required. Closes scylladb/scylladb#19037 * github.com:scylladb/scylladb: .github: add exception to CLEANER_DIRS .github: annotate the report from clang-include-cleaner .github: build headers before running clang-include-cleaner	2024-06-06 09:02:26 +03:00
Pavel Emelyanov	acc438e98b	view-update-generator: Start in provided scheduling group Currently it gets the streaming/maintenance one from database, but it can as well just assume that it's already running in the correct one, and the main code fulfils this assumption. This removes one more place that uses database as sched groups provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19078	2024-06-06 08:58:05 +03:00
Tzach Livyatan	c30f81c389	Docs: fix start command in Update replace-dead-node.rst Fix #18920 Closes scylladb/scylladb#18922	2024-06-06 08:56:07 +03:00
Botond Dénes	7aa9bfa661	Merge 'util/result_try: pass template arg list explicitly' from Kefu Chai clang-19 introduced a change which enforces the change proposed by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96), which was accepted by C++20 in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), as [[temp.names]p5](https://eel.is/c++draft/temp.names#6). so, to be future-proof and to be standard compliant, let's pass the template arguments. otherwise we'd have build failure like ``` error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] ``` --- no need to backport. as this change only addresses a FTBFS with a recent build of clang-19. but our CI is not a clang built from llvm's main HEAD. Closes scylladb/scylladb#19100 * github.com:scylladb/scylladb: util/result_try: pass template arg list explicitly util/result_try: pass func as `const F&` instead of `F&&`	2024-06-06 08:54:42 +03:00
Nadav Har'El	b5fd854c77	cql-pytest: be more forgiving to ancient versions of Scylla We recently added to cql-pytest tests the ability to check if tablets are enabled or not (for some tablet-specific tests). When running tests against Cassandra or old pre-tablet versions of Scylla, this fact is detected and "False" is returned immediately. However, we still look at a system table which didn't exist on really ancient versions of Scylla, and tests couldn't run against such versions. The fix is trivial: if that system table is missing, just ignore the error and return False (i.e., no tablets). There were no tablets on such ancient versions of Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19098	2024-06-06 08:53:26 +03:00
Pavel Emelyanov	4606302ead	distributed_loader: Remove base_path from populator It's unused, populator uses it to print debugging messages, but it can as well use table->dir() for it, just as sstable_directory does. One message looks useless and is removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19113	2024-06-06 08:49:41 +03:00
Pavel Emelyanov	84f0bab27c	hints/manager: Simplify hints dir evaluation Currently the code wraps simple "if" with std::invoke over a lambda. Also, the local variable that gets the result, is declared as const one, which prevents it from being std::move()-d in the very next line. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19106	2024-06-06 08:31:30 +03:00
Pavel Emelyanov	ad0e6b79fc	replica: Remove all_datadir from keyspace config This vector of paths is only used to generate the same vector of paths for table config, but the latter already has all the needed info. It's the part of the plan to stop using paths/directories in keyspaces and tables, because with storage-options tables no longer keep their data in "files on disk", so this information goes to sstables storage manager (refs #12707) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19119	2024-06-06 08:30:34 +03:00
Kefu Chai	4a36918989	topology_coordinator: handle/wait futures when stopping topology_coordinator before this change, unlike other services in scylla, topology_coordinator is not properly stopped when it is aborted, because the scylla instance is no longer a leader or is being shut down. its `run()` method just stops the grand loop and bails out before topology_coordinator is destroyed. but we are tracking the migration state of tablets using a bunch of futures, which might not be handled yet, and some of them could carry failures. in that case, when the `future` instances with failure state get destroyed, seastar calls `report_failed_future`. and seastar considers this practice a source a bug -- as one just fails to handle an error. that's why we have following error: ``` WARN 2024-05-19 23:00:42,895 [shard 0:strm] seastar - Exceptional future ignored: seastar::rpc::unknown_verb_error (unknown verb), backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56c14e /home/bhalevy/.ccm/scylla-repository/local_tarball/libre loc/libseastar.so+0x56c770 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56ca58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x38c6ad 0x29cdd07 0x29b376b 0x29a5b65 0x108105a /home/bhalevy/.ccm/scylla-repository/local_tarbal l/libreloc/libseastar.so+0x3ff1df /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x400367 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff838 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36de58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d092 0x1017cba 0x1055080 0x1016ba7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x1015524 ``` and the backtrace looks like: ``` seastar::current_backtrace_tasklocal() at ??:? seastar::current_tasktrace() at ??:? seastar::current_backtrace() at ??:? seastar::report_failed_future(seastar::future_state_base::any&&) at ??:? service::topology_coordinator::tablet_migration_state::~tablet_migration_state() at topology_coordinator.cc:? service::topology_coordinator::~topology_coordinator() at topology_coordinator.cc:? service::run_topology_coordinator(seastar::sharded<db::system_distributed_keyspace>&, gms::gossiper&, netw::messaging_service&, locator::shared_token_metadata&, db::system_keyspace&, replica::database&, service::raft_group0&, service::topology_state_machine&, seastar::abort_source&, raft::server&, seastar::noncopyable_function<seastar::future<service::raft_topology_cmd_result> (utils::tagged_tagged_integer<raft::internal::non_final, raft::term_tag, unsigned long>, unsigned long, service::raft_topology_cmd const&)>, service::tablet_allocator&, std::chrono::duration<long, std::ratio<1l, 1000l> >, service::endpoint_lifecycle_notifier&) [clone .resume] at topology_coordinator.cc:? seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at main.cc:? seastar::reactor::run_some_tasks() at ??:? seastar::reactor::do_run() at ??:? seastar::reactor::run() at ??:? seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ??:? ``` and even worse, these futures are indirectly owned by `topology_coordinator`. so there are chances that they could be used even after `topology_coordinator` is destroyed. this is a use-after-free issue. because the `run_topology_coordinator` fiber exits when the scylla instance retires from the leader's role, this use-after-free could be fatal to a running instance due to undefined behavior of use after free. so, in this change, we handle the futures in `_tablets`, and note down the failures carried by them if any. Fixes #18745 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18991	2024-06-06 07:55:03 +03:00
Israel Fruchter	1fd600999b	Update tools/cqlsh submodule v6.0.20 * tools/cqlsh c8158555...0d58e5ce (6): > cqlsh.py: fix server side describe after login command > cqlsh: try server-side DESCRIBE, then client-side > Refactor tests to accept both client and server side describe > github actions: support testing with enterprise release > Add the tab-completion support of SERVICE_LEVEL statements > reloc/build_reloc.sh: don't use `--no-build-isolation` Closes scylladb/scylladb#18990	2024-06-06 07:32:05 +03:00
Tomasz Grabiec	2c3f7c996f	test: pylib: Fetch all pages by default in run_async Fetching only the first page is not the intuitive behavior expected by users. This causes flakiness in some tests which generate variable amount of keys depending on execution speed and verify later that all keys were written using a single SELECT statement. When the amount of keys becomes larger than page size, the test fails. Fixes #18774 Closes scylladb/scylladb#19004	2024-06-05 18:07:24 +03:00
Tomasz Grabiec	5ca54a6e88	test: pylib: Do not block async reactor while removing directories This fixes a problem where suite cleanup schedules lots of uninstall() tasks for servers started in the suite, which schedules lots of tasks, which synchronously call rmtree(). These take over a minute to finish, which blocks other tasks for tests which are still executing. In particular, this was observed to case ManagerClient.server_stop_gracefully() to time-out. It has a timeout of 60 seconds. The server was stopped quickly, but the RESTful API response was not processed in time and the call timed out when it got the async reactor.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	98323be296	repair: Exclude tablet migrations with tablet repair We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requets start using the new topology. Also, if transitions are already running, repair waits for them to finish. Fixes #17658. Fixes #18561.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	e97acf4e30	repair_service: Propagate topology_state_machine to repair_service	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	c45ce41330	main, storage_service: Move topology_state_machine outside storage_service It will be propagated to repair_service to avoid cyclic dependency: storage_service <-> repair_service	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	476c076a21	storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() Will be used later in a place which doesn't have access to storage_service but has to toplogy_state_machine. It's not necessary to start group0 operation around polling because the busy() state can be checked atomically and if it's false it means the topology is no longer busy.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	1513d6f0b0	tablet_scheduler: Make disabling of balancing interrupt shuffle mode Tests will rely on that, they will run in shuffle mode, and disable balancing around section which otherwise would be infinitely blocked by ongoing shuffling (like repair).	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	6c64cf33df	tablet_scheduler: Log whether balancing is considered as enabled	2024-06-05 16:11:22 +02:00
Benny Halevy	b2fa954d82	gms: endpoint_state: get_dc_rack: do not assign to uninitialized memory Assigning to a member of an uninitialized optional does not initialize the object before assigning to it. This resulted in the AddressSanitizer detecting attempt to double-free when the uninitialized string contained apprently a bogus pointer. The change emplaces the returned optional when needed without resorting to the copy-assignment operator. So it's not suceptible to assigning to uninitialized memory, and it's more efficient as well... Fixes scylladb/scylladb#19041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19043	2024-06-05 13:09:01 +03:00
Kamil Braun	18f5d6fd89	Merge 'Fail bootstrap if ip mapping is missing during double write stage' from Gleb Natapov If a node restart just before it stores bootstrapping node's IP it will not have ID to IP mapping for bootstrapping node which may cause failure on a write path. Detect this and fail bootstrapping if it happens. Closes scylladb/scylladb#18927 * github.com:scylladb/scylladb: raft topology: fix indentation after previous commit raft topology: do not add bootstrapping node without IP as pending test: add test of bootstrap where the coordinator crashes just before storing IP mapping schema_tables: remove unused code	2024-06-05 11:15:15 +02:00
Raphael S. Carvalho	3983f69b2d	topology_experimental_raft/test_tablets: restore usage of check_with_down `e7246751b6` incorrectly dropped its usage in test_tablet_missing_data_repair. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19092	2024-06-05 10:11:02 +02:00
Kefu Chai	b7994ee4f6	util/result_try: pass template arg list explicitly clang-19 introduced a change which enforces the change proposed by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96), which was accepted by C++20 in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), as [[temp.names]p5](https://eel.is/c++draft/temp.names#6). so, to be future-proof and to be standard compliant, let's pass the template arguments. otherwise we'd have build failure like ``` error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-05 13:19:45 +08:00
Kefu Chai	e2158a0c72	util/result_try: pass func as `const F&` instead of `F&&` as we the functor passed to `invoke()` is not a rvalue, if we specify the template parameter explicitly, clang errors out like: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -MF transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o.d -o transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -c /home/kefu/dev/scylladb/transport/server.cc In file included from /home/kefu/dev/scylladb/transport/server.cc:39: /home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke' 210 \| return Converter::template invoke<const Cb, const Ex&>(_cb, ex); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here 194 \| return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 327 \| first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)), \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 518 \| result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...); \| ^ /home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here 484 \| return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument 33 \| invoke(F&& f, Args&&... args) { \| ^ ~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke' 210 \| return Converter::template invoke<const Cb, const Ex&>(_cb, ex); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here 194 \| return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here 327 \| first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)), \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:326:79: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here 326 \| return try_catch_chain_impl<R, Converter, CatchHandlers...>::template invoke_in_try_catch<>( \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 518 \| result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...); \| ^ /home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here 484 \| return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument 33 \| invoke(F&& f, Args&&... args) { \| ^ ~~~~~ ``` so to prepare for the change to pass template parameter explicitly, let's pass `f` as a `const` reference, instead of as a rvalue refernece. also, this parameter type matches with our usage case -- we always pass a member variable `_cb` to `invoke`, and we don't expect that `invoke()` would move it away. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-05 13:19:40 +08:00
Kefu Chai	cfd6084edd	Update seastar submodule * seastar 914a4241...9ce62705 (18): > github: do not set --dpdk-machine haswell > io_tester: correct calculation of writes count > io-tester.md: update information about file size > reactor: align used hint for extent size to 128KB for XFS > Fix compilation failure on Ubuntu 22.04 > io_tester: align the used file size to 1MB > circular_buffer_fixed_capacity: arrow operator instead of . operator > posix-file-impl: Do not keep device-id on board > github: s/clang++-18/clang++/ > include: include used headers > include: include used headers > iotune: allow user to set buffer size for random IO > abort_source: add method to get exception pointer > github: cancel a job if it takes longer than 40 minutes > std-compat: remove #include:s which were added for pre C++17 > perf_tests: measure and report also cpu cycles > linux_perf_events: add user_cpu_cycles_retired > linux_perf_event: user_instructions_retired: exclude_idle Closes scylladb/scylladb#19019	2024-06-05 08:13:55 +03:00
Michał Chojnowski	c901139d07	scylla-gdb.py: print coroutine names in `scylla fiber` Enriches the output of `scylla fiber` with resolved names of coroutine resume functions. Before: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 ``` After: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` Closes scylladb/scylladb#19091	2024-06-04 22:32:17 +03:00
Pavel Emelyanov	dcc083110d	gossiper: Stop using db::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	00d8590d7e	gossiper: Move force_gossip_generation on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	e3abc5d2fd	gossiper: Move failure_detector_timeout_ms on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	53906aa431	main: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	fcab847f31	main: Make gossiper config a sharded parameter Next patches will put updateable_value's on it, but plain copy of them across shard doesn't work (see #7316) Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:26 +03:00
Pavel Emelyanov	77361e1661	main: Add local variable for set of seeds Next patch will do seeds assignment to gossiper config on each shard, so it's good to have it once, then copy around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:18:47 +03:00
Pavel Emelyanov	9c719a0a02	main: Add local variable for group0 id Next patch will do group0_id assignment to gossiper config on each shard, so it's good to have it once, then copy around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:17:58 +03:00
Pavel Emelyanov	b069544d16	main: Add local variable for cluster_name It's modified if its empty, next patch will make this code be called on each shard, so modification must happen only once Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:17:58 +03:00
Marcin Maliszkiewicz	ac0e164a6b	raft: rename announce to commit Old wording was derived from existing code which originated from schema code. Name commit better describes what we do here.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	370a5b547e	cql3: raft: attach description to each mutations collector group This description is readable from raft log table. Previously single description was provided for the whole announce call but since it can contain mutations from various subsystems now description was moved to add_mutation(s)/add_generator function calls.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	3289fbd71e	auth: unify mutations_generator type mutation_collector supports generators but it was added to /service/raft code so it couldn't depend on /auth/ but once it's added we can remove generator type from /auth/ as it can depend on /service/raft.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	64b635bb58	auth: drop redundant 'this' keyword	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	b639350933	auth: remove no longer used code from standard_role_manager::legacy_modify_membership Since we gruadually switched all auth-v2 code paths to use modify_membership it's now safe to delete unused code.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a88b7fc281	cql3: auth: use mutation collector for service levels statements This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	97a5da5965	cql3: auth: use mutation collector for alter role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a12c8ebfce	cql3: auth: use mutation collector for grant role and revoke role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	5ba7d1b116	cql3: auth: use mutation collector for drop role and auto-revoke The main theme of this commit is executing drop keyspace/table/aggregate/function statements in a single transaction together with auth auto-revoke logic. This is the logic which cleans related permissions after resource is deleted. It contains serveral parts which couldn't easily be split into separate commits mainly because mutation collector related paths can't be mixed together. It would require holding multiple guards which we don't support. Another reason is that with mutation collector the changes are announced in a single place, at the end of statement execution, if we'd announce something in the middle then it'd lead to raft concurrent modification infinite loop as it'd invalidate our guard taken at the begining of statement execution. So this commit contains: - moving auto-revoke code to statement execution from migration_listener * only for auth-v2 flow, to not break the old one * it's now executed during statement execution and not merging schemas, which means it produces mutations once as it should and not on each node separately * on_before callback family wasn't used because I consider it much less readable code. Long term we want to remove auth_migration_listener. - adding mutation collector to revoke_all * auto-revoke uses this function so it had to be changed, auth::revoke_all free function wrapper was added as cql3 layer should not use underlying_authorizer() directly. - adding mutation collector to drop_role * because it depends on revoke_all and we can't mix old and new flows * we need to switch all functions auth::drop_role call uses * gradual use of previously introduced modify_membership, otherwise we would need to switch even more code in this commit	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	9ca15a3ada	auth: add refactored modify_membership func in standard_role_manager The new function is simplified and handles only auth-v2 flow with mutation_collector (single transaction logic). It's not used in this commit and we'll switch code paths gradually in subsequent commits.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	f67761f5b6	auth: implement empty revoke_all in allow_all_authorizer There is no need to throw an exception because it was always ignored later with an empty catch block.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	75ccab9693	auth: drop request_execution_exception handling from default_authorizer::revoke_all The change applies only to auth-v2 code path. It seems nothing in the code except cdc and truncate throws this exception so it's probably dead code. I'll keep it for now in other places to not accidentally break things in auth-v1, in auth-v2 even if this exception is used it should likely fail the query because otherwise data consistency is silently violated.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	01fb43e35f	Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" This reverts commit `80ed442be2`. This logic was replaced in previous commit by dynamic cast. Hopefully even this cast will be eliminated in the future.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	0573fee2a9	cql3: auth: use mutation collector for grant and revoke permissions This is done to achieve single transaction semantics. The change includes auto-grant feature. In particular for schema related auto-grant we don't use normal mutation collector announce path but follow migration manager, this may be unified in the future.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	9ddfc2ce4b	cql3: extract changes_tablets function in alter_keyspace_statement It will be used outside this class in the following commit	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	2a6cfbfb33	cql3: auth: use mutation collector for create role statement This is done to achieve single transaction semantics. grant_permissions_to_creator is logically part of create role but its change will be included in following commits as it spans multiple usages. Additinally we disabled rollback during create role as it won't work and is not needed with single transaction logic.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	e4a83008b6	auth: move create_role code into service We need this later as we'll add condition based on legacy_mode(qp) and free function doesn't have access to qp. Moreover long term we should get rid of this weird free function pattern bloat.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	6f654675c6	auth: add a way to announce mutations having only client_state ref Statements code have only access to client_state from which it takes auth::service. It doesn't have abort_source nor group0_client so we need to add them to auth::service. Additionally since abort_source can't be const the whole announce_mutations method needs non const auth::service so we need to remove const from the getter function.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	47864b991a	auth: add collect_mutations common helper It will be used in subsequent commits.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	b2cbcb21e8	auth: remove unused header in common.hh	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	7e0a801f53	auth: add class for gathering mutations without immediate announce To achieve write atomicity across different tables we need to announce mutations in a single transaction. So instead of each function doing a separate announce we need to collect mutations and announce them once at the end.	2024-06-04 15:43:04 +02:00
Piotr Dulikowski	01ff8108c1	Merge 'db/hints: Use host ID to IP mappings to choose the ep manager to drain when node is leaving' from Dawid Mędrek In `d0f5873`, we introduced mappings IP–host ID between hint directories and the hint endpoint managers managing them. As a consequence, it may happen that one hint directory stores hints towards multiple nodes at the same time. If any of those nodes leaves the cluster, we should drain the hint directory. However, before these changes that doesn't happen – we only drain it when the node of the same host ID as the hint endpoint manager leaves the cluster. This PR fixes that draining issue in the pre-host-ID-based hinted handoff. Now no matter which of the nodes corresponding to a hint directory leaves the cluster, the directory will be drained. We also introduce error injections to be able to test that it indeed happens. Fixes scylladb/scylladb#18761 Closes scylladb/scylladb#18764 * github.com:scylladb/scylladb: db/hints: Introduce an error injection to test draining db/hints: Ensure that draining happens	2024-06-04 10:17:14 +02:00
Botond Dénes	d120f0d7d3	Merge 'tasks: introduce task manager's task folding' from Aleksandra Martyniuk Task manager's tasks stay in memory after they are finished. Moreover, even if a child task is unregistered from task manager, it is still alive since its parent keeps a foreign pointer to it. Also, when a task has finished successfully there is no point in keeping all of its descendants in memory. The patch introduces folding of task manager's tasks. Whenever a task which has a parent is finished it is unregistered from task manager and foreign_ptr to it (kept in its parent) is replaced with its status. Children's statuses of the task are dropped unless they or one of their descendants failed. So for each operation we keep a tree of tasks which contains: - a root task and its direct children (status if they are finished, a task otherwise); - running tasks and their direct children (same as above); - a statuses path from root to failed tasks. /task_manager/wait_task/ does not unregister tasks anymore. Refs: #16694. - [ ] Backport reason (please explain below if this patch should be backported or not) Requires backport to 6.0 as task number exploded with tablets. Closes scylladb/scylladb#18735 * github.com:scylladb/scylladb: docs: describe task folding test: rest_api: add test for task tree structure test: rest_api: modify new_test_module tasks: test: modify test_task methods api: task_manager: do not unregister task in /task_manager/wait_task/ tasks: unregister tasks with parents when they are finished tasks: fold finished tasks info their parents tasks: make task_manager::task::impl::finish_failed noexcept tasks: change _children type	2024-06-04 08:43:44 +03:00
Pavel Emelyanov	9e65434692	main: Start alternator expiration service earlier Prior to registering drain_on_shutdown and all the protorocl servers. To keep the natural sequence - start core - register drain-on-shutdown - start transport(s) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	d7c231ede9	main: Start redis transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	4204d7f4f9	main: Start alternator transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Also move the controller variable lower to keep it all next to each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	d3e1121793	main: Start thrift transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. It also fixes a rare bug. If thrifst is not asked to be started on boot, its deferred shutdown action isn't created, so it it's later started via the API, it won't be stopped on shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	830a87e862	main: Start native transport transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Marcin Maliszkiewicz	09b26208e9	auth: cql3: use auth facade functions consistently on write path Auth interface is quite mixed-up but general rule is that cql statements code calls auth::* free functions from auth/service.hh to execute auth logic. There are many exceptions where underlying_authorizer or underlying_role_manager or auth::service method is used instead. Service should not leak it's internal APIs to upper layers so functions like underlying_role_manager should not exists. In this commit we fix tiny fragment related to auth write path.	2024-06-03 14:27:13 +02:00
Marcin Maliszkiewicz	126c82a6f5	auth: remove unused is_enforcing function	2024-06-03 14:27:13 +02:00
Wojciech Mitros	2cafa573df	mv: update the backlogs when view updates finish Currently, the backlog used for MV flow control is only updated after we generate view updates as a result of a write request. However, when the resources are no longer used, we should also notice that to prevent excessive slowdowns caused by the MV flow control calulating the delays based of an outdated, large backlog. This patch makes it so the backlogs are updated every time a view update finishes, and not only when the updates start. Fixes #18783 Closes scylladb/scylladb#18804	2024-06-03 14:10:49 +03:00
Avi Kivity	f133ae945a	Merge 'repair: Introduce new primary replica selection algorithm for tablets' from Benny Halevy Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica or get_primary_replica_within_dc, respectively to see if this node is the primary replica for each tablet or not. Fixes https://github.com/scylladb/scylladb/issues/17752 No backport is required before 6.0 as tablets (and tablet repair) are introduced in 6.0 Closes scylladb/scylladb#18784 * github.com:scylladb/scylladb: repair: repair_tablets: use get_primary_replica repair: repair_tablets: no need to check ranges_specified per tablet locator: tablet_map: add get_primary_replica_within_dc locator: tablet_map: get_primary_replica: do not copy tablet info locator: tablet_map: get_primary_replica: return tablet_replica	2024-06-03 13:16:49 +03:00
Kefu Chai	0da0461668	build: cmake: do not scan for C++20 modules when creating the build rules using CMake 3.28 and up, it generates the rules to scan for C++20 modules for C++20 projects by default. but this slows down the compilation, and introduces unnecessary dependencies for each of the targets when building .cc files. also, it prevents the static analysis tools from running from a repo which only have its building system generated, but not yet built. as, these tools would need to process the source files just like a compiler does, and if any of the included header files is missing, they just fail. so, before we migrate to C++20 modules, let's disable this feature. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19038	2024-06-03 12:51:40 +03:00
Pavel Emelyanov	9292d326b7	storage_service: Make register_protocol_server() start the server After a protocol server is registered, it can be instantly started by the main code. It makes sense to generalize this sequence by teaching register_protocol_server() start it. For now it's a no-op change, as "start_instantly" is false by default, but next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	2aab9f6340	storage_service: Turn register_protocol_server() async method To make the next patch shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	eb033e3c5f	storage_service: Outline register_protocol_server() To make next patch shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	315ef4c484	main: Schedule deferred drain_on_shutdown() prior to protocol servers Nex patches will remove protocol servers' deferred stops and will rely on drain_on_shutdown -> stop_transport to do it, so the drain deferred action some come before protocol servers' registration. This also fixes a bug. Currently alternator and redis both rely on protocol servers to stop them on shutdown. However, when startup is aborted prior to drain_on_shutdown() registration, protocol servers are not stopped and alternator and redis can remain stopped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:11:04 +03:00
Pavel Emelyanov	2fa89d8696	main: Move some trailing startup earlier The set_abort_on_ebadf() call and some api endpoints registration come after protocol servers. The latter is going to be shuffled, so move the former earlier not to hang around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:01:24 +03:00
Kefu Chai	c6691d3217	.github: add exception to CLEANER_DIRS to cover more directories to prevent regressions of violating the "include what you use" policy in this directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 12:45:04 +08:00
Kefu Chai	21bdda550a	.github: annotate the report from clang-include-cleaner before this change, user has to click into the "Details" link for access the report from clang-include-cleaner. but this is neither convenient nor obvious. after this change, the report is annotated in the github web interface, this helps the reviewers and contributers to user this tool in a more efficient way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 12:45:04 +08:00
Kefu Chai	3d056a0cf2	.github: build headers before running clang-include-cleaner clang-include-cleaner actually interprets the preprocessor macros, and looks at the symbols. so we have to prepare the included headers before using it. so, but in ScyllaDB, we don't have a single target for building all the used headers, so we have to build them either in batch of separately. in this change, we build the included headers before running clang-include-cleaner. this allows us to run clang-include-cleaner on more source files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 11:30:31 +08:00
Nadav Har'El	95db1c60d6	test/alternator: fix a test failing on Amazon DynamoDB The test test_table.py::test_concurrent_create_and_delete_table failed on Amazon DynamoDB because of a silly typo - "false" instead of "False". A function detecting Scylla tried to return false when noticing this isn't Scylla - but had a typo, trying to return "false" instead of "False". This patch fixes this typo, and the test now works on DynamoDB: test/alternator/run --aws test_table.py::test_concurrent_create_and_delete_table Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17799	2024-06-02 22:25:56 +03:00
Avi Kivity	79d0711c7e	Merge 'tablets: load balancer: Use random selection of candidates when moving tablets' from Tomasz Grabiec In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and another one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824 Closes scylladb/scylladb#18885 * github.com:scylladb/scylladb: tablets: load balancer: Use random selection of candidates when moving tablets test: perf: Add test for tablet load balancer effectiveness load_sketch: Extract get_shard_minmax() load_sketch: Allow populating only for a given table	2024-06-02 22:03:37 +03:00
Benny Halevy	18df36d920	repair: repair_tablets: use get_primary_replica Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica* functions to get the primary replica for each tablet, possibly within a dc. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:28:39 +03:00
Benny Halevy	009767455d	repair: repair_tablets: no need to check ranges_specified per tablet The code already turns off `primary_replica_only` if `!ranges_specified.empty()`, so there's no need to check it again inside the per-tablet loop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	84761acc31	locator: tablet_map: add get_primary_replica_within_dc Will be needed by repair in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	2de79c39dc	locator: tablet_map: get_primary_replica: do not copy tablet info Currently, the function needlessly copies the tablet_info (all tablet replicas in particular) to a local variable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	c52f70f92c	locator: tablet_map: get_primary_replica: return tablet_replica This is required by repair when it will start using get_primary_replica in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Tomasz Grabiec	603abddca9	tablets: load balancer: Use random selection of candidates when moving tablets In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and the one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	7b1eea794b	test: perf: Add test for tablet load balancer effectiveness	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	c9bcb5e400	load_sketch: Extract get_shard_minmax()	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	3be6120e3b	load_sketch: Allow populating only for a given table	2024-06-02 14:23:00 +02:00
Avi Kivity	db4e4df762	alternator: yield while converting large responses to json text We have two paths for generating the json text representation, one for large items and one for small items, but the large item path is lacking: - it doesn't yield, so a response with many items will stall - it doesn't wait for network sends to be accepted by the network stack, so it will allocate a lot of memory Fix by moving the generation to a thread. This allows us to wait for the network stack, which incidentally also fixes stalls. The cost of the thread is amortized by the fact we're emitting a large response. Fixes #18806 Closes scylladb/scylladb#18807	2024-06-02 13:07:13 +03:00
Michał Jadwiszczak	5b4e688668	docs/procedures/backup-restore: use `DESC SCHEMA WITH INTERNALS` Update docs for backup procedure to use `DESC SCHEMA WITH INTERNALS` instead of plain `DESC SCHEMA`. Add a note to use cqlsh in a proper version (at least 6.0.19). Closes scylladb/scylladb#18953	2024-05-31 15:26:36 +02:00
Aleksandra Martyniuk	beef77a778	docs: describe task folding	2024-05-31 10:40:04 +02:00
Aleksandra Martyniuk	d7e80a6520	test: rest_api: add test for task tree structure Add test which checks whether the tasks are folded into their parent as expected.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	fc0796f684	test: rest_api: modify new_test_module Remove remaining test tasks when a test module is removed, so that a node could shutdown even if a test fails.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	30f97ea133	tasks: test: modify test_task methods Wait until the task is done in test_task::finish_failed and test_task::finish to ensure that it is folded into its parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	c1b2b8cb2c	api: task_manager: do not unregister task in /task_manager/wait_task/ If /task_manager/wait_task/ unregisters the task, then there is no way to examine children failures, since their statuses can be checked only through their parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	a82a2f0624	tasks: unregister tasks with parents when they are finished Unregister children that are finished from task manager. They can be examined through they parents.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	e6c50ad2d0	tasks: fold finished tasks info their parents Currently, when a child task is unregistered, it is still kept by its parent. This leads to excessive memory usage, especially when the tasks are configured to be kept in task manager after they are finished (task_ttl_in_seconds). Introduce task_essentials struct which keeps only data necesarry for task manager API. When a task which has a parent is finished, a foreign pointer to it in its parent is replaced with respective task_essentials. Once a parent task is finished it is also folded into its parent (if it has one). Children details of a folded task are lost, unless they (or some of their subtrees) failed. That is, when a task is finished, we keep: - a root task (until it is unregistered); - task_essentials of root's direct children; - a path (of task_essentials) from root to each failed task (so that the reason of a failure could be examined).	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	319e799089	tasks: make task_manager::task::impl::finish_failed noexcept	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	6add9edf8a	tasks: change _children type Keep task children in a map. It's a preparation for further changes.	2024-05-31 10:27:09 +02:00
Pavel Emelyanov	273dca6f27	query_processor: Coroutinize stop() This effectively removes "finally" block so if authorized_prepared_cache.stop() resolves with exception, the prepared_cache.stop() is skipped. But that's not a problem -- even if .stop() throws the shole scylla stop aborts so we don't really care if it was clean or not. Also, authorized_prepared_cache.stop() closes the gate and cancels the timer. None of those can resolve with exception. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19001	2024-05-31 10:22:08 +03:00
Benny Halevy	427acb393e	data_dictionary: keyspace_metadata: format: print also initial_tablets Currently, there is no indication of tablets in the logged KSMetaData. Print the tablets configuration of either the`initial` number of tablets, if enabled, or {'enabled':false} otherwise. For example: ``` migration_manager - Create new Keyspace: KSMetaData{name=tablets_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"initial":0}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004d446a8} migration_manager - Create new Keyspace: KSMetaData{name=vnodes_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"enabled":false}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004c33ea8} Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18998	2024-05-31 10:09:58 +03:00
Nadav Har'El	c786621b4c	test/cql-pytest: reproduce bug of secondary index used before built This patch adds a test reproducing for the known issue #7963, where after adding a secondary-index to a table, queries might immediately start to use this index - even before it is built - and produce wrong results. The issue is still open and unfixed, so the new test is marked "xfail". Interestingly, even though Cassandra claims to have found and fixed a similar bug in 2015 (CASSANDRA-8505), this test also fails on Cassandra - trying a query right after CREATE INDEX and before it was fully built may cause the query to fail. Refs #7963 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18993	2024-05-31 10:05:00 +03:00
Raphael S. Carvalho	b396b05e20	replica: Fix race of tablet snapshot with compaction tablet snapshot, used by migration, can race with compaction and can find files deleted. That won't cause data loss because the error is propagated back into the coordinator that decides to retry streaming stage. So the consequence is delayed migration, which might in turn reduce node operation throughput (e.g. when decommissioning a node). It should be rare though, so shouldn't have drastic consequences. Fixes #18977. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18979	2024-05-31 09:58:49 +03:00
Lakshmi Narayanan Sreethar	3d7d1fa72a	db/config.cc: increment components_memory_reclaim_threshold config default Incremented the components_memory_reclaim_threshold config's default value to 0.2 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters. Fixes #18607 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18964	2024-05-30 18:03:51 +03:00
Botond Dénes	0ead3570b4	Merge 'Run sstables loader in scheduling group' from Pavel Emelyanov Currently the loader is called via API, which inherits the maintenance scheduling group from API http server. The loader then can either do load_and_stream() or call (legacy) distributed_loader::upload_new_sstables(). The latter first switches into streaming scheduling group, but the former doesn't and continues running in the maintenance one. All this is not really a problem, because streaming sched group and maintenance sched group is one group under two different variable names. However, it's messy and worth delegating the sched group switch (even if it's a no-op) to the sstables-loader. As a nice side effect, this patch removes one place that uses database as proxy object to get configuration parameters. Closes scylladb/scylladb#18928 * github.com:scylladb/scylladb: sstables-loader: Run loading in its scheduling group sstables-loader: Add scheduling group to constructor	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	83d491af02	config: Remove experimental TABLETS feature ... and replace it with boolean enable_tablets option. All the places in the code are patched to check the latter option instead of the former feature. The option is OFF by default, but the default scylla.yaml file sets this to true, so that newly installed clusters turn tablets ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18898	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	dc588d1eef	replication_strategy: Remove unused factory_key::to_sstring() declaration Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18908	2024-05-30 18:03:51 +03:00
Anna Stuchlik	8f5c15b78f	doc: add support for Ubuntu 24.04 Closes scylladb/scylladb#18954	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	91f74989ba	snitch: Remove production_snitch_base::_prop_file_contents This fiend was used to carry string with property file contents into the parse_property_file(), but it can go with an argument just as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:55:14 +03:00
Pavel Emelyanov	1cdeabdc50	snitch: Remove production_snitch_base::_prop_file_size This field was used to carry property file size across then-lambdas, now the code is coroutinized and can live with on-stack variable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:54:30 +03:00
Pavel Emelyanov	b62aa276d1	snitch: Coroutinize load_property_file() Cleaner and easier to read this way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:54:15 +03:00
Kefu Chai	fb87ab1c75	compress, auth: include used headers before this change, we rely on `seastar/util/std-compat.hh` to include the used headers provided by stdandard library. this was necessary before we moved to a C++20 compliant standard library implementation. but since Seastar has dropped C++17 support. its `seastar/util/std-compat.hh` is not responsible for providing these headers anymore. so, in this change, we include the used header directly instead of relying on `seastar/util/std-compat.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18986	2024-05-30 09:16:23 +03:00
Kefu Chai	810da830ef	build: add sanitizer compiling options directly before this change, in order to avoid repeating/hardwiring the compiling options set by Seastar, we just inherit the compiling options of Seastar for building Abseil, as the former exposes the options to enable sanitizers. this works fine, despite that, strictly speaking, not all options are necessary for building abseil, as abseil is not a Seastar application -- it is just a C++ library. but when we introduce dependencies which are only generated at build time, and these dependencies are passed to the compiler at build time, this breaks the build of Abseil. because these dependencies are exposed by the Seastar's .pc file, and consumed by Abseil. when building Abseil, apparently, the building process driven by ninja is not started yet, so we are not able to build Abseil with these settings due to missing dependencies. so instead of inheriting the compiling options from Seastar, just set the sanitizer related compiling options directly, to avoid referencing these missing dependencies. the upside is that we pass a much smaller set of compiling options to compiler when building Abseil, the downside is that we hardwire these options related to sanitizer manually, they are also detected by Seastar's building system. but fortunately, these options are relatively stable across the building environements we support. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18987	2024-05-30 09:14:03 +03:00
Aleksandra Martyniuk	8a72324ff1	docs: add docs to task manager Closes scylladb/scylladb#18967	2024-05-30 09:05:02 +03:00
Raphael S. Carvalho	a56664b8e9	readers: combined: Avoid reallocation in prepare_forwardable_readers() reserve() is missing conditional addition of single and galloping readers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18980	2024-05-30 08:57:27 +03:00
Dawid Medrek	e855794327	db/hints: Introduce an error injection to test draining We want to verify that a hint directory is drained when any of the nodes correspodning to it leaves the cluster. The test scenario should happen before the whole cluster has been migrated to the host-ID-based hinted handoff, so when we still rely on the mappings between hint endpoint managers and the hint directories managed by them. To make such a test possible, in these changes we introduce an error injection rejecting incoming hints. We want to test a scenario when: 1. hints are saved towards a given node -- node N1, 2. N1 changes its IP to a different one, 3. some other node -- node N2 -- changes its IP to the original IP of N1, 4. hints are saved towards N2 and they are stored in the same directory as the hints saved towards N1 before, 5. we start draining N2. Because at some point N2 needs to be stopped, it may happen that some mutations towards a distributed system table generate a hint to N2 BEFORE it has finished changing its IP, effectively creating another hint directory where ALL of the hints towards the node will be stored from there on. That would disturb the test scenario. Hence, this error injection is necessary to ensure that all of the steps in the test proceed as expected.	2024-05-29 19:32:41 +02:00
Dawid Medrek	745a9c6ab8	db/hints: Ensure that draining happens Before hinted handoff is migrated to using host IDs to identify nodes in the cluster, we keep track of mappings between hint endpoint managers identified by host IDs and the hint directories managed by them and represented by IP addresses. As a consequence, it may happen that one hint directory corresponds to multiple nodes -- it's intended. See `64ba620` for more details. Before these changes, we only started the draining process of a hint directory if the node leaving the cluster corresponded to that hint directory AND was identified by the same host ID as the hint endpoint manager managing that directory. As a result, the draining did not always happen when it was supposed to. Draining should start no matter which of the nodes corresponding to a hint directory is leaving the cluster. This commit ensures that it happens.	2024-05-29 19:32:38 +02:00
Wojciech Mitros	0de3a5f3ff	test mv: remove injection delaying shutdown of a node In the test_mv_topology_change case, we use an injection to delay the view updates application, so that the ERMs have a chance to change in the process. This injection was also enabled on a new node in the test, which was later decommissioned. During the shutdown, writes were still being performed, causing view update generation and delays due to the injection which in turn delayed the node shutdown, causing the test to timeout. This patch removes the injection for the node being shut down. At the same time, the force_gossip_topology_changes=True option is also removed from its config, but for that option it's enough to enable on the first node in the cluster and all nodes use it. Fixes: https://github.com/scylladb/scylladb/issues/18941 Closes scylladb/scylladb#18958	2024-05-29 15:29:55 +02:00
Kefu Chai	a415bb07ab	sl_controller: fix a typo in comment s/necessairy/necessary/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18950	2024-05-29 16:23:31 +03:00
Nadav Har'El	4b04ed1360	test/alternator: be more forgiving on authorizer configuration The Alternator test suite usually runs on a specific configuration of Scylla set up by test.py or test/alternator/run. However, we do consider it an important design goal of this test suite that developers should be able to run these tests against any DynamoDB-API implementation, including any version Scylla manually run by the developer in any way he or she pleases. The recent commit `dc80b5dafe` changed the way we retrieve the configured autentication key, which is needed if Scylla is run with --alternator-enforce-authorization. However, the new code assumed that Scylla was also run with --authenticator PasswordAuthenticator --authorizer CassandraAuthorizer so that the default role of "cassandra" has a valid, non-null, password (namely, "cassandra"). If the developer ran Scylla manually without these options, the test initialization code broke, and all tests in the suite failed. This patch fixes this breakage. You can now run the Alternator test suite against Scylla run manually without any of the aforementioned options, and everything will work except some tests in test_authorization.py will fail as expected. This patch has no affect on the usual test.py or test/alternator/run runs, as they already run Scylla with all the aforementioned options and weren't exposed to the problem fixed here. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18957	2024-05-29 16:22:45 +03:00
Raphael S. Carvalho	578a6c1e07	replica: Only consume memtable of the tablet intersecting with range read storage_proxy is responsible for intersecting the range of the read with tablets, and calling replica with a single tablet range, therefore it makes sense to avoid touching memtables of tablets that don't intersect with a particular range. Note this is a performance issue, not correctness one, as memtable readers that don't intersect with current range won't produce any data, but cpu is wasted until that's realized (they're added to list of readers in mutation_reader_merger, more allocations, more data sources to peek into, etc). That's also important for streaming e.g. after decommission, that will consume one tablet at a time through a reader, so we don't want memtables of streamed tablets (that weren't cleaned up yet) to be consumed. Refs #18904. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18907	2024-05-29 15:58:33 +03:00
Tomasz Grabiec	0d596a425c	tablets: Filter-out left nodes in get_natural_endpoints() The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843	2024-05-29 14:49:49 +02:00
Anna Stuchlik	888d7601a2	doc: add the tablets information to the nodetool describering command This commit adds an explanation of how the `nodetool describering` command works if tablets are enabled. Closes scylladb/scylladb#18940	2024-05-29 15:31:46 +03:00
Pavel Emelyanov	e74a4b038f	Merge 'tablets: alter keyspace' from Piotr Smaron This change supports changing replication factor in tablets-enabled keyspaces. This covers both increasing and decreasing the number of tablets replicas through first building topology mutations (`alter_keyspace_statement.cc`) and then tablets/topology/schema mutations (`topology_coordinator.cc`). For the limitations of the current solution, please see the docs changes attached to this PR. Fixes: #16129 Closes scylladb/scylladb#16723 * github.com:scylladb/scylladb: test: Do not check tablets mutations on nodes that don't have them test: Fix the way tablets RF-change test parses mutation_fragments test/tablets: Unmark RF-changing test with xfail docs: document ALTER KEYSPACE with tablets Return response only when tablets are reallocated cql-pytest: Verify RF is changes by at most 1 when tablets on cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 Reject ALTER with 'replication_factor' tag Implement ALTER tablets KEYSPACE statement support Parameterize migration_manager::announce by type to allow executing different raft commands Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks Extend system.topology with 3 new columns to store data required to process alter ks global topo req Allow query_processor to check if global topo queue is empty Introduce new global topo `keyspace_rf_change` req New raft cmd for both schema & topo changes Add storage service to query processor tablets: tests for adding/removing replicas tablet_allocator: make load_balancer_stats_manager configurable by name	2024-05-29 14:17:51 +03:00
Gleb Natapov	f91db0c1e4	raft topology: fix indentation after previous commit	2024-05-29 12:11:28 +03:00
Gleb Natapov	6853b02c00	raft topology: do not add bootstrapping node without IP as pending If there is no mapping from host id to ip while a node is in bootstrap state there is no point adding it to pending endpoint since write handler will not be able to map it back to host id anyway. If the transition sate requires double writes though we still want to fail. In case the state is write_both_read_old we fail the barrier that will cause topology operation to rollback and in case of write_both_read_new we assert but this should not happen since the mapping is persisted by this point (or we failed in write_both_read_old state). Fixes: scylladb/scylladb#18676	2024-05-29 12:11:18 +03:00
Gleb Natapov	27445f5291	test: add test of bootstrap where the coordinator crashes just before storing IP mapping On the next boot there is no host ID to IP mapping which causes node to crash again with "No mapping for :: in the passed effective replication map" assertion.	2024-05-29 11:46:23 +03:00
Marcin Maliszkiewicz	1b1bc6f9bb	docs: document if not exists option for create index Closes scylladb/scylladb#18956	2024-05-29 11:35:01 +03:00
Gleb Natapov	1faef47952	schema_tables: remove unused code	2024-05-29 11:30:24 +03:00
Tomasz Grabiec	3e1ba4c859	test: pylib: Extract start_writes() load generator utility	2024-05-29 10:02:56 +02:00
Piotr Smaron	8a77a74d0e	cql: fix a crash lurking in `ks_prop_defs::get_initial_tablets` `tablets_options->erase(it);` invalidates `it`, but it's still referred to later in the code in the last `else`, and when that code is invoked, we get a `heap-use-after-free` crash. Fixes: #18926 Closes scylladb/scylladb#18936	2024-05-28 23:46:43 +03:00
Botond Dénes	aae3cfaff4	readers: compacting_reader: remove unused _ignore_partition_end This member is read-only since `ac44efea11` so remove it. Closes scylladb/scylladb#18726	2024-05-28 20:53:00 +03:00
Kefu Chai	719d53a565	service/storage_proxy: coroutinize handle_paxos_accept() for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18765	2024-05-28 20:51:10 +03:00
Nadav Har'El	00d10aa84a	alternator: clean up target string splitting This patch cleans up a bit the code in Alternator which splits up the operation's X-Amz-Target header (the second part of it is the name of the operation, e.g., CreateTable). The patch doesn't change any functionality or change performance in any meaningful way. I was just reviewing this code and was annoyed by the unnecessary variable and unnecessary creation of strings and vectors for such a simple operation - and wanted to clean it up. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18830	2024-05-28 20:42:47 +03:00
Botond Dénes	d37eca0593	test/boost/mutation_reader_test: compacting_reader_next_partition: fix partition order The test creates two partitions and passes them through the reader, but the partitions are out-of-order. This is benign but best to fix it anyway. Found after bumping validation level inside the compactor. Closes scylladb/scylladb#18848	2024-05-28 20:41:54 +03:00
Aleksandra Martyniuk	b7ae7e0b0e	test: fix test_tombstone_gc.py Tests in test_tombstone_gc.py are parametrized with string instead of bool values. Fix that. Use the value to create a keyspace with or without tablets. Fixes: #18888. Closes scylladb/scylladb#18893	2024-05-28 20:40:15 +03:00
Kefu Chai	f58f6dfe20	data_dictionary: include <variant> otherwise when compiling with the new seastar, which removed `#include <variant>` from `std-compat.hh`, the {mode}-headers target would fail to build, like: ``` ./data_dictionary/storage_options.hh:34:29: error: no template named 'variant' in namespace 'std' 10:45:15 using value_type = std::variant<local, s3>; 10:45:15 ~~~~~^ 10:45:15 ./data_dictionary/storage_options.hh:35:5: error: unknown type name 'value_type'; did you mean 'std::_Bit_const_iterator::value_type'? 10:45:15 value_type value = local{}; 10:45:15 ^~~~~~~~~~ 10:45:15 std::_Bit_const_iterator::value_type ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18921	2024-05-28 20:38:55 +03:00
Anna Stuchlik	cfa3cd4c94	doc: add the tablet limitation to the manual recovery procedure This commit adds the information that the manual recovery procedure is not supported if tablets are enabled. In addition, the content in the Manual Recovery Procedure is reorganized by adding the Prerequisites and Procedure subsections - in this way, we can limit the number of Note and Warning boxes that made the page hard to follow. Fixes https://github.com/scylladb/scylladb/issues/18895 Closes scylladb/scylladb#18935	2024-05-28 18:19:22 +02:00
Nadav Har'El	1fe8f22d89	alternator, scheduler: test reproducing RPC scheduling group bug This patch adds a test for issue #18719: Although the Alternator TTL work is supposedly done in the "streaming" scheduling group, it turned out we had a bug where work sent on behalf of that code to other nodes failed to inherit the correct scheduling group, and was done in the normal ("statement") group. Because this problem only happens when more than one node is involved, the test is in the multi-node test framework test/topology_experimental_raft. The test uses the Alternator API. We already had in that framework a test using the Alternator API (a test for alternator+tablets), so in this patch we move the common Alternator utility functions to a common file, test_alternator.py, where I also put the new test. The test is based on metrics: We write expiring data, wait for it to expire, and then check the metrics on how much CPU work was done in the wrong scheduling group ("statement"). Before #18719 was fixed, a lot of work was done there (more than half of the work done in the right group). After the issue was fixed in the previous patch, the work on the wrong scheduling group went down to zero. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-28 10:58:08 -04:00
Anna Stuchlik	2bfdb1b583	doc: document RF limitation This commit adds the information that the Replication Factor must be the same or higher than the number of nodes. Closes scylladb/scylladb#18760	2024-05-28 17:14:40 +03:00
Botond Dénes	5d3f7c13f9	main: add maintenance tenant to messaging_service's scheduling config Currently only the user tenant (statement scheduling group) and system (default scheduling group) tenants exist, as we used to have only user-initiated operations and sytem (internal) ones. Now there is need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).	2024-05-28 10:08:46 -04:00
Wojciech Mitros	519317dc58	mv: handle different ERMs for base and view table When calculating the base-view mapping while the topology is changing, we may encounter a situation where the base table noticed the change in its effective replication map while the view table hasn't, or vice-versa. This can happen because the ERM update may be performed during the preemption between taking the base ERM and view ERM, or, due to `f2ff701`, the update may have just been performed partially when we are taking the ERMs. Until now, we assumed that the ERMs are synchronized while calling finding the base-view endpoint mapping, so in particular, we were using the topology from the base's ERM to check the datacenters of all endpoints. Now that the ERMs are more likely to not be the same, we may try to get the datacenter of a view endpoint that doesn't exist in the base's topology, causing us to crash. This is fixed in this patch by using the view table's topology for endpoints coming from the view ERM. The mapping resulting from the call might now be a temporary mapping between endpoints in different topologies, but it still maps base and view replicas 1-to-1. Fixes: #17786 Fixes: #18709 Closes scylladb/scylladb#18816	2024-05-28 16:01:39 +02:00
Botond Dénes	aae263ef0a	Merge 'Harden the repair_service shutdown path' from Benny Halevy This series ignores errors in `load_history()` to prevent `abort_requested_exception` coming from `get_repair_module().check_in_shutdown()` from escaping during `repair_service::stop()`, causing ``` repair_service::~repair_service(): Assertion `_stopped' failed. ``` Fixes https://github.com/scylladb/scylladb/issues/18889 Backport to 6.0 required due to `523895145d` Closes scylladb/scylladb#18890 * github.com:scylladb/scylladb: repair: load_history: warn and ignore all errors repair_service: debug stop	2024-05-28 15:30:39 +03:00
Pavel Emelyanov	66f6001c77	test: Do not check tablets mutations on nodes that don't have them The check is performed by selecting from mutation_fragments(table), but it's known that this query crashes Scylla when there's no tablet replica on that node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Pavel Emelyanov	6e0e2674f0	test: Fix the way tablets RF-change test parses mutation_fragments When the test changes RF from 2 to 3, the extra node executes "rebuild" transition which means that it streams tablets replicas from two other peers. When doing it, the node receives two sets of sstables with mutations from the given tablet. The test part that checks if the extra node received the mutations notices two mutation fragments on the new replica and errorneously fails by seeing, that RF=3 is not equal to the number of mutations found, which is 4. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Pavel Emelyanov	2567e300d1	test/tablets: Unmark RF-changing test with xfail Now the scailing works and test must check it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Piotr Smaron	1b913dd880	docs: document ALTER KEYSPACE with tablets	2024-05-28 13:56:46 +02:00
Piotr Smaron	39181c4bf2	Return response only when tablets are reallocated Up until now we waited until mutations are in place and then returned directly to the caller of the ALTER statement, but that doesn't imply that tablets were deleted/created, so we must wait until the whole processing is done and return only then.	2024-05-28 13:56:46 +02:00
Dawid Medrek	ec5708bdee	cql-pytest: Verify RF is changes by at most 1 when tablets on This commit adds a test verifying that we can only change the RF of a keyspace for any DC by at most 1 when using tablets. Fixes #18029	2024-05-28 13:56:46 +02:00
Dawid Medrek	951915ed84	cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 We want to ensure that when the replication factor of a keyspace changes, it changes by at most 1 per DC if it uses tablets. The rationale for that is to make sure that the old and new quorums overlap by at least one node. After these changes, attempts to change the RF of a keyspace in any DC by more than 1 will fail.	2024-05-28 13:56:46 +02:00
Piotr Smaron	b875151405	Reject ALTER with 'replication_factor' tag This patch removes the support for the "wildcard" replication_factor option for ALTER KEYSPACE when the keyspace supports tablets. It will still be supported for CREATE KEYSPACE so that a user doesn't have to know all datacenter names when creating the keyspace, but ALTER KEYSPACE will require that and the user will have to specify the exact change in replication factors they wish to make by explicitly specifying the datacenter names. Expanding the replication_factor option in the ALTER case is unintuitive and it's a trap many users fell into. See #8881, #15391, #16115	2024-05-28 13:56:46 +02:00
Piotr Smaron	fbd75c5c06	Implement ALTER tablets KEYSPACE statement support This commit adds support for executing ALTER KS for keyspaces with tablets and utilizes all the previous commits. The ALTER KS is handled in alter_keyspace_statement, where a global topology request in generated with data attached to system.topology table. Then, once topology state machine is ready, it starts to handle this global topology event, which results in producing mutations required to change the schema of the keyspace, delete the system.topology's global req, produce tablets mutations and additional mutations for a table tracking the lifetime of the whole req. Tracking the lifetime is necessary to not return the control to the user too early, so the query processor only returns the response while the mutations are sent.	2024-05-28 13:56:42 +02:00
Piotr Smaron	7081215552	Parameterize migration_manager::announce by type to allow executing different raft commands Since ALTER KS requires creating topology_change raft command, some functions need to be extended to handle it. RAFT commands are recognized by types, so some functions are just going to be parameterized by type, i.e. made into templates. These templates are instantiated already, so that only 1 instances of each template exists across the whole code base, to avoid compiling it in each translation unit.	2024-05-28 13:55:11 +02:00
Piotr Smaron	80ed442be2	Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks	2024-05-28 13:55:11 +02:00
Piotr Smaron	59d3fd615f	Extend system.topology with 3 new columns to store data required to process alter ks global topo req Because ALTER KS will result in creating a global topo req, we'll have to pass the req data to topology coordinator's state machine, and the easiest way to do it is through sytem.topology table, which is going to be extended with 3 extra columns carrying all the data required to execute ALTER KS from within topology coordinator.	2024-05-28 13:55:11 +02:00
Piotr Smaron	6fd0a49b63	Allow query_processor to check if global topo queue is empty With current implementation only 1 global topo req can be executed at a time, so when ALTER KS is executed, we'll have to check if any other global topo req is ongoing and fail the req if that's the case.	2024-05-28 13:55:11 +02:00
Piotr Smaron	c174eee386	Introduce new global topo `keyspace_rf_change` req It will be used when processing ALTER KS statement, but also to create a separate processing path for a KS with tablets (as opposed to a vnode KS).	2024-05-28 13:54:48 +02:00
Kamil Braun	247eb9020b	Merge 'cdc, raft topology: fix and test cdc in the recovery mode' from Patryk Jędrzejczak This PR ensures that CDC keeps working correctly in the recovery mode after leaving the raft-based topology. We update `system.cdc_local` in `topology_state_load` to ensure a node restarting in the recovery mode sees the last CDC generation created by the topology coordinator. Additionally, we extend the topology recovery test to verify that the CDC keeps working correctly during the whole recovery process. In particular, we test that after restarting nodes in the recovery mode, they correctly use the active CDC generation created by the topology coordinator. Fixes scylladb/scylladb#17409 Fixes scylladb/scylladb#17819 Closes scylladb/scylladb#18820 * github.com:scylladb/scylladb: test: test_topology_recovery_basic: test CDC during recovery test: util: start_writes_to_cdc_table: add FIXME to increase CL test: util: start_writes_to_cdc_table: allow restarting with new cql storage_service: update system.cdc_local in topology_state_load	2024-05-28 11:53:28 +02:00
Patryk Jędrzejczak	c44d8eca15	test: test_topology_ops: run correctly without tablets This patch fixes two bugs in `test_topology_ops`: 1. The values of `tablets_enabled` were nonempty strings, so they always evaluated to `True` in the if statement responsible for enabling writing workers only if tablets are disabled. Hence, the writing workers were always disabled. 2. The `topology_experimental_raft suite` uses tablets by default, so we need a config with empty `experimental_features` to disable them. Ensuring this test works with and without tablets is considered a part of 6.0, so we should backport this patch. Closes scylladb/scylladb#18900	2024-05-28 10:08:41 +02:00
Pavel Emelyanov	ae622d711e	sstables-loader: Run loading in its scheduling group Now the loading code has two different paths, and only one of them switches sched group. It's cleaner and more natural to switch the sched group in the loader itself, so that all code paths run in it and don't care switching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:58 +03:00
Pavel Emelyanov	7fefd57b74	sstables-loader: Add scheduling group to constructor So that it knows in which group to run its code in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:22 +03:00
Nadav Har'El	b7fa5261c8	Merge 'Fix parsing of initial tablets by ALTER' from Pavel Emelyanov If the user wants to change the default initial tablets value, it uses ALTER KEYSPACE statement. However, specifying `WITH tablets = { initial: $value }` will take no effect, because statement analyzer only applies `tablets` parameters together with the `replication` ones, so the working statement should be `WITH replication = $old_parameters AND tablets = ...` which is not very convenient. This PR changes the analyzer so that altering `tablets` happens independently from `replication`. Test included. fixes: #18801 Closes scylladb/scylladb#18899 * github.com:scylladb/scylladb: cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS cql3: Fix parsing of ALTER KEYSPACE's tablets parameters cql3: Remove unused ks_prop_defs/prepare_options() argument	2024-05-27 23:10:39 +03:00
Kefu Chai	e42d83dc46	treewide: include used headers before this change, we rely on `seastar/util/std-compat.hh` to include the used headers provided by stdandard library. this was necessary before we moved to a C++20 compliant standard library implementation. but since Seastar has dropped C++17 support. its `seastar/util/std-compat.hh` is not responsible for providing these headers anymore. so, in this change, we include the used headers directly instead of relying on `seastar/util/std-compat.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18883	2024-05-27 17:34:38 +03:00
Anna Stuchlik	806dd5a68a	doc: describe Tablets in ScyllaDB This commit adds the main description of tablets and their benefits. The article can be used as a reference in other places across the docs where we mention tablets. Closes scylladb/scylladb#18619	2024-05-27 15:41:37 +02:00
Botond Dénes	2d79b0106c	Merge 'storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability. If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count. Fixes #18085. Closes scylladb/scylladb#18287 * github.com:scylladb/scylladb: test: Fix flakiness in topology_experimental_raft/test_tablets service: Use tablet read selector to determine which replica to account table stats storage_service: Fix race between tablet split and stats retrieval	2024-05-27 16:32:54 +03:00
Pavel Emelyanov	1003391ed6	cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS There's a test that checks how ALTER changes the initial tablets value, but it equips the statement with `replication` parameters because of limitations that parser used to impose. Now the `tablets` parameters can come on their own, so add a new test. The old one is kept from compatibility considerations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:27:45 +03:00
Pavel Emelyanov	a172ef1bdf	cql3: Fix parsing of ALTER KEYSPACE's tablets parameters When the `WITH` doesn't include the `replication` parameters, the `tablets` one is ignoded, even if it's present in the statement. That's not great, those two parameter sets are pretty much independent and should be parsed individually. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:25:38 +03:00
Pavel Emelyanov	8a612da155	cql3: Remove unused ks_prop_defs/prepare_options() argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:25:22 +03:00
Benny Halevy	c32c418cd5	repair: load_history: warn and ignore all errors Currently, the call to `get_repair_module().check_in_shutdown()` may throw `abort_requested_exception` that causes `repair_service::stop()` to fail, and trigger assertion failure in `~repair_service`. We alredy ignore failure from `update_repair_time`, so expand the logic to cover the whole function body. Fixes scylladb/scylladb#18889 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-27 15:57:54 +03:00
Patryk Jędrzejczak	7c1e6ba8b3	test: test_topology_ops: stop a write worker after the first error `test_topology_ops` is flaky, which has been uncovered by gating in scylladb/scylladb#18707. However, debugging it is harder than it should be because write workers can flood the logs. They may send a lot of failed writes before the test fails. Then, the log file can become huge, even up to 20 GB. Fix this issue by stopping a write worker after the first error. This test is important for 6.0, so we can backport this change. Closes scylladb/scylladb#18851	2024-05-27 13:49:30 +02:00
Piotr Dulikowski	fa142a9ce7	Merge 'qos/raft_service_level_distributed_data_accessor: print correct error message when trying to modify a service level in recovery mode' from Michał Jadwiszczak Raft service levels are read-only in recovery mode. This patch adds check and proper error message when a user tries to modify service levels in recovery mode. Fixes https://github.com/scylladb/scylladb/issues/18827 Closes scylladb/scylladb#18841 * github.com:scylladb/scylladb: test/auth_cluster/test_raft_service_levels: try to create sl in recovery service/qos/raft_sl_dda: reject changes to service levels in recovery mode service/qos/raft_sl_dda: extract raft_sl_dda steps to common function	2024-05-27 13:26:06 +02:00
Kefu Chai	cbc83f92d3	.github: add iwyu workflow iwyu is short for "include what you use". this workflow is added to identify missing "#include" and extraneous "#include" in C++ source files. This workflow is triggered when a pull request is created targetting the "master" branch. It uses the clang-include-cleaner tool provided by clang-tools package to analyze all the ".cc" and ".hh" source files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18122	2024-05-27 14:19:11 +03:00
Kefu Chai	e70b116333	api/api-doc/utils: fix a typo in description s/mintues/minutes/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18869	2024-05-27 14:15:23 +03:00
Kefu Chai	2d7545ade6	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18884	2024-05-27 14:13:51 +03:00
Piotr Smaron	06008970fb	New raft cmd for both schema & topo changes Allows executing combined topology & schema mutations under a single RAFT command	2024-05-27 12:48:44 +02:00
Piotr Smaron	cb40f13831	Add storage service to query processor Query processor needs to access storage service to check if global topology request is still ongoing and to be able to wait until it completes.	2024-05-27 12:48:44 +02:00
Paweł Zakrzewski	c888945354	tablets: tests for adding/removing replicas Note we're suppressing a UBSanitizer overflow error in UTs. That's because our linter complains about a possible overflow, which never happens, but tests are still failing because of it.	2024-05-27 12:48:44 +02:00
Paweł Zakrzewski	65deddd967	tablet_allocator: make load_balancer_stats_manager configurable by name This is needed, because the same name cannot be used for 2 separate entities, because we're getting double-metrics-registration error, thus the names have to be configurable, not hardcoded.	2024-05-27 12:48:44 +02:00
Benny Halevy	38845754c4	repair_service: debug stop Seen the following unexplained assertion failure with pytest -s -v --scylla-version=local_tarball --tablets repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc ``` INFO 2024-05-27 11:18:05,081 [shard 0:main] init - Shutting down repair service INFO 2024-05-27 11:18:05,081 [shard 0:main] task_manager - Stopping module repair INFO 2024-05-27 11:18:05,081 [shard 0:main] task_manager - Unregistered module repair INFO 2024-05-27 11:18:05,081 [shard 1:main] task_manager - Stopping module repair INFO 2024-05-27 11:18:05,081 [shard 1:main] task_manager - Unregistered module repair scylla: repair/row_level.cc:3230: repair_service::~repair_service(): Assertion `_stopped' failed. Aborting on shard 0. Backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3f040c /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x41c7a1 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dbaf /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x8e883 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dafd /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2687e /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2679a /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x36186 0x26f2428 0x10fb373 0x10fc8b8 0x10fc809 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456c6d /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456bcf 0x10fc65b 0x10fc5bc 0x10808d0 0x1080800 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff22f /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x4003b7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff888 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36dea8 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d0e2 0x101cefa 0x105a390 0x101bde7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x101a764 ``` Decoded: ``` ~repair_service at ./repair/row_level.cc:3230 ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491 ~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:569 (inlined by) seastar::shared_ptr<repair_service>::operator=(seastar::shared_ptr<repair_service>&&) at ././seastar/include/seastar/core/shared_ptr.hh:582 (inlined by) seastar::shared_ptr<repair_service>::operator=(decltype(nullptr)) at ././seastar/include/seastar/core/shared_ptr.hh:588 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:727 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2035 (inlined by) seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::smp_submit_to_options, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:367 seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:394 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:725 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(std::__invoke_other, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:114 (inlined by) std::_Function_handler<seastar::future<void> (unsigned int), seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290 ``` FWIW, gdb crashed when opening the coredump. This commit will help catch the issue earlier when repair_service::stop() fails (and it must never fail) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-27 13:02:10 +03:00
Kefu Chai	61b5bfae6d	docs: fix typos in dev documents these typos were identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18871	2024-05-27 12:28:34 +03:00
Botond Dénes	c137f84535	Merge 'Mark prepare_statement as immutable' from Pavel Emelyanov Users of prepared statement reference it with the help of "smart" pointers. None of the users are supposed to modify the object they point to, so mark the respective pointer type as `pointer<const prepared_statement>`. Also mark the fields of prepared statement itself with const's (some of them already are) Closes scylladb/scylladb#18872 * github.com:scylladb/scylladb: cql3: Mark prepared_statement's fields const cql3: Define prepared_statement weak pointer as const	2024-05-27 12:27:54 +03:00
Kefu Chai	f1f3f009e7	docs: fix typos in upgrade document s/Montioring/Monitoring/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18870	2024-05-27 12:26:59 +03:00
Patryk Jędrzejczak	2111cb01df	test: test_topology_recovery_basic: test CDC during recovery In topology on raft, management of CDC generations is moved to the topology coordinator. We extend the topology recovery test to verify that the CDC keeps working correctly during the whole recovery process. In particular, we test that after restarting nodes in the recovery mode, they correctly use the active CDC generation created by the topology coordinator. A node restarting in the recovery mode should learn about the active generation from `system.cdc_local` (or from gossip, but we don't want to rely on it). Then, it should load its data from `system.cdc_generations_v3`. Fixes scylladb/scylladb#17409	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	388db33dec	test: util: start_writes_to_cdc_table: add FIXME to increase CL	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	68b6e8e13e	test: util: start_writes_to_cdc_table: allow restarting with new cql This patch allows us to restart writing (to the same table with CDC enabled) with a new CQL session. It is useful when we want to continue writing after closing the first CQL session, which happens during the `reconnect_driver` call. We must stop writing before calling `reconnect_driver`. If a write started just before the first CQL session was closed, it would time out on the client. We rename `finish_and_verify` - `stop_and_verify` is a better name after introducing `restart`.	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	4351eee1f6	storage_service: update system.cdc_local in topology_state_load When the node with CDC enabled and with the topology on raft disabled bootstraps, it reads system.cdc_local for the last generation. Nodes with both enabled use group0 to get the last generation. In the following scenario with a cluster of one node: 1. the node is created with CDC and the topology on raft enabled 2. the user creates table T 3. the node is restarted in the recovery mode 4. the CDC log of T is extended with new entries 5. the node restarts in normal mode The generation created in the step 3 is seen in system_distributed.cdc_generation_timestamps but not in system.cdc_generations_v3, thus there are used streams that the CDC based on raft doesn't know about. Instead of creating a new generation, the node should use the generation already committed to group0. Save the last CDC generation in the system.cdc_local during loading the topology state so that it is visible for CDC not based on raft. Fixes scylladb/scylladb#17819	2024-05-27 10:39:04 +02:00
Kefu Chai	f70e888ed5	build: cmake: pass -fprofile-list to compiler to mirror the behavior of the build.ninja generated by configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18734	2024-05-27 11:22:55 +03:00
Botond Dénes	47dbf23773	Merge 'Rework view services and system-distributed-keyspace dependencies' from Pavel Emelyanov The system-distributed-keyspace and view-update-generator often go in pair, because streaming, repair and sstables-loader (via distributed-loader) need them booth to check if sstable is staging and register it if it's such. The check is performed by messing directly with system_distributed.view_build_status table, and the registration happens via view-update-generator. That's not nice, other services shouldn't know that view status is kept in system table. Also view-update-generator is a service to generae and push view updates, the fact that it keeps staging sstables list is the implementation detail. This PR replaces dependencies on the mentioned pair of services with the single dependency on view-builder (repair, sstables-loader and stream-manager are enlightened) and hides the view building-vs-staging details inside the view_builder. Along the way, some simplification of repair_writer_impl class is done. Closes scylladb/scylladb#18706 * github.com:scylladb/scylladb: stream_manager: Remove system_distributed_keyspace and view_update_generator repair: Remove system_distributed_keyspace and view_update_generator streaming: Remove system_distributed_keyspace and view_update_generator sstables_loader: Remove system_distributed_keyspace and view_update_generator distributed_loader: Remove system_distributed_keyspace and view_update_generator view: Make register_staging_sstable() a method of view_builder view: Make check_view_build_ongoing() helper a method of view_builder streaming: Proparage view_builder& down to make_streaming_consumer() repair: Keep view_builder& on repair_writer_impl distributed_loader: Propagate view_builder& via process_upload_dir() stream_manager: Add view builder dependency repair_service: Add view builder dependency sstables_loader: Add view_bulder dependency main: Start sstables loader later repair: Remove unwanted local references from repair_meta	2024-05-27 10:51:11 +03:00
Botond Dénes	e0f4d79f3b	Merge 'Do not export statement scheduling group from database' from Pavel Emelyanov Database used to be (and still is in many ways) an object used to get configuration from. Part of the configuration is the set of pre-configured scheduling groups. That's not nice, services should use each other for some real need, not as proxies to configuration. This patch patches the places that explicitly switch to statement group _not_ to use database to get the group itself. fixes: #17643 Closes scylladb/scylladb#18799 * github.com:scylladb/scylladb: database: Don't export statement scheduling group test: Use async attrs and cql-test-env scheduling groups test: Use get_scheduling_groups() to get scheduling groups api: Don't switch sched group to start/stop protocol servers main: Don't switch sched group to start protocol servers code: Switch to sched group in request_stop_server() code: Switch to server sched group in start() protocol_server: Keep scheduling group on board code: Add scheduling group to controllers redis: Coroutinize start() method	2024-05-27 10:48:33 +03:00
Kefu Chai	46d993a283	test: revert `4c1b6f04` in `4c1b6f04`, we added a concept for fmt::is_formattable<>. but it was not ncessary. the fmt::is_formattable<> trait was enough. the reason `4c1b6f04` was actually a leftover of a bigger change which tried to add trait for the cases where fmt::is_formattable<> was not able to cover. but that was based on the wrong impression that fmt::is_formattable<> should be able to work with container types without including, for instance `fmt/ranges.h`. but in `222dbf2c`, we include `fmt/ranges.h` in tests, where the range-alike formatter is used, that enables `fmt::is_formattable<>` to tell that container types are formattable. in short, `4c1b6f04` was created based on a misunderstanding, and it was a reduced type trait, which is proved to be not necessary. so, in this change, it is dropped. but the type constraints is preserved to make the build failure more explicit, if the fallback formatter does not match with the type to be formatted by Boost.test. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18879	2024-05-27 10:14:59 +03:00
Marcin Maliszkiewicz	2ab143fb40	db: auth: move auth tables to system keyspace Separate keyspace which also behaves as system brings little benefit while creating some compatibility problems like schema digest mismatch during rollback. So we decided to move auth tables into system keyspace. Fixes https://github.com/scylladb/scylladb/issues/18098 Closes scylladb/scylladb#18769	2024-05-26 22:30:42 +03:00
Avi Kivity	56d523b071	Merge 'build, test: disable operator<< for vector and unordered_map' from Kefu Chai this series disables operator<<:s for vector and unordered_map, and drop operator<< for mutation, because we don't have to keep it to work with these operator:s anymore. this change is a follow up of https://github.com/scylladb/seastar/issues/1544 this change is a cleanup. so no need to backport Closes scylladb/scylladb#18866 * github.com:scylladb/scylladb: mutation,db: drop operator<< for mutation and seed_provider_type& build: disable operator<< for vector and unordered_map db/heat_load_balance: include used header test: define a more generic boost_test_print_type test/boost: define fmt::formatter for service_level_controller_test.cc test/boost: include test/lib/test_utils.hh	2024-05-26 19:19:20 +03:00
Kefu Chai	4e9596a5a9	treewide: replace std::result_of_t with std::invoke_result_t in theory, std::result_of_t should have been removed in C++20. and std::invoke_result_t is available since C++17. thanks to libstdc++, the tree is compiling. but we should not rely on this. so, in this change, we replace all `std::result_of_t` with `std::invoke_result_t`. actually, clang + libstdc++ is already warning us like: ``` In file included from /home/runner/work/scylladb/scylladb/multishard_mutation_query.cc:9: In file included from /home/runner/work/scylladb/scylladb/schema/schema_registry.hh:11: In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/unordered_map:38: Warning: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/type_traits:2624:5: warning: 'result_of<void (noop_compacted_fragments_consumer::*(noop_compacted_fragments_consumer &))()>' is deprecated: use 'std::invoke_result' instead [-Wdeprecated-declarations] 2624 \| using result_of_t = typename result_of<_Tp>::type; \| ^ /home/runner/work/scylladb/scylladb/mutation/mutation_compactor.hh:518:43: note: in instantiation of template type alias 'result_of_t' requested here 518 \| if constexpr (std::is_same_v<std::result_of_t<decltype(&GCConsumer::consume_end_of_stream)(GCConsumer&)>, void>) { \| ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18835	2024-05-26 16:45:42 +03:00
Pavel Emelyanov	9108952a52	test/cql-pytest: Add test for token() filter againts mutation_fragments() When selecting from mutation_fragments(table) one may want to apply token() filtering againts partition key. This doesn't work currently, but used to crash. This patch adds a regression test for that refs: #18637 refs: #18768 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18759	2024-05-26 15:31:20 +03:00
Kefu Chai	125464f2d9	migration_manager: do not reference moved-away smart pointer this change is inspired by clang-tidy. it warns like: ``` [752/852] Building CXX object service/CMakeFiles/service.dir/migration_manager.cc.o Warning: /home/runner/work/scylladb/scylladb/service/migration_manager.cc:891:71: warning: 'view' used after it was moved [bugprone-use-after-move] 891 \| db.get_notifier().before_create_column_family(keyspace, view, mutations, ts); \| ^ /home/runner/work/scylladb/scylladb/service/migration_manager.cc:886:86: note: move occurred here 886 \| auto mutations = db::schema_tables::make_create_view_mutations(keyspace, std::move(view), ts); \| ^ ``` in which, `view` is an instance of view_ptr which is a type with the semantics of shared pointer, it's backed by a member variable of `seastar::lw_shared_ptr<const schema>`, whose move-ctor actually resets the original instance. so we are actually accessing the moved-away pointer in ```c++ db.get_notifier().before_create_column_family(keyspace, view, mutations, ts) ``` so, in this change, instead of moving away from `view`, we create a copy, and pass the copy to `db::schema_tables::make_create_view_mutations()`. this should be fine, as the behavior of `db::schema_tables::make_create_view_mutations()` does not rely on if the `view` passed to it is a moved away from it or not. the change which introduced this use-after-move was `88a5ddabce` Refs `88a5ddabce` Fixes #18837 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18838	2024-05-26 12:04:00 +03:00
Kefu Chai	dbfdc71d2d	treewide: fix typos in comment and error messages these typos were identified by codespell Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18868	2024-05-26 11:54:36 +03:00
Kefu Chai	35e1fcde1f	mutation,db: drop operator<< for mutation and seed_provider_type& since we've migrated away from the generic homebrew formatters for range-alike containers, there is no need to keep there operator<< around -- they were preserved in order to work with the container formatters which expect operator<< of the elements. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	9bd9f283f4	build: disable operator<< for vector and unordered_map seastar provides an option named `Seastar_DEPRECATED_OSTREAM_FORMATTERS` to enable the operator<< for `std::vector` and `std::unordered_map`, and this option is enabled by default. but we intent to avoid using them, so that we can use the fmt::formatter specializations when Boost.test prints variables. if we keep these two operator<< enabled, Boost.test would use them when printing variables to be compaired then the check fails, but if elements in the vector or unordered_map to be compaired does do not provide operator<<, compiling would fail. so, in this change, let's disable these operator<< implementations. this allows us to ditch the operator<< implementations which are preserved only for testing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	8e0a6ea021	db/heat_load_balance: include used header in this header, we use `hr_logger.trace("returned _pp={}", p)` to print a `vector<float>`, so we we need to include `fmt/ranges.h`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	4c1b6f0476	test: define a more generic boost_test_print_type fmt::is_formattable<T>::value is false, even if * T is a container of U, and * fmt::is_formattable<U>, and * U can be formatted using fmt::formatter so, we have to define a more generic boost_test_print_type() for the all types supported by {fmt}. it will help us to ditch the operator<< for vector and unordered_map in Seastar, and allow us to use the fmt::formatter specialization of the element types. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Kefu Chai	bfe918ac9e	test/boost: define fmt::formatter for service_level_controller_test.cc since we are moving away for operator<< based formatter, more and more types now only have {fmt} based formatters. the same will apply to the STL container types after ditching the generic homebrew formatter in to_string.hh, so to be prepared for the change, let's add the fmt::formatter for tests as well. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Kefu Chai	222dbf2ce4	test/boost: include test/lib/test_utils.hh this change was created in the same spirit of 505900f18f. because we are deprecating the operator<< for vector and unorderd_map in Seastar, some tests do not compile anymore if we disable these operators. so to be prepared for the change disabling them, let's include test/lib/test_utils.hh for accessing the printer dedicated for Boost.test. and also '#include <fmt/ranges.h>' when necessary, because, in order to format the ranges using {fmt}, we need to use fmt/ranges.h. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Pavel Emelyanov	cf564d7a54	cql3: Mark prepared_statement's fields const Not only users of prepared_statement point to immutable object, but the class itself doesn't assume modifications of its fields, so mark them const too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-25 16:41:30 +03:00
Pavel Emelyanov	828862bdff	cql3: Define prepared_statement weak pointer as const The pointer points to immutable prepared_statement, so tune up the type respectively. Tracing has its own alieas for it, fix one too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-25 16:40:35 +03:00
Michał Chojnowski	de798775fd	test: test_coordinator_queue_management: wait for logs properly The modified lines of code intend to await the first appearance of a log on one of the nodes. But due to misplaced parentheses, instead of creating a list of log-awaiting tasks with a list comprehension, they pass a generator expression to asyncio.create_task(). This is nonsense, and it fails immediately with a type error. But since they don't actually check the result of the await, the test just assumes that the search completed successfully. This was uncovered by an upgrade to Python 3.12, because its typing is stronger and asyncio.create_task() screams when it's passed a regular generator. This patch fixes the bad list comprehension, and also adds an error check on the completed awaitables (by calling `await` on them). Fixes #18740 Closes scylladb/scylladb#18754	2024-05-25 10:54:44 +03:00
Pavel Emelyanov	31edab277a	database: Don't export statement scheduling group Now all the code gets this group from elsewhere and the method can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	ddc511872e	test: Use async attrs and cql-test-env scheduling groups Continuation of the prevuous patch, but with its own flavor. There's a manual test that wants to run seastar thread in statement scheduling group and gets one from database. This patch makes it get the group from cql-test-env and, while at it, makes it switch to that group using thread attributes passed to async() method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	2e3a057db1	test: Use get_scheduling_groups() to get scheduling groups There's such a helper in cql-test-env that other tests use to get sched groups from. Few other tests (ab)use databse for that, this patch fixes those remnants. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	d86a8252d4	api: Don't switch sched group to start/stop protocol servers All the protocol servers implementations now maintain scheduling group on their own, so the API handler can stop caring Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	ee0239b2ef	main: Don't switch sched group to start protocol servers Now each of them does this switch on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	7c76a35e0b	code: Switch to sched group in request_stop_server() This method is used to stop protocol server in the runtime (via the API). Since it's not just "kick it and wait to wrap up", it's needed to perform this in the inherited sched group too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	fe349a73c8	code: Switch to server sched group in start() This patch makes all protocol servers implementations use the inherited sched group in their start methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:56:02 +03:00
Pavel Emelyanov	bf5894cc69	protocol_server: Keep scheduling group on board The groups is now mandatory for the real protocol server implementation to initialize. Previous patch make all of them get the sched group as constructor argument, so that's where to take it from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:54:29 +03:00
Pavel Emelyanov	fc3c3e1099	code: Add scheduling group to controllers There are four of them currently -- transport, thrift, alternator and redis. This patch makes main pass to all the statement scheduling group as constructor argument. Next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:53:16 +03:00
Pavel Emelyanov	82511f3c25	redis: Coroutinize start() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:52:48 +03:00
Michał Jadwiszczak	af0b6bcc56	test/auth_cluster/test_raft_service_levels: try to create sl in recovery	2024-05-23 17:49:59 +02:00
Pavel Emelyanov	8906126a2c	stream_manager: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:56 +03:00
Pavel Emelyanov	84ef6a8179	repair: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:56 +03:00
Pavel Emelyanov	ae2dcdc7c2	streaming: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:55 +03:00
Pavel Emelyanov	afa94d2837	sstables_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	b728857954	distributed_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	66a8035b64	view: Make register_staging_sstable() a method of view_builder Callers of it had just checked if an sstable still has some views building, so the should talk to view-builder to register the sstable that's now considered to be staging. Effectively. this is to hide the view-update-generator from other services and make them communicate with the builder only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	92ff0d3fc3	view: Make check_view_build_ongoing() helper a method of view_builder This helper checks if there's an ongoing build of a view, and it's in fact internal to view-builder, who keeps its status in one of its system tables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	57517d5987	streaming: Proparage view_builder& down to make_streaming_consumer() Continuation of the previous patch. Repair itself doesn't need it, but streaming consumer does. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:46 +03:00
Pavel Emelyanov	5e6893075d	repair: Keep view_builder& on repair_writer_impl Preparation patch, next patches will make use of this new member Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:29 +03:00
Pavel Emelyanov	0d946a5fdf	distributed_loader: Propagate view_builder& via process_upload_dir() Preparation to next patches, they'll make use of this new argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	d917b06857	stream_manager: Add view builder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f0f1097d0c	repair_service: Add view builder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f269a37541	sstables_loader: Add view_bulder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	ff63f8b1a5	main: Start sstables loader later This service is on its own, nothing depends on it. Neither it can work before system distributed keyspace is started, so move it lower. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f4341ea088	repair: Remove unwanted local references from repair_meta When constructed, the class copies local references to services just to push them into make_repair_writer() later in the same initializers list. There's no need in keeping those references. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Marcin Maliszkiewicz	9adf74ae6c	docs: remove note about performance degradation with default superuser This doesn't apply for auth-v2 as we improved data placement and removed cassandra quirk which was setting different CL for some default superuser involved operations. Fixes #18773 Closes scylladb/scylladb#18785	2024-05-23 13:16:11 +03:00
Kefu Chai	dfeef4e4e8	build: use f-string when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18808	2024-05-23 11:19:39 +03:00
Anna Stuchlik	2da25cca1a	doc: enable publishing docs for branch-6.0 This commit enables publishing documentation from branch-6.0. The docs will be published as UNSTABLE (the warning about version 6.0 being unstable will be displayed). Closes scylladb/scylladb#18832	2024-05-23 10:37:55 +03:00
Michał Jadwiszczak	ee08d7fdad	service/qos/raft_sl_dda: reject changes to service levels in recovery mode When a cluster goes into recovery mode and service levels were migrated to raft, service levels become temporarily read-only. This commit adds a proper error message in case a user tries to do any changes.	2024-05-23 08:18:03 +02:00
Michał Jadwiszczak	2b56158d13	service/qos/raft_sl_dda: extract raft_sl_dda steps to common function When setting/dropping a service level using raft data accessor, the same validation steps are executed (this_shard_id = 0 and guard is present). To not duplicate the calls in both functions, they can be extracted to a helper function.	2024-05-23 08:16:00 +02:00
Raphael S. Carvalho	e7246751b6	test: Fix flakiness in topology_experimental_raft/test_tablets One source of flakiness is in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode due to gossiper being aborted prematurely, and causing reconnection storm. Another is test_tablet_missing_data_repair which is flaky due an issue in python driver that session might not reconnect on rolling restart (tracked by https://github.com/scylladb/python-driver/issues/230) Refs #15356. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 17:02:29 -03:00
Raphael S. Carvalho	eb8ef38543	replica: Fix tablet's compaction_groups_for_token_range() with unowned range File-based tablet streaming calls every shard to return data of every group that intersects with a given range. After dynamic group allocation, that breaks as the tablet range will only be present in a single shard, so an exception is thrown causing migration to halt during streaming phase. Ideally, only one shard is invoked, but that's out of the scope of this fix and compaction_groups_for_token_range() should return empty result if none of the local groups intersect with the range. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18798	2024-05-22 20:15:33 +03:00
Anna Stuchlik	6626d72520	doc: replace Raft-disabled with Raft-enabled procedure This commit fixes the incorrect Raft-related information on the Handling Cluster Membership Change Failures page introduced with https://github.com/scylladb/scylladb/pull/17500. The page describes the procedure for when Raft is disabled. Since 6.0, Raft for consistent schema management is enabled and mandatory (cannot be disabled), this commit adds the procedure for Raft-enabled setups. Closes scylladb/scylladb#18803	2024-05-22 17:45:20 +02:00
David Garcia	de2b30fafd	docs: docs: autogenerate metrics Autogenerates metrics documentation using the scripts/get_description.py script introduced in #17479 docs: add beta Closes scylladb/scylladb#18767	2024-05-22 15:49:41 +03:00
Raphael S. Carvalho	551bf9dd58	service: Use tablet read selector to determine which replica to account table stats Since we introduced the ability to revert migrations, we can no longer rely on ordering of transition stages to determine whether to account pending or leaving replica. Let's use read selector instead, which correctly has info which replica type has correct stats info. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 09:25:29 -03:00
Raphael S. Carvalho	abcc68dbe7	storage_service: Fix race between tablet split and stats retrieval If tablet split is finalized while retrieving stats, the saved erm, used by all shards, will be invalidated. It can either cause incorrect behavior or crash if id is not available. It's worked by feeding local tablet map into the "coordinator" collecting stats from all shards. We will also no longer have a snapshot of erm shared between shards to help intra-node migration. This is simplified by serializing token metadata changes and the retrieval of the stats (latter should complete pretty fast, so it shouldn't block the former for any significant time). Fixes #18085. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 09:25:29 -03:00
Yaron Kaikov	9cc42c98f5	[Mergify] update configuration for 6.0 Updating mergify conf to support 6.0 release Closes scylladb/scylladb#18823	2024-05-22 14:28:43 +03:00
Yaron Kaikov	219daf3489	Update ScyllaDB version to: 6.1.0-dev	2024-05-22 14:08:56 +03:00
Botond Dénes	2f87bfd634	Update tools/java submodule * tools/java 4ee15fd9...88809606 (2): > Update Scylla Java driver to 3.11.5.3. > install-dependencies.sh: s/python/python3/ [botond: regenerate toolchain image] Closes scylladb/scylladb#18790	2024-05-22 11:39:02 +03:00
Asias He	1a03e3d5ae	repair: Add missing db/config.hh Since commit `952dfc6157` "repair: Introduce repair_partition_count_estimation_ratio config option", get_config() is used. We need to include db/config.hh for that. Spotted when backporting to 5.4 branch. Refs #18615 Closes scylladb/scylladb#18780	2024-05-22 11:00:16 +03:00
Nadav Har'El	dc80b5dafe	test/alternator: do not write to auth tables As part of the Alternator test suite, we check Alternator's support for authentication. Alternator maps Scylla's existing CQL roles to AWS's authentication: * AWS's access_key_id <- the name of the CQL role * AWS's secret_access_key <- the salted hash of the password of the CQL role Before this patch, the Alternator test suite created a new role with a preset salted hash (role "alternator", salted hash "secret_pass") and than used that in the tests. However, with the advent of Raft-based metadata it is wrong to write directly to the roles table, and starting with #17952 such writes will be outright forbidden. But we don't actually need to create a new CQL role! We already have a perfectly good CQL role called "cassandra", and our tests already use it. So what this patch does is to have the Alternator tests (conftest.py) read from the roles system-table the salted hash of the "cassandra" role, and then use that - instead of the hard-coded pair alternator/secret_pass - in the tests. A couple more tests assumed that the role name that was used was "alternator", but now it was changed to "cassandra" so those tests needed minor fixes as well. After this patch, the Alternator tests no longer write to the roles system table. Moreover, after this patch, test/alternator/run and test/alternator/suite.yaml (used when testing with test.py) no longer need to do extra ugly CQL setup before starting the Alternator tests. Fixes #18744 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18771	2024-05-22 11:00:15 +03:00
Avi Kivity	c37f2c2984	version: bump version to 6.0.0-dev The next release will be called 6.0, not 5.5, so bump the version to reflect that. Closes scylladb/scylladb#18789	2024-05-22 11:00:15 +03:00
Kefu Chai	0610eda1b5	Update seastar submodule * seastar 42f15a5f...914a4241 (33): > sstring: deprecate formatters for vector and unordered_map > github: use fedora:40 image for testing > github: add 2 testing combinations back to the matrix > github: extract test.yaml into a resusable workflow > build: use initial-exec TLS when building seastar as shared library > coroutine: preserve this->container before calling dtor > smp: allocate hugepages eagerly when kernel support is available > shared_mutex: Add tests for std::shared_lock and std::unique_lock > shared_mutex: Add RAII locks > README.md: replace C++17 with C++23 > treewide: do not check for SEASTAR_COROUTINES_ENABLED > build: support enabled options when building seastar-module > treewide: include required header files > build: move add_subdirectory(src) down > README.md: replace CircleCI badge with GitHub badge > weak_ptr: Make it possible to convert to "compatible" pointers > circleci: remove circleci CI tests > build: use DPDK_MACHINE=haswell when testing dpdk build on github-hosted runner > build: add --dpdk-machine option to configure.py > build: stop translating -march option to names recognized by DPDK > github: encode matrix.enables in cache key > doc/prometheus.md: add metrics? in URL exporter URI > tests/unit/metrics_tester: use deferred_stop() when appropriate > httpd: mark http_server_control::stop() noexcept > reactor: print scheduling group along with backtrace > reactor: update lowres_clock when max_task_backlog is exceeded > tests: add test for prometheus exporter > tests: move apps/metrics_tester to tests/unit > apps/metrics_tester: keep metrics with "private" labels > apps/metrics_tester: support "labels" in conf.yaml > apps/metrics_tester: stop server properly > apps/metrics_tester: always start exporter > apps/metrics_tester: fix typo in conf-example.yaml Closes scylladb/scylladb#18800	2024-05-22 11:00:15 +03:00
Pavel Emelyanov	26eda88401	test/tablets: Check that after RF change data is replicated properly There's a test that checks system.tablets contents to see that after changing ks replication factor via ALTER KEYSPACE the tablet map is updated properly. This patch extends this test that also validates that mutations themselves are replicated according to the desired replication factor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18644	2024-05-22 11:00:15 +03:00
Anna Stuchlik	92bc8053e2	doc: remove outdated MV error from Troubleshooting This commit removes the MV error message, which only affect older versions of ScyllaDB, from the Troubleshooting section. Fixes https://github.com/scylladb/scylladb/issues/17205 Closes scylladb/scylladb#17229	2024-05-21 19:02:31 +03:00
Avi Kivity	2bf2e24fcd	Merge 'Coroutinize some auth and service levels related functions' from Marcin Maliszkiewicz Coroutinization will help improve readability and allow easier changes planned for this code. This work was separated from https://github.com/scylladb/scylladb/pull/17910 to make it smoother to review and merge. Closes scylladb/scylladb#18788 * github.com:scylladb/scylladb: cql3: coroutinize create/alter/drop service levels auth: coroutinize alter_role and drop_role auth: coroutinize grant_permissions and revoke_permissions auth: coroutinize create_role cql3: statements: co-routinize auth related statements cql3: statements: release unused guard explicitly in auth related statements	2024-05-21 17:45:19 +03:00
Botond Dénes	5e41dd28c7	Merge 'Sanitize sl controller draining' from Pavel Emelyanov The sl-controller is stopped in three steps. The first (and instantly the second) is unsubscribing from lifecycle notification and draining. The third is stop itself. First two steps are "out of order" as compared to the desired start-stop sequence of any service, this patch fixes these steps. After this PR the drain_on_shutdown() (the call that drains the node upon stop) finally becomes clean and tidy and is no longer accompanied by ad-hoc fellow drains/stops/aborts/whatever. refs: #2737 Closes scylladb/scylladb#18731 * github.com:scylladb/scylladb: sl_controller: Remove drain() method sl_controller: Move abort kicking into do_abort() main,sl_controller: Subscribe for early abort main: Unsubscribe sl controller next to subscribing	2024-05-21 17:16:23 +03:00
Anna Stuchlik	a86fb293fe	doc: update Raft information in 6.0 This commit updates the documentation about Raft in version 6.0. - "Introduction": The outdated information about consistent topology updates not being supported is removed and replaced with the correct information. - "Enabling Raft": The relevant information is moved to other sections. The irrelevant information is removed. The section no longer exists. - "Verifying that the Raft upgrade procedure finished successfully" - moved under Schema (in the same document). I additionally removed the include saying that after you verify that schema on Raft is enabled, you MUST enable topology changes on Raft (it is not mandatory; also, it should be part of the upgrade guide, not the Raft document). - Unnecessary or incorrect references to versions are removed. Refs https://github.com/scylladb/scylladb/issues/18580 Closes scylladb/scylladb#18689	2024-05-21 11:45:36 +02:00
Anna Stuchlik	eefa4a7333	doc: replace 5.4-to-5.5 with 5.4-to-6.0 upgrade guide This commit replaces the 5.4-to-5.5 upgrade guide with the 5.4-to-6.0 upgrade guide, including the metrics update information. The guide references the "Enable Consistent Topology Updates" document, as enabling consistent topology updates is a new step when upgrading to version 6.0. Also, a procedure for image upgrades has been added (as verified by @yaronkaikov). Fixes scylladb/scylladb#18254 Fixes scylladb/scylladb#17896 Refs scylladb/scylladb#18580 Closes scylladb/scylladb#18728	2024-05-21 11:31:04 +02:00
Piotr Dulikowski	9820472277	main: introduce schema commitlog scheduling group Currently, we do not explicitly set a scheduling group for the schema commitlog which causes it to run in the default scheduling group (called "main"). However: - It is important and significant enough that it should run in a scheduling group that is separate from the main one, - It should not run in the existing "commitlog" group as user writes may sometimes need to wait for schema commitlog writes (e.g. read barrier done to learn the schema necessary to interpret the user write) and we want to avoid priority inversion issues. Therefore, introduce a new scheduling group dedicated to the schema commitlog. Fixes: scylladb/scylladb#15566 Closes scylladb/scylladb#18715	2024-05-21 11:29:57 +02:00
Kefu Chai	5db315930e	sstables: fix a typo in comment: s/Mimicks/Mimics/ this typo was identified by the codespell workflow Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18781	2024-05-21 12:14:10 +03:00
Nadav Har'El	dcd26d8a16	Merge 'docs: update isolation.md' from Botond Dénes Update `docs/dev/isolation.d`: * Update the list of scheduling groups * Remove IO priority groups (they were folded into scheduling groups) * Add section on RPC isolation Closes scylladb/scylladb#18749 * github.com:scylladb/scylladb: docs: isolation.md: add section on RPC call isolation docs: isolation.md: remove mention of IO priority groups docs: isolation.md: update scheduling group list, add aliases	2024-05-21 11:46:57 +03:00
Kefu Chai	44e85c7d79	build: "undo" the coverage compiling options added to abseil we are not interseted in the code coverage of abseil library, so no need to apply the compiling options enabling the coverage instrumentation when building the abseil library. moreover, since the path of the file passed to `-fprofile-list` is a relative path. when building with coverage enabled, the build fails when building abseil, like: ``` /usr/lib64/ccache/clang++ -I/jenkins/workspace/scylla-master/scylla-ci/scylla/abseil -std=c++20 -I/jenkins/workspace/scylla-master/scylla-ci/scylla/seastar/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/debug/seastar/gen/include -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DBOOST_NO_CXX98_FUNCTION_BASE -DFMT_SHARED -I/usr/include/p11-kit-1 -fprofile-instr-generate -fcoverage-mapping -fprofile-list=./coverage_sources.list -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -MF absl/strings/CMakeFiles/strings.dir/str_cat.cc.o.d -o absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -c /jenkins/workspace/scylla-master/scylla-ci/scylla/abseil/absl/strings/str_cat.cc clang-16: error: no such file or directory: './coverage_sources.list'` ``` in this change, we just remove the compiling options enabling the coverage instrumentation from the cflags when building abseil. Fixes #18686 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18748	2024-05-21 11:43:16 +03:00
Marcin Maliszkiewicz	570b766e8b	cql3: coroutinize create/alter/drop service levels	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	f98cb6e309	auth: coroutinize alter_role and drop_role	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	21556c39d3	auth: coroutinize grant_permissions and revoke_permissions	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	6709947ccf	auth: coroutinize create_role	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	7f5d259b54	cql3: statements: co-routinize auth related statements	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	dee17e5ab6	cql3: statements: release unused guard explicitly in auth related statements Currently guard is released immediately because those functions are based on continuations and guard lifetime is not extended. In the following commit we rewrite those functions to coroutines and lifetime will be automatically extended. This would deadlock the client because we'd try to take second guard inside auth code without releasing this unused one. In the future commits auth guard will be removed and the one from statement will be used but this needs some more code re-arrangements.	2024-05-21 10:37:26 +02:00
Botond Dénes	11fa79a537	docs: isolation.md: add section on RPC call isolation	2024-05-21 03:12:22 -04:00
Kefu Chai	86b988a70b	test/lib: do not use variable which could be moved away C++ standard does not define the order in which the parameters passed to a function are evaluated. so in theory, in ```c++ reusable_sst(sst->get_schema(), std::move(sst)); ``` `std::move(sst)` could be evaluated before `sst->get_schema`. but please note, `std::move(sst)` does not move `sst` away, it merely cast `sst` to a rvalue reference, it is `reusable_sst()` which could move `sst` away by consuming it. so following call is much more dangerous than the above one: ```c++ reusable_sst(sst->get_schema(), modify_sst(std::move(sst))) ``` nevertheless, this usage is still confusing. so instead of passing a copy of `sst` to `reusable_sst`. this change is inspired by clang-tidy, it warns like: ``` Warning: /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: warning: 'sst' used after it was moved [bugprone-use-after-move] 397 \| return reusable_sst(sst->get_schema(), std::move(sst)); \| ^ /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:44: note: move occurred here 397 \| return reusable_sst(sst->get_schema(), std::move(sst)); \| ^ /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 397 \| return reusable_sst(sst->get_schema(), std::move(sst)); \| ``` per the analysis above, this is a false alarm. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18775	2024-05-21 10:02:10 +03:00
Pavel Emelyanov	428e0bd7d4	locator: Remove unused lshift-operator for topology Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18714	2024-05-21 09:46:30 +03:00
Pavel Emelyanov	b24fb8dc87	inet_address: Remove to_sstring() in favor of fmt::to_string The existing inet_address::to_string() calls fmt::format("{}", *this) anyway. However, the to_string() method is declared in .cc file, while form formatter is in the header and is equipeed with constexprs so that converting an address to string is done as much as possible compile-time. Also, though minor, fmt::to_string(foo) is believed to be even faster than fmt::format("{}", foo). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18712	2024-05-21 09:43:08 +03:00
Pavel Emelyanov	fed457eb06	sl_controller: Remove drain() method The draining now only consists of waiting for the data update future to resolve. It can be safely moved to .stop() (i.e. -- later) because its stopping had already been initiated by abort-source, and no other services depend on sl-controller to be stopped and drained. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-21 09:42:16 +03:00
Pavel Emelyanov	535e5f4ae7	sl_controller: Move abort kicking into do_abort() Draining sl controller consists of two parts -- first, kicks the wrap-up process by aborting operations, breaking semaphores, etc. It's no-waiting part. At last there goes co_await of the completion future. This part moves the no-waiting part into recently introduced abort subscription, so that wrap-up starts few bits earlier. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-21 09:42:16 +03:00
Kefu Chai	b6e2d6868b	build: add dependencies from binaries to abseil libraries in `0b0e661a`, we brought abseil submodule back. but we didn't update the build.ninja rules properly -- we should have add the abseil libraries to the dependencies of the binaries so that the abseil libraries are always generated before a certain binary is built. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18753	2024-05-21 08:50:48 +03:00
Avi Kivity	33ec6ccea9	test: boost: chunked_vector_test: include <optional> std::optional is used but not imported. This fails on libstdc++-14. Closes scylladb/scylladb#18739	2024-05-21 07:37:11 +03:00
Pavel Emelyanov	8d4c8711fa	main,sl_controller: Subscribe for early abort There's stop-signal in main that fires an abort source on stop. Lots of other services are subscribed in it, add the sl-controller too. For now it's a no-op, but next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-20 21:26:31 +03:00
Pavel Emelyanov	5105ee3284	main: Unsubscribe sl controller next to subscribing The subscription only handles on_leave_cluster() and only for local node, so even if controller gets subscribed for longer, it won't do any harm. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-20 21:26:31 +03:00
Yaron Kaikov	bc596a3e76	pull_request_template: clearify the template and remove checkbox verification It seems that having the checkbox in the PR template and failing the action is confusing and not very clear. Let's remove it completely and just add to the template an explanation to explain the backport reason Closes scylladb/scylladb#18708	2024-05-20 18:24:28 +03:00
Botond Dénes	f239339a29	Merge 'Improve modularity of some per-table API endpoints' from Pavel Emelyanov There's a set of API endpoints that toggle per-table auto-compaction and tombstone-gc booleans. They all live in two different .cc files under api/ directory and duplicate code of each other. This PR generalizes those handlers, places them next to each other, fixes leak on stop and, as a nice side effect, enlightens database.hh header. Closes scylladb/scylladb#18703 * github.com:scylladb/scylladb: api,database: Move auto-compaction toggle guard api: Move some table manipulation helpers from storage_service api: Move table-related calls from storage_service domain api: Reimplement some endpoints using existing helpers api: Lost unset of tombstone-gc endpoints	2024-05-20 18:01:54 +03:00
Avi Kivity	61505d057e	Merge 'Sort user-defined types in describe statements' from Michał Jadwiszczak User-defined types can depend on each other, creating directed acyclic graph. In order to support restoring schema from `DESC SCHEMA`, UDTs should be ordered topologically, not alphabetically as it was till now. This patch changes the way UDTs are ordered in `DESC SCHEMA`/`DESC KEYSPACE <ks>` statements, so the output can be safely copy-pasted to restore the schema. Fixes #18539 Closes scylladb/scylladb#18302 * github.com:scylladb/scylladb: test/cql-pytest/test_describe: add test for UDTs ordering cql3/statements/describe_statement: UDTs topological sorting cql3/statements/describe_statement: allow to skip alphabetical sorting types: add a method to get all referenced user types db/cql_type_parser: use generic topological sorting db/cql_type_parses: futurize raw_builder::build() test/boost: add test for topological sorting utils: introduce generic topological sorting algorithm	2024-05-20 16:58:17 +03:00
Pavel Emelyanov	159e44d08a	test.py: Make it possible to avoid wildcard test names matching There's a nasty scenario when this searching plays bad joke. When CI picks up a new branch and notices, that a test had changed, it spawns a custom job with test.py --repeat 100 $changed_test_name in it. Next, when the test.py tries opt-in test name matching, it uses the wildcard search and can pick up extra unwanted tests into the run. To solve this, the case-selection syntax is extended. Now if the caller specifies `suite/test::` as test, the test file is selected by exact name match, but the specific test-case is not selected, the `` makes it run all cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18704	2024-05-20 15:50:47 +02:00
Botond Dénes	e1c4e6c151	Merge 'sstables_manager: use maintenance scheduling group to run components reload fiber' from Lakshmi Narayanan Sreethar PR https://github.com/scylladb/scylladb/pull/18186 introduced a fiber that reloads reclaimed bloom filters when memory becomes available. Use maintenance scheduling group to run that fiber instead of running it in the main scheduling group. Fixes #18675 Closes scylladb/scylladb#18721 * github.com:scylladb/scylladb: sstables_manager: use maintenance scheduling group to run components reload fiber sstables_manager: add member to store maintenance scheduling group	2024-05-20 16:38:42 +03:00
Takuya ASADA	33af97ca5a	dist/docker: revert dropping systemd package On `7ce6962141` we dropped openssh-server, it also dropped systemd package and caused an error on Scylla Operator (#17787). This reverts dropping systemd package and fix the issue. Fix #17787 Closes scylladb/scylladb#18643	2024-05-20 16:38:15 +03:00
Andrei Chekun	bce53efd36	Enrich test results produced by test.py This PR resolves issue with double count of the test result for topology tests. It will not appear in the consolidated report anymore. Another fix is to provide a better view which test failed by modifying the test case name in the report enriching it with mode and run id, so making them unique across the run. The scope of this change is: 1. Modify the test name to have run id in name 2. Add handlers to get logs of test.py and pytest in one file that are related to test, rather than to the full suite 3. Remove topology tests from aggregating them on a suite level in Junit results 4. Add a link to the logs related to the failed tests in Junit results, so it will be easier to navigate to all logs related to test 5. Gather logs related to the failed test to one directory for better logs investigation Ref: scylladb/scylladb#17851 Closes scylladb/scylladb#18277	2024-05-20 15:33:57 +02:00
Avi Kivity	52fe351c31	Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec This is needed to avoid severe imbalance between shards which can happen when some table grows and is split. The inter-node balance can be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N then there is not even a possibility of moving tablets around to fix the imbalance. The only way to bring the system to balance is to move tablets within the nodes. The system is not prepared for intra-node migration currently. Request coordination is host-based, while for intra-node migration it should be (also) shard-based. The solution employed here is to keep the coordination between nodes as-is, and for intra-node migration storage_proxy-level coordinator is not aware of the migration (no pending host). The replica-side request handler will be a second-level coordinator which routes requests to shards, similar to how the first-level coordinator routes them to hosts. Tablet sharder is adjusted to handle intra-migration where a tablet can have two replicas on the same host. For reads, sharder uses the read selector to resolve the conflict. For writes, the write selector is used. The old shard_of() API is kept to represent shard for reads, and new method is introduced to query the shards for writing: shard_for_writes(). All writers should be switched to that API, which is not done in this patch yet. The request handler on replica side acts as a second-level coordinator, using sharder to determine routing to shards. A given sharder has a scope of a single topology version, a single effective_replication_map_ptr, which should be kept alive during writes. perf-simple-query test results show no signs of regression: Command: perf-simple-query -c1 -m1G --write --tablets --duration=10 Before: > 83294.81 tps ( 59.5 allocs/op, 14.3 tasks/op, 53725 insns/op, 0 errors) > 87756.72 tps ( 59.5 allocs/op, 14.3 tasks/op, 54049 insns/op, 0 errors) > 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > 86211.38 tps ( 59.7 allocs/op, 14.3 tasks/op, 54219 insns/op, 0 errors) > 86559.89 tps ( 59.6 allocs/op, 14.3 tasks/op, 54188 insns/op, 0 errors) > 86609.39 tps ( 59.6 allocs/op, 14.3 tasks/op, 54117 insns/op, 0 errors) > 87464.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 54039 insns/op, 0 errors) > 86185.43 tps ( 59.6 allocs/op, 14.3 tasks/op, 54169 insns/op, 0 errors) > 86254.71 tps ( 59.6 allocs/op, 14.3 tasks/op, 54139 insns/op, 0 errors) > 83395.35 tps ( 60.2 allocs/op, 14.4 tasks/op, 54693 insns/op, 0 errors) > > median 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > median absolute deviation: 243.04 > maximum: 87756.72 > minimum: 83294.81 > After: > 85523.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 53872 insns/op, 0 errors) > 89362.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54226 insns/op, 0 errors) > 88167.55 tps ( 59.7 allocs/op, 14.3 tasks/op, 54400 insns/op, 0 errors) > 87044.40 tps ( 59.7 allocs/op, 14.3 tasks/op, 54310 insns/op, 0 errors) > 88344.50 tps ( 59.6 allocs/op, 14.3 tasks/op, 54289 insns/op, 0 errors) > 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > 88725.46 tps ( 59.6 allocs/op, 14.3 tasks/op, 54230 insns/op, 0 errors) > 88640.08 tps ( 59.6 allocs/op, 14.3 tasks/op, 54210 insns/op, 0 errors) > 90306.31 tps ( 59.4 allocs/op, 14.3 tasks/op, 54043 insns/op, 0 errors) > 87343.62 tps ( 59.8 allocs/op, 14.3 tasks/op, 54496 insns/op, 0 errors) > > median 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > median absolute deviation: 1007.41 > maximum: 90306.31 > minimum: 85523.06 Command (reads): perf-simple-query -c1 -m1G --tablets --duration=10 Before: > 95860.18 tps ( 63.1 allocs/op, 14.1 tasks/op, 42476 insns/op, 0 errors) > 97537.69 tps ( 63.1 allocs/op, 14.1 tasks/op, 42454 insns/op, 0 errors) > 97549.23 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97511.29 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97227.32 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 94031.94 tps ( 63.1 allocs/op, 14.1 tasks/op, 42441 insns/op, 0 errors) > 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > 96401.70 tps ( 63.1 allocs/op, 14.1 tasks/op, 42473 insns/op, 0 errors) > 96573.77 tps ( 63.1 allocs/op, 14.1 tasks/op, 42440 insns/op, 0 errors) > 96340.54 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > > median 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > median absolute deviation: 571.20 > maximum: 97549.23 > minimum: 94031.94 > After: > 99794.67 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 101244.99 tps ( 63.1 allocs/op, 14.1 tasks/op, 42472 insns/op, 0 errors) > 101128.37 tps ( 63.1 allocs/op, 14.1 tasks/op, 42485 insns/op, 0 errors) > 101065.27 tps ( 63.1 allocs/op, 14.1 tasks/op, 42465 insns/op, 0 errors) > 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > 101413.31 tps ( 63.1 allocs/op, 14.1 tasks/op, 42463 insns/op, 0 errors) > 101464.92 tps ( 63.1 allocs/op, 14.1 tasks/op, 42466 insns/op, 0 errors) > 101086.74 tps ( 63.1 allocs/op, 14.1 tasks/op, 42488 insns/op, 0 errors) > 101559.09 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > 100742.58 tps ( 63.1 allocs/op, 14.1 tasks/op, 42491 insns/op, 0 errors) > > median 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > median absolute deviation: 200.33 > maximum: 101559.09 > minimum: 99794.67 > Fixes #16594 Closes scylladb/scylladb#18026 * github.com:scylladb/scylladb: Implement fast streaming for intra-node migration test: tablets_test: Test sharding during intra-node migration test: tablets_test: Check sharding also on the pending host test: py: tablets: Test writes concurrent with migration test: py: tablets: Test crash during intra-node migration api, storage_service: Introduce API to wait for topology to quiesce dht, replica: Remove deprecated sharder APIs test: Avoid using deprecated sharded API db: do_apply_many() avoid deprecated sharded API replica: mutation_dump: Avoid deprecated sharder API repair: Avoid deprecated sharder API table: Remove optimization which returns empty reader when key is not owned by the shard dht: is_single_shard: Avoid deprecated sharder API dht: split_range_to_single_shard: Work with static_sharder only dht: ring_position_range_sharder: Avoid deprecated sharder APIs dht: token: Avoid use of deprecated sharder API by switching to static_sharder selective_token_sharder: Avoid use of deprecated sharder API docs: Document tablet sharding vs tablet replica placement readers/multishard.cc: use shard_for_reads() instead of shard_of() multishard_mutation_query.cc: use shard_for_reads() instead of shard_of() storage_proxy: Extract common code to apply mutations on many shards according to sharder storage_proxy: Prepare per-partition rate-limiting for intra-node migration storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate() storage_proxy: Prepare mutate_hint() for intra-node tablet migration commitlog_replayer: Avoid deprecated sharder::shard_of() lwt: Avoid deprecated sharder::shard_of() compaction: Avoid deprecated sharder::shard_of() dht: Extract dht::static_sharder replica: Deprecate table::shard_of() locator: Deprecate effective_replication_map::shard_of() dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard tests: tablets: py: Add intra-node migration test tests: tablets: Test that drained nodes are not balanced internally tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load tests: tablets: Verify that disabling balancing results in no intra-node migrations tests: tablets: Check that nodes are internally balanced tests: tablets: Improve debuggability by showing which rows are missing tablets, storage_service: Support intra-node migration in move_tablet() API tablet_allocator: Generate intra-node migration plan tablet_allocator: Extract make_internode_plan() tablet_allocator: Maintain candidate list and shard tablet count for target nodes tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions tablets, streaming: Implement tablet streaming for intra-node migration dht, auto_refreshing_sharder: Allow overriding write selector multishard_writer: Handle intra-node migration storage_proxy: Handle intra-node tablet migration for writes tablets: Get rid of tablet_map::get_shard() tablets: Avoid tablet_map::get_shard in cleanup tablets: test: Use sharder instead of tablet_map::get_shard() tablets: tablet_sharder: Allow working with non-local host sharding: Prepare for intra-node-migration docs: Document sharder use for tablets tablets: Introduce tablet transition kind for intra-node migration tests: tablets: Fix use-after-move of skiplist in rebalance_tablets() sstables, gdb: Track readers in a linked list raft topology: Fix global token metadata barrier to not fence ahead of what is drained	2024-05-20 16:13:01 +03:00
Kefu Chai	a517fcf970	service/storage_proxy: capture `tr_state` by copy in handle_paxos_accept() this change is inspired by following warning from clang-tidy ``` Warning: /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:884:13: warning: 'tr_state' used after it was moved [bugprone-use-after-move] 884 \| if (tr_state) { \| ^ /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:872:139: note: move occurred here 872 \| auto f = get_schema_for_read(proposal.update.schema_version(), src_addr, *timeout).then([&sp = _sp, &sys_ks = _sys_ks, tr_state = std::move(tr_state), \| ^ ``` this is not a false positive. as `tr_state` is a captured by move for constructing a variable in the captured list of a lambda which is in turn passed to the expression evaluated to `f`. even the expression itself is not evaluated yet when we reference `tr_state` to check if it is empty after preparing the expression, `tr_state` is already moved away into the captured variable. so at that moment, the statement of `f = f.finally(...)` is never evaluated, because `tr_state` is always empty by then. so before this change, the trace message is never recorded. in this change, we address this issue by capturing `tr_state` by copying it. as `tr_state` is backed by a `lw_shared_ptr`, the overhead is neglectable. after this change, the tracing message is recorded. the change introduced this issue was `548767f91e`. please note, we could coroutinize this function to improve its readability, but since this is a fix and should be backported, let's start with a minimal fix, and worry about the readability in a follow-up change. Refs `548767f91e` Fixes #18725 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18702	2024-05-20 12:58:49 +03:00
Kefu Chai	40ce52c3cc	test: use generic boost_test_print_type() in this change, we trade the `boost_test_print_type()` overloads for the generic template of `boost_test_print_type()`, except for those in the very small tests, which presumably want to keep themselves relative self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18727	2024-05-20 12:56:20 +03:00
Botond Dénes	0e23cd45ad	Merge 'feature: grandfather some old cluster features' from Avi Kivity This series grandfathers the following features: MD_SSTABLE_FORMAT ME_SSTABLE feature VIEW_VIRTUAL_COLUMNS DIGEST_INSENSITIVE_TO_EXPIRY CDC NONFROZEN_UDTS PER_TABLE_PARTITIONERS PER_TABLE_CACHING DIGEST_FOR_NULL_VALUES CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX Note that for the last (CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX) some code remains to support indexes created before the new feature was adopted. Each patch names the version where the feature was introduced. Closes scylladb/scylladb#18428 * github.com:scylladb/scylladb: feature, index: grandfather CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX feature: grandfather DIGEST_FOR_NULL_VALUES storage_proxy: drop use of MD5 as a digest algorithm feature: grandfather PER_TABLE_CACHING feature: grandfather LWT feature: grandfather HINTED_HANDOFF_SEPARATE_CONNECTION feature: grandfather PER_TABLE_PARTITIONERS test: schema_change_test: regenerate digest for PER_TABLE_PARTITIONERS test: test_schema_change_digest: drop unneeded reference digests feature: grandfather NONFROZEN_UDTS feature: grandfather CDC feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY feature: grandfather VIEW_VIRTUAL_COLUMNS feature: grandfather ME_SSTABLE feature feature: grandfather MD_SSTABLE_FORMAT	2024-05-20 11:48:07 +03:00
Botond Dénes	936a7e282b	docs: isolation.md: remove mention of IO priority groups They were folded into CPU scheduling groups, which now apply to both CPU and IO.	2024-05-20 03:33:24 -04:00
Botond Dénes	8f61468322	docs: isolation.md: update scheduling group list, add aliases	2024-05-20 03:30:04 -04:00
Lakshmi Narayanan Sreethar	6f58768c46	sstables_manager: use maintenance scheduling group to run components reload fiber Fixes #18675 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-19 15:23:45 +05:30
Lakshmi Narayanan Sreethar	79f6746298	sstables_manager: add member to store maintenance scheduling group Store that maintenance scheduling group inside the sstables_manager. The next patch will use this to run the components reloader fiber. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-19 15:23:45 +05:30
Avi Kivity	54a82fed6b	feature, index: grandfather CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX This feature corrected how we store the token in secondary indexes. It was introduced in `7ff72b0ba5` (2020; 4.4) and can now be assumed present everywhere. Note that we still support indexes created with the old format.	2024-05-18 00:24:11 +03:00
Avi Kivity	2fbd78c769	feature: grandfather DIGEST_FOR_NULL_VALUES The DIGEST_FOR_NULL_VALUES feature was added in `21a77612b3` (2020; 4.4) and can now be assumed to be always present. The hasher which it invoked is removed.	2024-05-18 00:24:00 +03:00
Avi Kivity	879583c489	storage_proxy: drop use of MD5 as a digest algorithm The XXHASH feature was introduced in `0bab3e59c2` (2017; 2.2) and made mandatory in `defe6f49df` (2020; 4.4), but some vestiges remain. Remove them now. Note that md5_hasher itself is still in use by other components, so it cannot be removed.	2024-05-18 00:23:47 +03:00
Avi Kivity	7c264e8a71	feature: grandfather PER_TABLE_CACHING The PER_TABLE_CACHING feature was added in `0475dab359` (2020; 4.2) and can now be assumed to be always present.	2024-05-18 00:23:30 +03:00
Avi Kivity	d52c424a5f	feature: grandfather LWT LWT was make non-experimental in `9948f548a5` (2020; 4.1) and can now be assumed to be always present.	2024-05-18 00:20:53 +03:00
Avi Kivity	93088d0921	feature: grandfather HINTED_HANDOFF_SEPARATE_CONNECTION The HINTED_HANDOFF_SEPARATE_CONNECTION feature was introduced in `3a46b1bb2b` (2019; 3.3) and can be assumed always present.	2024-05-18 00:18:27 +03:00
Avi Kivity	3bead8cea0	feature: grandfather PER_TABLE_PARTITIONERS The PER_TABLE_PARTITIONERS feature was added in `90df9a44ce` (2020; 4.0) and can now be assumed to be always present. We also remove the associated schema_feature.	2024-05-18 00:15:07 +03:00
Avi Kivity	6b532fd40b	test: schema_change_test: regenerate digest for PER_TABLE_PARTITIONERS The first digest tested was generated without the PER_TABLE_PARTITIONERS schema feature. We're about to make that feature mandatory, so we won't be able (and won't need) to generate a digest without it. Update the digest to include the feature. Note it wasn't untested before, we have a test with schema_features::full().	2024-05-18 00:14:43 +03:00
Avi Kivity	c4d8b17f4c	test: test_schema_change_digest: drop unneeded reference digests digests[0] was used by the VIEW_VIRTUAL_COLUMNS feature, which no longer exists. digests[1] is the same as digests[2], so drop it.	2024-05-17 20:41:20 +03:00
Avi Kivity	93113da01b	feature: grandfather NONFROZEN_UDTS The NONFROZEN_UDTS feature was added in `e74b5deb5d` (2019; 3.2) and can now be assumed to be always present.	2024-05-17 20:41:20 +03:00
Avi Kivity	c7d7ca2c23	feature: grandfather CDC The CDC feature was made non-experimental in `e9072542c1` (2020; 4.4) and can now be assumed to be always present. We also remove the corresponding schema_feature.	2024-05-17 20:41:20 +03:00
Avi Kivity	82ad2913ca	feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY The DIGEST_INSENSITIVE_TO_EXPIRY feature was added in `9de071d214` (2019; 3.2) and can now be assumed to be always present. We enable the corresponding schema_feature unconditionally. We do not remove the corresponding schema feature, because it can be disabled when the related TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is present.	2024-05-17 20:41:19 +03:00
Avi Kivity	b5f6021a6b	feature: grandfather VIEW_VIRTUAL_COLUMNS The VIEW_VIRTUAL_COLUMNS feature was added in `a108df09f9` (2019; 3.1) and can now be assumed to be always present. The corresponding schema_feature is removed. Note schema_features are not sent over the wire. A digest calculation without VIEW_VIRTUAL_COLUMNS is no longer tested.	2024-05-17 20:41:19 +03:00
Avi Kivity	7952200c8c	feature: grandfather ME_SSTABLE feature "me" format sstables were introduced in `d370558279` (Jan 2022; 5.1) and so can be assumed always present. The listener that checks when the cluster understands ME_SSTABLE was removed and in its place we default to sstable_version_types::me (and call on_enabled() immediately).	2024-05-17 20:41:19 +03:00
Avi Kivity	6d0c0b542c	feature: grandfather MD_SSTABLE_FORMAT "md" sstable support was introduced in `e8d7744040` (2020; 4.4) and so can be assumed to be present on all versions we upgrade from. Nothing appears to depend on it.	2024-05-17 20:41:19 +03:00
Anna Stuchlik	c93a7d2664	doc: replace 5.5 with 6.0 in SStable docs (me) This commit replaces the version number 5.5 with 6.0, because 5.5 has never been released. This is a follow-up to https://github.com/scylladb/scylladb/pull/16716. Refs https://github.com/scylladb/scylladb/issues/16551 Refs https://github.com/scylladb/scylladb/issues/18580 Closes scylladb/scylladb#18730	2024-05-17 16:34:18 +03:00
Botond Dénes	db70e8dd5f	test/cql-pytest: test_tombstone_limit.py: enable xfailing tests These tests were marked as xfail because they use to fail with tablets. They don't anymore, so remove the xfail. Fixes: #16486 Closes scylladb/scylladb#18671	2024-05-16 20:14:47 +03:00
Nadav Har'El	c7aa47354a	Merge 'mutation_fragment_stream_validating_filter: respect validating_level::none' from Botond Dénes Even when configured to not do any validation at all, the validator still did some. This small series fixes this, and adds a test to check that validation levels in general are respected, and the validator doesn't validate more than it is asked to. Fixes: #18662 Closes scylladb/scylladb#18667 * github.com:scylladb/scylladb: test/boost/mutation_fragment_test.cc: add test for validator validation levels mutation: mutation_fragment_stream_validating_filter: fix validation_level::none mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter	2024-05-16 19:57:49 +03:00
Kamil Braun	734c5de314	Merge 'fix test teardown race with ongoing test operation' from Artsiom Mishuta This commit brings several new features in scylla_cluster.py to fix runaway asyncio task problems in topology tests - Start-Stop Lock and Stop Event in ScyllaServer - Tasks History, Wait for tasks from Tasks History and Manager broken state in ScyllaClusterManager - make ManagerClient object function scope - test_finished_event in ManagerClient Fixes: scylladb/scylladb#16472 Fixes: scylladb/scylladb#16651 Closes scylladb/scylladb#18236 * github.com:scylladb/scylladb: test/pylib: Introduce ManagerClient.test_finished_event test/topology: make ManagerClient object function scope test/pylib: Introduce Manager broken state: test/pylib: Wait for tasks from Tasks History: test/pylib: Introduce Tasks History: test/pylib: Introduce Stop Event test/pylib: Introduce Start-Stop Lock:	2024-05-16 17:42:00 +02:00
Kefu Chai	759156b56d	test: perf: alternator: mark format string as `constexpr` before this change, we use `update_item_suffix` as a format string fed to `format(...)`, which is resolved to `seastar::format()`. but with a patch which migrates the `seastar::format()` to the backend with compile-time format check, the caller sites using `format()` would fail to build, because `update_item_suffix` is not a `constexpr`: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -MF test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o.d -o test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -c /home/kefu/dev/scylladb/test/perf/perf_alternator.cc /home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: error: call to consteval function 'fmt::basic_format_string<char, const char (&)[1]>::basic_format_string<const char , 0>' is not a constant expression 249 \| return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, "")); \| ^ /usr/include/fmt/core.h:2776:67: note: read of non-constexpr variable 'update_item_suffix' is not allowed in a constant expression 2776 \| FMT_CONSTEVAL FMT_INLINE basic_format_string(const S& s) : str_(s) { \| ^ /home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: note: in call to 'basic_format_string<const char , 0>(update_item_suffix)' 249 \| return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, "")); \| ^~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/test/perf/perf_alternator.cc:198:6: note: declared here 198 \| auto update_item_suffix = R"( \| ^ ``` so, to prepare the change switching to compile-time format checking, let's mark this variable `static constexpr`. this is also more correct, as this variable is * a compile time constant, and * is not shared across different compilation units. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18685	2024-05-16 15:18:42 +03:00
Avi Kivity	6982de6dde	Merge 'Fix stalls in forward_service::dispatch() with large tablet count' from Raphael "Raph" Carvalho With a large tablet count, e.g. 128k, forward_service::dispatch() can potentially stall when grouping ranges per endpoint. ` Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f ` Also there are inefficient copies that are being removed. partition_range_vector for a single endpoint can grow beyond 1M. Closes scylladb/scylladb#18695 * github.com:scylladb/scylladb: service: fix indentation in dispatch() service: fix reactor stall with large tablet count service: avoid potential expensive copies in forward_service::dispatch() service: coroutinize forward_service::dispatch()	2024-05-16 15:17:43 +03:00
Kefu Chai	617e532859	db: config: drop operator<<() for error_injection_at_startup it is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18701	2024-05-16 15:10:57 +03:00
Pavel Emelyanov	dffd985401	data_dictionary: Resurrect formatter for keyspace_metadata It was commented out by the `a439ebcfce` (treewide: include fmt/ranges.h and/or fmt/std.h) , probably by mistake Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18665	2024-05-16 15:09:45 +03:00
Pavel Emelyanov	31d05925cc	api,database: Move auto-compaction toggle guard Toggling per-table auto-compaction enabling bit is guarded with on-database boolean and raii guard. It's only used by a single api/column_family.cc file, so it can live there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:51 +03:00
Pavel Emelyanov	a43b178f72	api: Move some table manipulation helpers from storage_service Continuation of the previous patch -- helpers toggling tombstone_gc and auto_compaction on tables should live in the same file that uses them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	862fcd7bc7	api: Move table-related calls from storage_service domain The storage_service/(enable\|disable)_(tombstone_gc\|auto_compaction) endpoints are not handled by storage_service _service_ and should rather live in the column_family/ domain which is handler by replica::database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	ba53283d21	api: Reimplement some endpoints using existing helpers The (enable\|disable)_(tombstone_gc\|auto_compaction) endpoints living in column_family domain can benefit from the helpers that do the same in the storage_service domain. The "difference" is that c.f. endpoints do it per-table, while s.s. ones operate on a vector of tables, so the former is a corner case of the latter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	231ffa623c	api: Lost unset of tombstone-gc endpoints On stop all endpoints must be unregistered, these three are lost Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Michał Jadwiszczak	b3e6a39604	test/cql-pytest/test_describe: add test for UDTs ordering	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	f29820fb27	cql3/statements/describe_statement: UDTs topological sorting User-defined types can depend on each other, creating directed acyclic graph. In order to support restoring schema from `DESC SCHEMA`, UDTs should be ordered topologically, not alphabetically as it was till now.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	7be938192b	cql3/statements/describe_statement: allow to skip alphabetical sorting In a next commit, we are going to introduce topological sorting of user-defined types, so alphabetical sorting must be skipped not to interfere.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	8157d260f2	types: add a method to get all referenced user types The method allows to collect all UDTs used to create a type. This is required to sort UDTs in a topological order.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	573e13e3f1	db/cql_type_parser: use generic topological sorting	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	3830f3bd23	db/cql_type_parses: futurize raw_builder::build() In order to use generic topological sort, build() method needs to return future.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	7f04c88395	test/boost: add test for topological sorting	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	aa08e586fd	utils: introduce generic topological sorting algorithm Until now, we have implemented topological sorting in db/cql_type_parser.cc but it is specific to its usage. Now we want to use topological sorting in another place, so generic sorting algoritm provides one implementation to be reused in several places.	2024-05-16 13:30:03 +02:00
Nadav Har'El	27ab560abd	cql: fix hang during certain SELECT statements The function intersection(r1,r2) in statement_restrictions.cc is used when several WHERE restrictions were applied to the same column. For example, for "WHERE b<1 AND b<2" the intersection of the two ranges is calculated to be b<1. As noted in issue #18690, Scylla is inconsistent in where it allows or doesn't allow these intersecting restrictions. But where they are allowed they must be implemented correctly. And it turns out the function intersection() had a bug that caused it to sometimes enter an infinite loop - when the intent was only to call itself once with swapped parameters. This patch includes a test reproducing this bug, and a fix for the bug. The test hangs before the fix, and passes after the fix. While at it, I carefully reviewed the entire code used to implement the intersection() function to try to make sure that the bug we found was the only one. I also added a few more comments where I thought they were needed to understand complicated logic of the code. The bug, the fix and the test were originally discovered by Michał Chojnowski. Fixes #18688 Refs #18690 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18694	2024-05-16 11:25:44 +03:00
Piotr Dulikowski	68eca3778c	Merge 'mv: throttle view update generation for large queries' from Wojciech Mitros This series is a reupload of #13792 with a few modifications, namely a test is added and the conflicts with recent tablet related changes are fixed. See https://github.com/scylladb/scylladb/issues/12379 and https://github.com/scylladb/scylladb/pull/13583 for a detailed description of the problem and discussions. This PR aims to extend the existing throttling mechanism to work with requests that internally generate a large amount of view updates, as suggested by @nyh. The existing mechanism works in the following way: * Client sends a request, we generate the view updates corresponding to the request and spawn background tasks which will send these updates to remote nodes * Each background task consumes some units from the `view_update_concurrency_semaphore`, but doesn't wait for these units, it's just for tracking * We keep track of the percent of consumed units on each node, this is called `view update backlog`. * Before sending a response to the client we sleep for a short amount of time. The amount of time to sleep for is based on the fullness of this `view update backlog`. For a well behaved client with limited concurrency this will limit the amount of incoming requests to a manageable level. This mechanism doesn't handle large DELETE queries. Deleting a partition is fast for the base table, but it requires us to generate a view update for every single deleted row. The number of deleted rows per single client request can be in the millions. Delaying response to the request doesn't help when a single request can generate millions of updates. To deal with this we could treat the view update generator just like any other client and force it to wait a bit of time before sending the next batch of updates. The amount of time to wait for is calculated just like in the existing throttling code, it's based on the fullness of `view update backlogs`. The new algorithm of view update generation looks something like this: ```c++ for(;;) { auto updates = generate_updates_batch_with_max_100_rows(); co_await seastar::sleep(calculate_sleep_time_from_backlogs()); spawn_background_tasks_for_updates(updates); } ``` Fixes: https://github.com/scylladb/scylladb/issues/12379 Closes scylladb/scylladb#16819 * github.com:scylladb/scylladb: test: add test for bad_allocs during large mv queries mv: throttle view update generation for large queries exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception db/view: extract view throttling delay calculation to a global function view_update_generator: add get_storage_proxy() storage_proxy: make view backlog getters public	2024-05-16 08:22:54 +02:00
Botond Dénes	af9e173c99	Merge 'repair: Don't get topology via database' from Pavel Emelyanov Database has token-metadata onboard and other services use it to get topology from. Repair code has simpler and cleaner ways to get access to topology. Closes scylladb/scylladb#18677 * github.com:scylladb/scylladb: repair: Get topology via replication map repair: Use repair_service::my_address() in handlers repair: Remove repair_meta::_myip repair: Use repair_meta::myip() everywhere repair: Add repair_service::my_address() method	2024-05-16 08:28:14 +03:00
Raphael S. Carvalho	715ae689c0	Implement fast streaming for intra-node migration With intra-node migration, all the movement is local, so we can make streaming faster by just cloning the sstable set of leaving replica and loading it into the pending one. This cloning is underlying storage specific, but s3 doesn't support snapshot() yet (th sstables::storage procedure which clone is built upon). It's only supported by file system, with help of hard links. A new generation is picked for new cloned sstable, and it will live in the same directory as the original. A challenge I bumped into was to understand why table refused to load the sstable at pending replica, as it considered them foreign. Later I realized that sharder (for reads) at this stage of migration will point only to leaving replica. It didn't fail with mutation based streaming, because the sstable writer considers the shard -- that the sstable was written into -- as its owner, regardless of what sharder says. That was fixed by mimicking this behavior during loading at pending. test: ./test.py --mode=dev intranode --repeat=100 passes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	a179f37780	test: tablets_test: Test sharding during intra-node migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	5f32d2ddb6	test: tablets_test: Check sharding also on the pending host	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	6d809c75fb	test: py: tablets: Test writes concurrent with migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	ad02d85c16	test: py: tablets: Test crash during intra-node migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7956a2991e	api, storage_service: Introduce API to wait for topology to quiesce	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	679baff25a	dht, replica: Remove deprecated sharder APIs	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	32a191384a	test: Avoid using deprecated sharded API There is not tablet migration in unit tests, so shard_of() can be safely replaced with shard_for_reads(). Even if it's used for writes.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	539460dd71	db: do_apply_many() avoid deprecated sharded API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	0f50504c39	replica: mutation_dump: Avoid deprecated sharder API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7bf5733fa5	repair: Avoid deprecated sharder API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7c03646f99	table: Remove optimization which returns empty reader when key is not owned by the shard This check would lead to correctness issues with intra-node migration because the shard may switch during read, from "read old" to "read new". If the coordinator used "read old" for shard routing, but table on the old shard is already using "read new" erm, such a read would observe empty result, which is wrong. Drop the optimization. In the scenario above, read will observe all past writes because: 1) writes are still using "write both" 2) writes are switched to "write new" only after all requests which might be using "read old" are done Replica-side coordinators should already route single-key requests to the correct shard, so it's not important as an optimization. This issue shows how assumptions about static sharding are embedded in the current code base and how intra-node migration, by violating those assumptions, can lead to correctness issues.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	26f2e6aa8e	dht: is_single_shard: Avoid deprecated sharder API All current uses are used in the read path.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c9e6b4dca7	dht: split_range_to_single_shard: Work with static_sharder only In preparation for intra-node tablet migration, to avoid using deprecated sharder APIs. This function is used for generating sstable sharding metadata. For tablets, it is not invoked, so we can safely work with the static sharder. The call site already passes static_sharder only.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c380aecf64	dht: ring_position_range_sharder: Avoid deprecated sharder APIs In preparation for tablet intra-node migration. Existing uses are for reads, so it's safe to use shard_for_reads(): - in multishard reader - in forward_service The ring_position_range_vector_sharder is used when computing sstable shards, which for intra-node migration should use the view for reads. If we haven't completed streaming, sstables should be attached to the old shard (used by reads). When in write-both-read-new stage, streaming is complete, reads are using the new shard, and we should attach sstables to the new shard. When not in intra-node migration, the view for reads on the pending node will return the pending shard even if read selector is "read old". So if pending node restarts during streaming, we will attach to sstables to the shard which is used by writes even though we're using the selector for reads.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	a1aac409bf	dht: token: Avoid use of deprecated sharder API by switching to static_sharder The touched APIs are used only with static_sharder.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	dd4a086b87	selective_token_sharder: Avoid use of deprecated sharder API I analyzed all the uses and all except the alternator/ttl.cc seem to be interested in the result for the purpose of reading. Alternator is not supported with tablets yet, so the use was annotated with a relevant issue.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	eb3a22d5a8	docs: Document tablet sharding vs tablet replica placement	2024-05-16 00:28:47 +02:00
Botond Dénes	635aba435b	readers/multishard.cc: use shard_for_reads() instead of shard_of() The latter is deprecated.	2024-05-16 00:28:47 +02:00
Botond Dénes	bc779ed00c	multishard_mutation_query.cc: use shard_for_reads() instead of shard_of() The latter is deprecated.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	3b7d7088d1	storage_proxy: Extract common code to apply mutations on many shards according to sharder	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	660b3d1765	storage_proxy: Prepare per-partition rate-limiting for intra-node migration Note: there is a potential problem with rate-limit count going out of sync during intra-node migration between old and the new shard. Before this patch, when coordinator accounted and admitted the request, so the rate_limit_info passed to apply_locally() is account_only, it was converted to std::monostate for requests to the local replia. This makes sense because the request was already accounted by the coordinator. However, during intra-node migration when we do double writes to two shards locally, that means that the new shard will not account the write, it will have lower count than the limiter on the old shard. This means that the new shard may accept writes which will end up being rejected. This is not desirable, but not the end of the world since it's temporary, and the new shard will still protect itself from overload based on its own rate limiter.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7c3291b5ea	storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate() Cunters are not supported with tablets, so we should not reach this path.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	db2809317d	storage_proxy: Prepare mutate_hint() for intra-node tablet migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	feafe0f6a7	commitlog_replayer: Avoid deprecated sharder::shard_of() shard_for_writes() is appropriate, because we're writing. It can happen that the tablet was migrated away and no shard is the owner. In that case the mutation is dropped, as it should be, because "shards" is empty.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c9294b1642	lwt: Avoid deprecated sharder::shard_of() Instead, use shard_for_reads(). The justification is that: 1) In cas_shard(), we need to pick a single request coordinator. shard_for_reads() gives that, which is equivalent to shard_of() if there is no intra-node migration. 2) In paxos handler for prepare(), the shard we execute it on is the shard from which we read, so shard_for_reads() is the one. 3) Updates of paxos state are separate CQL requests, and use their own sharding. 4) Handler for learn is executing updates using calls to storage_proxy::mutate_locally() which will use the right sharder for writes However, the code is still not prepared for intra-node migration, and possibly regular migration too in case of abandoned requests, because the locking of paxos state assumes that the shard is static. That would have to be fixed separately, e.g. by locking both shards (shard_for_writes()) during migration, so that the set of locked shards always intersects during migration and local serialization of paxos state updates is achieved. I left FIXMEs for that.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	1631bab658	compaction: Avoid deprecated sharder::shard_of()	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	9da3bd84c7	dht: Extract dht::static_sharder Before the patch, dht::sharder could be instantiated and it would behave like a static sharder. This is not safe with regards to extensions of the API because if a derived implementation forgets to override some method, it would incorrectly default to the implementation from static sharder. Better to fail the compilation in this case, so extract static sharder logic to dht::static_sharder class and make all methods in dht::sharder pure virtual. This also allows us to have algorithms indicate that they only work with static sharder by accepting the type, and have compile-time safety for this requirement. schema::get_sharder() is changed to return the static_sharder&.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	dbca598e99	replica: Deprecate table::shard_of()	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	a1bee16ee9	locator: Deprecate effective_replication_map::shard_of()	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	10a4903d0c	dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard Require users to specify whether we want shard for reads or for writes by switching to appropriate non-deprecated variant. For example, shard_of() can be replaced with shard_for_reads() or shard_for_writes(). The next_shard/token_for_next_shard APIs have only for-reads variant, and the act of switching will be a testimony to the fact that the code is valid for intra-node migration.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	b3cdf9a379	tests: tablets: py: Add intra-node migration test	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	d26cd97633	tests: tablets: Test that drained nodes are not balanced internally It would be a waste of effort to do so, since we migrate tablets away anyway.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	04f0088679	tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c76ba52c70	tests: tablets: Verify that disabling balancing results in no intra-node migrations	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	0addca88b9	tests: tablets: Check that nodes are internally balanced Existing tests are augmented with a check which verifies that all nodes are internally balanced.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	0e2617336a	tests: tablets: Improve debuggability by showing which rows are missing	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	329342bfb2	tablets, storage_service: Support intra-node migration in move_tablet() API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	db9d3f0128	tablet_allocator: Generate intra-node migration plan Intra-node migrations are scheduled for each node independently with the aim to equalize per-shard tablet count on each node. This is needed to avoid severe imbalance between shards which can happen when some table grows and is split. The inter-node balance can be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N then there is not even a possibility of moving tablets around to fix the imbalance. The only way to bring the system to balance is to move tablets within the nodes. After scheduling inter-node migrations, the algorithm schedules intra-node migrations. This means that across-node migrations can proceed in parallel with intra-node migrations if there is free capacity to carry them out, but across-node migrations have higher priority. Fixes #16594	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	793af3d6e1	tablet_allocator: Extract make_internode_plan() Currently the load balancer is only generting an inter-node plan, and the algorithm is embedded in make_plan(). The method will become even harder to follow once we add more kinds of plan generating steps, e.g. inter-node plan. Extract the inter-node plan to make it easier to add other plans and see the grand flow.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	f95a0f0182	tablet_allocator: Maintain candidate list and shard tablet count for target nodes The node_load datastructure was not updated to reflect migration decisions on the target node. This is not needed for inter-node migration because target nodes are not considered as sources. But we want it to reflect migration decisions so that later inter-node migration sees an accurate picture with earlier migrations reflected in node_load.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	c86f659421	tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions Will be needed by member methods which generate migration plans.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	fdcaaea91a	tablets, streaming: Implement tablet streaming for intra-node migration	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	aafeacc8d9	dht, auto_refreshing_sharder: Allow overriding write selector During streaming for intra-node migration we want to write only to the new shard. To achieve that, allow altering write selector in sharder::shard_for_writes() and per-instance of auto_refreshing_sharder.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	dfed4efcc5	multishard_writer: Handle intra-node migration This writer is used by streaming, on tablet migration and load-and-stream. The caller of distribute_reader_and_consume_on_shards(), which provides a sharder, is supposed to ensure that effective_replication_map is kept alive around it, in order for topology coordinator to wait for any writes which may be in flight to reach their shards before tablet replica starts another migration. This is already the case: 1) repair and load-and-stream keep the erm around writing. 2) tablet migration uses autorefreshing_sharder, so it does not, but it keeps the topology_guard around the operation in the consumer, which serves the same purpose.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	4df818db98	storage_proxy: Handle intra-node tablet migration for writes When sharder says that the write should go to multiple shards, we need to consider the write as applied only if it was applied to all those shards. This can happen during intra-node tablet migration. During such migration, the request coordinator on storage_proxy side is coordinating to hosts as if no migration was in progress. The replica-side coordinator coordinates to shards based on sharder response. One way to think about it is that effective_replication_map::get_natural_endpoints()/get_pending_endpoints() tells how to coordinate between nodes, and sharder tells how to coordinate between shards. Both work with some snapshot of tablet metadata, which should be kept alive around the operation. Sharder is associated with its own effective_replication_map, which marks the topology version as used and allows barriers to synchronize with replica-side operations.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	6c6ce2d928	tablets: Get rid of tablet_map::get_shard() Its semantics do not fit well with intra-node migration which allow two owning shards. Replace uses with the new has_replica() API.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	d000ad0325	tablets: Avoid tablet_map::get_shard in cleanup In preparation for intra-node migration for which get_shard() is not prepared.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	daaceda963	tablets: test: Use sharder instead of tablet_map::get_shard() tablet_map::get_shard() will go away as it is not prepared for intra-node migration.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	d47dfceb34	tablets: tablet_sharder: Allow working with non-local host Will be used in tests.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	6946ad2a45	sharding: Prepare for intra-node-migration Tablet sharder is adjusted to handle intra-migration where a tablet can have two replicas on the same host. For reads, sharder uses the read selector to resolve the conflict. For writes, the write selector is used. The old shard_of() API is kept to represent shard for reads, and new method is introduced to query the shards for writing: shard_for_writes(). All writers should be switched to that API, which is not done in this patch yet. The request handler on replica side acts as a second-level coordinator, using sharder to determine routing to shards. A given sharder has a scope of a single topology version, a single effective_replication_map_ptr, which should be kept alive during writes.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	b5bb46357b	docs: Document sharder use for tablets	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	82b34d34d8	tablets: Introduce tablet transition kind for intra-node migration We need a separate transition kind for intra node migration so that we don't have to recover this information from replica set in an expensive way. This information is needed in the hot path - in effective_replicaiton_map, to not return the pending tablet replica to the coordinator. From its perspective, replica set is not transitional. The transition will also be used to alter the behavior of the sharder. When not in intra-node migration, the sharder should advertise the shard which is either in the previous or next replica set. During intra-node migration, that's not possible as there may be two such shards. So it will return the shard according to the current read selector.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	942ea39bf0	tests: tablets: Fix use-after-move of skiplist in rebalance_tablets() balance_tablets() is invoked in a loop, so only the first call will see non-empty skiplist. This bug starts to manifest after adding intra-node migration plan, causing failures of the test_load_balancing_with_skiplist test case. The reason is that rebalancing will now require multiple passes before convergence is reached, due to intra-node migrations, and later calls will not see the skiplist and try to balance skipped nodes, vioating test's assertions.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	4d84451cf1	sstables, gdb: Track readers in a linked list For the purpose of scylla-gdb.py command "scylla active-sstables". Before the patch, readers were located by scanning the heap for live objects with vtable pointers corresponding to readers. It was observed that the test scylla_gdb/test_misc.py::test_active_sstables started failing like this: gdb.error: Error occurred in Python: Cannot access memory at address 0x300000000000000 This could be explained by there being a live object on the heap which used to be a reader but now is a different object, and the _sst field contains some other data which is not a pointer. To fix, track readers explicitly in a linked list so that the gdb script can reliably walk readers. Fixes #18618.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	fad6c41cee	raft topology: Fix global token metadata barrier to not fence ahead of what is drained Topology version may be updated, for example, by executing a RESTful API call to move a tablet. If that is done concurrently with an ongoing token metadata barrier executed by topology coordinator (because there is active tablet migration, for example), then some requests may fail due to being fenced out unnecessarily. The problem is that barrier function assumes no concurrent topology updates so it sets the fence version to the one which is current after other nodes are drained. This patch changes it to set the fence to the version which was current before other nodes were drained. Semantics of the barrier are preserved because it only guarantees that topology state from before the invocation of barrier is propagated. Fixes #18699	2024-05-16 00:28:46 +02:00
Benny Halevy	3c4c81c2d9	utils: chunked_vector: optimize for trivially_copyable types Use std::uninitialized_{copy,move} and std::destroy that have optimizations for trivially copyable and trivially moveable types. In those cases, memory can be copied onto the uninitialized memory, rather than invoking the respective copy/move constructors, one item at a time. perf-simple-query results: ``` base: median 95954.90 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42312 insns/op, 0 errors) post: median 97530.65 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42331 insns/op, 0 errors) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18609	2024-05-15 22:32:45 +03:00
Raphael S. Carvalho	012ba25b5b	service: fix indentation in dispatch() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	0a9e073154	service: fix reactor stall with large tablet count with a large tablet count, e.g. 128k, forward_service::dispatch() can potentially stall when grouping ranges per endpoint. Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	f7659b357c	service: avoid potential expensive copies in forward_service::dispatch() each partition_range_vector might grow to ~9600 elements, assuming 96-shard nodes, each with 100 tablets. ~9600 elements, where each is 120 bytes (sizeof(partition_range)) can result in vector with capacity of ~2M due to growth factor of 2. we're copying each range 3x in dispatch(), and we can easily avoid it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	f9d2b9a83b	service: coroutinize forward_service::dispatch() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Pavel Emelyanov	16db2f650e	functions: Do not crash when schema is missing Getting token() function first tries to find a schema for underlying table and continues with nullptr if there's no one. Later, when creating token_fct, the schema is passed as is and referenced. If it's null crash happens. It used to throw before `5983e9e7b2` (cql3: test_assignment: pass optional schema everywhere) on missing schema, but this commit changed the way schema is looked up, so nullptr is now possible. fixes: #18637 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18639	2024-05-15 17:20:40 +03:00
Pavel Emelyanov	d267fbd894	repair: Get topology via replication map When row_level_repair is constructed it sorts provided list of enpoints. For that it needs to get topology from somewhere and it goes the database->token_metadata->topology chain. Patch this palce to get topology from erm instead. It's consistent with how other code from row_level_repair gets it and removes one more place that uses database to token metadata "provider". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	2706f27cd9	repair: Use repair_service::my_address() in handlers Some handlers want to print local node address in logs. Now the repair_service has a method to get one, so those places can stop getting it via database->token_metadata dependency chain. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	7fb405ba65	repair: Remove repair_meta::_myip In favor of recently introduced my_address() one. One nice side effect of this change is minus one place that gets token metadata from database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	017f650955	repair: Use repair_meta::myip() everywhere The method returns _myip and some places in this class use _myip directly. Next patch is going to remove _myip, so prepare for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	6899bf83ec	repair: Add repair_service::my_address() method To be used in next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Avi Kivity	a5fea84d82	Merge 'scylla-nodetool: add tablet support for ring command' from Botond Dénes Currently, invoking `nodetool ring` on a tablet keyspace fails with an error, because it doesn't pass the required table parameter to `/storage_service/ownership/{keyspace}`. Further to this, the command will currently always output the vnode ring, regardless of the keyspace and table parameter. This series fixes this, adding tablet support to `/storage_service/tokens_endpoint`, which will now return the tablet ring (tablet token -> tablet primary replica mapping) if the new keyspace and table parameters are provided. `nodetool status` also gets a touch-up, to provide the tablet ring's token count (the tablet count) when invoked with a tablet keyspace and table. Fixes: #17889 Fixes: #18474 - [x] native-nodetool is new functionality, no backport is needed Closes scylladb/scylladb#18608 * github.com:scylladb/scylladb: test/nodetool: make test pass with cassandra nodetool tools/scylla-nodetool: status: fix token count for tablets tools/scylla-nodetool: add tablet support to ring command api/storage_service: add tablet support for /storage_service/tokens_endpoint service/storage_service: introduce get_tablet_to_endpoint_map() locator/tablets: introduce the primary replica concept	2024-05-15 16:05:10 +03:00
Artsiom Mishuta	d659d9338b	test/pylib: Introduce ManagerClient.test_finished_event introduce ManagerClient.test_finished_event to block access to REST client object from the test if ManagerClient.after_test method was called (test teardown started)	2024-05-15 11:33:45 +02:00
Botond Dénes	7b41bb601c	Merge 'Simplify access to topology::my_address()' from Pavel Emelyanov Recent commit `12f160045b` (Get rid of fb_utilities) replaced the usage of global fb_utilities and made all services use topology::my_address() in order to get local node broadcast address. Some places resulted in long dependency chains dereferences. to get to topology This PR fixes some of them. Closes scylladb/scylladb#18672 * github.com:scylladb/scylladb: service_level_controller_test: Use topology::is_me() helper service_level_controller: Add dependency on shared_token_metadata tracing: Get my_address() via proxy storage_proxy: Get token metadata via local member, not database	2024-05-15 11:23:16 +03:00
Wojciech Mitros	5154429713	mv gossip: check errno instead of value returned by strtoull Currently, when a view update backlog is changed and sent using gossip, we check whether the strtoll/strtoull function used for reading the backlog returned LLONG_MAX/ULLONG_MAX, signaling an error of a value exceeding the type's limit, and if so, we do not store it as the new value for the node. However, the ULLONG_MAX value can also be used as the max backlog size when sending empty backlogs that were never updated. In theory, we could avoid sending the default backlog because each node has its real backlog (based on the node's memory, different than the ULLONG_MAX used in the default backlog). In practice, if the node's backlog changed to 0, the backlog sent by it will be likely the default backlog, because when selecting the biggest backlog across node's shards, we use the operator<=>(), which treats the default backlog as equal to an empty backlog and we may get the default backlog during comparison if the backlog of some shard was never changed (also it's the initial max value we compare shard's backlogs against). This patch removes the (U)LLONG_MAX check and replaces it with the errno check, which is also set to ERANGE during the strtoll error, and which won't prevent empty backlogs from being read Fixes: #18462 Closes scylladb/scylladb#18560	2024-05-15 07:14:36 +02:00
Pavel Emelyanov	59aec1f300	database: Don't break namespace withexternal alias The namespace replica is broken in the middle with sstable_list alias, while the latter can be declared earlier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18664	2024-05-14 16:45:20 +03:00
Piotr Dulikowski	9ab57b12bb	Merge 'cql/describe: hide cdc log tables' from Michał Jadwiszczak Currently all tables are printed in statements like `DESC TABLES`, `DESC KEYSPACE ks` or `DESC SCHEMA`. But when we create a table with cdc enabled, additional table with `_scylla_cdc_log` suffix is created. Those tables shouldn't be recreated manually but created automatically when the base table is created. This patch hides tables with `_scylla_cdc_log` suffix in all describe statements. To preserve properties values of those tables, `ALTER TABLE` statement with all properties and their current values for log cdc table is added to description of the base table. Fixes #18459 Closes scylladb/scylladb#18467 * github.com:scylladb/scylladb: test/cql-pytest/test_describe: add test for hiding cdc tables cql3/statements/describe_statement: hide cdc tables schema: add a method to generate ALTER statement with all properties schema: extract schema's properties generation	2024-05-14 15:02:29 +02:00
Pavel Emelyanov	a30337e719	service_level_controller_test: Use topology::is_me() helper The on_leave_cluster() callback needs to check if the leaving node is the local one. It currently compares endpoint with the my_address() obtained via pretty long dependency chain of auth_service->query_processor->storage_proxy->database->token_metadata This patch makes the whole thing _much_ shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:47:12 +03:00
Pavel Emelyanov	634c066c43	service_level_controller: Add dependency on shared_token_metadata The controller needs to access topology, so it needs the token metadata at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:43:01 +03:00
Pavel Emelyanov	f9c34f7bd5	tracing: Get my_address() via proxy The my_address() helper method gets the address via a long qp->proxy->database->token_metadata->topology chain. That's quite an overkill, storage_proxy has public my_address() method. The latter also accesses topology, but without the help of the database. Also this change makes tracing code a bit shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:41:04 +03:00
Pavel Emelyanov	75d5eb96f2	storage_proxy: Get token metadata via local member, not database The my_address() method eventually needs to access topology and goes long way via sharded<database>. No need in that, shared token metadata is available on proxy itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:40:10 +03:00
Artsiom Mishuta	fb6b572b9e	test/topology: make ManagerClient object function scope move ManagerClient object creation/clear to functions scope instead of session scope to prevent test cases affect each other by stopping sharing connections to cluster between tests	2024-05-14 14:31:10 +02:00
Artsiom Mishuta	efb079ec15	test/pylib: Introduce Manager broken state: Waiting for all tasks does not guarantee that test will not spawn new tasks while we wait Manager broken state prevents all future put requests in case of 1) fail during task waiting 2) Test continue to create tasks in test_after stage	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	a8bab03c15	test/pylib: Wait for tasks from Tasks History: To ensure the atomicity of tests and recycle clusters without any issues, it is crucial that all active requests in ScyllaClusterManager are completed before proceeding further.	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	2ee063c90c	test/pylib: Introduce Tasks History: Topology tests might spawn asynchronous tasks in parallel in ScyllaClusterManager. Tasks history is introduced to be able log and analyze all actions against cluster in case of failures	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	38125a0049	test/pylib: Introduce Stop Event indrodce stop event that interrupt start node on state "wait for node started" if someone wants to stop it	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	4c2527efce	test/pylib: Introduce Start-Stop Lock: The methods stop, stop_gracefully, and start in ScyllaServer are not designed for parallel execution. To circumvent issues arising from concurrent calls, a start_stop_lock has been introduced. This lock ensures that these methods are executed sequentially.	2024-05-14 14:24:03 +02:00
Botond Dénes	a15a9c3e8d	Merge 'utils: chunked_vector: fill ctor: make exception safe' from Benny Halevy Currently, if the fill ctor throws an exception, the destructor won't be called, as it object is not fully constructed yet. Call the default ctor first (which doesn't throw) to make sure the destructor will be called on exception. Fixes scylladb/scylladb#18635 - [x] Although the fixes is for a rare bug, it has very low risk and so it's worth backporting to all live versions Closes scylladb/scylladb#18636 * github.com:scylladb/scylladb: chunked_vector_test: add more exception safety tests chunked_vector_test: exception_safe_class: count also moved objects utils: chunked_vector: fill ctor: make exception safe	2024-05-14 13:35:02 +03:00
Botond Dénes	78afb3644c	test/boost/mutation_fragment_test.cc: add test for validator validation levels To make sure that the validator doesn't validate what the validation level doesn't include.	2024-05-14 06:03:20 -04:00
Botond Dénes	e7b07692b6	mutation: mutation_fragment_stream_validating_filter: fix validation_level::none Despite its name, this validation level still did some validation. Fix this, by short-circuiting the catch-all operator(), preventing any validation when the user asked for none.	2024-05-14 06:02:10 -04:00
Botond Dénes	f6511ca1b0	mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter When set to false, no exceptions will be raised from the validator on validation error. Instead, it will just return false from the respective validator methods. This makes testing simpler, asserting exceptions is clunky. When true (default), the previous behaviour will remain: any validation error will invoke on_internal_error(), resulting in either std::abort() or an exception.	2024-05-14 05:59:40 -04:00
Piotr Dulikowski	448f651049	Merge 'hinted handoff: Prevent segmentation fault when initializing endpoint managers ' from Dawid Mędrek We don't attempt to create an endpoint manager for a hint directory if there is no mapping host ID–IP corresponding to the directory's name, an IP address. That prevents a segmentation fault. Fixes scylladb/scylladb#18649 Closes scylladb/scylladb#18650 * github.com:scylladb/scylladb: db/hints: Remove an unused header db/hints: Remove migrating flag before initializing endpoint managers db/hints: Prevent segmentation fault when initializing endpoint managers	2024-05-14 07:34:16 +02:00
Amnon Heiman	0c84692c97	replica/table.cc: Add metrics per-table-per-node This patch adds metrics that will be reported per-table per-node. The added metrics (that are part of the per-table per-shard metrics) are: scylla_column_family_cache_hit_rate scylla_column_family_read_latency scylla_column_family_write_latency scylla_column_family_live_disk_space Fixes #18642 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#18645	2024-05-14 07:54:34 +03:00
Raphael S. Carvalho	0b2ec3063c	sstables: Fix incremental_reader_selector (for range reads) with tablets incremental_reader_selector is the mechanism for incremental comsumption of disjoint sstables on range reads. tablet_sstable_set was implemented, such that selector is efficient with tablets. The problem is selector is vnode addicted and will only consider a given set exhausted when maximum token is reached. With tablets, that means a range read on first tablet of a given shard will also consume other tablets living in the same shard. That results in combined reader having to work with empty sstable readers of tablets that don't intersect with the range of the read. It won't cause extra I/O because the underlying sstables don't intersect with the range of the read. It's only unnecessary CPU work, as it involves creating readers (= allocation), feeding them into combined reader, which will in turn invoke the sstable readers only to realize they don't have any data for that range. With 100k tablets (ranges), and 100 tablets per shard, and ~5 sstables per tablet, there will be this amount of readers (empty or not): (100k * ((100^2 + 100) / 2) * avg_sstable_per_tablet=5) = ~2.5 billions. ~5000 times more readers, it can be quite significant additional cpu work, even though I/O dominates the most in scans. It's an inefficiency that we rather get rid of. The behavior can be observed from logs (there's 1 sstable for each of 4 tablets, but note how readers are created for every single one of them when reading only 1 tablet range): ``` table - make_reader_v2 - range=(-inf, {-4611686018427387905, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {minimum token, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._34qn42... that has range [{-9151620220812943033, start},{-4813568684827439727, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {-4611686018427387904, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._368nk2... that has range [{-4599560452460784857, start},{-78043747517466964, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {0, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._38lj42... that has range [{851021166589397842, start},{3516631334339266977, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {4611686018427387904, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._3dba82... that has range [{5065088566032249228, start},{9215673076482556375, end}] ``` Fix is about making sure the tablet set won't select past the supplied range of the read. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18556	2024-05-14 07:43:22 +03:00
Wojciech Mitros	485eb7a64c	test: add test for bad_allocs during large mv queries This patch adds a test for reproducing issue #12379, which is being fixed in #16819. The test case works by creating a table with a materialized view, and then performing a partition delete query on it. At the same time, it uses injections to limit the memory to a level lower than usual, in order to increase the consistency of the test, and to limit its runtime. Before #16819, the test would exceed the limit and fail, and now the next allocation is throttled using a sleep.	2024-05-13 18:16:39 +02:00
Jan Ciolek	e0442d7bfa	mv: throttle view update generation for large queries For every mutation applied to the base table we have to generate the corresponding materialized view table updates. In case of simple requests, like INSERT or UPDATE, the number of view updates generated per base table mutation is limited to at most a few view table updates per base table update. The situation is different for DELETE queries, which can delete the whole partitions or clustering ranges. Range deletions are fast on the base table, but for the view table the situation is different. Deleting a single partition in the base table will generate as many singular view updates as there are rows in the deleted partition, which could potentially be in the millions. To prevent OOM view updates are generated in batches of at most 100 rows. There is a loop which generates the next batch of updates, spawns tasks to send them to remote nodes, generates another batch and so on. The problem is that there is no concurrency control - each batch is scheduled to be sent in the background, but the following batch is generated without waiting for the previously generated updates to be sent. This can lead to unbounded concurrency and OOM. To protect against this view update generation should be limited somehow. There is an existing mechanism for limiting view updates - throttling. We keep track of how many pending view updates there are, in the view backlog, and delay responses to the client based on this backlog's fullness. For a well behaved client with limited concurrency this will slow down the amount of incoming requests until it reaches an optimal point. This works for simple queries (INSERT, UPDATE, ...), but it doesn't do anything for range DELETEs. A DELETE is a single request that generates millions of view updates, delaying client response doesn't help. The throttling mechanism could be extend to cover this case - we could treat the DELETE request like any other client and force it to wait before sending more updates. This commit implements this approach - before sending the next batch of updates the generator is forced to sleep for a bit of time, calculated using the exisiting throttling equation. The more full the backlog gets the more the generator will have to sleep for, and hopefully this will prevent overloading the system with view updates. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-13 18:16:23 +02:00
Jan Ciolek	cd62697605	exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception The `request_timeout_exception` is thrown when a client request can't be completed in time. Previously this class included some fields specific to read/write timeouts: ``` db::consistency_level consistency; int32_t received; int32_t block_for; ``` The problem is that a request can timeout for reasons other than read/write timeout, for example the request might timeout due to materialized view update generation taking too long. In such cases of non read/write timeouts we would like to be able use request_timeout_exception, but it contains fields that aren't releveant in these cases. To deal with this let's create read_write_timeout_exception, which inherits from request_timeout_exception. read_write_timout_exception will contain all of these fields that are specific to read/write timeouts. request_timeout_exception will become the base class that doesn't have any fields, the other case-specific exceptions will derive from it and add the desired fields. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-13 18:16:09 +02:00
Jan Ciolek	ae28b8bdb7	db/view: extract view throttling delay calculation to a global function In order to prevent overload caused by too many view updates, their number is limited by delaying client responses. The amount of time to delay for is calculated based on the fullness of the view update backlog. Currently this is done in the function calculate_delay, used by abstract_write_response_handler. In the following commits I will introduce another throttling mechanism that uses the same equation to calculate wait time, so it would be good to reuse the exsiting function. Let's make the function globally accessible. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-13 18:14:56 +02:00
Pavel Emelyanov	bb1696910c	Merge 'scylla-nodetool: make documentation links product and version dependant' from Botond Dénes Currently, all documentation links that feature anywhere in the help output of scylla-nodetool, are hard-coded to point to the documentation of the latest stable release. As our documentation is version and product (open-source or enterprise) specific, this is not correct. This PR addresses this, by generating documentation links such that they point to the documentation appropriate for the product and version of the scylladb release. Fixes: https://github.com/scylladb/scylladb/issues/18276 - [x] the native nodetool is a new feature, no backport needed Closes scylladb/scylladb#18476 * github.com:scylladb/scylladb: tools/scylla-nodetool: make doc link version-specific release: introduce doc_link() build: pass scylla product to release.cc	2024-05-13 18:03:45 +03:00
Botond Dénes	d82a31f15f	service/storage_proxy: add useful version of base write throttle metrics There are two metrics to help observe base-write throttling: * current_throttled_base_writes * last_mv_flow_control_delay Both show a snapshot of what is happening right at the time of querying these metrincs. This doesn't work well when one wants to investigate the role throttling is playing in occasional write timeouts.s Prometheus scrapes metrics in multi-second intervals, and the probability of that instant catching the throttling at play is very small (almost zero). Add two new metrics: * throttled_base_writes_total * mv_flow_control_delay_total These accumulate all values, allowing graphana to derive the values and extract information about throttle events that happened in the past (but not necessarily at the instant of the scrape). Note that dividing the two values, will yield the average delay for a throttle, which is also useful. Closes scylladb/scylladb#18435	2024-05-13 18:02:06 +03:00
Dawid Medrek	ef8f14d44b	db/hints: Remove an unused header	2024-05-13 16:40:47 +02:00
Dawid Medrek	c9bbb92b1a	db/hints: Remove migrating flag before initializing endpoint managers Before these changes, if initializing endpoint managers after the migration of hinted handoff to host ID is done throws an exception, we don't remove the flag indicating the migration is still in progress. However, the migration has, in practice, finished -- all of the hint directories have been mapped to host IDs and all of the nodes in the cluster are host-ID-based. Because of that, it makes sense to remove the flag early on.	2024-05-13 16:40:47 +02:00
Dawid Medrek	bdcde0c210	db/hints: Prevent segmentation fault when initializing endpoint managers If hinted handoff is still IP-based and there is a hint directory representing an IP without a corresponding mapping to a host ID in `locator::token_metadata`, an attemp to initialize its endpoint manager will result in a segmentation fault. This commit prevents that.	2024-05-13 16:40:47 +02:00
Benny Halevy	4bbb66f805	chunked_vector_test: add more exception safety tests For insertion, with and without reservation, and for fill and copy constructors. Reproduces https://github.com/scylladb/scylladb/issues/18635 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-13 17:18:38 +03:00
Benny Halevy	88b3173d03	chunked_vector_test: exception_safe_class: count also moved objects We have to account for moved objects as well as copied objects so they will be balanced with the respective `del_live_object` calls called by the destructor. However, since chunked_vector requires the value_type to be nothrow_move_constructible, just count the additional live object, but do not modify _countdown or, respectively, throw an exception, as this should be considered only for the default and copy constructors. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-13 17:18:38 +03:00
Benny Halevy	64c51cf32c	utils: chunked_vector: fill ctor: make exception safe Currently, if the fill ctor throws an exception, the destructor won't be called, as it object is not fully constructed yet. Call the default ctor first (which doesn't throw) to make sure the destructor will be called on exception. Fixes scylladb/scylladb#18635 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-13 17:18:38 +03:00
Michał Jadwiszczak	3e5c34831c	test/cql-pytest/test_describe: add test for hiding cdc tables	2024-05-13 16:14:11 +02:00
Michał Jadwiszczak	f12edbdd95	cql3/statements/describe_statement: hide cdc tables Tables with `_scylla_cdc_log` suffix are internal tables used by cdc. We want to hide those tables in all describe statements, as they shouldn't be created by user but created by Scylla when user creates a table with cdc enabled. Instead, we include `ALTER TABLE <cdc log table> WITH <all table properties>` to the description of cdc base table, so all changes to cdc log table's properties are preserved in backup.	2024-05-13 16:11:13 +02:00
Michał Jadwiszczak	05a51c9286	schema: add a method to generate ALTER statement with all properties In the describe statement, we need to generate `ALTER TABLE` statement with all schema's properties for some tables (cdc log tables). The method prints valid CQL statement with current values of the properties.	2024-05-13 16:11:06 +02:00
Michał Jadwiszczak	b62f7a1dd3	schema: extract schema's properties generation In a later commit, we want to add a method to create `ALTER TABLE ... WITH` statement including all schema's properties with current values.	2024-05-13 14:52:32 +02:00
Asias He	952dfc6157	repair: Introduce repair_partition_count_estimation_ratio config option In commit `642f9a1966` (repair: Improve estimated_partitions to reduce memory usage), a 10% hard coded estimation ratio is used. This patch introduces a new config option to specify the estimation ratio of partitions written by repair out of the total partitions. It is set to 0.1 by default. Fixes #18615 Closes scylladb/scylladb#18634	2024-05-13 15:16:55 +03:00
Botond Dénes	afa870a387	Merge 'Some sstable set related improvements' from Raphael "Raph" Carvalho Closes scylladb/scylladb#18616 * github.com:scylladb/scylladb: replica: Make it explicit table's sstable set is immutable replica: avoid reallocations in tablet_sstable_set replica: Avoid compound set if only one sstable set is filled	2024-05-13 14:17:24 +03:00
Botond Dénes	a77796f484	test/nodetool: make test pass with cassandra nodetool After the recent fixes 4 tests started failing with the java nodetool implementation. We are about to ditch the java implementation, but until we actually do, it is valuable to keep the tests passing with both the native and java implementation. So in this patch, these tests are fixed to pass with the java implementation too. There is one test, test_help.py, which fails only if run together with all the tests. I couldn't confirm this 100%, but it seems like this is due to JMX sending a rouge request on some timer, which happens to hit this test. I don't think this is worth trying to fix.	2024-05-13 07:09:20 -04:00
Botond Dénes	bec4c17db4	tools/scylla-nodetool: status: fix token count for tablets Currently, the token count column is always based on the vnodes, which makes no sense for tablet keyspaces. If a tablet keyspace is provided as the keyspace argument, don't print the vnode token count. If the user provided a table argument as well, print the tablet count, otherwise print "?".	2024-05-13 07:09:20 -04:00
Botond Dénes	e82455beab	tools/scylla-nodetool: add tablet support to ring command Add a table parameter. Pass both keyspace and table (when provided) to the /storage_service/tokens_endpoint API endpoint, so that the returned (and printed) token ring is that of the table's tablets, not the vnode ring. Also pass the table param to the ownership API, which will complain if this param is missing for a tablet keyspace.	2024-05-13 07:09:20 -04:00
Botond Dénes	fd25bb6f9f	api/storage_service: add tablet support for /storage_service/tokens_endpoint Add a keyspace and cf parameter. When specified, the endpoint will return token -> primary replica mapping for the table's tablet tokens, not the vnodes.	2024-05-13 07:09:20 -04:00
Botond Dénes	8690dbf8ad	service/storage_service: introduce get_tablet_to_endpoint_map() The tablet variant of the existing get_token_to_endpoint_map(), which returns a list of tablet tokens and the primary replica for each.	2024-05-13 06:57:13 -04:00
Pavel Emelyanov	2ce643d06b	table: Directly compare std::optional<shard_id> with shard_id There's a loop that calculates the number of shard matches over a tablet map. The check of the given shard against optional<shard> can be made shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18592	2024-05-13 13:25:05 +03:00
Andrei Chekun	76a766cab0	Migrate alternator tests to PythonTestSuite As part of the unification process, alternator tests are migrated to the PythonTestSuite instead of using the RunTestSuite. The main idea is to have one suite, so there will be easier to maintain and introduce new features. Introduce the prepare_sql option for suite.yaml to add possibility to run cql statements as precondition for the test suite. Related: https://github.com/scylladb/scylladb/issues/18188 Closes scylladb/scylladb#18442	2024-05-13 13:23:29 +03:00
Avi Kivity	51d09e6a2a	cql3: castas_fcts: do not rely on boost casting large multiprecision integers to floats behavior In [1] a bug casting large multiprecision integers to floats is documented (note that it received two fixes, the most recent and relevant is [2]). Even with the fix, boost now returns NaN instead of ±∞ as it did before [3]. Since we cannot rely on boost, detect the conditions that trigger the bug and return the expected result. The unit test is extended to cover large negative numbers. Boost version behavior: - 1.78 - returns ±∞ - 1.79 - terminates - 1.79 + fix - returns NaN Fixes https://github.com/scylladb/scylladb/issues/18508 [1] https://github.com/boostorg/multiprecision/issues/553 [2] `ea786494db` [3] https://github.com/boostorg/math/issues/1132 Closes scylladb/scylladb#18532	2024-05-13 13:18:28 +03:00
Yaniv Michael Kaul	4639ca1bf5	compaction_strategy.cc: typo -> "performanceimproves" -> "performance improves" Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#18629	2024-05-13 08:43:38 +03:00
Patryk Wrobel	ec820e214c	scylla_io_setup: ensure correct RLIMIT_NOFILE for iotune The default limit of open file descriptors per process may be too small for iotune on certain machines with large number of cores. In such case iotune reports failure due to unability to create files or to set up seastar framework. This change configures the limit of open file descriptors before running iotune to ensure that the failure does not occur. The limit is set via 'resource.setrlimit()' in the parent process. The limit is then inherited by the child process. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#18546	2024-05-13 08:35:52 +03:00
Botond Dénes	32a0867b38	locator/tablets: introduce the primary replica concept The primary replica is an arbitrary replica of the tablet's, which is considered to tbe the "main" owner of the tablet, similar to how replicas own tokens in the vnode world. To avoid aliasing the primary replicas with a certain DC or rack, primary replicas are rotated among the tablet's replicas, selecting tablet_id % replica_count as the primary replica.	2024-05-13 01:35:05 -04:00
Avi Kivity	cc8b4e0630	batchlog_manager, test: initialize delay configuration In `b4e66ddf1d` (4.0) we added a new batchlog_manager configuration named delay, but forgot to initialize it in cql_test_env. This somehow worked, but doesn't with clang 18. Fix it by initializing to 0 (there isn't a good reason to delay it). Also provide a default to make it safer. Closes scylladb/scylladb#18572	2024-05-13 07:57:35 +03:00
Israel Fruchter	a1a6bd6798	Update tools/cqlsh submodule to v6.0.18 * tools/cqlsh e5f5eafd...c8158555 (11): > cqlshlib/sslhandling: fix logic of `ssl_check_hostname` > cqlshlib/sslhandling.py: don't use empty userkey/usercert > Dockerfile: noninteractive isn't enough for answering yet on apt-get > fix cqlsh version print > cqlshlib/sslhandling: change `check_hostname` deafult to False > Introduce new ssl configuration for disableing check_hostname > set the hostname in ssl_options.server_hostname when SSL is used > issue-73 Fixed a bug where username and password from the credentials file were ignored. > issue-73 Fixed a bug where username and password from the credentials file were ignored. > issue-73 > github actions: update `cibuildwheel==v2.16.5` Fixes: scylladb/scylladb#18590 Closes scylladb/scylladb#18591	2024-05-13 07:25:10 +03:00
Yaron Kaikov	3eb81915c1	docker: drop jmx and tools-java from installation Following the work done in `dd0779675f`, removing the scylla-jmx and scylla-tools-java from our docker image Closes scylladb/scylladb#18566	2024-05-13 07:24:23 +03:00
Takuya ASADA	9538af0d95	scylla_kernel_check: fix block device size error on latest mkfs.xfs On latest mkfs.xfs, it does not allow to format a block device which is smaller than 300MB. There are options to ignore this validation but it is unsupported feature, so it is better to increase the loopback image size to "supported size" == 300MB. reference: https://lore.kernel.org/all/164738662491.3191861.15611882856331908607.stgit@magnolia/ Fixes #18568 Closes scylladb/scylladb#18620	2024-05-13 07:23:29 +03:00
Avi Kivity	c8cc47df2d	Merge 'replica: allocate storage groups dynamically' from Aleksandra Martyniuk Allocate storage groups dynamically, i.e.: - on table creation allocate only storage groups that are on this shard; - allocate a storage group for tablet that is moved to this shard; - deallocate storage group for tablet that is moved out of this shard. Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` before change: ``` random-seed=2248493992 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 64933.90 tps ( 63.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42163 insns/op, 0 errors) 65865.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42155 insns/op, 0 errors) 66649.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) 67029.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) 68361.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42166 insns/op, 0 errors) median 66649.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) median absolute deviation: 784.00 maximum: 68361.21 minimum: 64933.90 ``` Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` after change: ``` random-seed=2248493992 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 63744.12 tps ( 63.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42153 insns/op, 0 errors) 66613.16 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42153 insns/op, 0 errors) 69667.39 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42184 insns/op, 0 errors) 67824.78 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42180 insns/op, 0 errors) 67244.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42174 insns/op, 0 errors) median 67244.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42174 insns/op, 0 errors) median absolute deviation: 631.05 maximum: 69667.39 minimum: 63744.12 ``` Fixes: #16877. Closes scylladb/scylladb#17664 * github.com:scylladb/scylladb: test: add test for back and forth tablets migration replica: allocate storage groups dynamically replica: refresh snapshot in compaction_group::cleanup replica: add rwlock to storage_group_manager replica: handle reads of non-existing tablets gracefully service: move to cleanup stage if allow_write_both_read_old fails replica: replace table::as_table_state compaction: pass compaction group id to reshape_compaction_group replica: open code get_compaction_group in perform_cleanup_compaction replica: drop single_compaction_group_if_available	2024-05-12 21:22:02 +03:00
Nadav Har'El	9813ec9446	Merge 'test: perf: add end-to-end benchmark for alternator' from Marcin Maliszkiewicz The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core). Related: https://github.com/scylladb/scylladb/issues/12518 Closes scylladb/scylladb#13121 * github.com:scylladb/scylladb: test: perf: alternator: add option to skip data pre-population perf-alternator-workloads: add operations-per-shard option test: perf: add global secondary indexes write workload for alternator test: perf: add option to continue after failed request test: perf: add read modify write workload for alternator (lwt) test: perf: add scan workload for alternator test: perf: add end-to-end benchmark for alternator test: perf: extract result aggregation logic to a separate struct	2024-05-12 18:15:29 +03:00
Kefu Chai	fd14b6f26b	test/nodetool: do not accept 1 return code when passing --help to nodetool in `906700d5`, we accepted 0 as well as the return code of "nodetool <command> --help", because we needed to be prepared for the newer seastar submodule while be compatible with the older seastar versions. now that in `305f1bd3`, we bumped up the seastar module, and this commit picked up the change to return 0 when handling "--help" command line option in seastar, we are able to drop the workaround. so, in this change, we only use "0" as the expected return code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18627	2024-05-12 14:30:31 +03:00
Avi Kivity	be76527781	Merge 'build: cmake build dist-unified by default and put tarballs under per-config paths' from Kefu Chai in the same spirit of `d57a82c156`, this change adds `dist-unified` as one of the default targets. so that it is built by default. the unified package is required to when redistributing the precompiled packages -- we publish the rpm, deb and tar balls to S3. - [x] cmake related change, no need to backport Closes scylladb/scylladb#18621 * github.com:scylladb/scylladb: build: cmake: use paths to be compatible with CI build: cmake build dist-unified by default	2024-05-12 11:16:03 +03:00
Benny Halevy	796ca367d1	gossiper: rename topo_sm member to _topo_sm Follow scylla convention for class member naming. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18528	2024-05-12 11:02:35 +03:00
Avi Kivity	2ad13e5d76	auth: complete coroutinization of password_authenticator::create_default_if_missing password_authenticator::create_default_if_missing() is a confusing mix of coroutines and continuations, simplify it to a normal coroutine. Closes scylladb/scylladb#18571	2024-05-11 17:04:20 +03:00
Kefu Chai	1186ddef16	build: cmake: use paths to be compatible with CI our CI workflow for publishing the packages expects the tar balls to be located under `build/$buildMode/dist/tar`, where `$buildMode` is "release" or "debug". before this change, the CMake building system puts the tar balls under "build/dist" when the multi-config generator is used. and `configure.py` uses multi-config generator. in this change, we put the tar balls for redistribution under `build/$<CONFIG>/dist/tar`, where `$<CONFIG>` is "RelWithDebInfo" or "Debug", this works better with the CI workflow -- we just need to map "release" and "debug" to "RelWithDebInfo" and "Debug" respectively. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-11 21:56:50 +08:00
Kefu Chai	0f85255c74	build: cmake build dist-unified by default in the same spirit of `d57a82c156`, this change adds `dist-unified` as one of the default targets. so that it is built by default. the unified package is required to when redistributing the precompiled packages -- we publish the rpm, deb and tar balls to S3. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-11 18:44:11 +08:00
Raphael S. Carvalho	7faba69f28	replica: Make it explicit table's sstable set is immutable Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-10 11:58:08 -03:00
Raphael S. Carvalho	55c0272b68	replica: avoid reallocations in tablet_sstable_set reserve upfront wherever possible to avoid reallocations. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-10 10:44:39 -03:00
Raphael S. Carvalho	35a0d47408	replica: Avoid compound set if only one sstable set is filled Most of the time only main set is filled, so we can avoid one layer of indirection (= compound set) when maintenance set is empty. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-10 10:44:34 -03:00
Aleksandra Martyniuk	51fdda4199	test: add test for back and forth tablets migration	2024-05-10 15:08:56 +02:00
Aleksandra Martyniuk	b4371a0ea0	replica: allocate storage groups dynamically Currently empty storage_groups are allocated for tablets that are not on this shard. Allocate storage groups dynamically, i.e.: - on table creation allocate only storage groups that are on this shard; - allocate a storage group for tablet that is moved to this shard; - deallocate storage group for tablet that is cleaned up. Stop compaction group before it's deallocated. Add a flag to table::cleanup_tablet deciding whether to deallocate sgs and use it in commitlog tests.	2024-05-10 15:08:21 +02:00
Aleksandra Martyniuk	6e1e082e8c	replica: refresh snapshot in compaction_group::cleanup During compaction_group::cleanup sstables set is updated, but row_cache::_underlaying still keeps a shared ptr to the old set. Due to that descriptors to deleted sstables aren't closed. Refresh snapshot in order to store new sstables set in _underlying mutation source.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	c283746b32	replica: add rwlock to storage_group_manager Add rwlock which prevents storage groups from being added/deleted while some other layers itereates over them (or their compaction groups). Add methods to iterate over storage groups with the lock held.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	54fcb7be53	replica: handle reads of non-existing tablets gracefully In the following patches, storage groups (and so also sstables sets) will be allocated only for tablets that are located on this shard. Some layers may try to read non-existing sstable sets. Handle this case as if the sstables set was empty instead of calling on_internal_error.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	561fb1dd09	service: move to cleanup stage if allow_write_both_read_old fails If allow_write_both_read_old tablet transition stage fails, move to cleanup_target stage before reverting migration. It's a preparation for further patches which deallocate storage group of a tablet during cleanup.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	532653f118	replica: replace table::as_table_state Replace table::as_table_state with table::try_get_table_state_with_static_sharding which throws if a table does not use static sharding.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	cf9913b0b7	compaction: pass compaction group id to reshape_compaction_group Pass compaction group id to shard_reshaping_compaction_task_impl::reshape_compaction_group. Modify table::as_table_state to return table_state of the given compaction group.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	90d618d8c9	replica: open code get_compaction_group in perform_cleanup_compaction Open code get_compaction_group in table::perform_cleanup_compaction as its definition won't be relevant once storage groups are allocated dynamically.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	8505389963	replica: drop single_compaction_group_if_available Drop single_compaction_group_if_available as it's unused.	2024-05-10 14:56:38 +02:00
Lakshmi Narayanan Sreethar	d39adf6438	compaction: improve partition estimates for garbage collected sstables When a compaction strategy uses garbage collected sstables to track expired tombstones, do not use complete partition estimates for them, instead, use a fraction of it based on the droppable tombstone ratio estimate. Fixes #18283 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18465	2024-05-10 13:02:34 +03:00
Botond Dénes	3286a6fa14	Merge 'Reload reclaimed bloom filters when memory is available' from Lakshmi Narayanan Sreethar PR #17771 introduced a threshold for the total memory used by all bloom filters across SSTables. When the total usage surpasses the threshold, the largest bloom filter will be removed from memory, bringing the total usage back under the threshold. This PR adds support for reloading such reclaimed bloom filters back into memory when memory becomes available (i.e., within the 10% of available memory earmarked for the reclaimable components). The SSTables manager now maintains a list of all SSTables whose bloom filter was removed from memory and attempts to reload them when an SSTable, whose bloom filter is still in memory, gets deleted. The manager reloads from the smallest to the largest bloom filter to maximize the number of filters being reloaded into memory. Closes scylladb/scylladb#18186 * github.com:scylladb/scylladb: sstable_datafile_test: add testcase to test reclaim during reload sstable_datafile_test: add test to verify auto reload of reclaimed components sstables_manager: reload previously reclaimed components when memory is available sstables_manager: start a fiber to reload components sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables sstable_datafile_test: add test to verify reclaimed components reload sstables: support reloading reclaimed components sstables_manager: add new intrusive set to track the reclaimed sstables sstable: add link and comparator class to support new instrusive set sstable: renamed intrusive list link type sstable: track memory reclaimed from components per sstable sstable: rename local variable in sstable::total_reclaimable_memory_size	2024-05-10 13:01:01 +03:00
Kefu Chai	305f1bd382	Update seastar submodule * seastar b73e5e7d...42f15a5f (27): > prometheus: revert the condition for enabling aggregation > tests/unit: add a unit test for json2code > seastar-json2code: fix the path param handling > github/workflow: do not override <clang++,23,release> > github/workflow: add a github workflow for running tests > prometheus: support disabling aggregation at query time > apps/httpd: free allocated http_server_control > rpc: cast rpc::tuple to std::tuple when passing it to std::apply > stall-analyser: move `args` into main() > stall-analyser: move print_command_line_options() out of Graph > stall-analyser: pass branch_threshold via parameter > stall-analyser: move process_graph() into Graph class > scripts: addr2line: cache the results of resolve_address() > stall-analyser: document the parser of log lines > stall-analyser: move resolver into main() > stall-analyser: extract get_command_line_parser() out > stall-analyser: move graph into main() > stall-analyser: extract main() out > stall-analyser: extract print_command_line_options() out > stall-analyser: add more typing annotatins > stall-analyser: surround top-level function with two empty lines > core/app_template: return status code 0 for --help > iotune: Print file alignments too > seastar-json2code: extract Parameter class > seastar-json2code: use f-string when appropriate > seastar-json2code: use nickname in place of oper['nickname'] > seastar-json2code: use dict.get() when checking allowMultiple Closes scylladb/scylladb#18598	2024-05-10 12:50:16 +03:00
Patryk Jędrzejczak	a04ea7b997	topology_coordinator: send barrier to a decommissioning node The code in `global_token_metadata_barrier` allows drain to fail. Then, it relies on fencing. However, we don't send the barrier command to a decommissioning node, which may still receive requests. The node may accept a write with a stale topology version. It makes fencing ineffective. Fix this issue by sending the barrier command to a decommissioning node. The raft-based topology is moved out of experimental in 6.0, no need to backport the patch. Fixes scylladb/scylladb#17108 Closes scylladb/scylladb#18599	2024-05-10 10:53:16 +02:00
Botond Dénes	c35031dda5	Merge 'repair: tablet_repair: make best effort in spite of errors' from Benny Halevy Currently if any shard repair task fails, `tablet_repair_task_impl` per-shard loop breaks, since it doesn't handle the expection. Although repair does return an error, which is as expected, we change vnode-based repair to make a best effort and try to repair as much as it can, even if any of the ranges failed. This causes the `test_repair_with_down_nodes_2b` dtest to fail with tablets, as seen in, e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/52/testReport/repair_additional_test/TestRepairAdditional/FullDtest___full_split002___test_repair_with_down_nodes_2b/ ``` AssertionError: assert 1765 == 2000 ``` - [x] Backport reason (please explain below if this patch should be backported or not) Tablet repair code will be introduced in 6.0, no need to backport to earlier versions. Closes scylladb/scylladb#18518 * github.com:scylladb/scylladb: repair: tablet_repair_task_impl: modernize table lookup repair: tablet_repair: make best effort in spite of errors	2024-05-10 10:51:09 +03:00
Piotr Dulikowski	a3070089de	main: initialize scheduling group keys before service levels Due to scylladb/seastar#2231, creating a scheduling group and a scheduling group key is not safe to do in parallel. The service level code may attempt to create scheduling groups while the cql_transport::cql_sg_stats scheduling group key is being created. Until the seastar issue is fixed, move initialization of the cql sg states before service level initialization. Refs: scylladb/seastar#2231 Closes scylladb/scylladb#18581	2024-05-10 10:35:05 +03:00
Kefu Chai	28791aa2c1	build: cmake: link thrift against absl::header this change is a leftover of `0b0e661a85`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18596	2024-05-09 18:43:23 +03:00
Avi Kivity	37d32a5f8b	Merge 'Cleanup inactive reads on tablet migration' from Botond Dénes When a tablet is migrated away, any inactive read which might be reading from said tablet, has to be dropped. Otherwise these inactive reads can prevent sstables from being removed and these sstables can potentially survive until the tablet is migrated back and resurrect data. This series introduces the fix as well as a reproducer test. Fixes: https://github.com/scylladb/scylladb/issues/18110 Closes scylladb/scylladb#18179 * github.com:scylladb/scylladb: test: add test for cleaning up cached querier on tablet migration querier: allow injecting cache entry ttl by error injector replica/table: cleanup_tablet(): clear inactive reads for the tablet replica/database: introduce clear_inactive_reads_for_tablet() replica/database: introduce foreach_reader_concurrency_semaphore reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table() reader_concurrency_semaphore: allow storing a range with the inactive reader reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon()	2024-05-09 17:34:49 +03:00
Lakshmi Narayanan Sreethar	4d22c4b68b	sstable_datafile_test: add testcase to test reclaim during reload Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 19:57:40 +05:30
Pavel Emelyanov	5497bb5a3d	loading_shared_values: Replace static-assert with concept The templatized get_or_load() accepts Loader template parameter and static-asserts on its signature. Concept is more suitable here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18582	2024-05-09 16:29:49 +03:00
Patryk Jędrzejczak	332bd8ea98	raft: raft_group_registry: start_server_for_group: catch and rethrow abort_requested_exception If we initiate the shutdown while starting the group 0 server, we could catch `abort_requested_exception` in `start_server_for_group` and call `on_internal_error`. Then, Scylla aborts with a coredump. It causes problems in tests that shut down bootstrapping nodes. The `abort_requested_exception` can be thrown from `gossiper::lock_endpoint` called in `storage_service::topology_state_load`. So, the issue is new and applies only to the raft-based topology. Hence, there is no need to backport the patch. Fixes scylladb/scylladb#17794 Fixes scylladb/scylladb#18197 Closes scylladb/scylladb#18569	2024-05-09 14:55:11 +02:00
Benny Halevy	073680768f	repair: tablet_repair_task_impl: modernize table lookup Currently, the loop that goes over all repair metas checks for the table's existance using `find_column_family()`. Although this is correct, it might cause an exception storm if a table o keyspace are dropped during repair. This can be avoided by using the more modern interface, `get_table_if_exists` in the database `tables_metadata` that returns a `lw_shared_ptr<replica::table>`, exactly as we need, that has value iff the table still exists without throwing any exception. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-09 15:43:00 +03:00
Benny Halevy	c55aa4b121	repair: tablet_repair: make best effort in spite of errors Currently if any shard repair task fails, `tablet_repair_task_impl` per-shard loop breaks, since it doesn't handle the expection. Although repair does return an error, which is as expected, we change vnode-based repair to make a best effort and try to repair as much as it can, even if any of the ranges failed. This causes the `test_repair_with_down_nodes_2b` dtest to fail with tablets, as seen in, e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/52/testReport/repair_additional_test/TestRepairAdditional/FullDtest___full_split002___test_repair_with_down_nodes_2b/ ``` AssertionError: assert 1765 == 2000 ``` This change adds a check for the keyspace and table presence whenever an individual repair task fails, instead of the global check at the end, so that failures due to dropping of the keyspace or the table are logged as warnings, but ignored for the purpose of failing the overall repair status. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-09 15:42:59 +03:00
Lakshmi Narayanan Sreethar	a080daaa94	sstable_datafile_test: add test to verify auto reload of reclaimed components Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar	0b061194a7	sstables_manager: reload previously reclaimed components when memory is available When an SSTable is dropped, the associated bloom filter gets discarded from memory, bringing down the total memory consumption of bloom filters. Any bloom filter that was previously reclaimed from memory due to the total usage crossing the threshold, can now be reloaded back into memory if the total usage can still stay below the threshold. Added support to reload such reclaimed filters back into memory when memory becomes available. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar	f758d7b114	sstables_manager: start a fiber to reload components Start a fiber that gets notified whenever an sstable gets deleted. The fiber doesn't do anything yet but the following patch will add support to reload reclaimed components if there is sufficient memory. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar	24064064e9	sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables The testcase uses an sstable whose mutation key and the generation are owned by different shards. Due to this, when process_sstable_dir is called, the sstable gets loaded into a different shard than the one that was intended. This also means that the sstable and the sstable manager end up in different shards. The following patch will introduce a condition variable in sstables manager which will be signalled from the sstables. If the sstable and the sstable manager are in different shards, the signalling will cause the testcase to fail in debug mode with this error : "Promise task was set on shard x but made ready on shard y". So, fix it by supplying appropriate generation number owned by the same shard which owns the mutation key as well. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	69b2a127b0	sstable_datafile_test: add test to verify reclaimed components reload Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	54bb03cff8	sstables: support reloading reclaimed components Added support to reload components from which memory was previously reclaimed as the total memory of reclaimable components crossed a threshold. The implementation is kept simple as only the bloom filters are considered reclaimable for now. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	2340ab63c6	sstables_manager: add new intrusive set to track the reclaimed sstables The new set holds the sstables from where the memory has been reclaimed and is sorted in ascending order of the total memory reclaimed. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	140d8871e1	sstable: add link and comparator class to support new instrusive set Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	3ef2f79d14	sstable: renamed intrusive list link type Renamed the intrusive list link type to differentiate it from the set link type that will be added in an upcoming patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	02d272fdb3	sstable: track memory reclaimed from components per sstable Added a member variable _total_memory_reclaimed to the sstable class that tracks the total memory reclaimed from a sstable. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	a53af1f878	sstable: rename local variable in sstable::total_reclaimable_memory_size Renamed local variable in sstable::total_reclaimable_memory_size in preparation for the next patch which adds a new member variable _total_memory_reclaimed to the sstable class. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Marcin Maliszkiewicz	a1099791c4	test: perf: alternator: add option to skip data pre-population	2024-05-09 13:59:17 +02:00
Marcin Maliszkiewicz	fd416fac3b	perf-alternator-workloads: add operations-per-shard option	2024-05-09 13:59:13 +02:00
Marcin Maliszkiewicz	5b8acf182a	test: perf: add global secondary indexes write workload for alternator	2024-05-09 13:59:08 +02:00
Marcin Maliszkiewicz	43a64ac558	test: perf: add option to continue after failed request	2024-05-09 13:59:03 +02:00
Marcin Maliszkiewicz	70b5b5024b	test: perf: add read modify write workload for alternator (lwt)	2024-05-09 13:58:58 +02:00
Marcin Maliszkiewicz	5b8e554431	test: perf: add scan workload for alternator	2024-05-09 13:58:54 +02:00
Marcin Maliszkiewicz	55030b1550	test: perf: add end-to-end benchmark for alternator The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).	2024-05-09 13:58:40 +02:00
Marcin Maliszkiewicz	6152223890	test: perf: extract result aggregation logic to a separate struct It will be reused later by a new tool.	2024-05-09 13:58:29 +02:00
Gleb Natapov	3b40d450e5	gossiper: try to locate an endpoint by the host id when applying state if search by IP fails Even if there is no endpoint for the given IP the state can still belong to existing endpoint that was restarted with different IP, so lets try to locate the endpoint by host id as well. Do it in raft topology mode only to not have impact on gossiper mode. Also make the test more robust in detecting wrong amount of entries in the peers table. Today it may miss that there is a wrong entry there because the map will squash two entries for the same host id into one. Fixes: scylladb/scylladb#18419 Fixes: scylladb/scylladb#18457	2024-05-09 13:14:54 +02:00
Patrik	b0fbe71eaf	Update launch-on-gcp.rst Closes scylladb/scylladb#18512	2024-05-09 10:12:31 +03:00
Avi Kivity	b7055b5f2f	storage_service: don't rely on optional<> formatting for removed node error std::optional formatting changed while moving from the home-grown formatter to the fmt provided formatter; don't rely on it for user visible messages. Here, the optional formatted is known to be engaged, so just print it. Closes scylladb/scylladb#18534	2024-05-09 10:03:23 +03:00
Kefu Chai	906700d523	test/nodetool: accept -1 returncode also when --help is invoked in newer seastar, 0 is returned as the returncode of the application when handling `--help`. to prepare for this behavior, let's accept it before updating the seastar submodule. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18574	2024-05-09 08:26:44 +03:00
Kefu Chai	6047b3b6aa	build: cmake: build async_utils.cc async_utils.cc was introduced in `e1411f39`, so let's update the cmake building system to build it. without which, we'd run into link failure like: ``` ld.lld: error: undefined symbol: to_mutation_gently(canonical_mutation const&, seastar::lw_shared_ptr<schema const>) >>> referenced by storage_service.cc >>> storage_service.cc.o:(service::storage_service::merge_topology_snapshot(service::raft_snapshot)) in archive service/Dev/libservice.a >>> referenced by group0_state_machine.cc >>> group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>)) inarchive service/Dev/libservice.a >>> referenced by group0_state_machine.cc >>> group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>) (.resume)) in archive service/Dev/libservice.a >>> referenced 1 more times ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18524	2024-05-09 08:26:44 +03:00
Kefu Chai	c336904722	build: cmake: mark abseil include SYSTEM this change is a followup of `0b0e661a`. it helps to ensure that the header files in abseil submodule have higher priority when the compiler includes abseil headers when building with CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18523	2024-05-09 08:26:44 +03:00
Kefu Chai	2a9a874e19	db,service: fix typos in comments Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18567	2024-05-09 08:26:44 +03:00
Anna Stuchlik	65c8b81051	doc: add OS support in version 6.0 This commit adds OS support in version 6.0. In addition, it removes the information about version 5.2, as this version is no longer supported, according to our policy. Closes scylladb/scylladb#18562	2024-05-09 08:26:44 +03:00
Anna Stuchlik	74fb9808ed	doc: update Consistent Topology with Raft This PR: - Removes the `.. only:: opensource` directive from Consistent Topology with Raft. This feature is no longer an Open Source-only experimental feature. - Removes redundant version-specific information. - Moves the necessary version-specific information to a separate file. This is a follow-up to `55b011902e`. Refs https://github.com/scylladb/scylladb/pull/18285/ Closes scylladb/scylladb#18553	2024-05-09 08:26:44 +03:00
Calle Wilund	79d56ccaad	commitlog: Fix request_controller semaphore accounting. Fixes #18488 Due to the discrepancy between bytes added to CL and bytes written to disk (due to CRC sector overhead), we fail to account for the proper byte count when issuing account_memory_usage in allocate (using bytes added) and in cycle:s notify_memory_written (disk bytes written). This leads us to slowly, but surely, add to the semaphore all the time. Eventually rendering it useless. Also, terminate call would _not_ take any of this into account, and the chunk overhead there would cause a (smaller) discrepancy as well. Fix by simply ensuring that buffer alloc handles its byte usage, then accounting based on buffer position, not input byte size. Closes scylladb/scylladb#18489	2024-05-09 08:26:44 +03:00
Botond Dénes	155332ebf8	Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail. This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop. Closes scylladb/scylladb#18408 * github.com:scylladb/scylladb: Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" view: Abort pending view updates when draining	2024-05-09 08:26:44 +03:00
Aleksandra Martyniuk	67bbaad62e	tasks: use default task_ttl in scylla.yaml Currently default task_ttl_in_seconds is 0, but scylla.yaml changes the value to 10. Change task_ttl_in_seconds in scylla.yaml to 0, so that there are consistent defaults. Comment it out. Fixes: #16714. Closes scylladb/scylladb#18495	2024-05-09 08:26:44 +03:00
Botond Dénes	0438febdc9	Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table. The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards. Fixes #5883. Closes scylladb/scylladb#18286 * github.com:scylladb/scylladb: test/alternator: test addressing LSI using REST API REST API: stop using deprecated, buggy, path parameter	2024-05-09 08:26:43 +03:00
Yaniv Michael Kaul	124064844f	docs/dev/object_stroage.md: convert example AWS keys to be more innocent Someone thought that they actually represent real keys (the 'EXAMPLE' in their name was not enough). Converted them to be as clear as can be, example data. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#18565	2024-05-09 08:26:43 +03:00
Asias He	46269a99d8	repair: Add ranges_parallelism option support for tablet The ranges_parallelism option is introduced in commit `9b3fd9407b`. Currently, this option works for vnode table repair only. This patch enables it for tablet repair, since it is useful for tablet repair too. Fixes #18383 Closes scylladb/scylladb#18385	2024-05-09 08:26:43 +03:00
Benny Halevy	0156e97560	storage_proxy: cas: reject for tablets-enabled tables Currently, LWT is not supported with tablets. In particular the interaction between paxos and tablet migration is not handled yet. Therefore, it is better to outright reject LWT queries for tablets-enabled tables rather than support them in a flaky way. This commit also marks tests that depend on LWT as expeced to fail. Fixes scylladb/scylladb#18066 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18103	2024-05-09 08:26:43 +03:00
Patryk Jędrzejczak	053a2893cf	raft topology: join_token_ring: prevent shutdown hangs Shutdown of a bootstrapping node could hang on `_topology_state_machine.event.when()` in `wait_for_topology_request_completion`. It caused scylladb/scylladb#17246 and scylladb/scylladb#17608. On a normal node, `wait_for_group0_stop` would prevent it, but this function won't be called before we join group 0. Solve it by adding a new subscriber to `_abort_source`. Additionally, trigger `_group0_as` to prevent other hang scenarios. Note that if both the new subscriber and `wait_for_group0_stop` are called, nothing will break. `abort_source::request_abort` and `conditional_variable::broken` can be called multiple times. The raft-based topology is moved out of experimental in 6.0, no need to backport the patch. Fixes scylladb/scylladb#17246 Fixes scylladb/scylladb#17608 Closes scylladb/scylladb#18549	2024-05-09 08:26:43 +03:00
Botond Dénes	96a7ed7efb	Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili This is the second half of the fix for issue #13968. The first half is already merged with PR #18346 Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row. This change adds a new counter for dead rows to the large_partitions table. This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells. This is an enhancement, and backporting is not needed. Fixes #13968 Closes scylladb/scylladb#18458 * github.com:scylladb/scylladb: sstable: added test for counting dead rows sstable: added docs for system.large_partitions.dead_rows sstable: added cluster feature for dead rows and range tombstones sstable: write dead_rows count to system.large_partitions sstable: added counter for dead rows	2024-05-09 08:26:43 +03:00
David Garcia	d63d418ae3	docs: change "create an issue" github label to "type/documentation" Closes scylladb/scylladb#18550	2024-05-09 08:26:43 +03:00
Kefu Chai	02be1e9309	.github: add clang-tidy workflow clang-tidy is a tool provided by Clang to perform static analysis on C++ source files. here, we are mostly intersted in using its https://clang.llvm.org/extra/clang-tidy/checks/bugprone/use-after-move.html check to reveal the potential issues. this workflow is added to run clang-tidy when building the tree, so that the warnings from clang-tidy can be noticed by developers. a dedicated action is added so other github workflow can reuse it to setup the building environment in an ubuntu:jammy runner. clang-tidy-matcher.json is added to annotate the change, so that the warnings are more visible with github webpage. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18342	2024-05-09 08:26:43 +03:00
David Garcia	4a1b109641	docs: add swagger ui extension Renders the API Reference from api/api-doc using Swagger UI 2.2.10. address comments Closes scylladb/scylladb#18253	2024-05-09 08:26:43 +03:00
Botond Dénes	c7c4964b1c	tools/scylla-nodetool: make doc link version-specific Generate documentation link, such that they point to the documentation page, which is appropriate to the current product (open-source or enterprise) and version. The documentation links are generated by a new function and the documentation links are injected into the description of nodetool command via fmt::format().	2024-05-08 09:41:18 -04:00
Botond Dénes	2d1e938849	release: introduce doc_link() Allows generating documentation links that are appropriate for the current product (open-source or enterprise) and version. To be used in the next patch to make scylla-nodetool's documentation links product and version appropriate.	2024-05-08 09:41:17 -04:00
Botond Dénes	9d2156bd8a	build: pass scylla product to release.cc In the form of -DSCYLLA_PRODUCT. To be used in the next patch.	2024-05-08 09:40:24 -04:00
Kamil Braun	4dcae66380	Merge 'test: {auth,topology}: use manager.rolling_restart' from Piotr Dulikowski Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369 Closes scylladb/scylladb#18379 * github.com:scylladb/scylladb: test: get rid of server-side server_restart test: util: get rid of the `restart` helper test: {auth,topology}: use manager.rolling_restart	2024-05-08 09:45:08 +02:00
Piotr Dulikowski	180cb7a2b9	storage_service: notify lifecycle subs only after token metadata update Currently, in raft mode, when raft topology is reloaded from disk or a notification is received from gossip about an endpoint change, token metadata is updated accordingly. While updating token metadata we detect whether some nodes are joining or are leaving and we notify endpoint lifecycle subscribers if such an event occurs. These notifications are fired _before_ we finish updating token metadata and before the updated version is globally available. This behavior, for "node leaving" notifications specifically, was not present in legacy topology mode. Hinted handoff depends on token metadata being updated before it is notified about a leaving node (we had a similar issue before: scylladb/scylladb#5087, and we fixed it by enforcing this property). Because this is not true right now for raft mode, this causes the hint draining logic not to work properly - when a node leaves the cluster, there should be an attempt to send out hints for that node, but instead hints are not sent out and are kept on disk. In order to fix the issue with hints, postpone notifying endpoint lifecycle subscribers about joined and left nodes only after the final token metadata is computed and replicated to all shards. Fixes: scylladb/scylladb#17023 Closes scylladb/scylladb#18377	2024-05-08 09:40:44 +02:00
Kamil Braun	03818c4aa9	direct_failure_detector: increase ping timeout and make it tunable The direct failure detector design is simplistic. It sends pings sequentially and times out listeners that reached the threshold (i.e. didn't hear from a given endpoint for too long) in-between pings. Given the sequential nature, the previous ping must finish so the next ping can start. We timeout pings that take too long. The timeout was hardcoded and set to 300ms. This is too low for wide-area setups -- latencies across the Earth can indeed go up to 300ms. 3 subsequent timed out pings to a given node were sufficient for the Raft listener to "mark server as down" (the listener used a threshold of 1s). Increase the ping timeout to 600ms which should be enough even for pinging the opposite side of Earth, and make it tunable. Increase the Raft listener threshold from 1s to 2s. Without the increased threshold, one timed out ping would be enough to mark the server as down. Increasing it to 2s requires 3 timed out pings which makes it more robust in presence of transient network hiccups. In the future we'll most likely want to decrease the Raft listener threshold again, if we use Raft for data path -- so leader elections start quickly after leader failures. (Faster than 2s). To do that we'll have to improve the design of the direct failure detector. Ref: scylladb/scylladb#16410 Fixes: scylladb/scylladb#16607 --- I tested the change manually using `tc qdisc ... netem delay`, setting network delay on local setup to ~300ms with jitter. Without the change, the result is as observed in scylladb/scylladb#16410: interleaving ``` raft_group_registry - marking Raft server ... as dead for Raft groups raft_group_registry - marking Raft server ... as alive for Raft groups ``` happening once every few seconds. The "marking as dead" happens whenever we get 3 subsequent failed pings, which is happens with certain (high) probability depending on the latency jitter. Then as soon as we get a successful ping, we mark server back as alive. With the change, the phenomenon no longer appears. Closes scylladb/scylladb#18443	2024-05-07 23:40:23 +02:00
Anna Stuchlik	98367cb6a1	doc: Snitch switch is not supported with tablets This commit adds the tablets-related limitation: if you use tablets, then changing snitch is not supported Refs:https://github.com/scylladb/scylladb/issues/17513 See: https://github.com/scylladb/scylladb/issues/17513#issuecomment-2022552677 Closes scylladb/scylladb#18548	2024-05-07 17:26:05 +02:00
Pavel Emelyanov	677e80a4d5	table: Coroutinize table::delete_sstables_atomically() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18499	2024-05-07 17:10:28 +02:00
Kamil Braun	53443f566a	Merge 'Coroutinize generic_server's listen() method' from Pavel Emelyanov It needs some local naming cleanup, but otherwise it's pretty simple Closes scylladb/scylladb#18510 * github.com:scylladb/scylladb: generic_server: Fix indentation after previous patch generic_server: Coroutinize listen() method generic_server: Rename creds argument to builder	2024-05-07 17:08:59 +02:00
Ferenc Szili	60bf846f68	sstable: added test for counting dead rows	2024-05-07 15:44:33 +02:00
Ferenc Szili	8e9771d010	sstable: added docs for system.large_partitions.dead_rows	2024-05-07 15:44:33 +02:00
Avi Kivity	9b8dfb2b19	compaction: compaction_strategy validation: don't rely on optional<> formatting std::optional formatting changed while moving from the home-grown formatter to the fmt provided formatter; don't rely on it for user visible messages. Here, the optional formatted is known to be engaged, so just print it. Closes scylladb/scylladb#18533	2024-05-07 12:02:33 +03:00
Kefu Chai	7e578ae964	message: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18527	2024-05-07 11:59:36 +03:00
Raphael S. Carvalho	570e3f8df0	compaction: exclude expired sstables from calculation of base timestamps base timestamps are feeded into the sstable writer for calculating delta, used by varints. given that expired ssts are bypassed, we don't have to account them. so if we compacting fully expired and new sstable together, we can save a bit by having a base ts closer to the data actually written into output. also I wanted to move the calculation into the loop in setup(), to avoid two iterations over input set that can have even more than 1k elements. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18504	2024-05-07 08:43:50 +03:00
Raphael S. Carvalho	2d9142250e	Fix flakiness in test_tablet_load_and_stream due to premature gossiper abort on shutdown Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this will be handled by explicitly closing the connection, so if scylla fails to update gossiper state due to premature abort on shutdown, then we won't be stuck in an endless reconnection attempt (later through heartbeats (30s interval)), causing the test to timeout. Manifests in scylla logs like this: gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted) gossip - failure_detector_loop: Finished main loop migration_manager - stopping migration service storage_service - Shutting down native transport server gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) cql_server_controller - CQL server stopped ... gossip - My status = NORMAL gossip - Announcing shutdown gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924 gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924 gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested) Refs #14746. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18484	2024-05-07 02:31:02 +02:00
Piotr Dulikowski	5459cfed6a	Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz We won't run: - old pre auth-v1 migration code - code creating auth-v1 tables We will keep running: - code creating default rows - code creating auth-v1 keyspace (needed due to cqlsh legacy hack, it errors when executing `list roles` or `list users` if there is no system_auth keyspace, it does support case when there is no expected tables) Fixes https://github.com/scylladb/scylladb/issues/17737 Closes scylladb/scylladb#17939 * github.com:scylladb/scylladb: auth: don't run legacy migrations on auth-v2 startup auth: fix indent in password_authenticator::start auth: remove unused service::has_existing_legacy_users func	2024-05-06 19:53:35 +02:00
Wojciech Mitros	8472c46c8a	service_level_controller: coroutinize notify_service_level_removed To avoid conflicts arising from the discrepancy between different versions of the repository, use coroutines instead of continuations in service_level_controller::notify_service_level_removed(). Closes scylladb/scylladb#18525	2024-05-06 14:20:49 +03:00
Piotr Dulikowski	92e5018ddb	test: get rid of server-side server_restart Restarting a node amounts to just shutting it down and then starting again. There is no good reason to have a dedicated endpoint in the ScyllaClusterManager for restarting when it can be implemented by calling two endpoints in a sequence: stop and start - it's just code duplication. Remove the server_restart endpoint in ScyllaClusterManager and reimplement it as two endpoint calls in the ManagerClient.	2024-05-06 12:54:53 +02:00
Piotr Dulikowski	8de2bda7ae	test: util: get rid of the `restart` helper We already have `ManagerClient.server_restart`, which can be used in its place.	2024-05-06 12:24:40 +02:00
Piotr Dulikowski	897e603bf0	test: {auth,topology}: use manager.rolling_restart Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369	2024-05-06 12:24:40 +02:00
Kamil Braun	ccbb9f5343	Merge 'topology_coordinator: clear obsolete generations earlier' from Patryk Jędrzejczak We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. https://github.com/scylladb/scylladb/issues/17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Additionally, improve `test_raft_snapshot_request` that would become flaky after the change so it's not sensitive to changes anymore. The raft-based topology was experimental before 6.0, no need to backport. Ref: scylladb/scylladb#17545 Closes scylladb/scylladb#18497 * github.com:scylladb/scylladb: topology_coordinator: clear obsolete generations earlier test: test_raft_snapshot_request: improve the last assertion test: test_raft_snapshot_request: find raft leader after restart test: test_raft_shanpshot_request: simplify appended_command	2024-05-06 12:03:33 +02:00
Kamil Braun	1a50a524e7	Merge 'topology_coordinator: compute cluster size correctly during upgrade' from Piotr Dulikowski During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints(). Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case). The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster. In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size. Fixes: scylladb/scylladb#18198 Closes scylladb/scylladb#18261 * github.com:scylladb/scylladb: test: topology: test that upgrade succeeds after recent removal topology_coordinator: compute cluster size correctly during upgrade	2024-05-06 11:06:09 +02:00
Piotr Dulikowski	64ba620dc2	Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased. The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost. Refs scylladb/scylladb#6403 Fixes scylladb/scylladb#12278 Closes scylladb/scylladb#15567 * github.com:scylladb/scylladb: docs: Update Hinted Handoff documentation db/hints: Add endpoint_downtime_not_bigger_than() db/hints: Migrate hinted handoff when cluster feature is enabled db/hints: Handle arbitrary directories in resource manager db/hints: Start using hint_directory_manager db/hints: Enforce providing IP in get_ep_manager() db/hints: Introduce hint_directory_manager db/hints/resource_manager: Update function description db/hints: Coroutinize space_watchdog::scan_one_ep_dir() db/hints: Expose update lock of space watchdog db/hints: Add function for migrating hint directories to host ID db/hints: Take both IP and host ID when storing hints db/hints: Prepare initializing endpoint managers for migrating from IP to host ID db/hints: Migrate to locator::host_id db/hints: Remove noexcept in do_send_one_mutation() service: Add locator::host_id to on_leave_cluster service: Fix indentation db/hints: Fix indentation	2024-05-06 09:58:18 +02:00
Patryk Jędrzejczak	628d7e709e	cdc: generation: fix retrieve_generation_data_v2 `system_keyspace::read_cdc_generation_opt` queries `system.cdc_generations_v3`, which stores ids of CDC generations as timeuuids. This function shouldn't be called with a normal uuid (used by `system.cdc_generations_v2` to store generation ids). Such a call would end with a marshaling error. Before this patch,`retrieve_generation_data_v2` could call `system_keyspace::read_cdc_generation_opt` with a normal uuid if the generation wasn't present in `system.cdc_generations_v2`. This logic caused a marshaling error while handling the `check_and_repair_cdc_streams` request in the `cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest. This patch fixes the code being added in 6.0, no need to backport it. Fixes scylladb/scylladb#18473 Closes scylladb/scylladb#18483	2024-05-06 09:12:47 +02:00
Kamil Braun	16846bf5ce	Merge 'Do not serialize removenode operation with api lock if topology over raft is enabled' from Gleb With topology over raft all operation are already serialized by the coordinator anyway, so no need to synchronize removenode using api lock. All others are still synchronized since there cannot be executed in parallel for the same node anyway. * 'gleb/17681-fix' of github.com:scylladb/scylla-dev: storage_service: do not take API lock for removenode operation if topology coordinator is enabled test: return file mark from wait_for that points after the found string	2024-05-06 09:03:03 +02:00
Benny Halevy	ebff5f5d70	everywhere: include seastar headers using angle brackets seastar is an external library therefore it should use the system-include syntax. Closes scylladb/scylladb#18513	2024-05-06 10:00:31 +03:00
Kefu Chai	5ca9a46a91	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18515	2024-05-05 23:31:48 +03:00
Kefu Chai	0b0e661a85	build: bring abseil submodule back because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689, the rebuilt abseil package provided by fedora has different settings than the ones if the tree is built with the sanitizer enabled. this inconsistency leads to a crash. to address this problem, we have to reinstate the abseil submodule, so we can built it with the same compiler options with which we build the tree. in this change * Revert "build: drop abseil submodule, replace with distribution abseil" * update CMake building system with abseil header include settings * bump up the abseil submodule to the latest LTS branch of abseil: lts_2024_01_16 * update scylla-gdb.py to adapt to the new structure of flat_hash_map This reverts commit `8635d24424`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18511	2024-05-05 23:31:09 +03:00
Kefu Chai	ea791919cf	service/storage_proxy: drop unused operator<< operator<<(ostream, paxos_response_handler) is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18520	2024-05-05 16:33:29 +03:00
Nadav Har'El	21557cfaa6	cql3: Fix invalid JSON parsing for JSON object with different key types More than three years ago, in issue #7949, we noticed that trying to set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the fromJson() function) fails - the ascii key is incorrectly parsed. We fixed that issue in commit `75109e9519` but unfortunately, did not do our due diligence: We did not write enough tests inspired by this bug, and failed to discover that actually we have the same bug for many other key types, not just for "ascii". Specifically, the following key types have exactly the same bug: * blob * date * inet * time * timestamp * timeuuid * uuid Other types, like numbers or boolean worked "by accident" - instead of parsing them as a normal string, we asked the JSON parser to parse them again after removing the quotes, and because unquoted numbers and unquoted true/false happwn to work in JSON, this didn't fail. The fix here is very simple - for all native types (i.e., not collections or tuples), the encoding of the key in JSON is simply a quoted string - and removing the quotes is all we need to do and there's no need to run the JSON parser a second time. Only for more elaborate types - collections and tuples - we need to run the JSON parser a second time on the key string to build the more elaborate object. This patch also includes tests for fromJson() reading a map with all native key types, confirming that all the aforementioned key types were broken before this patch, and all key types (including the numbers and booleans which worked even befoe this patch) work with this patch. Fixes #18477. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18482	2024-05-05 15:42:43 +03:00
Kefu Chai	f2b1c47dfc	test/boost: s/boost::range::random_shuffle/std::ranges::shuffle/ `boost::range::random_shuffle()` uses the deprecated `std::random_shuffle()` under the hood, so let's use `std::ranges::shuffle()` which is available since C++20. this change should address the warning like: ``` [312/753] CXX build/debug/test/boost/counter_test.o In file included from test/boost/counter_test.cc:17: /usr/include/boost/range/algorithm/random_shuffle.hpp:106:13: warning: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' is deprecated: use 'std::shuffle' instead [-Wdepr ecated-declarations] 106 \| detail::random_shuffle(boost::begin(rng), boost::end(rng)); \| ^ test/boost/counter_test.cc:507:27: note: in instantiation of function template specialization 'boost::range::random_shuffle<std::vector<counter_shard>>' requested here 507 \| boost::range::random_shuffle(shards); \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_algo.h:4489:5: note: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' has been explicitly marked deprecated here 4489 \| _GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle") \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1957:45: note: expanded from macro '_GLIBCXX14_DEPRECATED_SUGGEST' 1957 \| # define _GLIBCXX14_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1941:19: note: expanded from macro '_GLIBCXX_DEPRECATED_SUGGEST' 1941 \| __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) \| ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18517	2024-05-05 15:39:57 +03:00
Pavel Emelyanov	99f9807f15	sstables: Remove operator<<(std::ostream&, const deletion_time&) It's completely unused, likely in favor of recently added formatter for the type in question. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18502	2024-05-05 14:43:27 +03:00
Pavel Emelyanov	ddd2623418	generic_server: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:29:08 +03:00
Pavel Emelyanov	a1daa7093e	generic_server: Coroutinize listen() method Straightforward. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:28:42 +03:00
Pavel Emelyanov	030f1ef81c	generic_server: Rename creds argument to builder So that it doesn't clash with local creds variable that will appear in this method after its coroutinization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:27:37 +03:00
Kefu Chai	53b98a8610	test: string_format_test: disable test if {fmt} >= 10.0.0 {fmt} v10.0.0 introduces formatter for `std::optional`, so there is no need to test it. furthermore the behavior of this formatter is different from our homebrew one. so let's skip this test if {fmt} v10.0.0 or up is used. Refs #18508 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18509	2024-05-03 11:34:23 +03:00
Kefu Chai	3421e6dcc1	tools/scylla-nodetool: add formatter for char* in {fmt} version 10.0.0, it has a regression, which dropped the formatter for `char `, even it does format `const char`, as the latter is convertible to `fmt::stirng_view`. and this issue was addressed in 10.1.0 using 616a4937, which adds the formatter for `Char ` back, where `Char` is a template parameter. but we do need to print `vector<char>`, so, to address the build failure with {fmt} version 10.0.0, which is shipped along with fedora 39. let's backport this formatter. Fixes #18503 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18505	2024-05-02 23:25:24 +03:00
Avi Kivity	8de81f8f91	Merge 'Unstall merge topology snapshot' from Benny Halevy This series adds facilities to gently convert canonical mutations back to mutations and to gently make canonical mutations or freeze mutations in a seastar thread. Those are used in storage_service::merge_topology_snapshot to prevent reactor stalls due to large mutation, as seed in the test_add_many_nodes_under_load dtest. Also, migration_manager migration_request was converted to use a seastar thread to use the above facilities to prevent reactor stalls with large schema mutations, e,g, with a large number of tables, and/or when reading tablets mutations with a large number of tablets in a table. perf-simple-query --write results: Before: ``` median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) ``` After: ``` median 79716.73 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53314 insns/op, 0 errors) ``` Closes scylladb/scylladb#18290 * github.com:scylladb/scylladb: storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method raft: group0_state_machine: write_mutations_to_database: freeze mutations gently database: apply_in_memory: unfreeze_gently large mutations storage_service: get_system_mutations: make_canonical_mutation_gently tablets: read_tablet_mutations: make_canonical_mutation_gently schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference storage_service: merge_topology_snapshot: freeze_gently canonical_mutation: add make_canonical_mutation_gently frozen_mutation: move unfreeze_gently to async_utils mutation: add freeze_gently idl-compiler: generate async serialization functions for stub members raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently storage_service: merge_topology_snapshot: co_await to_mutation_gently canonical_mutation: add to_mutation_gently idl-compiler: emit include directive in generated impl header file mutation_partition: add apply_gently collection_mutation: improve collection_mutation_view formatting mutation_partition: apply_monotonically: do not support schema upgrade test/perf: report also log_allocations/op	2024-05-02 23:24:38 +03:00
Nadav Har'El	f604269f0a	cql3, secondary index: consistently choose index to use in a query When a table has secondary indexes on multiple columns, and several such columns are used for filtering in a query, Scylla chooses one of these indexes as the main driver of the query, and the second column's restriction is implemented as filtering. Before this patch, the index to use was chosen fairly randomly, based on the order of the indexes in the schema. This order may be different in different coordinators, and may even change across restarts on the same coordinators. This is not only inconsistent, it can cause outright wrong results when using paging and switching (or restarting) coordinates in the middle of a paged scan... One coordinator saves one index's key in the paging state, and then the other coordinator gets this paging state and wrongly believes it is supposed to be a key of a different index. The fix in this patch is to pick the index suitable for the first indexed column mentioned in the query. This has two benefits over the situation before the patch: 1. The decision of which index to use no longer changes between coordinators or across restarts - it just depends on the schema and the specific query. 2. Different indexes can have different "specificity" so using one or the other can change the query's performance. After this patch, the user is in control over which index is used by changing the order of terms in the query. A curious user can use tracing to check which index was used to implement a particular query. An xfailing test we had for this issue no longer fails, so the "xfail" marker is removed. Fixes #7969 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#14450	2024-05-02 19:52:42 +02:00
Benny Halevy	890b890e36	storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method Generalizing the ad-hoc implementation out of group0_state_machine.write_mutations_to_database. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:42:58 +03:00
Benny Halevy	4ae5bbb058	raft: group0_state_machine: write_mutations_to_database: freeze mutations gently write_mutations_to_database might need to handle large mutations from system tables, so to prevent reactor stalls, freeze the mutations gently and call proxy.mutate_locally in parallel on the individual frozen mutations, rather than calling the vector<mutation> based entry point that eventually freezes each mutation synchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	a9f157b648	database: apply_in_memory: unfreeze_gently large mutations Prevent stalls coming from applying large mutations in memory synchronously, like the ones seen with the test_add_many_nodes_under_load dtest: ``` \| \| \| ++[5#2/2 44%] addr=0x1498efb total=256 count=3 avg=85: \| \| \| \| replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}::operator() at ./replica/memtable.cc:804 \| \| \| \| (inlined by) logalloc::allocating_section::with_reclaiming_disabled<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&> at ././utils/logalloc.hh:500 \| \| \| \| (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}::operator() at ././utils/logalloc.hh:527 \| \| \| \| (inlined by) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}> at ././utils/logalloc.hh:471 \| \| \| \| (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}> at ././utils/logalloc.hh:526 \| \| \| \| (inlined by) replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator() at ./replica/memtable.cc:800 \| \| \| \| (inlined by) with_allocator<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0> at ././utils/allocation_strategy.hh:318 \| \| \| \| (inlined by) replica::memtable::apply at ./replica/memtable.cc:799 \| \| \| ++[6#1/1 100%] addr=0x145047b total=1731 count=21 avg=82: \| \| \| \| replica::table::do_apply<frozen_mutation const&, seastar::lw_shared_ptr<schema const>&> at ./replica/table.cc:2896 \| \| \| ++[7#1/1 100%] addr=0x13ddccb total=2852 count=32 avg=89: \| \| \| \| replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0::operator() at ./replica/table.cc:2924 \| \| \| \| (inlined by) seastar::futurize<void>::invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2032 \| \| \| \| (inlined by) seastar::futurize_invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2066 \| \| \| \| (inlined by) replica::dirty_memory_manager_logalloc::region_group::run_when_memory_available<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0> at ./replica/dirty_memory_manager.hh:572 \| \| \| \| (inlined by) replica::table::apply at ./replica/table.cc:2923 \| \| \| ++ - addr=0x1330ba1: \| \| \| \| replica::database::apply_in_memory at ./replica/database.cc:1812 \| \| \| ++ - addr=0x1360054: \| \| \| \| replica::database::do_apply at ./replica/database.cc:2032 ``` This change has virtually no effect on small mutations (up to 128KB in size). build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1 Before: median 80092.06 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53291 insns/op, 0 errors) After: median 78780.86 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53311 insns/op, 0 errors) To estimate the performance ramifications on large mutations, I measured perf-simple-query --write calling unfreeze_gently in all cases: median 77411.26 tps ( 71.3 allocs/op, 8.0 logallocs/op, 14.3 tasks/op, 53280 insns/op, 0 errors) Showing the allocations that moved out of logalloc (in memtable::apply of frozen_mutation) into seastar allocations (in unfreeze_gently) and <1% cpu overhead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	7dd6a81026	storage_service: get_system_mutations: make_canonical_mutation_gently and also unfreeze_gently the result frozen_mutation:s to prevent the following stalls that were seen with the test_add_many_nodes_under_load dtest: ``` ++[1#1/58 5%] addr=0x16330e9 total=321 count=4 avg=80: \| utils::uleb64_express_encode_impl at ././utils/vle.hh:73 \| (inlined by) utils::uleb64_express_encode<void (&)(char const, unsigned long), void (&)(char const, unsigned long)> at ././utils/vle.hh:82 \| (inlined by) logalloc::region_impl::object_descriptor::encode at ./utils/logalloc.cc:1658 \| (inlined by) logalloc::region_impl::alloc_small at ./utils/logalloc.cc:1743 ++ - addr=0x1634cff: \| logalloc::region_impl::alloc at ./utils/logalloc.cc:2104 \| ++[2#1/2 83%] addr=0x116e22c total=321 count=4 avg=80: \| \| managed_bytes::managed_bytes at ././utils/managed_bytes.hh:552 \| \| ++[3#1/3 51%] addr=0x1551288 total=198 count=3 avg=66: \| \| \| compound_wrapper<clustering_key_prefix, clustering_key_prefix_view>::compound_wrapper at ././keys.hh:149 \| \| \| (inlined by) prefix_compound_wrapper<clustering_key_prefix, clustering_key_prefix_view, clustering_key_prefix>::prefix_compound_wrapper at ././keys.hh:574 \| \| \| (inlined by) clustering_key_prefix::clustering_key_prefix at ././keys.hh:865 \| \| \| (inlined by) rows_entry::rows_entry at ./mutation/mutation_partition.hh:957 \| \| ++ - addr=0x153f09f: \| \| \| allocation_strategy::construct<rows_entry, schema const&, position_in_partition_view&, seastar::bool_class<dummy_tag>&, seastar::bool_class<continuous_tag>&> at ././utils/allocation_strategy.hh:160 \| \| ++ - addr=0x151409a: \| \| \| mutation_partition::append_clustered_row at ./mutation/mutation_partition.cc:719 \| \| ++ - addr=0x14ab38f: \| \| \| partition_builder::accept_row at ././partition_builder.hh:57 \| \| \| ++[4#1/1 100%] addr=0x1579766 total=577 count=7 avg=82: \| \| \| \| mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212 \| \| \| ++[5#1/2 56%] addr=0x14e737c total=321 count=4 avg=80: \| \| \| \| frozen_mutation::unfreeze at ./mutation/frozen_mutation.cc:116 \| \| \| \| ++[6#1/1 100%] addr=0x24fb47e total=1476 count=18 avg=82: \| \| \| \| \| service::storage_service::get_system_mutations at ./service/storage_service.cc:6401 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	3143f575e5	tablets: read_tablet_mutations: make_canonical_mutation_gently To prevent reactor stalls due to large tablets mutations (that can contain over 100,000 rows). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	7f372dd9ae	schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently To prevent stalls due to large schema mutations. While at it, reserve the result canonical_mutation vector. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	61dea98185	schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference The function upgrades the input mutation only in certain cases. Currently it accepts the input mutation by value, which may cause and extraneous copy if the caller doesn't move the mutation, as done in `adjust_schema_for_schema_features`. Getting an rvalue reference instead makes the interface clearer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	bc1985b8ce	storage_service: merge_topology_snapshot: freeze_gently Freezing large mutations synchronously may cause reactor stalls, as seen in the test_add_many_nodes_under_load dtest: ``` ++[1#1/37 5%] addr=0x15b0bf total=99 count=2 avg=50: ?? ??:0 \| ++[2#1/2 67%] addr=0x15a331f total=66 count=1 avg=66: \| \| bytes_ostream::write at ././bytes_ostream.hh:248 \| \| (inlined by) bytes_ostream::write at ././bytes_ostream.hh:263 \| \| (inlined by) ser::serialize_integral<unsigned int, bytes_ostream> at ././serializer.hh:203 \| \| (inlined by) ser::integral_serializer<unsigned int>::write<bytes_ostream> at ././serializer.hh:217 \| \| (inlined by) ser::serialize<unsigned int, bytes_ostream> at ././serializer.hh:254 \| \| (inlined by) ser::writer_of_column<bytes_ostream>::write_id at ./build/dev/gen/idl/mutation.dist.impl.hh:4680 \| \| ++[3#1/1 100%] addr=0x159df71 total=132 count=2 avg=66: \| \| \| (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}::operator() at ./mutation/mutation_partition_serializer.cc:99 \| \| \| (inlined by) row::maybe_invoke_with_hash<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1} const, cell_and_hash const> at ./mutation/mutation_partition.hh:133 \| \| \| (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}::operator() at ./mutation/mutation_partition.hh:152 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>::operator() at ././utils/compact-radix-tree.hh:1888 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit_slot<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1560 \| \| ++ - addr=0x159d84d: \| \| \| compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1364 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > at ././utils/compact-radix-tree.hh:799 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:807 \| \| ++[4#1/1 100%] addr=0x1596f4a total=329 count=5 avg=66: \| \| \| compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:473 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:1626 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walk<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}> at ././utils/compact-radix-tree.hh:1909 \| \| \| (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}> at ./mutation/mutation_partition.hh:151 \| \| \| (inlined by) (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:97 \| \| \| (inlined by) write_row<ser::writer_of_deletable_row<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:168 \| \| ++[5#1/2 80%] addr=0x15a310c total=263 count=4 avg=66: \| \| \| mutation_partition_serializer::write_serialized<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:180 \| \| \| ++[6#1/2 62%] addr=0x14eb60a total=428 count=7 avg=61: \| \| \| \| frozen_mutation::frozen_mutation(mutation const&)::$_0::operator()<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/frozen_mutation.cc:85 \| \| \| \| (inlined by) ser::after_mutation__key<bytes_ostream>::partition<frozen_mutation::frozen_mutation(mutation const&)::$_0> at ./build/dev/gen/idl/mutation.dist.impl.hh:7058 \| \| \| \| (inlined by) frozen_mutation::frozen_mutation at ./mutation/frozen_mutation.cc:84 \| \| \| \| ++[7#1/1 100%] addr=0x14ed388 total=532 count=9 avg=59: \| \| \| \| \| freeze at ./mutation/frozen_mutation.cc:143 \| \| \| \| ++[8#1/2 74%] addr=0x252cf55 total=394 count=6 avg=66: \| \| \| \| \| service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:763 ``` This change uses freeze_gently to freeze the cdc_generations_v2 mutations one at a time to prevent the stalls reported above. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	a016e1d05d	canonical_mutation: add make_canonical_mutation_gently Make a canonical mutation gently using an async serialization function. Similar to freeze_gently, yielding is considered only in-between range tombstones and rows. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:04 +03:00
Benny Halevy	a126160d7e	frozen_mutation: move unfreeze_gently to async_utils Unfreeze_gently doesn't have to be a method of frozen_mutation. It might as well be implemented as a free function reading from a frozen_mutation and preparing a mutation gently. The logic will be used in a later patch to make a canonical mutation directly from a frozen_mutation instead of unfreezing it and then converting it to a canonical_mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	aa27ef8811	mutation: add freeze_gently Allow yielding in between serializing of range tombstones and rows to prevent reactor stalls due to large mutations with many rows or range tombstones. mutations that have many cells might still stall but those are considered infrequent enough to ignore for now. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	0da2940c72	idl-compiler: generate async serialization functions for stub members To be used in a following patch for e.g. mutation::freeze_gently. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	504a9ab897	raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently Prevent stalls coming from writing large mutations like the ones seen with the test_add_many_nodes_under_load dtest: ``` ++[1#11/11 6%] addr=0x15408f6 total=33 count=1 avg=33: \| managed_bytes::managed_bytes at ././utils/managed_bytes.hh:284 \| (inlined by) atomic_cell_or_collection::atomic_cell_or_collection at ./mutation/atomic_cell_or_collection.hh:25 \| (inlined by) cell_and_hash::cell_and_hash at ./mutation/mutation_partition.hh:73 \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::emplace<atomic_cell_or_collection, seastar::optimized_optional<cell_hash> > at ././utils/compact-radix-tree.hh:1809 ++ - addr=0x1518bae: \| row::append_cell at ./mutation/mutation_partition.cc:1344 ++ - addr=0x14acb23: \| partition_builder::accept_row_cell at ././partition_builder.hh:70 ++ - addr=0x157a6a6: \| mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor::accept_atomic_cell at ./mutation/mutation_partition_view.cc:218 \| (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor::operator() at ./mutation/mutation_partition_view.cc:138 \| (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>::internal_visit<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/variant.hpp:1028 \| (inlined by) boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type> > at /usr/include/boost/variant/detail/visitation_impl.hpp:117 \| (inlined by) boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:157 \| (inlined by) boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::mpl::l_item<mpl_::long_<2l>, ser::collection_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:238 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void> at /usr/include/boost/variant/variant.hpp:2337 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false> > at /usr/include/boost/variant/variant.hpp:2349 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const> at /usr/include/boost/variant/variant.hpp:2393 \| (inlined by) boost::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:68 \| (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor> at ./mutation/mutation_partition_view.cc:158 \| (inlined by) mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:224 ++ - addr=0x151234a: \| mutation_partition::apply at ./mutation/mutation_partition.cc:476 ++ - addr=0x14e1103: \| canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:76 ++ - addr=0x283f9ee: \| service::write_mutations_to_database at ./service/raft/group0_state_machine.cc:124 ++ - addr=0x283f36c: \| service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2::operator() at ./service/raft/group0_state_machine.cc:165 ++ - addr=0x28395e3: \| std::__invoke_impl<seastar::future<void>, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 \| (inlined by) std::__invoke<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96 \| (inlined by) std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<seastar::future<void> > (*)(seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>&&, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&)>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1032 \| (inlined by) std::__do_visit<std::__detail::__variant::__deduce_visit_result<seastar::future<void> >, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1793 \| (inlined by) std::visit<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1854 \| (inlined by) service::group0_state_machine::merge_and_apply at ./service/raft/group0_state_machine.cc:156 ++ - addr=0x284781e: \| service::group0_state_machine::apply at ./service/raft/group0_state_machine.cc:220 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	574cb7d977	storage_service: merge_topology_snapshot: co_await to_mutation_gently Perevent stalls from "unpacking" of large canonical mutations seen with test_add_many_nodes_under_load when called from `group0_state_machine::transfer_snapshot`: ``` ++[1#1/44 14%] addr=0x395b2f total=569 count=6 avg=95: ?? ??:0 \| ++[2#1/2 56%] addr=0x3991e3 total=321 count=4 avg=80: ?? ??:0 \| ++ - addr=0x1587159: \| \| std::__new_allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/new_allocator.h:147 \| \| (inlined by) std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/allocator.h:198 \| \| (inlined by) std::allocator_traits<std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:482 \| \| (inlined by) std::_Vector_base<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::_M_allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_vector.h:378 \| \| (inlined by) std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::reserve at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/vector.tcc:79 \| \| (inlined by) ser::idl::serializers::internal::vector_serializer<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > > >::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer_impl.hh:226 \| \| (inlined by) ser::deserialize<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264 \| \| (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:31 \| ++ - addr=0x1587085: \| \| seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void> at ././seastar/include/seastar/core/simple-stream.hh:646 \| \| (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:28 \| \| (inlined by) ser::deserialize<clustering_key_prefix, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264 \| \| (inlined by) ser::deletable_row_view::key() const::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> const> at ./build/dev/gen/idl/mutation.dist.impl.hh:1268 \| \| ++[3#1/1 100%] addr=0x15865a3 total=577 count=7 avg=82: \| \| \| seastar::memory_input_stream<bytes_ostream::fragment_iterator>::with_stream<ser::deletable_row_view::key() const::{lambda(auto:1&)#1}> at ././seastar/include/seastar/core/simple-stream.hh:491 \| \| \| (inlined by) seastar::with_serialized_stream<seastar::memory_input_stream<bytes_ostream::fragment_iterator> const, ser::deletable_row_view::key() const::{lambda(auto:1&)#1}, void> at ././seastar/include/seastar/core/simple-stream.hh:639 \| \| \| (inlined by) ser::deletable_row_view::key at ./build/dev/gen/idl/mutation.dist.impl.hh:1264 \| \| ++[4#1/1 100%] addr=0x157cf27 total=643 count=8 avg=80: \| \| \| mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212 \| \| ++ - addr=0x1516cac: \| \| \| mutation_partition::apply at ./mutation/mutation_partition.cc:497 \| \| ++[5#1/1 100%] addr=0x14e4433 total=1765 count=22 avg=80: \| \| \| canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:60 \| \| ++[6#1/2 98%] addr=0x2452a60 total=1732 count=21 avg=82: \| \| \| service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:761 \| \| ++ - addr=0x2858782: \| \| \| service::group0_state_machine::transfer_snapshot at ./service/raft/group0_state_machine.cc:303 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	c485ed6287	canonical_mutation: add to_mutation_gently to_mutation_gently generates mutation from canonical_mutation asynchronously using the newly introduced mutation_partition accept_gently method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:54 +03:00
Benny Halevy	7f7e4616ab	idl-compiler: emit include directive in generated impl header file The generated implementation header file depends on the generated header file for the types it uses. Generate a respective #include directive to make it self-sufficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:50:16 +03:00
Benny Halevy	e1411f3911	mutation_partition: add apply_gently To be used for freezing mutations or making canonical mutations gently. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:45:24 +03:00
Benny Halevy	f625cd76a9	collection_mutation: improve collection_mutation_view formatting Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Benny Halevy	15e8ecb670	mutation_partition: apply_monotonically: do not support schema upgrade Currently, if the input mutation_partition requires schema upgrade, apply_monotonically always silently reverts to being non-preemptible, even if the caller passed is_preemptible::yes. To prevent that from happening, put the burden of upgrading the mutation_partition schem on the caller, which is today the apply() methods, which are synchronous anyhow. With that, we reduce the proliferation of the `apply_monotonically` overloads and keep only the low level one (which could potentially be private as well, as it's called only from within the mutation/ source files and from tests) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Benny Halevy	e5ca65f78b	test/perf: report also log_allocations/op Currently perf-simple-query --write ignores log allocations that happen on the memtable apply path. This change adds tracking and accounting of the number of log allocation, and reporting of thereof. For reference, here's the output of build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1 ``` random-seed=1 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction 78073.55 tps ( 59.4 allocs/op, 16.3 logallocs/op, 14.3 tasks/op, 52991 insns/op, 0 errors) 77263.59 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53282 insns/op, 0 errors) 79913.07 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53295 insns/op, 0 errors) 79554.32 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53284 insns/op, 0 errors) 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) median absolute deviation: 761.54 maximum: 79913.07 minimum: 77263.59 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Avi Kivity	e0d597348b	Merge 'Remove sstable_directory::_sstable_dir member' from Pavel Emelyanov Different sstable storage backends use slightly different notion of what sstable location is. Filesystem storage knows it's `/var/lib/data/ks/cf-uuid/state` path, while s3 storage keeps only this path's part without state (and even that's not very accurate, because bucket prefix is missing as well as "/var/lib/data" prefix is not needed and eventually should be omitted). Nonetheless, the sstable_directory still keeps the filsystem-like path, while it's really only needed by the filesystem lister. This PR removes it. Closes scylladb/scylladb#18496 * github.com:scylladb/scylladb: sstable_directory: Remove _sstable_dir member sstable_directory: Create sstable path with make_path() when logging sstable_directory: Use make_path to construct filesystem lister sstable_directory: Move some logging around	2024-05-02 17:52:21 +03:00
Patryk Jędrzejczak	b8e3bf4b09	topology_coordinator: clear obsolete generations earlier We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. scylladb/scylladb#17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Also, prevent `test_cdc_generation_clearing` from being flaky by firing the `increase_cdc_generation_leeway` error injection on the server being the topology coordinator. Ref: scylladb/scylladb#17545	2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak	f61c50baa4	test: test_raft_snapshot_request: improve the last assertion The last assertion in the test is very sensitive to changes. The constant has already been increased from 0 to 1 due to flakiness. The old comment explains it. In the following patch, we change the CDC generation publisher so that it clears the obsolete CDC generations earlier. This change would make this assertion flaky again. After restarting the servers, the new topology coordinator could remove the first generation if it became obsolete. This operation appends a new entry to the log. If it happened after triggering snapshot, the assertion could fail with `2 <= 1`. We could increase the constant again to unflake the test, but we better improve it once and for all. We change the assertion so that it's not sensitive to changes in the code based on Raft. The explanation is in the new comment.	2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak	44791a849e	test: test_raft_snapshot_request: find raft leader after restart Finding the new Raft leader after restart simplifies the test and makes it easier to reason about. There are two improvements: - we only need to wait until the leader appends a command, so the read barrier becomes unnecessary, - we only need to trigger snapshot on the leader. We also use the knowledge about the leader in the following patch.	2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak	41198998c5	test: test_raft_shanpshot_request: simplify appended_command We shorten the code and remove the unused `log_size` variable.	2024-05-02 12:46:31 +02:00
Yaron Kaikov	2cf7cc1ea5	scylla_setup: Remove jmx and tools packages from being verified Following `b8634fb244` machine image started to fail with the following error: ``` 10:44:59 ␛[0;32m googlecompute.gce: scylla-jmx package is not installed.␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: Traceback (most recent call last):␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: File "/home/ubuntu/scylla_install_image", line 135, in <module>␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: run('/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup', shell=True, check=True)␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: File "/usr/lib/python3.10/subprocess.py", line 526, in run␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: raise CalledProcessError(retcode, process.args,␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: subprocess.CalledProcessError: Command '/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup' returned non-zero exit status 1.␛[0m ``` It seems we no longer need to verify that jmx and tools-java packages are installed. Closes scylladb/scylladb#18494	2024-05-02 13:30:50 +03:00
Pavel Emelyanov	b8f9eeb82b	sstable_directory: Remove _sstable_dir member It's no longer in use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:12:59 +03:00
Pavel Emelyanov	608762adda	sstable_directory: Create sstable path with make_path() when logging The sstable_directory::sstable_filename() should generate a name of an sstable for log messages. It's not accurate, because it silently assumes that the filename is on local storage, which might not be the case. Fixing it is large chage, so for now replace _sstable_dir with explicit call to make_path(). The change is idempotent, as _sstable_dir is initialized with the result of make_path() call in constructor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:12:59 +03:00
Pavel Emelyanov	07c1df575e	sstable_directory: Use make_path to construct filesystem lister The _sstable_dir is used currently, but it's initialized with make_path() result anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:12:59 +03:00
Pavel Emelyanov	ef98777b27	sstable_directory: Move some logging around At the beginning of .process() method there's a log message which path and which storage is being processed. That's not really nice, because, e.g. filesystem lister may skip processing quarantine directory. Also, the registry lister doesn't list entries by their _sstable_dir, but rather by its _location (spoiler: dir = location / state). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:08:28 +03:00
Ferenc Szili	90634b419c	sstable: added cluster feature for dead rows and range tombstones Previously, writing into system.large_partitions was done by calling record_large_partition(). In order to write different data based on the cluster feature flag, another level of indirection was added by calling _record_large_partitions which is initialized to a lambda which calls internal_record_large_partitions(). This function does not record the values of the two new columns (dead_rows and range_tombstones). After the cluster feature flag becomes true, _record_large_partitions is set to a lambda which calls internal_record_large_partitions_all_data() which record the values of the two new columns.	2024-05-02 11:49:46 +02:00
Ferenc Szili	b06af5b2b9	sstable: write dead_rows count to system.large_partitions	2024-05-02 11:49:10 +02:00
Ferenc Szili	63e724c974	sstable: added counter for dead rows	2024-05-02 11:49:10 +02:00
Nadav Har'El	5558143014	test/alternator: test addressing LSI using REST API The name of the Scylla table backing an Alternator LSI looks like basename:!lsiname. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request may decide to "URL encode" it - convert it to %21. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding, which leads to the REST API request failing to address the LSI table. This patch introduces a test for this bug, which fails without the Seastar issue being fixed, and passes afterwards (i.e., after the previous patch that starts to use the new, fixed, Seastar API). The test creates an LSI, uses the REST API to find its name and then tries to call some REST API ("compaction_strategy") on this table name, after deliberately URL-encoding it. Refs #5883. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-02 12:33:54 +03:00
Nadav Har'El	1aacfdf460	REST API: stop using deprecated, buggy, path parameter The API req->param["name"] to access parameters in the path part of the URL was buggy - it forgot to do URL decoding and the result of our use of it in Scylla was bugs like #5883 - where special characters in certain REST API requests got botched up (encoded by the client, then not decoded by the server). The solution is to replace all uses of req->param["name"] by the new req->get_path_param("name"), which does the decoding correctly. Unfortunately we needed to change 104 (!) callers in this patch, but the transformation is mostly mechanical and there is no functional changes in this patch. Another set of changes was to bring req, not req->param, to a few functions that want to get the path param. This patch avoids the numerous deprecation warnings we had before, and more importantly, it fixes #5883. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-02 12:33:46 +03:00
Jan Ciolek	59b7920b0b	view_update_generator: add get_storage_proxy() During view generation we would like to be able to access information about the current state of view update backlogs, but this information is kept inside storage_proxy. A reference to storage_proxy is kept inside view_update_generator, so the easiest way to get access to it from the view update code is by adding a public getter there. There's already a similar getter for replica::database: get_db(), so it's in line with the rest of the code. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-02 10:59:55 +02:00
Jan Ciolek	4c5cfc7683	storage_proxy: make view backlog getters public Storage proxy maintains information about both local and remote view update backlogs. This information might also be useful outside of storage_proxy, so let's expose the functions that allow to acces backlog information. There aren't any implementation quirks that would make it unsafe to make the functions public, the worst that can happen is that someone causes a lot of atomic operations by repeatedly calling get_view_update_backlog(). Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-02 10:59:55 +02:00
Pavel Emelyanov	67736b5cd3	Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" This reverts commit `9c2a836607`.	2024-05-02 08:16:14 +03:00
Pavel Emelyanov	d47053266b	view: Abort pending view updates when draining When view builder is drained (it now happens very early, but next patch moves this into regular drain) it waits for all on-going view build steps to complete. This includes waiting for any outstanding proxy view writes to complete as well. View writes in proxy have very high timeout of 5 minutes but they are cancellable. However, canecelling of such writes happens in proxy's drain_on_shutdown() call which, in turn, happens pretty late on shutdown. Effectively, by the time it happens all view writes mush have completed already, so stop-time cancelling doesn't really work nowadays. Next patch makes view builder drain happen a bit later during shutdown, namely -- _after_ shutting down messaging service. When it happen that late, non-working view writes cancellation becomes critical, as view builder drain hangs for aforementioned 5 minutes. This patch explicitly cancels all view writes when view builder stops. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 08:16:12 +03:00
Kefu Chai	f183f5aa80	Update seastar submodule * seastar 2b43417d...b73e5e7d (11): > treewide: inherit from formatter<string_view> not formatter<std::string_view> > CMakeLists.txt: Apply CXX deprecated flags conditionally > tls: add assignment operator for gnutls_datum > tls: s/get0()/get()/ > io_queue: do not reference moved variable > TLS: use helper function in get_distinguished_name & get_alt_name_information > TLS: Add support for TLS1.3 session tickets > iotune: ignore shards with id above max_iodepth > core/future: remove a template parameter from set_callback() > util: with_file_input_stream: always close file > core/sleep: Use more raii-sh aproach to maintain sleeper Fixes #5181 Closes scylladb/scylladb#18491	2024-05-02 07:35:42 +03:00
Takuya ASADA	b8634fb244	dist: stop installing scylla-tools, scylla-jmx by default Since we added native nodetool, we no longer need to install scylla-tools and scylla-jmx, drop them from scylla metapackage and make it optional package. Closes #18472 Closes scylladb/scylladb#18487	2024-05-01 22:15:40 +03:00
Kefu Chai	af5674211d	redis/server.hh: suppress -Wimplicit-fallthrough from protocol_parser.hh when compiling the tree with clang-18 and ragel 6.10, the compiler warns like: ``` /usr/local/bin/cmake -E __run_co_compile --tidy="clang-tidy-18;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/home/runner/work/scylladb/scylladb/redis/controller.cc -- /usr/bin/clang++-18 -DBOOST_NO_CXX98_FUNCTION_BASE -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/home/runner/work/scylladb/scylladb -I/home/runner/work/scylladb/scylladb/build/gen -I/home/runner/work/scylladb/scylladb/seastar/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/src -isystem /home/runner/work/scylladb/scylladb/cooking/include -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/runner/work/scylladb/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT redis/CMakeFiles/redis.dir/controller.cc.o -MF redis/CMakeFiles/redis.dir/controller.cc.o.d -o redis/CMakeFiles/redis.dir/controller.cc.o -c /home/runner/work/scylladb/scylladb/redis/controller.cc error: too many errors emitted, stopping now [clang-diagnostic-error] Error: /home/runner/work/scylladb/scylladb/build/gen/redis/protocol_parser.hh:110:1: error: unannotated fall-through between switch labels [clang-diagnostic-implicit-fallthrough] 110 \| case 1: \| ^ /home/runner/work/scylladb/scylladb/build/gen/redis/protocol_parser.hh:110:1: note: insert 'FMT_FALLTHROUGH;' to silence this warning 110 \| case 1: \| ^ \| FMT_FALLTHROUGH; ``` since we have `-Werror`, the warnings like this are considered as error, hence the build fails. in order to address this failure, let's silence this warning when including this generated header file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18447	2024-05-01 18:47:24 +03:00
Kefu Chai	08d1362f80	utils/chunked_vector: fix some typos in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18486	2024-05-01 16:38:43 +03:00
Nadav Har'El	4e78e2d506	test/cql-pytest, cdc: add test for what happens when log name is taken In our CDC implementation, the CDC log table for table "xyz" is always called "xyz_scylla_cdc_log". If this table name is taken, and the user tries to create a table "xyz" with CDC enabled - or enable CDC on the table "xyz", the creation/enabling should fail gracefully, with a clear error message. This test verifies this. The new test passes - the code is already correct. I just wanted to verify that it is (and to prevent future regressions). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18485	2024-05-01 14:46:19 +03:00
Pavel Emelyanov	5d992a4f01	proxy: Remove declaration of nonexisting view_update_write_response_handler class Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18417	2024-05-01 10:15:41 +03:00
Botond Dénes	65a385f5d0	Merge 'Relax the way view builder code checks if a table exists' from Pavel Emelyanov There are two places that workaround db.column_family_exists() call with some fancy exceptions-catching lambda. This PR makes things simpler. Closes scylladb/scylladb#18441 * github.com:scylladb/scylladb: view: Open-code one line lambda checking if table exists view: Use non-throwoing check if a table exists	2024-05-01 10:14:58 +03:00
Kefu Chai	94ac0799d9	build: cmake: link scylla_tracing against scylla-main because tracing/trace_keyspace_helper.cc references symbols defined by table_helper, which is in turn provided by scylla-main, we should link tracing_tracing against scylla-main. otherwise we could have following link failure: ``` ./build/./tracing/trace_keyspace_helper.cc:214: error: undefined reference to 'table_helper::setup_keyspace(cql3::query_processor&, service::migration_manager&, std::basic_string_view<char, std::char_traits<char> >, seastar::basic_sstring<char, unsigned int, 15u, true>, service::query_state&, std::vector<table_helper, std::allocator<table_helper> >)' ./build/./tracing/trace_keyspace_helper.cc:396: error: undefined reference to 'table_helper::cache_table_info(cql3::query_processor&, service::migration_manager&, service::query_state&)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' clang++-18: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18455	2024-05-01 10:08:11 +03:00
Kefu Chai	f0d12df7fc	reloc: create $BUILDDIR for getting its path when building with CMake, there is a use case where the $BUILDIR is not created yet, when `reloc/build_rpm.sh` is launched. in order to enable us to run this script without creating $BUILDIR first, let's create this directory first. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18464	2024-05-01 09:52:17 +03:00
Kefu Chai	8168f02550	raft_group_registry: do not use moved variable clang-tidy warns like: ``` [628/713] Building CXX object service/CMakeFiles/service.dir/raft/raft_group_registry.cc.o Warning: /home/runner/work/scylladb/scylladb/service/raft/raft_group_registry.cc:543:66: warning: 'id' used after it was moved [bugprone-use-after-move] 543 \| auto& rate_limit = _rate_limits.try_get_recent_entry(id, std::chrono::minutes(5)); \| ^ /home/runner/work/scylladb/scylladb/service/raft/raft_group_registry.cc:539:19: note: move occurred here 539 \| auto dst_id = raft::server_id{std::move(id)}; \| ^ ``` this is a false alarm. as the type of `id` is actually `utils::UUID` which is a struct enclosing two `int64_t` variables. and we don't define a move constructor for `utils::UUID`. so the value of of `id` is intact after being moved away. but it is still confusing at the first glance, as we are indeed referencing a moved-away variable. so in order to reduce the confusion and to silence the warning, let's just do not `std::move(id)`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18449	2024-05-01 09:45:12 +03:00
Kefu Chai	bd0d246b57	tools/scylla-nodetool: implement the resetlocalschema command Fixes #18468 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18470	2024-05-01 08:49:11 +03:00
Raphael S. Carvalho	b980634ff2	test: Verify tablet cleanup is properly retried on failure Doesn't test only coordinator ability to retry on failure, but also that replica will be able to properly continue cleanup of a storage group from where it left off (when failure happened), not leave any sstables behind. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18426	2024-04-30 19:27:17 +02:00
Raphael S. Carvalho	62b1cfa89c	topology_coordinator: Fix synchronization of tablet split with other concurrent ops Finalization of tablet split was only synchronizing with migrations, but that's not enough as we want to make sure that all processes like repair completes first as they might hold erm and therefore will be working with a "stale" version of token metadata. For synchronization to work properly, handling of tablet split finalize will now take over the state machine, when possible, and execute a global token metadata barrier to guarantee that update in topology by split won't cause problems. Repair for example could be writing a sstable with stale metadata, and therefore, could generate a sstable that spans multiple tablets. We don't want that to happen, therefore we need the barrier. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18380	2024-04-30 19:23:28 +02:00
Botond Dénes	525553aa41	SCYLLA-VERSION-GEN: warn against using - or _ in custom version names Doing so is a pitfall that will make one waste a lot of time rebuilding the packages, just because at the end it turns out that the version has illegal characters in it. The author of this patch has certainly fallen into this pitfall a lot of times. Closes scylladb/scylladb#18429	2024-04-30 18:14:51 +03:00
Avi Kivity	ea15ddc7dc	Merge 'Fix population of non-normal sstables from registry' from Pavel Emelyanov On boot sstables are populated from normal location as well as from quarantine and staging. It turned out that sstables listed in registry (S3-backed ones) are not populated from non-normal states. Closes scylladb/scylladb#18439 * github.com:scylladb/scylladb: test: Add test for how quarantined sstables registry entries are loaded sstable_directory: Use sstable location to initialize registry lister	2024-04-30 18:10:11 +03:00
Avi Kivity	329b135b5e	Merge 'chunked_vector: fix use after free in emplace back' from Benny Halevy Currently, push_back or emplace_back reallocate the last chunk before constructing the new element. If the arg passed to push_back/emplace_back is a reference to an existing element in the vector, reallocating the last chunk will invalidate the arg reference before it is used. This patch changes the order when reallocating the last chunk in reserve_for_emplace_back: First, a new chunk_ptr is allocated. Then, the back_element is emplaced in the newly allocated array. And only then, existing elements in the current last chunk are migrated to the new chunk. Eventually, the new chunk replaces the existing chunk. If no reservation is requried, the back element is emplaced "in place" in the current last chunk. Fixes scylladb/scylladb#18072 Closes scylladb/scylladb#18073 * github.com:scylladb/scylladb: test: chunked_managed_vector_test: add test_push_back_using_existing_element utils: chunked_vector: reserve_for_emplace_back: emplace before migrating existing elements utils: chunked_vector: push_back: call emplace_back utils: chunked_vector: define min_chunk_capacity utils: chunked*vector: use std::clamp	2024-04-30 18:09:04 +03:00
David Garcia	f62197ee1e	docs: enable concurrent downloads Downloads chunks of 10 CSV concurrently to speed up doc builds. Closes scylladb/scylladb#18469	2024-04-30 16:13:40 +03:00
Raphael S. Carvalho	d7a01598ce	tools: Make sstable shard-of efficient by loading minimum to compute owners Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18440	2024-04-30 16:10:58 +03:00
Gleb Natapov	f2b0a5e9e1	storage_service: do not take API lock for removenode operation if topology coordinator is enabled Topology coordinator serialize operations internally, so there is no need to have an external lock. Fixes: scylladb/scylladb#17681	2024-04-30 15:13:50 +03:00
Gleb Natapov	0a7101923c	test: return file mark from wait_for that points after the found string Returning file mark allows to start searching from the point where the previous string was found.	2024-04-30 15:06:32 +03:00
Kefu Chai	3a1ceb96d7	utils: UUID_gen: include <atomic> in UUID_gen.cc, we are using `std::atomic<int64_t>` in `make_thread_local_node()`, but this template is not defined by any of the included headers. but we should include used headers to be self-contained. when compiling on ubuntu:jammy with libstdc++-13, we have following error: ``` /usr/local/bin/cmake -E __run_co_compile --tidy="clang-tidy-18;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/home/runner/work/scylladb/scylladb/utils/UUID_gen.cc -- /usr/bin/clang++-18 -DBOOST_ALL_NO_LIB -DBOOST_NO_CXX98_FUNCTION_BASE -DBOOST_REGEX_DYN_LINK -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/home/runner/work/scylladb/scylladb -I/home/runner/work/scylladb/scylladb/seastar/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/src -isystem /home/runner/work/scylladb/scylladb/cooking/include -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overl Error: /home/runner/work/scylladb/scylladb/utils/UUID_gen.cc:29:33: error: implicit instantiation of undefined template 'std::atomic<long>' [clang-diagnostic-error] 29 \| static std::atomic<int64_t> thread_id_counter; \| ^ /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/shared_ptr_atomic.h:361:11: note: template is declared here 361 \| class atomic; \| ^ ``` so, in this change, we include `<atomic>` to address this build failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18387	2024-04-30 09:07:22 +03:00
Kefu Chai	6a73c911e3	tools: lua_sstable_consumer.cc: be compatible with Lua 5.3's lua_resume() in Lua 5.3, lua_resume() only accepts three parameters, while in Lua 5.4, this function accepts four parameters. so in order to be compatible with Lua 5.3, we should not pass the 4th parameter to this function. a macro is defined to conditionally pass this parameter based on the Lua's version. see https://www.lua.org/manual/5.3/manual.html#lua_resume Refs `5b5b8b3264` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18450	2024-04-30 09:06:25 +03:00
Botond Dénes	0ace90ad04	test: add test for cleaning up cached querier on tablet migration Check that a cached querier, which exists prior to a migration, will be cleaned up afterwards. This reproduces #18110. The test fails before the fix for the above and passes afterwards.	2024-04-30 01:47:16 -04:00
Botond Dénes	64c817462e	querier: allow injecting cache entry ttl by error injector To allow making tests more robust by setting TTL to a very large value, whent the test relies on entries being present for a given time.	2024-04-30 01:47:16 -04:00
Botond Dénes	03995d9397	replica/table: cleanup_tablet(): clear inactive reads for the tablet To avoid any resource surviving the cleanup, via some inactive read pinning it. This can cause data resurrection if the tablet is later migrated back and the pinned data source is added back to the tablet.	2024-04-30 01:47:16 -04:00
Botond Dénes	a062e3f650	replica/database: introduce clear_inactive_reads_for_tablet() To be used on the tablet cleanup path, to clear any inactive read which might be related to the cleaned-up tablet.	2024-04-30 01:44:03 -04:00
Botond Dénes	338af5055c	replica/database: introduce foreach_reader_concurrency_semaphore Currently we have a single method -- detach_column_family() -- which does something with each semaphore. Soon there will be another one. Introduce a method to do something with all semaphores, to make this smoother. Enterprise has a different set of semaphores, and this will reduce friction.	2024-04-30 01:43:56 -04:00
Botond Dénes	3c813fbb99	reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table() When the new optional parameter has a value, evict only inactive reads, whose ranges overlap with the provided range. The range for the inactive read is provided in `register_inactive_read()`. If the inactive read has no range, ovarlap is assumed and the read is evicted. This will be used to evict all inactive reads that could potentially use a cleaned-up tablet.	2024-04-30 01:31:08 -04:00
Botond Dénes	9e7a957ffb	reader_concurrency_semaphore: allow storing a range with the inactive reader This allows specifying the range the inactive read is reading from. To be used in the next patch to selectively evict inactive reads whose range overlaps with a certain (tablet) range.	2024-04-30 01:31:08 -04:00
Botond Dénes	67684308d1	reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon() inactive_read_handle::abandon() evicts and destroyes the inactive-read, so it is not left behind. Currently, while doing so, it triggers the inactive_read's own version of abandon(): detach(). The two has bad interaction when the inactive_read_handle stores the last permit instance, causing (so far benign) use-after-free. Prevent triggering detach() to avoid this bad interaction altogether.	2024-04-30 01:31:08 -04:00
Piotr Dulikowski	35f456c483	Merge 'Extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop' from Michał Jadwiszczak In order to correctly restore schema from `DESC SCHEMA WITH INTERNALS`, we need a way to drop a column with a timestamp in the past. Example: - table t(a int pk, b int) - insert some data1 - drop column b - add column b int - insert some data2 If the sstables weren't compacted, after restoring the schema from description: - we will loss column b in data2 if we simply do `ALTER TABLE t DROP b` and `ALTER TABLE t ADD b int` - we will resurrect column b in data1 if we skip dropping and re-adding the column Test for this: https://github.com/scylladb/scylla-dtest/pull/4122 Fixes #16482 Closes scylladb/scylladb#18115 * github.com:scylladb/scylladb: docs/cql: update ALTER TABLE docs test/cqlpytest: add test for prepared `ALTER TABLE ... DROP ... USING TIMESTAMP ?` test/cql-pytest: remove `xfail` from alter table with timestamp tests cql3/statements: extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop cql3/statements: pass `query_options` to `prepare_schema_mutations()` cql3/statements: add bound terms to alter table statement cql3/statements: split alter_table_statement into raw and prepared schema: allow to specify timestamp of dropped column	2024-04-29 14:05:05 +02:00
Piotr Dulikowski	dec652de9e	test: topology: test that upgrade succeeds after recent removal Adds a regression test for scylladb/scylladb#18198 - start a two node cluster in legacy topology mode, use nodetool removenode on one of the nodes, upgrade the remaining 1-node cluster and observe that it succeeds.	2024-04-29 13:33:40 +02:00
Piotr Dulikowski	cb4a4f2caf	topology_coordinator: compute cluster size correctly during upgrade During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints(). Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case). The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster. In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size. Fixes: scylladb/scylladb#18198	2024-04-29 13:26:29 +02:00
Takuya ASADA	af0c0ee8af	configure.py: revert changing builddir as absolute path On `be3776ec2a`, we changed outdir to absolute path. This causes "unknown target" error when we build Scylla using the relative path something like "ninja build/dev/scylla", since the target name become absolte path. Revert the change to able to build with the relative path. Also, change optimized_clang.sh to use relative path for --builddir, since we reference "../../$builddir/SCYLLA-*-FILE" when we build submodule, it won't work with absolute path. Fixes #18321 Closes scylladb/scylladb#18338	2024-04-29 09:35:21 +03:00
Kefu Chai	4433d2e10e	build: cmake: let iotune depends on config specific file before this change, in order to build `${iotune_path}`, we use the rule to build `app_iotune` but this target is built using the default build type, see https://cmake.org/cmake/help/latest/variable/CMAKE_DEFAULT_BUILD_TYPE.html#variable:CMAKE_DEFAULT_BUILD_TYPE so, if we want to build `${iotune_path}` for the configuration which is not listed as the first item in `CMAKE_CONFIGURATION_TYPES`, we would end up with copying an nonexistent file. to address this issue, we override the this behavior using the `$<OUTPUT_CONFIG:...>` generator-expression. so that we can depend on non-unique path. and the file-level dependency between ${iotune_path} and $<CONFIG>/iotune can be established. see also https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html#custom-commands Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18395	2024-04-29 09:06:39 +03:00
Kefu Chai	f03f69ad4f	partition_version: move the base class in move ctor before this change, `partition_version` uses a hand-crafted move constructor. but it suffers from the warning from clang-tidy, which believe there is a use-after-move issue, as the inner instance of it's parent class is constructed using `anchorless_list_base_hook(std::move(pv))`, and its other member variables are initialized like `_partition(std::move(pv._partition))` `std::move(pv)` does not do anything, but indicates `pv` maybe moved from. and what is moved away is but the part belong to its parent class. so this issue is benign. but, it's still annoying. as we need to tell the genuine issues reported by clang-tidy from the false alarms. so we have at least two options: - stop using clang-tidy - ignore this warning - silence this warning using LINT direction in a comment - use another way to implement the move constructor in this change, we just cast the moved instance to its base class and move it instead, this should applease clang-tidy. Fixes #18354 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18359	2024-04-28 18:34:45 +02:00
Dawid Medrek	bf802e99eb	docs: Update Hinted Handoff documentation We briefly explain the process of migration of Hinted Handoff to host IDs, the rationale for it, consequences, and possible side effects.	2024-04-28 01:22:59 +02:00
Dawid Medrek	46ab22f805	db/hints: Add endpoint_downtime_not_bigger_than() We add an auxiliary function checking if a node hasn't been down for too long. Although `gms::gossiper` provides already exposes a function responsible for that, it requires that its argument be an IP address. That's the reason we add a new function.	2024-04-28 01:22:59 +02:00
Dawid Medrek	0ef8d67d32	db/hints: Migrate hinted handoff when cluster feature is enabled These changes migrate hinted handoff to using host ID as soon as the corresponding cluster feature is enabled. When a node starts, it defaults to creating directories naming them after IP addresses. When the whole cluster has upgraded to a version of Scylla that can handle directories representing host IDs, we perform a migration of the IP folders, i.e. we try to rename them to host IDs. Invalid directories, i.e. those that represent neither an IP address, nor a host ID, are removed. During the migration, hinted handoff is disabled. It is necessary because we have to modify the disk's contents, so new hints cannot be saved until the migration finishes.	2024-04-28 01:22:57 +02:00
Dawid Medrek	58784cd8db	db/hints: Handle arbitrary directories in resource manager Before these changes, resource manager only handled the case when directories it browsed represented valid host IDs. However, since before migrating hinted handoff to using host IDs we still name directories after IP addresses, that would lead to exceptins that shouldn't happen. We make resource manager handle directories of arbitrary names correctly.	2024-04-27 22:31:07 +02:00
Dawid Medrek	ee84e810ca	db/hints: Start using hint_directory_manager We start keeping track of mappings IP - host ID. The mappings are between endpoint managers (identified by host IDs) and the hint directories managed by them (represented by IP addresses). This is a prelude to handling IP directories by the hint shard manager. The structure should only be used by the hint manager before it's migrated to using host IDs. The reason for that is that we rely on the information obtained from the structure, but it might not make sense later on. When we start creating directories named after host IDs and there are no longer directories representing IP addresses, there is no relation between host IDs and IPs -- just because the structure is supposed to keep track between endpoint managers and hint directories that represent IP addresses. If they represent host IDs, the connection between the two is lost. Still using the data structure could lead to bugs, e.g. if we tried to associate a given endpoint manager's host ID with its corresponding IP address from locator::token_metadata, it could happen that two different host IDs would be bound to the same IP address by the data structure: node A has IP I1, node A changes its IP to I2, node B changes its IP to I1. Though nodes A and B have different host IDs (because they are unique), the code would try to save hints towards node B in node A's hint directory, which should NOT happen. Relying on the data structure is thus only safe before migrating hinted handoff to using host IDs. It may happen that we save a hint in the hint directory of the wrong node indeed, but since migration to using host IDs is a process that only happens once, it's a price we are ready to pay. It's only imperative to prevent it from happening in normal circumstances.	2024-04-27 22:31:07 +02:00
Dawid Medrek	aa4b06a895	db/hints: Enforce providing IP in get_ep_manager() We drop the default argument in the function's signature. Also, we adjust the code of change_host_filter() to be able to perform calls to get_ep_manager().	2024-04-27 22:31:07 +02:00
Dawid Medrek	d0f58736c8	db/hints: Introduce hint_directory_manager This commit introduces a new class responsible for keeping track of mappings IP-host ID. Before hinted handoff is migrated to using host IDs, hint directories still have to represent IP addresses. However, since we identify endpoint managers by host IDs already, we need to be able to associate them with the directories they manage. This class serves this purpose.	2024-04-27 22:31:07 +02:00
Dawid Medrek	f9af01852d	db/hints/resource_manager: Update function description The current description of the function `space_watchdog::scan_one_ep_dir` is not up-to-date with the function's signature. This commit updates it.	2024-04-27 22:31:07 +02:00
Dawid Medrek	59d49c5219	db/hints: Coroutinize space_watchdog::scan_one_ep_dir()	2024-04-27 22:31:07 +02:00
Dawid Medrek	8fd9c80387	db/hints: Expose update lock of space watchdog We expose the update lock of space watchdog to be able to prevent it from scanning hint directories. It will be necessary in an upcoming commit when we will be renaming hint directories and possibly removing some of them. Race conditions are unacceptable, so resource manager cannot be able to access the directory during that time.	2024-04-27 22:31:07 +02:00
Dawid Medrek	934e4bb45e	db/hints: Add function for migrating hint directories to host ID We add a function that will be used while migrating hinted handoff to using host IDs. It iterates over existing hint directories and tries to rename them to the corresponding host IDs. In case of a failure, we remove it so that at the end of its execution the only remaining directories are those that represent host IDs.	2024-04-27 22:31:04 +02:00
Dawid Medrek	e36f853f9b	db/hints: Take both IP and host ID when storing hints The store_hint() method starts taking both an IP and a host ID as its arguments. The rationale for the change is depending on the stage of the cluster (before an upgrade to the host-ID-based hinted handdof and after it), we might need to create a directory representing either an IP address, or a host ID. Because locator::topology can change in the before obtaining the host ID we pass and when the function is being executed, we need to pass both parameters explicitly to ensure the consistency between them.	2024-04-27 20:35:58 +02:00
Dawid Medrek	063d4d5e91	db/hints: Prepare initializing endpoint managers for migrating from IP to host ID We extract the initialization of endpoint managers from the start method of the hint manager to a separate function and make it handle directories that represent either IP addresses, or host IDs; other directories are ignored. It's necessary because before Scylla is upgraded to a version that uses host-ID-based hinted handoff, we need to continue only managing IP directories. When Scylla has been upgraded, we will need to handle host ID directories. It may also happen that after an upgrade (but not before it), Scylla fails while renaming the directories, so we end up with some of them representing IP address, and some representing host IDs. After these changes, the code handles that scenario as well.	2024-04-27 20:35:53 +02:00
Dawid Medrek	cfd03fe273	db/hints: Migrate to locator::host_id We change the type of node identifiers used within the module and fix compilation. Directories storing hints to specific nodes are now represented by host IDs instead of IPs.	2024-04-26 22:44:04 +02:00
Dawid Medrek	1af7fa74e8	db/hints: Remove noexcept in do_send_one_mutation() While the function is marked as noexcept, the returned future can in fact store an exception. We remove the specifier to reflect the actual behavior of the function.	2024-04-26 22:44:04 +02:00
Dawid Medrek	54ae9797b9	service: Add locator::host_id to on_leave_cluster We extend the function endpoint_lifecycle_subscriber::on_leave_cluster by another argument -- locator::host_id. It's more convenient to have a consistent pair of IP and host ID.	2024-04-26 22:44:03 +02:00
Dawid Medrek	a36387d942	service: Fix indentation	2024-04-26 22:44:03 +02:00
Dawid Medrek	c585444c60	db/hints: Fix indentation	2024-04-26 22:44:03 +02:00
Pavel Emelyanov	7f2742893e	view: Open-code one line lambda checking if table exists Continuation of the previous patch. The lambda in question used to be a heavyweight(y) code, but now it's one-liner. And it's only called once, so no more point in keeping it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 20:19:38 +03:00
Pavel Emelyanov	a3e76f9c93	view: Use non-throwoing check if a table exists Two places in view code check if a table exists by finding its schema ID and catching no_such_column_family exception. That's a bit heavyweight, database has column_family_exists() method for such cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 20:17:35 +03:00
Pavel Emelyanov	5e23493d25	test: Add test for how quarantined sstables registry entries are loaded Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 16:54:43 +03:00
Pavel Emelyanov	ba512c52a5	sstable_directory: Use sstable location to initialize registry lister When populating sstables on boot a bunch of sstable_directory objects is created. For each sstable there come three -- one for normal, quarantine and staging state. Each is initialized with sstable location (which is now a datadir/ks_name/cf_name-and-uuid) and the desired state (a enum class). When created, the directory object wires up component lister, depending on which storage options are provided. For local sstables a legacy filesystem lister is created and it's initialized with a path where to search files for -- location + / + string(state). But for s3 sstables, that keep their entries in registry, the lister is errorneously initialized with the same location + / + string(state) value. The mistake is that sstables in registry keep location and state in different columns, so for any state lister should query registry with the same location value (then it filters entries by state on its own). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 16:36:47 +03:00
Kamil Braun	d8313dda43	Merge 'db: config: move consistent-topology-changes out of experimental and make it the default for new clusters' from Patryk Jędrzejczak We move consistent cluster management out of experimental and make it the default for new clusters in 6.0. In code, we make the `consistent-topology-changes` flag unused and assumed to be true. In 6.0, the topology upgrade procedure will be manual and voluntary, so some clusters will still be using the gossip-based topology even though they support the raft-based topology. Therefore, we need to continue testing the gossip-based topology. This is possible by using the `force-gossip-topology-changes` flag introduced in scylladb/scylladb#18284. Ref scylladb/scylladb#17802 Closes scylladb/scylladb#18285 * github.com:scylladb/scylladb: docs: raft.rst: update after removing consistent-topology-changes treewide: fix indentation after the previous patch db: config: make consistent-topology-changes unused test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls test: test_read_required_hosts: run with force-gossip-topology-changes storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes storage_service: join_token_ring: fix finish_setup_after_join calls	2024-04-26 14:45:29 +02:00
Botond Dénes	b96f28356a	Merge 'api/storage_service: convert runtime_error from repair to http error ' from Kefu Chai in `set_repair()`, despite that the repair is performed asynchronously, we check the options specified by client immediately, and throw `std::runtime_error`, if any of them is not supported. before this change, these unhandled exceptions are translated to HTTP 500 error but the underlying HTTP router. but this is misleading, as these errors are caused by client, not server. in this change, we handle the `runtime_error`, and translate them into `httpd::bad_param_exception`, so that the client can have HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error), and with informative error message. for instance, if we apply repair with "small_table_optimization" enabled on a keyspace with tablets enabled. we should have an HTTP error 400 with "The small_table_optimization option is not supported for tablet repair" as the body of the error. this would much more helpful. Closes scylladb/scylladb#18389 * github.com:scylladb/scylladb: api/storage_service: convert runtime_error from repair to http error repair: change runtime_error to invalid_argument in do_repair_start() api/storage_service: coroutinize set_repair()	2024-04-26 13:27:51 +03:00
Patryk Jędrzejczak	3a100cd16c	test: test_raft_recovery_stuck: ensure raft upgrade procedure failed We have log browsing in test.py now, so we can fix this TODO easily. Closes scylladb/scylladb#18425	2024-04-26 10:16:49 +02:00
Asias He	62a9ecff51	repair: Cleanup repair history status entry for tablet The entry in the repair history map that is used to track repair status internally for each repair job should be removed after the repair job is done. We do the same for vnode repairs. This patch adds the missing automatic history cleanup code which is missed in the initial tablet repair support in commit `54239514af`, which does not support repair history update back then. Refs #17046 Closes scylladb/scylladb#18434	2024-04-26 10:56:45 +03:00
Botond Dénes	044fd7a3ec	Merge 'Move some view updating methods from table to view_update_generator' from Pavel Emelyanov The populate_views() and generate_and_propagate_view_updates() both naturally belong to view_update_generator -- they don't need anything special from table itself, but rather depend on some internals of the v.u.generator itself. Moving them there lets removing the view concurrency semaphore from keyspace and table, thus reducing the cross-components dependencies. Closes scylladb/scylladb#18421 * github.com:scylladb/scylladb: replica: Do not carry view concurrency semaphore pointer around view: Get concurrency semaphore via database, not table view_update_generator: Mark mutate_MV() private view: Move view_update_generator methods' code view: Move table::generate_and_propagate_view_updates into view code view: Move table::populate_views() into view_update_generator class	2024-04-26 10:55:38 +03:00
Botond Dénes	d566eec89a	Merge 'treewide: remove {dclocal_,}read_repair_chance options' from Kefu Chai dclocal_read_repair_chance and read_repair_chance have been removed in Cassandra 3.11 and 4.x, see https://issues.apache.org/jira/browse/CASSANDRA-13910. if we expose these properties via DDL, Cassandra would fail to consume the CQL statement creating the table when performing migration from Scylla to Cassandra 4.x, as the latter does not understand these properties anymore. currently the default values of `dc_local_read_repair_chance` and `read_repair_chance` are both "0". so they are practically disabled, unless user deliberately set them to a value greater than 0. also, as a side effect, Cassandra 4.x has better support of Python3. the cqlsh shipped along with Cassandra 3.11.16 only supports python2.7, see https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py it errors out if the system only provides python3 with the error of ``` No appropriate python interpreter found. ``` but modern linux systems do not provide python2 anymore. so, in this change, we deprecate these two options. Fixes #3502 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18087 * github.com:scylladb/scylladb: docs: drop documents related to {,dclocal_}read_repair_chance treewide: remove {dclocal_,}read_repair_chance options	2024-04-26 10:48:47 +03:00
Michał Chojnowski	c1146314a1	docs: clarify that `DELETE` can be used with `USING TIMEOUT` The current text seems to suggest that `USING TIMEOUT` doesn't work with `DELETE` and `BATCH`. But that's wrong. Closes scylladb/scylladb#18424	2024-04-26 10:48:17 +03:00
Pavel Emelyanov	4ac30e5337	view-builder: Print correct exception in built ste exception handler Inside .handle_exception() continuation std::current_exception() doesn't work, there's std::exception ex argument to handler's lambda instead fixes #18423 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18349	2024-04-26 09:58:45 +03:00
Kefu Chai	0bbaded4ce	api/storage_service: convert runtime_error from repair to http error in `set_repair()`, despite that the repair is performed asynchronously, we check the options specified by client immediately, and throw `std::runtime_error`, if any of them is not supported. before this change, these unhandled exceptions are translated to HTTP 500 error but the underlying HTTP router. but this is misleading, as these errors are caused by client, not server. and the error message is missing in the HTTP error message when performing the translation. in this change, we handle the `runtime_error`, and translate them into `httpd::bad_param_exception`, so that the client can have HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error), and with informative error message. for instance, if we apply repair with "small_table_optimization" enabled on a keyspace with tablets enabled. we should have an HTTP error 400 with "The small_table_optimization option is not supported for tablet repair" as the body of the error. this would much more helpful. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:25:15 +08:00
Kefu Chai	9de9f401a1	repair: change runtime_error to invalid_argument in do_repair_start() if an error is caused by the option provided by user, would be better to throw an `std::invalid_argument` instead of `std::runtime_error`, so that the caller can make a better decision when handling the thrown exceptions. so, in this change, we change the exceptions raise directly in `repair_service::do_repair_start()` from `std::runtime_error` to `std::invalid_argument`. please note, in the lambda named `host2ip`, since the hostname is not provided by user, so we are not changing the exception type in that lambda. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:24:45 +08:00
Kefu Chai	d737ba1ab2	api/storage_service: coroutinize set_repair() before this change, `set_repair()` uses a lambda for handling the client-side requests. and this works great. but the underlying `repair_start()` throws if any of the given options is not sane. and we don't handle any of these throw exceptions in `set_repair()`, from client's point of view, it would get an HTTP 500 error code, which implies an "Internal Server Error". but actually, we should blame the client for the error, not the server. so, to prepare the error handling, let's take the opportunity to coroutinize the lambda handling the request, so that we can handle the exception in a more elegant way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:24:03 +08:00
Michał Jadwiszczak	7f839f727e	docs/cql: update ALTER TABLE docs	2024-04-26 07:01:08 +02:00
Michał Jadwiszczak	7cbce78480	test/cqlpytest: add test for prepared `ALTER TABLE ... DROP ... USING TIMESTAMP ?`	2024-04-26 07:01:02 +02:00
Botond Dénes	7cbe5c78b4	install.sh: use the native nodetool directly * tools/java b810e8b00e...4ee15fd9ea (1): > install.sh: don't install nodetool into /usr/bin Add a bin/nodetool and install it to bin/ in install.sh. This script simply forwards to scylla nodetool and it is the replacement for the Java nodetool, which is dropped from the java-tools's install.sh, in the submodule update also included in this patch. With this change, we now hardwire the usage of the native nodetool, as the nodetool, with the intermediary nodetool wrapper script removed from the picture. Bash completion was copied from the java tools repository and it is now installed by the scylla package, together with nodetool. The Java nodetool is still available as as a fall-back, in case the native nodetool has problems, at the path of /opt/scylladb/share/cassandra/bin/nodetool. Testing I tested upgrades on a DEB and RPM distro: Ubuntu and Fedora. First I installed scylla-5.4, then I installed the packages for this PR. On Ubuntu, I had to use dpkg -i --auto-deconfigure, otherwise, dpkg would refuse to install the new packages because they break the old ones. No extra flags were required on Fedora. In both cases, /usr/bin/nodetool was changed from a thunk calling the Java nodetool (from 5.4) to the native launcher script from this PR. /opt/scylladb/share/cassandra/bin/nodetool remained in place and still works after the upgrade. I also verified that --nonroot installs also work. Nodetool works both when called with an absolute path, or when ~/scylladb/bin is added to $PATH. Fixes: #18226 Fixes: #17412 Closes scylladb/scylladb#18255 [avi: reset submodule to actual hash we ended up with]	2024-04-25 22:52:00 +03:00
Michał Jadwiszczak	27a4331dcd	test/cql-pytest: remove `xfail` from alter table with timestamp tests Previous patch introduced `ALTER TABLE ... DROP .. USING TIMESTAM ...` so those test should no longer fail. Refs #9929	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	80f0357436	cql3/statements: extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	7dc0d068c0	cql3/statements: pass `query_options` to `prepare_schema_mutations()` The object is needed to get timestamp from attributes (in a case when the statement was prepared with parameter marker).	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	998a65a4f6	cql3/statements: add bound terms to alter table statement Until now, alter table couldn't take any parameter marker, so the bound terms were always 0. Adding `USING TIMESTAMP` to `ALTER TABLE ... DROP` also adds possibility to prepare a alter table statement with a paramenter marker.	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	d268641c27	cql3/statements: split alter_table_statement into raw and prepared Currently alter table doesn't prepare any parameters so raw statement and prepared one could be the same class. Later commit will add attributes to the statement, which needs to be prepared, that's why I'm splitting.	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	1c5563ba44	schema: allow to specify timestamp of dropped column In order to drop a column with specified timestamp, we need to allow it in out schema class.	2024-04-25 21:27:40 +02:00
Avi Kivity	c2b8ca7d71	Merge 'cql3: statements: change default tombstone_gc mode for tablets' from Aleksandra Martyniuk Repair may miss some tablets that migrated across nodes. So if tombstones expire after some timeout, then we can have data resurrection. Set default tombstone_gc mode to "repair" for tables which use tablets (if repair is required). Fixes: #16627. Closes scylladb/scylladb#18013 * github.com:scylladb/scylladb: test: check default value of tombstone_gc test: topology: move some functions to util.py cql3: statements: change default tombstone_gc mode for tablets	2024-04-25 19:18:37 +03:00
Lakshmi Narayanan Sreethar	6af2659b57	sstables: reclaim_memory_from_components: do not update _recognised_components When reclaiming memory from bloom filters, do not remove them from _recognised_components, as that leads to the on-disk filter component being left back on disk when the SSTable is deleted. Fixes #18398 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18400	2024-04-25 19:15:59 +03:00
Raphael S. Carvalho	4a5fdc5814	table: Remove outdated FIXME about sstable spanning multiple tablets The FIXME was added back then because we thought the interface of compaction_group_for_sstable might have to be adjusted if a sstable were allowed to temporarily span multiple tablets until it's split, but we have gone a different path. If a sstable's key range incorrectly spans more than one tablet, that will be considered a bug and an exception is thrown. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18410	2024-04-25 17:21:11 +03:00
Marcin Maliszkiewicz	7085339f72	cql3: test: include get_mutations_internal log in test.py We have a concurrent modification conflict in tests and suspect duplicated requests but since we don't log successful requests we have no way to verify if that's the case. get_mutations_internal log will help to tell wchich nodes are trying to push auth or service levels mutations into raft. Refs scylladb/scylladb#18319 Closes scylladb/scylladb#18413	2024-04-25 17:17:53 +03:00
Botond Dénes	0234b4542a	Merge '[github] add PR template and action to verify PR tasks was completed' from Yaron Kaikov Today with the backport automation, the developer added the relevant backport label, but without any explanation of why Adding the PR template with a placeholder for the developer to add his decision about backport yes or no The placeholder is marked as a task, so once the explanation is added, the task must be checked as completed Also adding another check to the PR summary will make it clear to the maintainer/reviewer if the developer explained about backport Closes scylladb/scylladb#18275 * github.com:scylladb/scylladb: [github] add action to verify PR tasks was completed [github] add PR template	2024-04-25 17:14:50 +03:00
Pavel Emelyanov	18cc2cfa31	replica: Generalize snapshot details for single table/snapshot dir There are two places that get total:live stats for a table snapshot -- database::get_snapshot_details() and table::get_snapshot_details(). Both do pretty similar thing -- walk the table/snapshots/ directory, then each of the found sub-directory and accumulate the found files' sizes into snapshot details structure. Both try to tell total from live sizes by checking whether an sstable component found in snapshots is present in the table datadir. The database code does it in a more correct way -- not just checks the file presense, but also compares if it's a hardlink on the snapshot file, while the table code just checks if the file of the same name exists. This patch does both -- makes both database and table call the same helper method for a single snapshot details, and makes the generalized version use more elaborated collision check, thus fixing the per-table details getting behavior. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18347	2024-04-25 17:12:42 +03:00
Asias He	1ca779d287	streaming: Fix use after move in fire_stream_event The event is used in a loop. Found by clang-tidy: ``` streaming/stream_result_future.cc:80:49: warning: 'event' used after it was moved [bugprone-use-after-move] listener->handle_stream_event(std::move(event)); ^ streaming/stream_result_future.cc:80:39: note: move occurred here listener->handle_stream_event(std::move(event)); ^ streaming/stream_result_future.cc:80:49: note: the use happens in a later loop iteration than the move listener->handle_stream_event(std::move(event)); ^ ``` Fixes #18332 Closes scylladb/scylladb#18333	2024-04-25 16:48:54 +03:00
Botond Dénes	2c8bd99cd4	Merge 'Coroutinize view_builder::stop()' from Pavel Emelyanov It's pretty straightforward, but prior to that, exception handling needs some care Closes scylladb/scylladb#18378 * github.com:scylladb/scylladb: view-builder: Coroutinize stop() view_builder: Do not try to handle step join exceptions on stop	2024-04-25 16:48:25 +03:00
Kefu Chai	014a069ed2	build: cmake: require {fmt} >= 9.0.0 we are using `fmt::ostream_formatter` which was introduced in {fmt} v9.0.0, see https://github.com/fmtlib/fmt/releases/tag/9.0.0 . before this change, we depend on Seastar to find {fmt}. but the minimal version of {fmt} required by Seastar is 5.0.0, which cannot fulfill the needs to build scylladb. in this change, we find {fmt} package in scylla, and specify the minimal required version of 9.0.0, so the build can fail at the configuration time. {fmt} v8 could be still being used by users. for instance, ubuntu:jammy comes with libfmt-dev 8.1.1. and ubuntu:jammy is EOL in Apr 2027, see https://ubuntu.com/about/release-cycle . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18386	2024-04-25 16:35:08 +03:00
Amnon Heiman	dfea50a7e9	db/config.cc add metric family config from file Metric family config lets a user configure the metric family aggregate labels. This patch modifies the existing relable-config from file to accept metric family config. Similar to the existing relable_config, it adds a metric_family_configs section. For example, the following configuration demonstrates changing aggregate labels by name and regular expression. ``` metric_family_configs: - name: storage_service aggregate_labels: [shard] - regex: (storage_proxy.*) aggregate_labels: [shard, scheduling_group_name] ``` Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#18339	2024-04-25 16:03:39 +03:00
Kefu Chai	e9b31cb4c1	test: locator_topology: s/get0()/get()/ this change addresses the leftover of `9e8805bb49` Refs `9e8805bb49` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18390	2024-04-25 16:03:01 +03:00
Patryk Jędrzejczak	55b011902e	docs: raft.rst: update after removing consistent-topology-changes	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	0d428a3857	treewide: fix indentation after the previous patch	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	3a34bb18cd	db: config: make consistent-topology-changes unused We make the `consistent-topology-changes` experimental feature unused and assumed to be true in 6.0. We remove code branches that executed if `consistent-topology-changes` was disabled.	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	77342ffb34	test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls In the following commit, we make the `consistent-topology-changes` experimental feature unused. Then, all unit tests in the boost suite will start using the raft-based topology by default. Unfortunately, tests with multiple `single_node_cql_env::run_in_thread` calls (usually coming from the `do_with_cql_env_thread` calls) would fail. In a noninitial `run_in_thread` call, a node is started as if it booted for the first time. On the other hand, it has its persistent state from previous boots. Hence, the node can behave strangely and unexpectedly. In particular, `SYSTEM.TOPOLOGY` is not empty and the assertion that expects it to be empty when we boot for the first time fails. We fix this issue by making noninitial `run_in_thread` calls behave as normal restarts. After this change, `test_schema_digest_does_not_change_with_disabled_features` starts failing. This test copies the data directory before booting for the first time, so the new `_sys_ks.local().build_bootstrap_info().get();` makes the node incorrectly think it restarts. Then, after noticing it is not a part of group 0, the node would start the raft upgrade procedure if we didn't run it in the raft RECOVERY mode. This procedure would get stuck because it depends on messaging being enabled even if the node communicates only with itself and messaging is disabled in boost tests.	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	88038d958a	test: test_read_required_hosts: run with force-gossip-topology-changes In one of the following commits, we make the `consistent-topology-changes` experimental feature unused. Then, all unit tests in the boost suite will start using the raft-based topology by default. Unfortunately, some tests would start failing and `test_read_required_hosts` is one of them. `tablet_cql_test_config` in `tablets_test.cc` doesn't use `consistent-topology-changes`, so all test cases in this file run incorrectly wit the gossip-based topology changes. With `consistent-topology-changes`, only `test_read_required_hosts` fails. The failure happens on `auto table2 = add_table(e).get();`: ``` ERROR 2024-04-17 11:14:16,083 [shard 0:main] load_balancer - Replica 9b94d710-fbfb-11ee-9c4f-448617b47e11:0 of tablet 9b94d713-fbfb-11ee-9c4f-448617b47e11:0 not found in topology ``` This test case needs to be investigated and rewritten so that it passes with the raft-based topology. However, we don't want this issue to block the process of making the `consistent-topology-changes` experimental feature unused. We leave a FIXME and we will open a new issue to track it.	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	213f2f6882	storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes The `force_gossip_based_join` error injection does exactly what we expect from `force-gossip-topology-changes` so we can do a simple replacement. We prefer a flag over an error injection because we will use it a lot in CI jobs' configurations, some tests, manual testing etc. It's much more convenient. Moreover, the flag can be used in the release mode, so we re-enable all tests that were disabled in release mode only because of using the `force_gossip_based_join` error injection. The name of the `force-gossip-topology-changes` flag suggests that using it should always succesfully force the gossip-based topology or, if forcing is not possible, the booting should fail. We don't want a node with `force-gossip-topology-changes=true` that silently boots in the raft-topology mode. We achieve it by throwing a runtime error from `join_cluster` in two cases: - the node is restarting in the cluster that is using raft topology - the node is joining the cluster that is using raft topology	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	d6ee540efc	storage_service: join_token_ring: fix finish_setup_after_join calls The `topology_change_enabled` parameter of `finish_setup_after_join` is used underneath to enable pulling raft topology snapshots in two cases: - when the node joins the cluster that uses the raft-based topology, - when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature is enabled. The first case happens in the first changed call. `_raft_experimental_topology` always equals true there. The second call was incorrect as it could enable pulling snapshots before SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES was enabled. It could cause problems during rolling upgrade to 6.0. For more information see `07aba3abc4`.	2024-04-25 14:33:21 +02:00
Yaron Kaikov	5e63f74984	[github] add action to verify PR tasks was completed Adding another check to the PR summary will make it clear to the maintainer/reviewer if the developer explained about backport	2024-04-25 15:24:22 +03:00
Botond Dénes	aaa76d4c0e	Merge 'Getting per-table snapshot size is racy wrt creating new snapshots' from Pavel Emelyanov The API endpoint in question calls table::get_snapshot_detail() which just walks table/snapshots/ directory. This can clash with creating a new snapshot. Database-wide walk is guarded with snapshot-ctl's locking, so should the per-table API do Closes scylladb/scylladb#18414 * github.com:scylladb/scylladb: snapshot: Get per-table snapshot size under snapshot lock snapshot: Move per-table snap API to other snapshot endpoints	2024-04-25 14:57:52 +03:00
Kefu Chai	e5b30ae2ad	partition_version: do not rereference moved variable in `partition_entry::apply_to_incomplete()`, we pass `dst_snp` and `std::move(dst_snp)` to build the capture variable list of a lambda, but the order of evaluation of these variables are unspecified. fortunately, we haven't run into any issues at this moment. but this is not future-proof. so, let's avoid this by storing a reference of the dereferenced smart pointer, and use it later on. this issue is identified by clang-tidy: ``` /home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: warning: 'dst_snp' used after it was moved [bugprone-use-after-move] 500 \| cur = partition_snapshot_row_cursor(s, dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:502:23: note: move occurred here 502 \| dst_snp = std::move(dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 500 \| cur = partition_snapshot_row_cursor(s, dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: warning: 'src_snp' used after it was moved [bugprone-use-after-move] 501 \| src_cur = partition_snapshot_row_cursor(s, src_snp, can_move), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:504:23: note: move occurred here 504 \| src_snp = std::move(src_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 501 \| src_cur = partition_snapshot_row_cursor(s, *src_snp, can_move), \| ^ ``` Fixes #18360 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18361	2024-04-25 14:57:52 +03:00
Pavel Emelyanov	8aaa09ee97	replica: Do not carry view concurrency semaphore pointer around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:27:43 +03:00
Pavel Emelyanov	2ee7c41139	view: Get concurrency semaphore via database, not table The _view_update_concurrency_sem field on database propagates itself via keyspace config down to table config and view_update_generator then grabs one via table:: helper. That's an overkil, view_update_generator has direct reference on the database and can get this semaphore from there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:25:57 +03:00
Pavel Emelyanov	3d8b572d96	view_update_generator: Mark mutate_MV() private Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:25:40 +03:00
Pavel Emelyanov	bc4552740f	view: Move view_update_generator methods' code Now when the two methods belong to another class, move the code itself to db/view , where the class itself resides. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:24:20 +03:00
Pavel Emelyanov	c2bf6b43b2	view: Move table::generate_and_propagate_view_updates into view code Similarly to populate_views() method, this one also naturally belongs to view_update_generator class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:20:06 +03:00
Pavel Emelyanov	670c7c925c	view: Move table::populate_views() into view_update_generator class The method in question has little to do with table, effectively it only needs stats and consurrency semaphore. And the semaphore in question is obtained from table indirectly, it really resides on database. On the other hand, the method carries lots of bits from db::view, e.g. the view_update_builder class, memory_usage_of() helper and a bit more. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:17:20 +03:00
Kefu Chai	e5bcea6718	docs: drop documents related to {,dclocal_}read_repair_chance since "read_repair_chance" and "dclocal_read_repair_chance" are removed, and not supported anymore. let's stop documenting them. Refs #3502 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-25 17:15:27 +08:00
Kefu Chai	c323c93fa4	treewide: remove {dclocal_,}read_repair_chance options dclocal_read_repair_chance and read_repair_chance have been removed in Cassandra 3.11 and 4.x, see https://issues.apache.org/jira/browse/CASSANDRA-13910. if we expose the properties via DDL, Cassandra would fails to consume the CQL statement to creating the table when performing migration from Scylla to Cassandra 4.x, as the latter does not understand these properties anymore. currently the default values of `dc_local_read_repair_chance` and `read_repair_chance` are both "0". so this is practically disabled, unless user deliberately set them to a value greater than 0. also, as a side effect, Cassandra 4.x has better support of Python3. the cqlsh shipped along with Cassandra 3.11.16 only supports python2.7, see https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py it errors out if the system only provides python3 with the error of ``` No appropriate python interpreter found. ``` but modern linux systems do not provide python2 anymore. so, in this change, we deprecate these two options. Fixes #3502 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-25 17:15:27 +08:00
Botond Dénes	ca26899c36	Merge 'sstable: large data handler needs to count range tombstones as rows' from Ferenc Szili When issuing warnings about partitions with the number of rows above a configured threshold, the large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition. This PR fixes the first part of issue #13968 It does not cover distinguishing between live and dead rows. A subsequent PR will handle that. Closes scylladb/scylladb#18346 * github.com:scylladb/scylladb: sstables: add docs changes for system.large_partitions sstable: large data handler needs to count range tombstones as rows	2024-04-25 11:38:30 +03:00
Pavel Emelyanov	e97abfc473	tablets: Fix indentation after flat-hash-map patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18364	2024-04-25 11:36:37 +03:00
Kefu Chai	0b5a861961	build: cmake: reference build_mode with ${scylla_build_mode_${CMAKE_BUILD_TYPE}} before this change, if we generate the building system with plain `Ninja`, instead of `Ninja Multi-Config` using cmake, the build fails, because `${scylla_build_mode_${CMAKE_BUILD_TYPE}}` is not defined. so the profile used for building the rust library would be "rust-", which does not match any of the profiles defined by `Cargo.toml`. in this change, we use `$CMAKE_BUILD_TYPE` instead of "$config". as the former is defined for non-multi generator. while the latter is. see https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html with this change, we are able to generate the building system properly with the "Ninja" generator. if we just want to run some static analyzer against the source tree or just want to build scylladb with a single configuration, the "Ninja" generator is a good fit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18353	2024-04-25 10:51:54 +03:00
Pavel Emelyanov	ae4c1c44ec	snapshot: Get per-table snapshot size under snapshot lock Walking per-table snapshot directory without lock is racy. There's snapshot-ctl locking that's used to get db-wide snapshot details, it should be used to get per-table snapshot details too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 10:05:51 +03:00
Pavel Emelyanov	186b36165e	snapshot: Move per-table snap API to other snapshot endpoints So that they are collected in one place and to facilitate next patch that's going to use snapshot-ctl for per-table API too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 10:05:01 +03:00
Anna Stuchlik	b5d256a991	doc: add Scylla Doctor to the docs This commit adds the description and usage instructions of Scylla Doctor to the "How to Report a ScyllaDB Problem" page. Scylla Doctor replaces Health Check Report, so the description of and references to the latter are removed with this commit. Fixes https://github.com/scylladb/scylladb/issues/16276 Closes scylladb/scylladb#17617	2024-04-25 09:50:38 +03:00
Asias He	037bba0ca1	repair: Turn on off_strategy_updater for tablet repair The off_strategy_updater is used during repair to update the automatic off strategy timer so off_strategy compaction starts automatically only after repair finishes. We still use off_strategy for tablets. So we should still turn on the updater. The update logic is used for vnode tables. We can share the code with vnode table instead of copying, but since there is a possibility we could disable off_strategy for tablets. We'd better postpone the code sharing as follow ups. If later, we decide to disable off_strategy for tablets, we can remove the updater for tablet. Fixes #18196 Closes scylladb/scylladb#18266	2024-04-25 09:03:07 +03:00
Kamil Braun	3363f6e1e8	Merge 'Fix write failures during node replace with same IP with topology over raft' from Gleb Currently a new node is marked as alive too late, after it is already reported as a pending node. The patch series changes replace procedure to be the same as what node_ops do: first stop reporting the IP of the node that is being replaced as a natural replica for writes, then mark the IP is alive, and only after that report the IP as a pending endpoint. Fixes: scylladb/scylladb#17421 * 'gleb/17421-fix-v2' of github.com:scylladb/scylla-dev: test_replace_reuse_ip: add data plane load sync_raft_topology_nodes: make replace procedure similar to nodeops one storage_service: topology_coordinator: fix indentation after previous patch storage_service: topology coordinator: drop ring check in node_state::replacing state	2024-04-24 17:09:01 +02:00
Petr Gusev	bc98774f83	test_replace_reuse_ip: add data plane load In this commit we enhance test_replace_reuse_ip to reproduce #17421. We create a test table and run insert queries on it while the first node is being replaced. In this form the test fails without the fix from the previous commit. Some insert requests fail with [Unavailable exception] "Cannot achieve consistency level for cl QUORUM...".	2024-04-24 16:59:24 +03:00
Gleb Natapov	4614fedd22	sync_raft_topology_nodes: make replace procedure similar to nodeops one In replace-with-same-ip a new node calls gossiper.start_gossiping from join_token_ring with the 'advertise' parameter set to false. This means that this node will fail echo RPC-s from other nodes, making it appear as not alive to them. The node changes this only in storage_service::join_node_response_handler, when the topology coordinator notifies it that it's actually allowed to join the cluster. The node calls _gossiper.advertise_to_nodes({}), and only from this moment other nodes can see it as alive. The problem is that topology coordinator sends this notification in topology::transition_state::join_group0 state. In this state nodes of the cluster already see the new node as pending, they react with calling tmpr->add_replacing_endpoint and update_topology_change_info when they process the corresponding raft notification in sync_raft_topology_nodes. When the new token_metadata is published, assure_sufficient_live_nodes sees the new node in pending_endpoints. All of this happen before the new node handled successful join notification, so it's not alive yet. Suppose we had a cluster with three nodes and we're replacing on them with a fourth node. For cl=qurum assure_sufficient_live_nodes throws if live < need + pending, which in our case becomes 2 < 2 + 1. The end effect is that during replace-with-same-ip data plane requests can fail with unavailable_exception, breaking availability. The patch makes boot procedure more similar to node ops one. It splits the marking of a node as "being replaced" and adding it to pending set in to different steps and marks it as alive in the middle. So when the node is in topology::transition_state::join_group0 state it marked as "being replaced" which means it will no longer be used for reads and writes. Then, in the next state, new node is marked as alive and is added to pending list. fixes scylladb/scylladb#17421	2024-04-24 16:59:22 +03:00
Kamil Braun	1297b9a322	mutation: mutation_by_size_splitter: skip last mutation if it's empty Currently, the last mutation emitted by split_mutation could be empty. It can happen as follows: - consume range tombstone change at pos `1` with some timestamp - consume clustering row at pos `2` - flush: this will create mutation with range tombstone (1, 2) and clustering row at 2 - consume range tombstone change at pos `2` with no timestamp (i.e. closing rtc) - end of partition since the closing rtc has the same position as the clustering row, no additional range tombstone will be emitted -- the only necessary range tombstone was already emitted in the previous mutation. On the other hand, `test_split_mutations` expects all emitted mutations to be non-empty, which is a sane expectation for this function. The test catched a case like this with random-seed=629157129. Fix this by skipping the last mutation if it turns out to be empty. Fixes: scylladb/scylladb#18042 Closes scylladb/scylladb#18375	2024-04-24 16:25:31 +03:00
Raphael S. Carvalho	71682aebdd	storage_service: Fix use-after-move in storage_service::node_ops_cmd_handler ``` service/storage_service.cc:4288:62: warning: 'req' used after it was moved [bugprone-use-after-move] node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable { ^ service/storage_service.cc:4288:107: note: move occurred here node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable { ^ service/storage_service.cc:4288:62: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable { ^ ``` if evaluation order is right-to-left (GCC), req is moved first, and req.ignore_nodes will be empty, so nodes that should be ignored will still be considered, potentially resulting in a failure during replace. https://godbolt.org/z/jPcM6GEx1 courtesy of clang-tidy. Fixes #18324. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18366	2024-04-24 15:36:28 +03:00
Aleksandra Martyniuk	06f6aaf2cf	test: check default value of tombstone_gc Add a test which checks whether default tombstone_gc value is properly set and if it does not override previous setting.	2024-04-24 10:57:51 +02:00
Aleksandra Martyniuk	e0d498716a	test: topology: move some functions to util.py Move functions marked with asynccontextmanager from test/topology/test_mv.py to test/topology/util.py so that they can be used in other tests.	2024-04-24 10:57:51 +02:00
Aleksandra Martyniuk	58f72f9019	cql3: statements: change default tombstone_gc mode for tablets Currently, if tombstone_gc mode isn't specified for a table, then "timeout" is used by default. With tablets, running "nodetool repair -pr" may miss a tablet if it migrated across the nodes. Then, if we expire tombstones for ranges that weren't repaired, we may get data resurrection. Set default tombstone_gc mode value for DDLs that don't specify it. It's set to "repair" for tables which use tablets unless they use local replication strategy or rf = 1. Otherwise it's set to "timeout".	2024-04-24 10:42:10 +02:00
Kamil Braun	8876b9b0ef	test/pylib: random_tables: use IF NOT EXISTS when creating keyspace Due to Python driver's unexpected behavior, "CREATE KEYSPACE" statement may sometimes get executed twice (scylladb/python-driver#317), leading to "Keyspace ... already exists" error in our tests (scylladb/scylladb#17654). Work around this by using "IF NOT EXISTS". Fixes: scylladb/scylladb#17654 Closes scylladb/scylladb#18368	2024-04-24 10:09:26 +03:00
Pavel Emelyanov	1b1b86809d	view-builder: Coroutinize stop() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-23 20:43:42 +03:00
Pavel Emelyanov	eaf78fca04	view_builder: Do not try to handle step join exceptions on stop Commit `23c891923e` (main: make sure view_builder doesn't propagate semaphore errors) ignored some exceptions that could pop up from the _build_step/do_build_step() serialized action, since they are "benign" on stop. Later there came `b56b10a4bb` (view_builder: do_build_step: handle unexpected exceptions) that plugged any exception from the action in question, regardless of they happen on stop or run-time. Apparently, the latter commit supersedes the former. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-23 20:26:14 +03:00
Anna Stuchlik	c0e4f3e646	doc: include OSS-specific info as separate files This commit excludes OSS-specific links and content added in https://github.com/scylladb/scylladb/pull/17624 to separate files and adds the include directive `.. scylladb_include_flag::` to include these files in the doc source files. Reason: Adding the link to the Open Source upgrade guide (/upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology) breaks the Enterprise documentation because the Enterprise docs don't contain that upgrade guide. We must add separate files for OSS and Enterprise to prevent failing the Enterprise build and breaking the links. Closes scylladb/scylladb#18372	2024-04-23 16:59:05 +02:00
Raphael S. Carvalho	fa2dc5aefa	sstables: Fix use-after-move in an error path of FS-based sstable writer ``` sstables/storage.cc:152:21: warning: 'file_path' used after it was moved [bugprone-use-after-move] remove_file(file_path).get(); ^ sstables/storage.cc:145:64: note: move occurred here auto w = file_writer(output_stream<char>(std::move(sink)), std::move(file_path)); ``` It's a regression when TOC is found for a new sstable, and we try to delete temporary TOC. courtesy of clang-tidy. Fixes #18323. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18367	2024-04-23 17:19:55 +03:00
Pavel Emelyanov	f5f57dc817	table: No need to open directory in snapshot_exists() In order to check if a snapshot of a certain name exists the checking method opens directory. It can be made with more lightweight call. Also, though not critical, is that it fogets to close it. Coroutinuze the method while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18365	2024-04-23 17:19:24 +03:00
Botond Dénes	572003c469	Merge 'Cleanup the way snapshot details are propagated via API' from Pavel Emelyanov There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several snapshot_details aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls. Closes scylladb/scylladb#18317 * github.com:scylladb/scylladb: snapshot_ctl: Brush up true_snapshots_size() internals snapshot_ctl: Remove unused details struct snapshot_ctl: No double recoding of details database,snapshots: Move database::snapshot_details into snapshot_ctl database,snapshots: Make database::get_snapshot_details() return map, not vector table,snapshots: Move table::snapshot_details into snapshot_ctl	2024-04-23 16:28:25 +03:00
Kefu Chai	9e8805bb49	repair, transport: s/get0()/get()/ `future::get0()` was deprecated in favor of `future::get()`. so let's use the latter instead. this change silences a `-Wdeprecated` warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18357	2024-04-23 15:48:54 +03:00
Kefu Chai	4fd9b2a791	reader: silence false-positive use-after-move warning when compiling with clang-tidy, it warngs: ``` [6/9] Building CXX object readers/CMakeFiles/readers.dir/multishard.cc.o /home/kefu/dev/scylladb/readers/multishard.cc:84:53: warning: 'fut_and_result' used after it was moved [bugprone-use-after-move] 84 \| auto result = std::get<1>(std::move(fut_and_result)); \| ^ /home/kefu/dev/scylladb/readers/multishard.cc:79:34: note: move occurred here 79 \| _read_ahead_future = std::get<0>(std::move(fut_and_result)); \| ^ ``` but this warning is but a false alarm, as we are not really moving away the whole tuple, we are just move away an element from it. but clang-tidy cannot tell which element we are actually moving. so, silence both places of `std::move()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18363	2024-04-23 15:47:50 +03:00
Botond Dénes	5a1e3b25d0	Merge 'Sanitize sstables::directory_semaphore usage' from Pavel Emelyanov The semaphore in question is used to limit parallelism of manipulations with table's sstables. It's currently used in two places -- sstable_directory (mainly on boot) and by table::take_snapshot() to take snapshot. For the latter, there's also a database -> sharded<directory_semaphore> reference. This PR sanitizes the semaphore usage. The results are - directory_semaphore no longer needs to friend several classes that mess with its internals - database no longer references directory_semaphore Closes scylladb/scylladb#18281 * github.com:scylladb/scylladb: database: Keep local directory_semaphore to initialize sstables managers database: Don't reference directory_semaphore table: Use directory semaphore from sstables manager table: Indentation fix after previous patch table: Use directory_semaphore for rate-limited snapshot taking sstables: Move directory_semaphore::parallel_for_each() to header sstables: Move parallel_for_each_restricted to directory_semaphore table: Use smp::all_cpus() to iterate over all CPUs locally	2024-04-23 13:54:52 +03:00
Kefu Chai	ab4de1f470	auth: move fmt::formatter<auth::resource_kind> up before this change, `fmt::formatter<auth::resource_kind>` is located at line 250 in this file, but it is used at line 130. so, {fmt} is not able to find it: ``` /usr/include/fmt/core.h:2593:45: error: implicit instantiation of undefined template 'fmt::detail::type_is_unformattable_for<auth::resource_kind, char>' 2593 \| type_is_unformattable_for<T, char_type> _; \| ^ /usr/include/fmt/core.h:2656:23: note: in instantiation of function template specialization 'fmt::detail::parse_format_specs<auth::resource_kind, fmt::detail::compile_parse_context<char>>' requested here 2656 \| parse_funcs_{&parse_format_specs<Args, parse_context_type>...} {} \| ^ /usr/include/fmt/core.h:2787:47: note: in instantiation of member function 'fmt::detail::format_string_checker<char, auth::resource_kind, auth::resource_kind>::format_string_checker' requested here 2787 \| detail::parse_format_string<true>(str_, checker(s)); \| ^ /home/kefu/dev/scylladb/auth/resource.hh:130:29: note: in instantiation of function template specialization 'fmt::basic_format_string<char, auth::resource_kind &, auth::resource_kind &>::basic_format_string<char[65], 0>' requested here 130 \| seastar::format("This resource has kind '{}', but was expected to have kind '{}'.", actual, expected)) { \| ^ /usr/include/fmt/core.h:1578:45: note: template is declared here 1578 \| template <typename T, typename Char> struct type_is_unformattable_for; \| ^ ``` in this change, `fmt::formatter<auth::resource_kind>` is moved up to where `auth::resource_kind` is defined. so that it can be used by its caller. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18316	2024-04-23 12:11:17 +03:00
Kefu Chai	48048c2f94	utils/to_string: include fmt/std.h if fmt >= v10 in to_string.hh, we define the specialization of `fmt::formatter<std::optional<T>>`, which is available in {fmt} v10 and up. to avoid conditionally including `utils/to_string.hh` and `fmt/std.h` in all source files formatting `std::optional<T>` using {fmt}, let's include `fmt/std.h` if {fmt}'s verison is greater or equal to 10. in future, we should drop the specialization and use `fmt/std.h` directly. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18325	2024-04-23 12:09:05 +03:00
Kefu Chai	e2d5054c53	types: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18326	2024-04-23 12:08:23 +03:00
Pavel Emelyanov	4445ee9a55	Merge 'install-dependencies.sh: add more dependencies for debian' from Kefu Chai in this changeset, we install `libxxhash-dev` and `cargo` for debian, and install cxxbridge for all distros, so that at least debian can be built without further preparations after running `install-dependencies.sh`. Closes scylladb/scylladb#18335 * github.com:scylladb/scylladb: install-dependencies.sh: move cargo out of fedora branch install-dependencies: install cargo and wabt for debian install-dependencies.sh: add libxxhash-dev for debian	2024-04-23 12:04:47 +03:00
Lakshmi Narayanan Sreethar	de6570e1ec	serializer_impl, sstables: fix build failure due to missing includes When building scylla with cmake, it fails due to missing includes in serializer_impl.hh and sstables/compress.hh files. Fix that by adding the appropriate include files. Fixes #18343 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18344	2024-04-23 12:03:51 +03:00
Kefu Chai	826f413cad	thrift: avoid use-after-move in `make_non_overlapping_ranges()` in handler.cc, `make_non_overlapping_ranges()` references a moved instance of `ColumnSlice` when something unexpected happens to format the error message in an exception, the move constructor of `ColumnSlice` is default-generated, so the members' move constructors are used to construct the new instance in the move constructor. this could lead to undefined behavior when dereferencing the move instance. in this change, in order to avoid use-after free, let's keep a copy of the referenced member variables and reference them when formatting error message in the exception. this use-after-move issue was introduced in `822a315dfa`, which implemented `get_multi_slice` verb and this piece in the first place. since both 5.2 and 5.4 include this commit, we should backport this change to them. Refs `822a315dfa` Fixes #18356 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18358	2024-04-23 12:02:09 +03:00
Kefu Chai	ad2c26824a	main: do not reference moved variable before this change, we dereference `linfo` after moving it away. and clang-tidy warns us like ``` [19/171] Building CXX object CMakeFiles/scylla.dir/main.cc.o /home/kefu/dev/scylladb/main.cc:559:12: warning: 'linfo' used after it was moved [bugprone-use-after-move] 559 \| return linfo.host_id; \| ^ /home/kefu/dev/scylladb/main.cc:558:36: note: move occurred here 558 \| sys_ks.local().save_local_info(std::move(linfo), snitch.local()->get_location(), broadcast_address, broadcast_rpc_address).get(); \| ^ ``` the default-generated move constructor of `local_info` uses the default-generated move constructor of `locator::host_id`, which in turn use the default-generated move constructor of `utils::tagged_uuid<struct host_id_tag>`, and then `utils::UUID` 's move constructor. since `UUID` does not contain any moveable resources, what it has is but two `int64_t` member variables. so this is a benign issue. but still, it is distracting. in this change, we keep the value of `host_id` locally, and return it instead to silence this warning, and to improve the maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18362	2024-04-23 11:58:58 +03:00
Patryk Jędrzejczak	14911051ee	db: config: introduce force-gossip-topology-changes We are going to make the `consistent-topology-changes` experimental feature unused in 6.0. However, the topology upgrade procedure will be manual and voluntary, so some 6.0 clusters will be using the gossip-based topology. Therefore, we need to continue testing the gossip-based topology. The solution is introducing a new flag, `force-gossip-topology-changes`, that will enforce the gossip-based topology in a fresh cluster. In this patch, we only introduce the parameter without any effect. Here is the explanation. Making `consistent-topology-changes` unused and introducing `force-gossip-topology-changes` requires adjustments in scylla-dtest. We want to merge changes to scylladb and scylla-dtest in a way that ensures all tests are run correctly during the whole process. If we merged all changes to scylladb first, before merging the scylla-dtest changes, all tests would run with the raft-based topology and the ones excluded in the raft-based topology would fail. We also can't merge all changes to scylla-dtest first. However, we can follow this plan: 1. scylladb: merge this patch 2. scylla-dtest: start using `force-gossip-topology-changes` in jobs that run without the raft-based topology 3. scylladb: merge the rest of the changes 4. scylla-dtest: merge the rest of the changes Ref scylladb/scylladb#17802 Closes scylladb/scylladb#18284	2024-04-23 09:42:46 +02:00
Botond Dénes	275ed9a9bc	replica/mutation_dump: create_underlying_mutation_sources(): remove false move transformed_cr is moved in a loop, in each iteration. This is harmless because the variable is const and the move has no effect, yet it is confusing to readers and triggers false positives in clang-tidy (moved-from object reused). Remove it. Fixes: #18322 Closes scylladb/scylladb#18348	2024-04-23 01:21:36 +02:00
Kamil Braun	e9285e5c04	Merge 'various fixes for topology coordinator' from Gleb The series contains fixes for some problems found during scalability testing and one clean up patch. Ref: scylladb/scylladb#17545 * 'gleb/topology-fixes-v4' of github.com:scylladb/scylla-dev: gossiper: disable status check for endpoints in raft mode storage_service: introduce a setter for topology_change_kind topology coordinator: drop unused structure storage_service: yield in get_system_mutations	2024-04-22 17:37:47 +02:00
Calle Wilund	82d97da3e0	commitlog: Remove (benign) use-after-move Fixes #18329 named_file::assign call uses old object "known_size" after a move of the object. While this is wholly ok, since the attribute accessed will not be modified/destroyed by the move, it causes warnings in "tidy" runs, and might confuse or cause real errors should impl. change. Closes scylladb/scylladb#18337	2024-04-22 17:20:19 +03:00
Ferenc Szili	c528597a84	sstables: add docs changes for system.large_partitions This commit updates the documentation changes for the new column range_tombstones in system.large_partitions	2024-04-22 15:25:41 +02:00
Ferenc Szili	98bec4e02a	sstable: large data handler needs to count range tombstones as rows When issuing warnings about partitions with the number of rows above a configured threshold, the large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition. This PR fixes the first part of issue #13968 It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.	2024-04-22 15:24:18 +02:00
Kefu Chai	ff04375016	main: drop unused namespace alias `fs` namespace alias was introduced in `ff4d8b6e85`, but we don't use it anymore. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18308	2024-04-22 13:50:28 +03:00
Nadav Har'El	59b40484c8	Update seastar submodule * seastar 8fabb30a...2b43417d (6): > future: deprecate future::get0() > build: do not export valgrind with export() > http: deprecate buggy path param[] > http/request: add get_path_param method > http/request: get_query_param refactor > http/util: add path_decode method Refs #5883 (fixes https://github.com/scylladb/seastar/issues/725 and provides a new API to read the decoded paths). Closes scylladb/scylladb#18297	2024-04-22 11:12:49 +03:00
Kefu Chai	85406a450c	install-dependencies.sh: move cargo out of fedora branch so that we install cxxbridge-cmd on all distros, and cxxbridge is available when building scylladb. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-22 15:41:20 +08:00
Kefu Chai	835742af6d	install-dependencies: install cargo and wabt for debian cargo is used for installing cxxbridge-cmd, which is in turn used when building the cxx bindings for the rust modules. so we need it on all distros. in this change, we add cargo for debian. so that we don't have build failure like: ``` CMake Error at rust/CMakeLists.txt:32 (find_program): Could not find CXXBRIDGE using the following names: cxxbridge ``` for similar reason, we also need wabt, which provides wasm2wat, without which, we'd have ``` CMake Error at test/resource/wasm/CMakeLists.txt:1 (find_program): Could not find WASM2WAT using the following names: wasm2wat ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-22 15:41:20 +08:00
Kefu Chai	a70a288627	install-dependencies.sh: add libxxhash-dev for debian libxxhash is used for building on both fedora and debian. `xxhash-devel` is already listed in `fedora_packages`, we should have its counterpart in `debian_base_packages`. otherwise the build on debian and its derivatives could fail like ``` CMake Error at /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find xxHash (missing: xxhash_LIBRARY xxhash_INCLUDE_DIR) (found version "") Call Stack (most recent call first): /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE) cmake/FindxxHash.cmake:30 (find_package_handle_standard_args) CMakeLists.txt:75 (find_package) ``` if we are using CMake to generate the building system. if we use `configure.py` to generate `build.ninja`, the build would fails at build time. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-22 15:22:51 +08:00
Gleb Natapov	0c77e96b0b	storage_service: topology_coordinator: fix indentation after previous patch	2024-04-21 18:53:21 +03:00
Gleb Natapov	b8ee8911ca	storage_service: topology coordinator: drop ring check in node_state::replacing state Always modify topology metadata in node_state::replacing state. There is no dependency on the ring value at all.	2024-04-21 18:53:04 +03:00
Gleb Natapov	06e6ed09ed	gossiper: disable status check for endpoints in raft mode Gossiper automatically removes endpoints that do not have tokens in normal state and either do not send gossiper updates or are dead for a long time. We do not need this with topology coordinator mode since in this mode the coordinator is responsible to manage the set of nodes in the cluster. In addition the patch disables quarantined endpoint maintenance in gossiper in raft mode and uses left node list from the topology coordinator to ignore updates for nodes that are no longer part of the topology.	2024-04-21 16:36:07 +03:00
Gleb Natapov	0e3f92fa49	storage_service: introduce a setter for topology_change_kind In the next patch we will extend it to have other side affects.	2024-04-21 16:36:07 +03:00
Gleb Natapov	040c6ca0c1	topology coordinator: drop unused structure	2024-04-21 16:36:07 +03:00
Gleb Natapov	d0a00f3489	storage_service: yield in get_system_mutations Yield in a loop that converts a result to canonical_mutation. We observed stalls for very large tables.	2024-04-21 16:36:07 +03:00
Avi Kivity	87b08c957f	Merge 'treewide: drop `FMT_DEPRECATED_OSTREAM` macro and homebrew range formatters' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Closes scylladb/scylladb#17968 * github.com:scylladb/scylladb: treewide: do not define FMT_DEPRECATED_OSTREAM treewide: include fmt/ranges.h and/or fmt/std.h utils/managed_bytes: add support for fmt::to_string() to bytes and friends	2024-04-20 22:25:00 +03:00
Mikołaj Grzebieluch	65cfb9b4e0	storage_service: skip wait_for_gossip_to_settle if topology changes are based on raft Waiting for gossip to settle slows down the bootstrap of the cluster. It is safe to disable it if the topology is based on Raft. Fixes scylladb/scylladb#16055 Closes scylladb/scylladb#17960	2024-04-20 17:56:51 +02:00
Pavel Emelyanov	67a408447f	snapshot_ctl: Brush up true_snapshots_size() internals Previous patches broke indentation in this method. Fix it by shortening the summation loop with the help of std::accumulate() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 21:06:06 +03:00
Pavel Emelyanov	50add3314d	snapshot_ctl: Remove unused details struct Now the details are manipulated via some other structs and this one can just be removed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:34 +03:00
Pavel Emelyanov	e8f10be12e	snapshot_ctl: No double recoding of details Currently database::get_snapshot_details() returns a collection of snapshots. The snapshot_ctl converts this collection into similarly looking one with slightly different structures inside. The resulting collection is converted one more time on the API layer into another similarly looking map. This patch removes the intermediate conversion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:32 +03:00
Pavel Emelyanov	8ec3f057a8	database,snapshots: Move database::snapshot_details into snapshot_ctl Similarly to how it looks like for table::snapshot_details Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:29 +03:00
Pavel Emelyanov	f6bc283bbb	database,snapshots: Make database::get_snapshot_details() return map, not vector So that it's in-sync with table::get_snapshot_details(). Next patches will improve this place even further. Also, there can be many snapshots and vector can grow large, but that's less of an issue here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:25 +03:00
Pavel Emelyanov	a36c13beb3	table,snapshots: Move table::snapshot_details into snapshot_ctl Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 19:59:34 +03:00
Kefu Chai	372a4d1b79	treewide: do not define FMT_DEPRECATED_OSTREAM since we do not rely on FMT_DEPRECATED_OSTREAM to define the fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`. in this change, * utils: drop the range formatters in to_string.hh and to_string.c, as we don't use them anymore. and the tests for them in test/boost/string_format_test.cc are removed accordingly. * utils: use fmt to print chunk_vector and small_vector. as we are not able to print the elements using operator<< anymore after switching to {fmt} formatters. * test/boost: specialize fmt::details::is_std_string_like<bytes> due to a bug in {fmt} v9, {fmt} fails to format a range whose element type is `basic_sstring<uint8_t>`, as it considers it as a string-like type, but `basic_sstring<uint8_t>`'s char type is signed char, not char. this issue does not exist in {fmt} v10, so, in this change, we add a workaround to explicitly specialize the type trait to assure that {fmt} format this type using its `fmt::formatter` specialization instead of trying to format it as a string. also, {fmt}'s generic ranges formatter calls the pair formatter's `set_brackets()` and `set_separator()` methods when printing the range, but operator<< based formatter does not provide these method, we have to include this change in the change switching to {fmt}, otherwise the change specializing `fmt::details::is_std_string_like<bytes>` won't compile. * test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends for comparing values. but without the operator<< based formatters, Boost.Test would not be able to print them. after removing the homebrew formatters, we need to use the generic `boost_test_print_type()` helper to do this job. so we are including `test_utils.hh` in tests so that we can print the formattable types. * treewide: add "#include "utils/to_string.hh" where `fmt::formatter<optional<>>` is used. * configure.py: do not define FMT_DEPRECATED_OSTREAM * cmake: do not define FMT_DEPRECATED_OSTREAM Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:57:36 +08:00
Kefu Chai	a439ebcfce	treewide: include fmt/ranges.h and/or fmt/std.h before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:16 +08:00
Kefu Chai	01f13850cb	utils/managed_bytes: add support for fmt::to_string() to bytes and friends in `3835ebfcdc`, `fmt::formatter` were added to `bytes` and friend, but their `format()` methods were intentionally implemented as plain methods, which only acccept `fmt::format_context`. it was a decision decision. the intention was to reduce the usage of template, to speed up the compilation at the expense of dropping the support of other appenders, notably the one used by `fmt::to_string()`, where the type of "format_context" is not a `fmt::format_context`, but a string appender. but it turns out we still have users in tests using `fmt::to_string()`, to convert, for instance, `bytes` to `std::string`, so, to make their life easier, we add the templated `format()` to these types. an alternative is to change the callers to use something like `fmt::format("{}", v)`, which is less convenient though. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:13 +08:00
Kefu Chai	5ab527e669	main: do not echo parsed options when calling scylla interactively in `2f0f53ac`, we added logging of parsed command line options so that we can see how scylla is launched in case it fails to boot. but when scylla is called interactively in console. this echo is a little bit annoying. see following console session ```console $ scylla --help-loggers Scylla version 5.5.0~dev-0.20240419.3c9651adf297 with build-id 7dd6a110e608535e5c259a03548eda6517ab4bde starting ... command used: "./RelWithDebInfo/scylla --help-loggers" pid: 996503 parsed command line options: [help-loggers] Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ... ``` so in this change, we check if the stdin is associated with a terminal device, if that the case, we don't print the scylla version, parsed command line and pid. and the interactive session looks like: ```console $ scylla --help-loggers Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ``` no more distracting information printed. the original behavior can be tested like: ```console $ : \| ./RelWithDebInfo/scylla --help-loggers ``` assuming scylla is always launched with systemd, which connects stdin to /dev/null. see https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Logging%20and%20Standard%20Input/Output . so this behavior is preserved with this change. Refs #4203 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18309	2024-04-19 15:00:05 +03:00
Raphael S. Carvalho	223214439b	compaction: Disconsider active tables in the hourly compaction reevaluation This hourly reevaluation is there to help tablets that have very low write activity, which can go a long time without flushing a memtable, and it's important to reevaluate compaction as data can get expired. Today it can happen that we reevaluate a table that is being compacted actively, which is waste of cpu as the reevaluation will happen anyway when there are changes to sstable set. This waste can be amplified with a significant tablet count in a given shard. Eventually, we could make the revaluation time per table based on expiration histogram, but until we get there, let's avoid this waste by only reevaluating tables that are compaction idle for more than 1h. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18280	2024-04-19 14:33:40 +03:00
Pavel Emelyanov	ba58b71eea	database: Keep local directory_semaphore to initialize sstables managers Now database is constructed with sharded<directory_semaphore>, but it no longer needs sharded, local is enough. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	53909da390	database: Don't reference directory_semaphore It was only used by table taking snapshot code. Now it uses sstables manager's reference and database can stop carrying it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	be5bc38cde	table: Use directory semaphore from sstables manager It's natural for a table to itarate over its sstables, get the semaphore from the manager of sstables, not from database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	7e7dd2649b	table: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	2fced3c557	table: Use directory_semaphore for rate-limited snapshot taking The table::take_snapshot() limits its parallelizm with the help of direcoty semaphore already, but implements it "by hand". There's already parallel_for_each() method on the dir.sem. class that does exactly that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	6514c67fae	sstables: Move directory_semaphore::parallel_for_each() to header It's a template and in order to use it in other .cc files it's more convenient to move it into a header file Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	ad1a9d4c11	sstables: Move parallel_for_each_restricted to directory_semaphore In order not to make sstable_directory mess with private members of this class. Next patch will also make use of this new method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	0d2178202d	table: Use smp::all_cpus() to iterate over all CPUs locally Currently it uses irange(0, smp::count0), but seastar provides convenient helper call for the very same range object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Kefu Chai	a5dae74aee	doc: update `nodetool setlogginglevel` sample output with most recent loggers list in order to reduce the confusion like: > I cannot find foobar in the list, is it supported? also, take this opportunity to use "console" instead of "shell" for rendering the code block. it's a better fit in this case. since we are using pygment for syntax highlighting, see https://pygments.org/docs/lexers/#pygments.lexers.shell.BashSessionLexer for details on the "console" lexer. and add a prompt before the command line, so that "console" lexer can render the command line and output better. also, add a note explaining that user should refer the output of `scylla` to see the list of logger classes. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18311	2024-04-19 13:25:39 +03:00
Kefu Chai	c04654e865	storage_service: capture this explicitly clang-19 complains with `-Wdeprecated-this-capture`: ``` /home/kefu/dev/scylladb/service/storage_service.cc:5837:22: error: implicit capture of 'this' with a capture default of '=' is deprecated [-Werror,-Wdeprecated-this-capture] 5837 \| auto* node = get_token_metadata().get_topology().find_node(dst.host); \| ^ /home/kefu/dev/scylladb/service/storage_service.cc:5830:44: note: add an explicit capture of 'this' to capture '*this' by reference 5830 \| co_await transit_tablet(table, token, [=] (const locator::tablet_map& tmap, api::timestamp_type write_timestamp) { \| ^ \| , this ``` since https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0806r2.html was approved, see https://eel.is/c++draft/depr.capture.this. and newer versions of C++ compilers implemented it, so we need to capture `this` explicitly to be more standard compliant, and to be more future-proof in this regard. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18306	2024-04-19 10:05:57 +03:00
Kefu Chai	168ade72f8	treewide: replace formatter<std::string_view> with formatter<string_view> in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>` for `std::string_view` as well as the specialization of `fmt::formatter<..>` for `fmt::string_view` which is an implementation builtin in {fmt} for compatibility of pre-C++17. and this type is used even if the code is compiled with C++ stadandard greater or equal to C++17. also, before v10, the `fmt::formatter<std::string_view>::format()` is defined so it accepts `std::string_view`. after v10, `fmt::formatter<std::string_view>` still exists, but it is now defined using `format_as()` machinery, so it's `format()` method does not actually accept `std::string_view`, it accepts `fmt::string_view`, as the former can be converted to `fmt::string_view`. this is why we can inherit from `fmt::formatter<std::string_view>` and use `formatter<std::string_view>::format(foo, ctx);` to implement the `format()` method with {fmt} v9, but we cannot do this with {fmt} v10, and we would have following compilation failure: ``` FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o /home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc /home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format' 254 \| return formatter<std::string_view>::format(it->second, ctx); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ /usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument 2759 \| FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const \| ^ ~~~~~~~~~~~~ ``` because the inherited `format()` method actually comes from `fmt::formatter<fmt::string_view>`. to reduce the confusion, in this change, we just inherit from `fmt::format<string_view>`, where `string_view` is actually `fmt::string_view`. this follows the document at https://fmt.dev/latest/api.html#formatting-user-defined-types, and since there is less indirection under the hood -- we do not use the specialization created by `FMT_FORMAT_AS` which inherit from `formatter<fmt::string_view>`, hopefully this can improve the compilation speed a little bit. also, this change addresses the build failure with {fmt} v10. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18299	2024-04-19 07:44:07 +03:00
Avi Kivity	6e487a49aa	Merge 'toolchain: support building an optimized clang' from Takuya ASADA This is complete version of #12786, since I take over the issue from @mykaul. Update from original version are: - Support ARM64 build (disable BOLT for now since it doesn't functioning) - Changed toolchain settings to get current scylla able to build (support WASM, etc) - Stop git clone scylladb repo, create git-archive of current scylla directory and import it - Update Clang to 17.0.6 - Save entire clang directory for BUILD mode, not just /usr/bin/clang binary - Implemented INSTALL_PREBUILT mode to install prebuilt image which built in BUILD mode Note that this patch drops cross-build support of frozen toolchain, since building clang and scylla multiple time in qemu-user-static will very slow, it's not usable. Instead, we should build the image for each architecture natively. ---- This is a different way attempting to combine building an optimized clang (using LTO, PGO and BOLT, based on compiling ScyllaDB) to dbuild. Per Avi's request, there are 3 options: skip this phase (which is the current default), build it and build + install it to the default path. Fixes: #10985 Fixes: scylladb/scylla-enterprise#2539 Closes scylladb/scylladb#17196 * github.com:scylladb/scylladb: toolchain: support building an optimized clang configure.py: add --build-dir option	2024-04-18 19:20:23 +00:00
Anna Stuchlik	a3481a4566	doc: document the system_auth_v2 feature This commit includes updates related to replacing system_auth with system_auth_v2. - The keyspace name system_auth is renamed to system_auth_v2. - The procedures are updated to account for system_auth_v2. - No longer required system_auth RF changes are removed from procedures. - The information is added that if the consistent topology updates feature was not enabled upon upgrade from 5.4, there are limitations or additional steps to do (depending on the procedure). The files with that kind of information are to be found in _common folders and included as needed. - The upgrade guide has been updated to reflect system_auth_v2 and related impacts. Closes scylladb/scylladb#18077	2024-04-18 18:33:49 +02:00
Kefu Chai	21b03d2ce3	topology_coordinator: remove unused variable when compiling the tree with clang-19, it complains: ``` /home/kefu/dev/scylladb/service/topology_coordinator.cc:1968:31: error: variable 'reject' set but not used [-Werror,-Wunused-but-set-variable] 1968 \| if (auto* reject = std::get_if<join_node_response_params::rejected>(&validation_result)) { \| ^ 1 error generated. ``` so, despite that we evaluate the assignment statement to see it evaluates to true or false, the compiler still believes that the variable is not used. probably, the value of the statement is not dependent on the value of the value being assigned. either way, let's use `std::holds_alternative<..>` instead of `std::get_if<..>`, to silence this warning, and the code is a little bit more compacted, in the sense of less tokens in the `if` statement. in order to be self-contained, we take the opportunity to include `<variant>` in this source file, as a function declared in this header is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18291	2024-04-18 18:04:56 +03:00
Amnon Heiman	e8410848a8	Update seastar submodule This patch updates the seastar submodule to get the latest safety patch for the metric layer. The latest patch allows manipulating metric_families early in the start-up process and is safer if someone chooses to aggregate summaries. * seastar f3058414...8fabb30a (4): > stall-analyser: improve stall pattern matching > TLS: Move background BYE handshake to engine::run_in_background > metrics.cc: Safer set_metric_family_configs > src/core/metrics.cc: handle SUMMARY add operator Closes scylladb/scylladb#18293	2024-04-18 18:02:28 +03:00
Tomasz Grabiec	393cb54c01	Merge 'Generalize tablet transition API calls' from Pavel Emelyanov Recently there had been added add_tablet_replica and del_tablet_replica API calls that copy big portion of the existing move_tablet API call's logic. This PR generalizes the common parts Closes scylladb/scylladb#18272 * github.com:scylladb/scylladb: tablets: Generalize transition mutations preparation tablets: Generalize tablet-already-in-transition check tablets: Generalize raft communications for tablet transition API calls tablets: Drop src vs dst equality check from move_tablet()	2024-04-18 14:42:10 +02:00
Anna Stuchlik	ad81f9f56a	doc: replace Scylla with ScyllaDB in Glossary This commit replaces "Scylla" with "ScyllaDB" on the Glossary page. The product has been rebranded as "ScyllaDB". Closes scylladb/scylladb#18296	2024-04-18 14:59:23 +03:00
Kamil Braun	9c2a836607	Revert "Merge 'Drain view_builder in generic drain' from ScyllaDB" This reverts commit `298a7fcbf2`, reversing changes made to `5cf53e670d`. The change made CI flaky. Fixes: scylladb/scylladb#18278	2024-04-18 11:50:41 +02:00
Aleksandr Bykov	e8833c6f2a	test: Kill coordinator during topology operation If coordinator node was killed, restarted, become not operatable during topology operation, new coordinator should be elected, operation should be aborted and cluster should be rolled back Error injection will be used to kill the coordinator before streaming starts Closes scylladb/scylladb#16197	2024-04-17 17:24:20 +02:00
Tomasz Grabiec	c6c8347493	migration_manager: Pull all of group0 state on repair Current code uses non-raft path to pull the schema, which violates group0 linearizability because the node will have latest schema but miss group0 updates of other system tables. In particular, system.tablets. This manifests as repair errors due to missing tablet_map for a given table when trying to access it. Tablet map is always created together with the table in the same group0 command. When a node is bootstrapping, repair calls sync_schema() to make sure local schema is up to date. This races with group0 catch up, and if sync_schema() wins, repair may fail on misssing tablet map. Fix by making sync_schema() do a group0 read barrier when in raft mode. Fixes #18002 Closes scylladb/scylladb#18175	2024-04-17 16:21:05 +02:00
Yaron Kaikov	44d1ffe86b	[github] add PR template Today with the backport automation, the developer added the relevant backport label, but without any explanation of why Adding the PR template with a placeholder for the developer to add his decision about backport yes or no The placeholder is marked as a task, so once the explanation is added, the task must be checked as completed	2024-04-17 15:40:32 +03:00
Nadav Har'El	e78fc75323	Merge 'tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands' from Botond Dénes Just like all the other commands already have it. These commands didn't have documentation at the point where they were implemented, hence the missing doc link. The links don't work yet, but they will work once we release 6.0 and the current master documentation is promoted to stable. Closes scylladb/scylladb#18147 * github.com:scylladb/scylladb: tools/scylla-nodetool: fix typo: Fore -> For tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands	2024-04-17 15:15:56 +03:00
Asias He	642f9a1966	repair: Improve estimated_partitions to reduce memory usage Currently, we use the sum of the estimated_partitions from each participant node as the estimated_partitions for sstable produced by repair. This way, the estimated_partitions is the biggest possible number of partitions repair would write. Since repair will write only the difference between repair participant nodes, using the biggest possible estimation will overestimate the partitions written by repair, most of the time. The problem is that overestimated partitions makes the bloom filter consume more memory. It is observed that it causes OOM in the field. This patch changes the estimation to use a fraction of the average partitions per node instead of sum. It is still not a perfect estimation but it already improves memory usage significantly. Fixes #18140 Closes scylladb/scylladb#18141	2024-04-17 14:31:38 +03:00
Pavel Emelyanov	1b2cd56bcc	tablets: Generalize transition mutations preparation Tablet transition handlers prepare two mutations -- one for tablets table, that sets transition state, transition mode and few others; and another one for topology table that "activates" the tablet_migration state for topology coordinator. The latter is common to all three handlers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 12:01:51 +03:00
Pavel Emelyanov	3beccb8165	tablets: Generalize tablet-already-in-transition check Continuation of the previous patch -- there's a common sanity check of tablet transition API handlers, namely that this tablet is not in transition already. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 12:01:02 +03:00
Pavel Emelyanov	14923812ad	tablets: Generalize raft communications for tablet transition API calls There are three transition calls -- move, add replica and del replica -- and all three work similarly. In a loop they try to get guard for raft operation, then perform sanity checks on topology state, then prepare mutations and then try to apply them to raft. After the loop finishes all three wait for transition for the given tablet to complete. This patch generalizes the raft kicking loop and the transition completion waiting code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 11:59:03 +03:00
Pavel Emelyanov	c4d538320e	tablets: Drop src vs dst equality check from move_tablet() The code here looks like this if src.host == dst.host throw "Local migration not possible" if src == dst co_return; The 2nd check is apparently never satisfied -- if src == dst this means that src.host == dst.host and it should have thrown already Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 11:57:10 +03:00
Kefu Chai	e431e7dc16	test: paritioner_test: print using fmt::print() instead of using `operator<<`, use `fmt::print()` to format and print, so we can ditch the `operator<<`-based formatters. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18259	2024-04-17 07:13:20 +03:00
Kefu Chai	0ff28b2a2a	test: extract boost_test_print_type() into test_utils.hh since Boost.Test relies on operator<< or `boost_test_print_type()` to print the value of variables being compared, instead of defining the fallback formatter of `boost_test_print_type()` for each individual test, let's define it in `test/lib/test_utils.hh`, so that it can be shared across tests. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18260	2024-04-17 07:12:39 +03:00
Kefu Chai	2bb8e7c3c3	utils: include "seastarx.hh" in composite_abort_source.hh there is chance that `utils/small_vector.hh` does not include `using namespace seastar`, and even if it does, we should not rely on it. but if it does not, checkhh would fail. so let's include "seastarx.hh" in this header, so it is self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18265	2024-04-17 07:11:01 +03:00
David Garcia	6707bc673c	docs: update theme 1.7 Closes scylladb/scylladb#18252	2024-04-16 13:48:11 +02:00
Kamil Braun	eb9ba914a3	Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy The problem this series solves is correctly ignoring DOWN nodes state when replacing a node. When a node is replaced and there are other nodes that are down, the replacing node is told to ignore those DOWN nodes using the `ignore_dead_nodes_for_replace` option. Since the replacing node is bootstrapping it starts with an empty system.peers table so it has no notion about any node state and it learns about all other nodes via gossip shadow round done in `storage_service::prepare_replacement_info`. Normally, since the DOWN nodes to ignore already joined the ring, the remaining node will have their endpoint state already in gossip, but if the whole cluster was restarted while those DOWN nodes did not start, the remaining nodes will only have a partial endpoint state from them, which is loaded from system.peers. Currently, the partial endpoint state contains only `HOST_ID` and `TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`. The first part of this series loads also `DC` and `RACK` from system.peers to make them available to the replacing node as they are crucial for building a correct replication map with network topology replication strategy. But still, without a `STATUS` those nodes are not considered as normal token owners yet, and they do not go through handle_state_normal which adds them to the topology and token_metadata. The second part of this series uses the endpoint state retrieved in the gossip shadow round to explicitly add the ignored nodes' state to topology (including dc and rack) and token_metadata (tokens) in `prepare_replacement_info`. If there are more DOWN nodes that are not explicitly ignored replace will fail (as it should). Fixes scylladb/scylladb#15787 Closes scylladb/scylladb#15788 * github.com:scylladb/scylladb: storage_service: join_token_ring: load ignored nodes state if replacing storage_service: replacement_info: return ignore_nodes state locator: host_id_or_endpoint: keep value as variant gms: endpoint_state: add getters for host_id, dc_rack, and tokens storage_service: topology_state_load: set local STATUS state using add_saved_endpoint gossiper: add_saved_endpoint: set dc and rack gossiper: add_saved_endpoint: fixup indentation gossiper: add_saved_endpoint: make host_id mandatory gossiper: add load_endpoint_state gossiper: start_gossiping: log local state	2024-04-16 10:27:36 +02:00
Pavel Emelyanov	2c3d6fe72f	storage_proxy: Simplify create_hint_sync_point() code It tries to call container().invoke_on_all() the hard way. Calling it directly is not possible, because there's no sharded::invoke_on_all() const overload Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18202	2024-04-16 07:26:06 +03:00
Nadav Har'El	a175e34375	cql-pytest: add instructions on how to get Cassandra The cql-pytest framework allows running tests also against Cassandra, but developers need to install Cassandra on their own because modern distributions such as Fedora no longer carry a Cassandra package. This patch adds clear and easy to follow (I think) instructions on how to download a pre-compiled Cassadra, or alternatively how to download and build Cassandra from source - and how either can be used with the test/cql-pytest/run-cassandra script. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18138	2024-04-16 07:23:36 +03:00
Botond Dénes	298a7fcbf2	Merge 'Drain view_builder in generic drain' from ScyllaDB For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself. refs: #2737 refs: #2795 Closes scylladb/scylladb#16558 * github.com:scylladb/scylladb: storage_service: Drain view builder on drain too view_builder: Generalize mark_as_built(view_ptr) method view_builder: Move mark_existing_views_as_built from storage service storage_service: Add view_builder& reference main,cql_test_env: Move view_builder start up (and make unconditional)	2024-04-16 07:21:42 +03:00
Pavel Emelyanov	5cf53e670d	replica: Remove unused ex variable from table::take_snapshot Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18215	2024-04-16 07:16:38 +03:00
Pavel Emelyanov	f17c594d21	large_data_handler: If-less statistics increment The partitions_bigger_than_threshold is incremented only if the previous check detects that the partition exceeds a threshold by its size. It's done with an extra if, but it can be done without (explicit) condition as bool type is guaranteed by the standard to convert into integers as true = 1 and false = 0 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18217	2024-04-16 07:16:05 +03:00
Pavel Emelyanov	0f70d276d2	tools/scylla-sstable: Use shorter check is unordered_set contains a key Currentl code counts the number of keys in it just to see if this number is non-zero. Using .contains() method is better fit here Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18219	2024-04-16 07:14:48 +03:00
Pavel Emelyanov	1df7c2a0e9	topology_coordinator: Mark retake_node() const Runaway from `4d83a8c12c` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18218	2024-04-16 07:13:07 +03:00
Pavel Emelyanov	05c4042511	api/lsa: Don't use database to perform invoke-on-all The sharded<database> is used as a invoke_in_all() method provider, there's no real need in database itself. Simple smp::invoke_on_all() would work just as good. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18221	2024-04-16 07:12:40 +03:00
Pavel Emelyanov	4a6291dce5	test/sstable: Use .handle_exception_type() shortcut Some tests want to ignore out_of_range exception in continuation and go the longer route for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18216	2024-04-16 07:11:35 +03:00
Pavel Emelyanov	1612aa01ca	cql3: Reserve vector with pk columns When constructing a vector with partition key data, the size of that vector is known beforehand Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18239	2024-04-16 07:06:07 +03:00
Pavel Emelyanov	f3edde7d2e	api: Qualify callback commitlog* argument with const There's a helper map-reducer that accepts a function to call on commitlog. All callers accumulate statistics with it, so the commitlog argument is const pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18238	2024-04-16 07:02:31 +03:00
Botond Dénes	162c9ad6f6	Merge 'gossiper: lock local endpoint when updating heart_beat' from Kamil Braun In testing, we've observed multiple cases where nodes would fail to observe updated application states of other nodes in gossiper. For example: - in scylladb/scylladb#16902, a node would finish bootstrapping and enter NORMAL state, propagating this information through gossiper. However, other nodes would never observe that the node entered NORMAL state, still thinking that it is in joining state. This would lead to further bad consequences down the line. - in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for schema versions to converge. Convergence would never be achieved and the test eventually timed out. The node was observing outdated schema state of some existing node in gossip. I created a test that would bootstrap 3 nodes, then wait until they all observe each other as NORMAL, with timeout. Unfortunately, thousands of runs of this test on different machines failed to reproduce the problem. After banging my head against the wall failing to reproduce, I decided to sprinkle randomized sleeps across multiple places in gossiper code and finally: the test started catching the problem in about 1 in 1000 runs. With additional logging and additional head-banging, I determined the root cause. The following scenario can happen, 2 nodes are sufficient, let's call them A and B: - Node B calls `add_local_application_state` to update its gossiper state, for example, to propagate its new NORMAL status. - `add_local_application_state` takes a copy of the endpoint_state, and updates the copy: ``` auto local_state = ep_state_before; for (auto& p : states) { auto& state = p.first; auto& value = p.second; value = versioned_value::clone_with_higher_version(value); local_state.add_application_state(state, value); } ``` `clone_with_higher_version` bumps `version` inside gms/version_generator.cc. - `add_local_application_state` calls `gossiper.replicate(...)` - `replicate` works in 2 phases to achieve exception safety: in first phase it copies the updated `local_state` to all shards into a separate map. In second phase the values from separate map are used to overwrite the endpoint_state map used for gossiping. Due to the cross-shard calls of the 1 phase, there is a yield before the second phase. During this yield* the following happens: - `gossiper::run()` loop on B executes and bumps node B's `heart_beat`. This uses the monotonic version_generator, so it uses a higher version then the ones we used for states added above. Let's call this new version X. Note that X is larger than the versions used by application_states added above. - now node B handles a SYN or ACK message from node A, creating an ACK or ACK2 message in response. This message contains: - old application states (NOT including the update described above, because `replicate` is still sleeping before phase 2), - but bumped heart_beat == X from `gossiper::run()` loop, and sends the message. - node A receives the message and remembers that the max version across all states (including heart_beat) of node B is X. This means that it will no longer request or apply states from node B with versions smaller than X. - `gossiper.replicate(...)` on B wakes up, and overwrites endpoint_state with the ones it saved in phase 1. In particular it reverts heart_beat back to smaller value, but the larger problem is that it saves updated application_states that use versions smaller than X. - now when node B sends the updated application_states in ACK or ACK2 message to node A, node A will ignore them, because their versions are smaller than X. Or node B will never send them, because whenever node A requests states from node B, it only requests states with versions > X. Either way, node A will fail to observe new states of node B. If I understand correctly, this is a regression introduced in `38c2347a3c`, which introduced a yield in `replicate`. Before that, the updated state would be saved atomically on shard 0, there could be no `heart_beat` bump in-between making a copy of the local state, updating it, and then saving it. With the description above, it's easy to make a consistent reproducer for the problem -- introduce a longer sleep in `add_local_application_state` before second phase of replicate, to increase the chance that gossiper loop will execute and bump heart_beat version during the yield. Further commit adds a test based on that. The fix is to bump the heart_beat under local endpoint lock, which is also taken by `replicate`. The PR also adds a regression test. Fixes: scylladb/scylladb#15393 Fixes: scylladb/scylladb#15602 Fixes: scylladb/scylladb#16668 Fixes: scylladb/scylladb#16902 Fixes: scylladb/scylladb#17493 Fixes: scylladb/scylladb#18118 Ref: scylladb/scylla-enterprise#3720 Closes scylladb/scylladb#18184 * github.com:scylladb/scylladb: test: reproducer for missing gossiper updates gossiper: lock local endpoint when updating heart_beat	2024-04-16 06:46:24 +03:00
Tzach Livyatan	289793d964	Update Driver root page The right term is Amazon DynamoDB not AWS DynamoDB See https://aws.amazon.com/dynamodb/ Closes scylladb/scylladb#18214	2024-04-16 06:41:28 +03:00
Beni Peled	223275b4d1	test.py: add the pytest junit_suite_name parameter By default the suitename in the junit files generated by pytest is named `pytest` for all suites instead of the suite, ex. `topology_experimental_raft` With this change, the junit files will use the real suitename This change doesn't affect the Test Report in Jenkins, but it raised part of the other task of publishing the test results to elasticsearch https://github.com/scylladb/scylla-pkg/pull/3950 where we parse the XMLs and we need the correct suitename Closes scylladb/scylladb#18172	2024-04-15 21:07:00 +03:00
Tomasz Grabiec	95d93c1668	Merge 'Extend tablet_transition_kind::rebuild to remove replicas' from Pavel Emelyanov When altering rf for a keyspace, all tablets in this ks may have less replicas. Part of this process is removing replicas from some node(s). This PR extends the tablets rebuild transition to handle this case by making pending_replica optional. fixes: #18176 Closes scylladb/scylladb#18203 * github.com:scylladb/scylladb: test: Tune up tablet-transition test to check del_replica api: Add method to delete replica from tablet tablet: Make pending replica optional	2024-04-15 21:01:03 +03:00
Pavel Emelyanov	c60639d582	sstables: Coroutinize drop_caches() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18220	2024-04-15 17:22:59 +03:00
Pavel Emelyanov	b06b85c270	test: Tune up tablet-transition test to check del_replica For that the test case is modified to have 3 nodes and 2 replicas on start. Existing test cases are changed slightly in the way "from" host is detected. Also, the final check for data presense is modified to check that hosts in "replicas" have data and other hosts don't have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Pavel Emelyanov	8bad828208	api: Add method to delete replica from tablet Copied from the add_replica counterpart TODO: Generalize common parts of move_tablet and add_\|del_tablet_replica Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Pavel Emelyanov	725b2863d2	tablet: Make pending replica optional Just like leaving replica could be optional when adding replica to tablet, the pending replica can be optional too if we're removing a replica from tablet Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Amnon Heiman	06dc56df01	Update seastar submodule Fixes scylladb/scylladb#18083 * seastar cd8a9133...f3058414 (18): > src/core/metrics.cc: rewrite set_metric_family_configs > include/seastar/core/metrics_api.hh: Revert d2929c2ade5bd0125a73d53280c82ae5da86218e > sstring: include <fmt/format.h> instead of <fmt/ostream.h> > seastar.cc: include used header > tls: include used header of <unordered_set> > docs: remove unused parameter from handle_connection function of echo-HTTP-server tutorial example > stall-analyser: use 0 for the default value of --width > http: Move parsed params and urls > scripts: use raw string to avoid invalid escape sequences > timed_out_error: add fmt::formatter for timed_out_error > scripts/stall-analyser: change default branch-threshold to 3% > scripts/stall-analyser: resolve string escape sequence warning > io_queue: Use static vector for fair groups too > io_queue: Use static vector to store fair queues > stall-analyser: add space around '=' in param list > stall-analyser: add a space between 'var: Type' in type annotation > stall-analyser: move variables closer to where they are used > memory: drop support for compilers that don't support aligned new Closes scylladb/scylladb#18235	2024-04-15 15:19:59 +02:00
Tomasz Grabiec	2ceef1d600	scripts: tablet-mon.py: Support for annotating tablets by table id Closes scylladb/scylladb#18225	2024-04-15 15:19:59 +02:00
Marcin Maliszkiewicz	7e749cd848	auth: don't run legacy migrations on auth-v2 startup We won't run: - old pre auth-v1 migration code - code creating auth-v1 tables We will keep running: - code creating default rows - code creating auth-v1 keyspace (needed due to cqlsh legacy hack, it errors when executing `list roles` or `list users` if there is no system_auth keyspace, it does support case when there is no expected tables)	2024-04-15 12:09:39 +02:00
Marcin Maliszkiewicz	d40ff81c5b	auth: fix indent in password_authenticator::start	2024-04-15 12:09:32 +02:00
Marcin Maliszkiewicz	3e8cf20b98	auth: remove unused service::has_existing_legacy_users func	2024-04-15 12:09:32 +02:00
Benny Halevy	655d624e01	storage_service: join_token_ring: load ignored nodes state if replacing When a node bootstraps or replaces a node after full cluster shutdown and restart, some nodes may be down. Existing nodes in the cluster load the down nodes TOKENS (and recently, in this series, also DC and RACK) from system.peers and then populate locator::topology and token_metadata accordingly with the down nodes' tokens in storage_service::join_cluster. However, a bootstrapping/replacing node has no persistent knowledge of the down nodes, and it learns about their existance only from gossip. But since the down nodes have unknown status, they never go through `handle_state_normal` (in gossiper mode) and therefore they are not accounted as normal token owners. This is handled by `topology_state_load`, but not with gossip-based node operations. This patch updates the ignored nodes (for replace) state in topology and token_metadata as if they were loaded from system tables, after calling `prepare_replacement_info` when raft topology changes are disabled, based on the endpoint_state retrieved in the shadow round initiated in prepare_replacement_info. Fixes scylladb/scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:45:55 +03:00
Benny Halevy	e4c3c07510	storage_service: replacement_info: return ignore_nodes state Instead of `parse_node_list` resolving host ids to inet_address let `prepare_replacement_info` get host_id_or_endpoint from parse_node_list and prepare `loaded_endpoint_state` for the ignored nodes so it can be used later by the callers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:43:19 +03:00
Benny Halevy	7c2bd8dc34	locator: host_id_or_endpoint: keep value as variant Rather than allowing to keep both host_id and endpoint, keep only one of them and provide resolve functions that use the token_metadata to resolve the host_id into an inet_address or vice verse. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:25:50 +03:00
Benny Halevy	86f1fcdcdd	gms: endpoint_state: add getters for host_id, dc_rack, and tokens Allow getting metadata from the endpoint_state based on the respective application states instead of going through the gossiper. To be used by the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:16:58 +03:00
Benny Halevy	239069eae5	storage_service: topology_state_load: set local STATUS state using add_saved_endpoint When loading this node endpoint state and it has tokens in token_metadata, its status can already be set to normal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	6aaa1b0f48	gossiper: add_saved_endpoint: set dc and rack When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following https://github.com/scylladb/scylladb/pull/16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs #15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	468462aa73	gossiper: add_saved_endpoint: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	b9e2aa4065	gossiper: add_saved_endpoint: make host_id mandatory Require all callers to provide a valid host_id parameter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	1061455442	gossiper: add load_endpoint_state Pack the topology-related data loaded from system.peers in `gms::load_endpoint_state`, to be used in a following patch for `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:56 +03:00
Benny Halevy	6b2d94045a	gossiper: start_gossiping: log local state The trace level message hides important information about the initial node state in gossip. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:30 +03:00
Benny Halevy	a7c5fccab9	test: chunked_managed_vector_test: add test_push_back_using_existing_element chunked_managed_vector isn't susceptible to #18072 since the elements it keeps are managed_ref<T> and those must be constructed by the caller, before reallocation takes place, so it's safer with that respect. The unit test is added to verify that and prevent regressions in the future. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:34:50 +03:00
Benny Halevy	2afc584f08	utils: chunked_vector: reserve_for_emplace_back: emplace before migrating existing elements Currently, push_back or emplace_back reallocate the last chunk before constructing the new element. If the arg passed to push_back/emplace_back is a reference to an existing element in the vector, reallocating the last chunk will invalidate the arg reference before it is used. This patch changes the order when reallocating the last chunk in reserve_for_emplace_back: First, a new chunk_ptr is allocated. Then, the back_element is emplaced in the newly allocated array. And only then, existing elements in the current last chunk are migrated to the new chunk. Eventually, the new chunk replaces the existing chunk. If no reservation is requried, the back element is emplaced "in place" in the current last chunk. Fixes scylladb/scylladb#18072 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:34:48 +03:00
Benny Halevy	2c0e40a21f	utils: chunked_vector: push_back: call emplace_back When pushing an element with a value referencing an exisiting element in the vector, we currently risking use-after-free when that element gets moved to a reallocated chunk, if capacity needs to be reserved, by that, invaliding the refernce to the existing element before it is used. This patch prepares for fixing that in the emplace path by converging to a single code path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:33:43 +03:00
Benny Halevy	882bb21903	utils: chunked_vector: define min_chunk_capacity Expose the number of items in the first allocated chunk. This will be used by a unit test in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:33:43 +03:00
Benny Halevy	e066f81cb3	utils: chunked*vector: use std::clamp It is available in the std library since C++17. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:33:43 +03:00
Kefu Chai	0be61e51d3	treewide: include <fmt/ostream.h> this header was previously brought in by seastar's sstring.hh. but since sstring.hh does not include <fmt/ostream.h> anymore, `gms/application_state.cc` does not have access to this header. also, `gms/application_state.cc` should `#include` the used header by itself. so, in this change, let's include <fmt/ostream.h> in `gms/application_state.cc`. this change addresses the FTBFS with the latest seastar. the same applies to other places changed in this commit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18193	2024-04-11 11:59:41 +03:00
Yaniv Kaul	bd34f2fe46	toolchain: support building an optimized clang This is a different way attempting to combine building an optimized clang (using LTO, PGO and BOLT, based on compiling ScyllaDB) to dbuild. Per Avi's request, there are 3 options: skip this phase (which is the current default), build it and build + install it to the default path. Fixes: #10985 Fixes: scylladb/scylla-enterprise#2539	2024-04-08 22:53:59 +09:00
Pavel Emelyanov	1e0d96cfed	storage_service: Drain view builder on drain too This gets rid of dangling deferred drin on stop and makes nodetool drain more "consistent" by stopping one more unneeded background activity Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:12 +03:00
Pavel Emelyanov	90593f4e82	view_builder: Generalize mark_as_built(view_ptr) method Marking is performed in two places and they can be generalized Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:12 +03:00
Pavel Emelyanov	3c3f2cd337	view_builder: Move mark_existing_views_as_built from storage service Now it's in the correct component Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:11 +03:00
Pavel Emelyanov	895391fb4b	storage_service: Add view_builder& reference Storage service will need to drain v.b. on its drain. Also on cluster join it marks existing views as built while it's v.b.'s job to do it. Both will be fixed by next patching and this is prerequisite. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:55:07 +03:00
Pavel Emelyanov	f00f1f117b	main,cql_test_env: Move view_builder start up (and make unconditional) Just starting sharded<view_builder> is lightweight, its constructor does nothing but initializes on-board variables. Real work takes off on view_builder::start() which is not moved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:53:33 +03:00
Botond Dénes	c01b19fcb3	Merge 'test/boost: add test for writing large partition notifications' from Ferenc Szili The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions. This change adds this check for partitions. Closes scylladb/scylladb#18189 * github.com:scylladb/scylladb: test/boost: added test for large row count warning test/boost: add test for writing large partition notifications	2024-04-05 15:35:54 +03:00
Botond Dénes	f6efa17713	Merge 'repair: fix memory counting in repair' from Aleksandra Martyniuk Repair memory limit includes only the size of frozen mutation fragments in repair row. The size of other members of repair row may grow uncontrollably and cause out of memory. Modify what's counted to repair memory limit. Fixes: #16710. Closes scylladb/scylladb#17785 * github.com:scylladb/scylladb: test: add test for repair_row::size() repair: fix memory accounting in repair_row	2024-04-05 14:53:55 +03:00
Tomasz Grabiec	0c74c2c12f	Merge 'Extend tablet_transition_kind::rebuild to rebuild tablet to new replica' from Pavel Emelyanov When altering rf for a keyspace, all tablets in this ks will get more replicas. Part of this process is rebuilding tablets' onto new node(s). This PR extends the tablets transition code to support rebuilding of tablet on new replica. fixes: #18030 Closes scylladb/scylladb#18082 * github.com:scylladb/scylladb: test: Check data presense as well test: Test how tablets are copied between nodes test: Add sanity test for tablet migration api: Add method to add replica to a tablet tablet: Make leaving replica optional	2024-04-05 12:51:10 +02:00
Ferenc Szili	443192e36d	test/boost: added test for large row count warning	2024-04-05 11:50:09 +02:00
Pavel Emelyanov	639cc1f576	compaction: Replace formatted_sstables_list with fmt:: facilities The formatted_sstables_list is auxiliary class that collects a bunch of sstables::to_string(shared_sstable)-generated strings. One of bad side effects of this helper is that it allocates memory for the vector of strings. This patch achieves the same goal with the help of fmt::join() equipped with transformed boost adaptor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18160	2024-04-05 09:17:15 +03:00
Kefu Chai	ff43628b44	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18194	2024-04-05 08:48:17 +03:00
Pavel Emelyanov	2a98e95cd0	api: Coroutinize API get_snapshot_details handler Now it's possible to understand what it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18190	2024-04-04 22:20:28 +03:00
Kamil Braun	72955093eb	test: reproducer for missing gossiper updates Regression test for scylladb/scylladb#17493.	2024-04-04 18:47:01 +02:00
Kamil Braun	a0b331b310	gossiper: lock local endpoint when updating heart_beat In testing, we've observed multiple cases where nodes would fail to observe updated application states of other nodes in gossiper. For example: - in scylladb/scylladb#16902, a node would finish bootstrapping and enter NORMAL state, propagating this information through gossiper. However, other nodes would never observe that the node entered NORMAL state, still thinking that it is in joining state. This would lead to further bad consequences down the line. - in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for schema versions to converge. Convergence would never be achieved and the test eventually timed out. The node was observing outdated schema state of some existing node in gossip. I created a test that would bootstrap 3 nodes, then wait until they all observe each other as NORMAL, with timeout. Unfortunately, thousands of runs of this test on different machines failed to reproduce the problem. After banging my head against the wall failing to reproduce, I decided to sprinkle randomized sleeps across multiple places in gossiper code and finally: the test started catching the problem in about 1 in 1000 runs. With additional logging and additional head-banging, I determined the root cause. The following scenario can happen, 2 nodes are sufficient, let's call them A and B: - Node B calls `add_local_application_state` to update its gossiper state, for example, to propagate its new NORMAL status. - `add_local_application_state` takes a copy of the endpoint_state, and updates the copy: ``` auto local_state = ep_state_before; for (auto& p : states) { auto& state = p.first; auto& value = p.second; value = versioned_value::clone_with_higher_version(value); local_state.add_application_state(state, value); } ``` `clone_with_higher_version` bumps `version` inside gms/version_generator.cc. - `add_local_application_state` calls `gossiper.replicate(...)` - `replicate` works in 2 phases to achieve exception safety: in first phase it copies the updated `local_state` to all shards into a separate map. In second phase the values from separate map are used to overwrite the endpoint_state map used for gossiping. Due to the cross-shard calls of the 1 phase, there is a yield before the second phase. During this yield* the following happens: - `gossiper::run()` loop on B executes and bumps node B's `heart_beat`. This uses the monotonic version_generator, so it uses a higher version then the ones we used for states added above. Let's call this new version X. Note that X is larger than the versions used by application_states added above. - now node B handles a SYN or ACK message from node A, creating an ACK or ACK2 message in response. This message contains: - old application states (NOT including the update described above, because `replicate` is still sleeping before phase 2), - but bumped heart_beat == X from `gossiper::run()` loop, and sends the message. - node A receives the message and remembers that the max version across all states (including heart_beat) of node B is X. This means that it will no longer request or apply states from node B with versions smaller than X. - `gossiper.replicate(...)` on B wakes up, and overwrites endpoint_state with the ones it saved in phase 1. In particular it reverts heart_beat back to smaller value, but the larger problem is that it saves updated application_states that use versions smaller than X. - now when node B sends the updated application_states in ACK or ACK2 message to node A, node A will ignore them, because their versions are smaller than X. Or node B will never send them, because whenever node A requests states from node B, it only requests states with versions > X. Either way, node A will fail to observe new states of node B. If I understand correctly, this is a regression introduced in `38c2347a3c`, which introduced a yield in `replicate`. Before that, the updated state would be saved atomically on shard 0, there could be no `heart_beat` bump in-between making a copy of the local state, updating it, and then saving it. With the description above, it's easy to make a consistent reproducer for the problem -- introduce a longer sleep in `add_local_application_state` before second phase of replicate, to increase the chance that gossiper loop will execute and bump heart_beat version during the yield. Further commit adds a test based on that. The fix is to bump the heart_beat under local endpoint lock, which is also taken by `replicate`. Fixes: scylladb/scylladb#15393 Fixes: scylladb/scylladb#15602 Fixes: scylladb/scylladb#16668 Fixes: scylladb/scylladb#16902 Fixes: scylladb/scylladb#17493 Fixes: scylladb/scylladb#18118 Ref: scylladb/scylla-enterprise#3720	2024-04-04 18:46:56 +02:00
Ferenc Szili	5624abfbeb	test/boost: add test for writing large partition notifications The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions. This change adds this check for partitions.	2024-04-04 17:33:23 +02:00
Pavel Emelyanov	c7908c319f	test: Check data presense as well Other than making sure that system.tablets is updated with correct replica set, it's also good to check that the data is present on the repsective nodes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 18:01:24 +03:00
Aleksandra Martyniuk	51c09a84cc	test: add test for repair_row::size() Add test which checs whether repair_row::size() considers external memory.	2024-04-04 16:03:05 +02:00
Aleksandra Martyniuk	a4dc6553ab	repair: fix memory accounting in repair_row In repair, only the size of frozen mutation fragments of repair row is counted to the memory limit. So, huge keys of repair rows may lead to OOM. Include other repair_row's members' memory size in repair memory limit.	2024-04-04 15:50:53 +02:00
Raphael S. Carvalho	9f93dd9fa3	replica: Use flat_hash_map for tablet storage The reason that we want to switch to flat_hash_map is that only a small subset of tablets will be allocated on any given shard, therefore it's wasteful to use a sparse array, and iterations are slow. Also, the map gives greater development flexibility as one doesn't have to worry about empty entries. perf result: -- reads scylla_with_chunked_vector-read-no-tablets.txt median 73223.28 tps ( 62.3 allocs/op, 13.3 tasks/op, 41932 insns/op, 0 errors) median 74952.87 tps ( 62.3 allocs/op, 13.3 tasks/op, 41969 insns/op, 0 errors) median 73016.37 tps ( 62.3 allocs/op, 13.3 tasks/op, 41934 insns/op, 0 errors) median 74078.14 tps ( 62.3 allocs/op, 13.3 tasks/op, 41938 insns/op, 0 errors) median 75323.07 tps ( 62.3 allocs/op, 13.3 tasks/op, 41944 insns/op, 0 errors) scylla_with_hash_map-read-no-tablets.txt median 74963.30 tps ( 62.3 allocs/op, 13.3 tasks/op, 41926 insns/op, 0 errors) median 74032.09 tps ( 62.3 allocs/op, 13.3 tasks/op, 41918 insns/op, 0 errors) median 74850.09 tps ( 62.3 allocs/op, 13.3 tasks/op, 41937 insns/op, 0 errors) median 74239.37 tps ( 62.3 allocs/op, 13.3 tasks/op, 41921 insns/op, 0 errors) median 74798.14 tps ( 62.3 allocs/op, 13.3 tasks/op, 41925 insns/op, 0 errors) scylla_with_chunked_vector-read-tablets-1.txt median 74234.27 tps ( 62.1 allocs/op, 13.3 tasks/op, 41903 insns/op, 0 errors) median 75775.98 tps ( 62.1 allocs/op, 13.3 tasks/op, 41910 insns/op, 0 errors) median 76481.56 tps ( 62.1 allocs/op, 13.2 tasks/op, 41874 insns/op, 0 errors) median 74056.67 tps ( 62.1 allocs/op, 13.3 tasks/op, 41894 insns/op, 0 errors) median 75287.68 tps ( 62.1 allocs/op, 13.3 tasks/op, 41894 insns/op, 0 errors) scylla_with_hash_map-read-tablets-1.txt median 75613.63 tps ( 62.1 allocs/op, 13.2 tasks/op, 41990 insns/op, 0 errors) median 74819.51 tps ( 62.1 allocs/op, 13.2 tasks/op, 41973 insns/op, 0 errors) median 75648.41 tps ( 62.1 allocs/op, 13.3 tasks/op, 42025 insns/op, 0 errors) median 74170.89 tps ( 62.1 allocs/op, 13.2 tasks/op, 42002 insns/op, 0 errors) median 75447.72 tps ( 62.1 allocs/op, 13.3 tasks/op, 41952 insns/op, 0 errors) scylla_with_chunked_vector-read-tablets-128.txt median 73788.57 tps ( 62.1 allocs/op, 13.2 tasks/op, 41956 insns/op, 0 errors) median 76563.63 tps ( 62.1 allocs/op, 13.3 tasks/op, 42006 insns/op, 0 errors) median 75536.12 tps ( 62.1 allocs/op, 13.2 tasks/op, 42005 insns/op, 0 errors) median 74679.17 tps ( 62.1 allocs/op, 13.3 tasks/op, 41958 insns/op, 0 errors) median 75380.95 tps ( 62.1 allocs/op, 13.2 tasks/op, 41946 insns/op, 0 errors) scylla_with_hash_map-read-tablets-128.txt median 75459.99 tps ( 62.1 allocs/op, 13.3 tasks/op, 42055 insns/op, 0 errors) median 74280.11 tps ( 62.1 allocs/op, 13.3 tasks/op, 42085 insns/op, 0 errors) median 74502.61 tps ( 62.1 allocs/op, 13.3 tasks/op, 42063 insns/op, 0 errors) median 74692.27 tps ( 62.1 allocs/op, 13.3 tasks/op, 41994 insns/op, 0 errors) median 75402.64 tps ( 62.1 allocs/op, 13.3 tasks/op, 42015 insns/op, 0 errors) -- writes scylla_with_chunked_vector-write-no-tablets.txt median 68635.17 tps ( 58.4 allocs/op, 13.3 tasks/op, 52709 insns/op, 0 errors) median 68716.36 tps ( 58.4 allocs/op, 13.3 tasks/op, 52691 insns/op, 0 errors) median 68512.76 tps ( 58.4 allocs/op, 13.3 tasks/op, 52721 insns/op, 0 errors) median 68606.14 tps ( 58.4 allocs/op, 13.3 tasks/op, 52696 insns/op, 0 errors) median 68619.25 tps ( 58.4 allocs/op, 13.3 tasks/op, 52697 insns/op, 0 errors) scylla_with_hash_map-write-no-tablets.txt median 67678.10 tps ( 58.4 allocs/op, 13.3 tasks/op, 52723 insns/op, 0 errors) median 67966.06 tps ( 58.4 allocs/op, 13.3 tasks/op, 52736 insns/op, 0 errors) median 67881.47 tps ( 58.4 allocs/op, 13.3 tasks/op, 52743 insns/op, 0 errors) median 67856.81 tps ( 58.4 allocs/op, 13.3 tasks/op, 52730 insns/op, 0 errors) median 67812.58 tps ( 58.4 allocs/op, 13.3 tasks/op, 52740 insns/op, 0 errors) scylla_with_chunked_vector-write-tablets-1.txt median 67741.83 tps ( 58.4 allocs/op, 13.3 tasks/op, 53425 insns/op, 0 errors) median 68014.20 tps ( 58.4 allocs/op, 13.3 tasks/op, 53455 insns/op, 0 errors) median 68228.48 tps ( 58.4 allocs/op, 13.3 tasks/op, 53447 insns/op, 0 errors) median 67950.96 tps ( 58.4 allocs/op, 13.3 tasks/op, 53443 insns/op, 0 errors) median 67832.69 tps ( 58.4 allocs/op, 13.3 tasks/op, 53462 insns/op, 0 errors) scylla_with_hash_map-write-tablets-1.txt median 66873.70 tps ( 58.4 allocs/op, 13.3 tasks/op, 53548 insns/op, 0 errors) median 67568.23 tps ( 58.4 allocs/op, 13.3 tasks/op, 53547 insns/op, 0 errors) median 67653.70 tps ( 58.4 allocs/op, 13.3 tasks/op, 53525 insns/op, 0 errors) median 67389.21 tps ( 58.4 allocs/op, 13.3 tasks/op, 53536 insns/op, 0 errors) median 67437.91 tps ( 58.4 allocs/op, 13.3 tasks/op, 53537 insns/op, 0 errors) scylla_with_chunked_vector-write-tablets-128.txt median 67115.41 tps ( 58.3 allocs/op, 13.3 tasks/op, 53341 insns/op, 0 errors) median 66836.07 tps ( 58.3 allocs/op, 13.3 tasks/op, 53342 insns/op, 0 errors) median 67214.07 tps ( 58.3 allocs/op, 13.3 tasks/op, 53303 insns/op, 0 errors) median 67198.25 tps ( 58.3 allocs/op, 13.3 tasks/op, 53347 insns/op, 0 errors) median 67368.78 tps ( 58.3 allocs/op, 13.3 tasks/op, 53374 insns/op, 0 errors) scylla_with_hash_map-write-tablets-128.txt median 66273.50 tps ( 58.3 allocs/op, 13.3 tasks/op, 53400 insns/op, 0 errors) median 66564.89 tps ( 58.3 allocs/op, 13.3 tasks/op, 53432 insns/op, 0 errors) median 66568.52 tps ( 58.3 allocs/op, 13.3 tasks/op, 53408 insns/op, 0 errors) median 66368.00 tps ( 58.3 allocs/op, 13.3 tasks/op, 53441 insns/op, 0 errors) median 66293.55 tps ( 58.3 allocs/op, 13.3 tasks/op, 53408 insns/op, 0 errors) Fixes #18010. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18093	2024-04-04 16:25:48 +03:00
Yaniv Kaul	2ce2649ec1	Typo: you -> your Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#17806	2024-04-04 14:55:46 +03:00
Nadav Har'El	c24bc3b57a	alternator: do not use tablets on new Alternator tables A few months ago, in merge `d3c1be9107`, we decided that if Scylla has the experimental "tablets" feature enabled, new Alternator tables should use this feature by default - exactly like this is the default for new CQL tables. Sadly, it was now decided to reverse this decision: We do not yet trust enough LWT on tablets, and since Alternator often (if not always) relies on LWT, we want Alternator tables to continue to use vnodes - not tablets. The fix is trivial - just changing the default. No test needed to change because anyway, all Alternator tests work correctly on Scylla with the tablets experimental feature disabled. I added a new test to enshrine the fact that Alternator does not use tablets. An unfortunate result of this patch will be that Alternator tables created on versions with this patch (e.g., Scylla 6.0) will not use tablets and will continue to not use tablets even if Scylla is upgraded (currently, the use of tablets is decided at table creation time, and there is no way to "upgrade" a vnode-based table to be tablet based). This patch should be reverted as soon as LWT support matures on tablets. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18157	2024-04-04 12:11:29 +03:00
Pavel Emelyanov	1c1004d1bd	sstables_loader: Format list of sstables' filenames in place Loader wants to print set of sstables' names. For that it collects names into a dedicated vector, then prints it using fmt/ranges facility. There's a way to achieve the same goal without allocating extra vector with names -- use fmt::format() and pass it a range converting sstables into their names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18159	2024-04-04 12:09:52 +03:00
Ferenc Szili	f1cc6252fd	logging: Don't log PK/CK in large partition/row/cell warning Currently, Scylla logs a warning when it writes a cell, row or partition which are larger than certain configured sizes. These warnings contain the partition key and in case of rows and cells also the cluster key which allow the large row or partition to be identified. However, these keys can contain user-private, sensitive information. The information which identifies the partition/row/cell is also inserted into tables system.large_partitions, system.large_rows and system.large_cells respectivelly. This change removes the partition and cluster keys from the log messages, but still inserts them into the system tables. The logged data will look like this: Large cells: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large cell ks_name/tbl_name: cell_name (SIZE bytes) to sstable.db Large rows: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large row ks_name/tbl_name: (SIZE bytes) to sstable.db Large partitions: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large partition ks_name/tbl_name: (SIZE bytes) to sstable.db Fixes #18041 Closes scylladb/scylladb#18166	2024-04-04 12:06:31 +03:00
Kefu Chai	3b50c39a83	scylla-gdb: access io_queue::_streams and io_queue::_fgs with static_vector in seastar's b28342fa5a301de3facf5e83dc691524a6b20604, we switched * `io_queue::_streams` from `boost::container::small_vector<fair_queue, 2>` to `boost::container::static_vector<fair_queue, 2>` * `io_queue::_fgs` from `std::vector<std::unique_ptr<fair_group>>` to `boost::container::static_vector<fair_group, 2>` so we need to update the gdb script accordingly to reflect this change, and to avoid the nested try-except blocks, we switch to a `while` statement to simplify the code structure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18165	2024-04-04 11:39:10 +03:00
Anna Stuchlik	994f807bf6	docs: add the latest image info to GCP and Azure pages This commit adds image information for the latest patch release to the GCP and Azure deployment page. The information now replaces the reference to the Download Center so that the user doesn't have to jump to another website. Fixes https://github.com/scylladb/scylladb/issues/18144 Closes scylladb/scylladb#18168	2024-04-04 11:24:39 +03:00
Kefu Chai	64b8bb239f	api/storage_service: throw if table is not found when move tablets `database::find_column_family()` throws no_such_column_family if an unknown ks.cf is fed to it. and we call into this function without checking for the existence of ks.cf first. since "/storage_service/tablets/move" is a public interface, we should translate this error to a better http error. in this change, we check for the existence of the given ks.cf, and throw an exception so that it can be caught by seastar::httpd::routers, and converted to an HTTP error. Fixes #17198 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17217	2024-04-04 11:23:52 +03:00
Pavel Emelyanov	590f0329ae	test: Test how tablets are copied between nodes This patches the previously introduced test by introducing the 'action' test paramter and tweaking the final checking assertions around tablet replicas read from system.tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:22:57 +03:00
Pavel Emelyanov	28964ba5fe	test: Add sanity test for tablet migration It just checks that after api call to move_tablet the resulting replica is in expected state. This test will be later expanded to check for rebuild transition. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:22:31 +03:00
Pavel Emelyanov	79ad760e95	api: Add method to add replica to a tablet The new API submits rebuild transition with new replicas set to be old (current) replicas plus the provided one. It looks and acts like the move_tablet API call with several changes: - lacks the "source" replica argument - submits "rebuild" transition kind - cross racks checks are not performed The 'force' argument is inherited from move_tablet, but is unused now and is left for future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:22:16 +03:00
Tomasz Grabiec	1a839bcb36	main: Skip tablet metadata loading in maintenance mode If system.tablets is corrupted, the node would not boot in maintenance mode, which is needed to fix system.tablets. Closes scylladb/scylladb#17990	2024-04-04 09:20:09 +03:00
Pavel Emelyanov	b0cba57e29	tablet: Make leaving replica optional When getting leaving replica from from tablet info and transition info, the getter code assumes that this replica always exists. It's not going to be the case soon, so make the return value be optional. There are four places that mess with leaving replica: - stream tablet handler: this place checks that the leaving replica is _not_ current host. If leaving replica is missing, the check should pass - cleanup tablet handler: this place checks that the leaving replica _is_ current host. If leaving replica is missing, the check should fail as well - topology coordinator: it gets leaving replica to call cleanup on. If leaving replica is missing, the cleanup call is short-circuited to succeed immediately - load-stats calculator: it checks if the leaving replica is self. This check is not patched as it's automatically satisfied by std::optional comparison operator overload for wrapped type Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:03:36 +03:00
Michał Chojnowski	8147ab69ac	row_cache_test: avoid a throw in external_updater In test_exception_safety_of_update_from_memtable, we have a potential throw from external_updater. external_updater is supposed to be infallible. Scylla currently aborts when an external_updater throws, so a throw from there just fails the test. This isn't intended. We aren't testing external_updater in this test. Fixes #18163 Closes scylladb/scylladb#18171	2024-04-03 23:22:08 +02:00
Piotr Dulikowski	baae811142	Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail. Fixes https://github.com/scylladb/scylladb/issues/17736 Closes scylladb/scylladb#18039 * github.com:scylladb/scylladb: auth: keep auth version in scylla_local auth: coroutinize service::start	2024-04-03 12:25:56 +02:00
Kefu Chai	e2f3fed373	service: qos: fix a typo s/accesor/accessor/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18124	2024-04-03 10:33:54 +02:00
Raphael S. Carvalho	12714a4123	locator: Avoid tablet map lookup on every write for getting replicas We can cache tablet map in erm, to avoid looking it up on every write for getting write replicas. We do that in tablet_sharder, but not in tablet erm. Tablet map is immutable in the context of a given erm, so the address of the map is stable during erm lifetime. This caught my attention when looking at perf diff output (comparing tablet and vnode modes). It also helps when erm is called again on write completion for checking locality, used for forwarding info to the driver if needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18158	2024-04-03 10:28:04 +02:00
Botond Dénes	d43670046b	test/lib: random_schema: disallow boolean_type in keys They result in poor distribution and poor cardinality, interfering with tests which want to generate N partitions or rows. Fixes: #17821 Closes scylladb/scylladb#17856	2024-04-03 09:52:36 +03:00
Botond Dénes	2cb5dcabf7	docs/dev/maintainer.md: document another exceptions to rule no.0 Maintainers are also allowed to commit their own backport PR. They are allowed to backport their own code, opening a PR to get a CI run for a backport doesn't change this. Closes scylladb/scylladb#17727	2024-04-03 09:51:19 +03:00
Botond Dénes	6771c646c4	tools/scylla-nodetool: fix typo: Fore -> For	2024-04-03 02:16:59 -04:00
Botond Dénes	b6db56286a	tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands Just like all the other commands already have it. These commands didn't have documentation at the point where they were implemented, hence the missing doc link. The links don't work yet, but they will work once we release 6.0 and the current master documentation is promoted to stable.	2024-04-03 02:16:03 -04:00
Piotr Dulikowski	3ba7a4ead2	Merge 'api: upgrade_to_raft topology: add logging' from Benny Halevy Upgrading raft topology is an important api call that should be logged. When failed, it is also important to log the exception to get better visibility into why the call failed. Closes scylladb/scylladb#18143 * github.com:scylladb/scylladb: api: storage_service: upgrade_to_raft_topology: fixup indentation api: storage_service: upgrade_to_raft_topology: add logging	2024-04-03 07:00:10 +02:00
Pavel Emelyanov	8550a38a8b	cql: Reserve vector of column definitions in advance The vector in question is populted from the content of another map, so its size is known in advance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18155	2024-04-02 22:35:10 +03:00
Marcin Maliszkiewicz	562caaf6c6	auth: keep auth version in scylla_local Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail.	2024-04-02 19:04:21 +02:00
Benny Halevy	1272d736c0	api: storage_service: upgrade_to_raft_topology: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-02 20:02:51 +03:00
Benny Halevy	31026ae27f	api: storage_service: upgrade_to_raft_topology: add logging Upgrading raft topology is an important api call that should be logged. When failed, it is also important to log the exception to get better visibility into why the call failed. Indentation will be fixed in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-02 20:02:49 +03:00
Kefu Chai	15d59db98b	cql3: select_statement: include <ranges> we should include used header, to avoid compilation failures like: ``` cql3/statements/select_statement.cc:229:79: error: no member named 'filter' in namespace 'std::ranges::views' for (const auto& used_function : used_functions \| std::ranges::views::filter(not_native)) { ~~~~~~~~~~~~~~~~~~~~^ 1 error generated.` ``` if some of the included header drops its own `#include <optional>`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18145	2024-04-02 18:47:54 +03:00
Botond Dénes	2179bfc40d	Merge 'Relax initialization of virtual tables' from Pavel Emelyanov It now happens in initialize_virtual_tables(), but this function is split into sub-calls and iterates over virtual tables map several times to do its work. This PR squashes it into a straightforward code which is shorter and, hopefully, easier to read. Closes scylladb/scylladb#18133 * github.com:scylladb/scylladb: virtual_tables: Open-code install_virtual_readers_and_writers() virtual_tables: Move readers setup loop into add_table() virtual_tables: Move tables creation loop into add_table() virtual_tables: Make add_tablet() a coroutine virtual_tables: Open-code register_virtual_tables()	2024-04-02 13:39:26 +03:00
Botond Dénes	469ff4f290	Merge 'repair: Load repair history in background' from Asias He Currently, we load the repair history during boot up. If the number of repair history entries is high, it might take a while to load them. In my test, to load 10M entries, it took around 60 seconds. It is not a must to load the entries during boot up. It is better to load them in the background to speed up the boot time. Fixes #17993 Closes scylladb/scylladb#17994 * github.com:scylladb/scylladb: repair: Load repair history in background repair: Abort load_history process in shutdown	2024-04-02 10:53:10 +03:00
Botond Dénes	fd12052c89	Update tools/java/ submodule * tools/java/ d61296dc...b810e8b0 (1): > do not include {dclocal_,}read_repair_chance if not enabled	2024-04-02 10:47:57 +03:00
Yaron Kaikov	fcdb80773e	github: sync-labels: run only in scylladb oss repo We currently support the sync-label only in OSS. Since Scylla-enterprise get all the commits from OSS repo, the sync-label is running and failing during checkout (since it's a private repo and should have different configuration) For now, let's limit the workflows for oss repo Closes scylladb/scylladb#18142	2024-04-02 10:45:17 +03:00
Botond Dénes	ffdd47c2b1	Merge 'Track and limit memory used by bloom filters' from Lakshmi Narayanan Sreethar Added support to track and limit the memory usage by sstable components. A reclaimable component of an SSTable is one from which memory can be reclaimed. SSTables and their managers now track such reclaimable memory and limit the component memory usage accordingly. A new configuration variable defines the memory reclaim threshold. If the total memory of the reclaimable components exceeds this limit, memory will be reclaimed to keep the usage under the limit. This PR considers only the bloom filters as reclaimable and adds support to track and limit them as required. The feature can be manually verified by doing the following : 1. run a single-node single-shard 1GB cluster 2. create a table with bloom-filter-false-positive-chance of 0.001 (to intentionally cause large bloom filter) 3. populate with tiny partitions 4. watch the bloom filter metrics get capped at 100MB The default value of the `components_memory_reclaim_threshold` config variable which controls the reclamation process is `.1`. This can also be reduced further during manual tests to easily hit the threshold and verify the feature. Fixes #17747 Closes scylladb/scylladb#17771 * github.com:scylladb/scylladb: test_bloom_filter.py: disable reclaiming memory from components sstable_datafile_test: add tests to verify auto reclamation of components test/lib: allow overriding available memory via test_env_config sstables_manager: support reclaiming memory from components sstables_manager: store available memory size sstables_manager: add variable to track component memory usage db/config: add a new variable to limit memory used by table components sstable_datafile_test: add testcase to verify reclamation from sstables sstables: support reclaiming memory from components	2024-04-02 10:40:52 +03:00
Amnon Heiman	803d414896	get_description.py: Make the Script a library This patch makes the get_description.py script easier to use by the documentation automation: 1. The script is now a library. 2. You can choose the output of the script, currently supported pipee and yml. You can still call the from the command line, like before, but you can also calls it from another python script. For example the folowing python script would generate the documentation for the metrics description of the ./alternator/ttl.cc file. ``` import get_description metrics = get_description.get_metrics_from_file("./alternator/ttl.cc", "scylla", get_description.get_metrics_information("metrics-config.yml")) get_description.write_metrics_to_file("out.yaml", metrics, "yml") ``` Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#18136	2024-04-02 10:07:11 +03:00
Botond Dénes	ea8478a3e7	scripts/open-coredump.sh: introduce --ci Coredumps coming from CI are produced by a commit, which is not available in the scylla.git repository, as CI runs on a merge commit between the main branch (master or enterprise) and the tested PR branch. Currently the script will attempt to checkout this commit and will fail as the commit hash is unrecognized. To work around this, add a --ci flag, which when used, will force the main branch to be checked out, instead of the commit hash. Closes scylladb/scylladb#18023	2024-04-02 09:27:52 +03:00
Kefu Chai	55d0ea48bd	test: randomized_nemesis_test: remove fmt::formatter for seastar::timed_out_error This reverts commit `97b203b1af`. since Seastar provides the formatter, it's not necessary to vendor it in scylladb anymore. Refs #13245 Closes scylladb/scylladb#18114	2024-04-02 09:25:51 +03:00
Benny Halevy	d5ac0c06b3	test_sstable_reversing_reader_random_schema: drop workaround for #9352 Issue #9352 was fixed about a year and a half ago so this workaround should not be needed anymore. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18121	2024-04-02 09:25:06 +03:00
Raphael S. Carvalho	29f9f7594f	replica: Kill table::storage_group_id_for_token() storage_group_id_for_token() was only needed from within tablet_storage_group_manager, so we can kill table::storage_group_id_for_token(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18134	2024-04-02 09:23:23 +03:00
Asias He	99b7ccfa8b	repair: Load repair history in background Currently, we load the repair history during boot up. If the number of repair history entries is high, it might take a while to load them. In my test, to load 10M entries, it took around 60 seconds. It is not a must to load the entries during boot up. It is better to load them in the background to speed up the boot time. Fixes #17993	2024-04-02 09:24:35 +08:00
Asias He	523895145d	repair: Abort load_history process in shutdown If the node is shutting down, there is no point to continue to load the repair history. Refs #17993	2024-04-02 09:24:35 +08:00
Lakshmi Narayanan Sreethar	d86505e399	test_bloom_filter.py: disable reclaiming memory from components Disabled reclaiming memory from sstable components in the testcase as it interferes with the false positive calculation. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	d261f0fbea	sstable_datafile_test: add tests to verify auto reclamation of components Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	169629dd40	test/lib: allow overriding available memory via test_env_config Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	a36965c474	sstables_manager: support reclaiming memory from components Reclaim memory from the SSTable that has the most reclaimable memory if the total reclaimable memory has crossed the threshold. Only the bloom filter memory is considered reclaimable for now. Fixes #17747 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	2ca4b0a7a2	sstables_manager: store available memory size The available memory size is required to calculate the reclaim memory threshold, so store that within the sstables manager. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	f05bb4ba36	sstables_manager: add variable to track component memory usage sstables_manager::_total_reclaimable_memory variable tracks the total memory that is reclaimable from all the SSTables managed by it. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	e8026197d2	db/config: add a new variable to limit memory used by table components A new configuration variable, components_memory_reclaim_threshold, has been added to configure the maximum allowed percentage of available memory for all SSTable components in a shard. If the total memory usage exceeds this threshold, it will be reclaimed from the components to bring it back under the limit. Currently, only the memory used by the bloom filters will be restricted. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	e0b6186d16	sstable_datafile_test: add testcase to verify reclamation from sstables Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	4f0aee62d1	sstables: support reclaiming memory from components Added support to track total memory from components that are reclaimable and to reclaim memory from them if and when required. Right now only the bloom filters are considered as reclaimable components but this can be extended to any component in the future. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Pavel Emelyanov	627c5fdf04	virtual_tables: Open-code install_virtual_readers_and_writers() It's pretty short already and is naturally a "part" of initialize_virtual_tables(). Neither it installs writers any longer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:02:40 +03:00
Pavel Emelyanov	1d79cfc6cf	virtual_tables: Move readers setup loop into add_table() Similarly to previous patch, after virtual tables are registered the registry is iterated over to install virtual readers onto each entry. Again, this can happen at the time of registering, no need in dedicated loop for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:01:50 +03:00
Pavel Emelyanov	891e792717	virtual_tables: Move tables creation loop into add_table() Once virtual_tables map is populated, it's iterated over to create replica::table entries for each virtual table. This can be done in the same place where the virtual table is created, no need in dedicated loop for it nowadays. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:00:38 +03:00
Pavel Emelyanov	420ce3634f	virtual_tables: Make add_tablet() a coroutine Next patches will populate it with sleeping calls, this patch prepares for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:00:15 +03:00
Pavel Emelyanov	ddc6f9279f	virtual_tables: Open-code register_virtual_tables() It's naturally a "part" of initialize_virtual_tables(). Further patching gets possible with it being open-coded. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 18:59:18 +03:00
Kefu Chai	c5601a749e	github: sync_labels: do not error out if PR's cover letter is empty if a pull request's cover letter is empty, `pr.body` is None. in that case we should not try to pass it to `re.findall()` as the "string" parameter. otherwise, we'd get ``` TypeError: expected string or bytes-like object, got 'NoneType' ``` so, in this change, we just return an empty list if the PR in question has an empty cover letter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18125	2024-04-01 18:13:22 +03:00
Avi Kivity	88fb686d67	test: generate core dumps on crashes in debug clusters The cluster manager library doesn't set the asan/ubsan options to abort on error and create core dumps; this makes debugging much harder. Fix by preparing the environment correctly. Fixes scylladb/scylladb#17510 Closes scylladb/scylladb#17511	2024-04-01 18:11:41 +03:00
Kefu Chai	07c40f5600	github: sync_labels: use ${{}} expression syntax in "if" condition to ensure that the expression is evaluated properly. see https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsstepsif Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18127	2024-04-01 17:17:43 +03:00
Kefu Chai	1494499f90	github: sync_labels: checkout a single file not the whole repo what we need is but a script, so instead of checkout the whole repo, with all history for all tags and branches, let's just checkout a single file. faster this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18126	2024-04-01 17:15:50 +03:00
Yaron Kaikov	b8c705bc54	.github: sync-labels: fix pull request permissions when adding a label to a PR request we keep getting the following error message: ``` Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 93, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main sync_labels(repo, args.number, args.label, args.action, args.is_issue) File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 74, in sync_labels target.add_to_labels(label) File "/usr/lib/python3/dist-packages/github/Issue.py", line 321, in add_to_labels headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.GithubException: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/issues/labels#add-labels-to-an-issue"} ``` Based on https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token. The maximum access for pull requests from public forked repositories is set to `read` Switching to `pull_request_target` to solve it Fixes: https://github.com/scylladb/scylladb/issues/18102 Closes scylladb/scylladb#18052	2024-04-01 17:11:35 +03:00
Pavel Emelyanov	46bbfc0c53	expression: Shorten making raw_value from FragmetedView The read_field is std::optional<View>. The raw_value::make_value() accepts managed_bytes_opt which is std::optional<manager_bytes>. Finally, there's std::optional<T>::optional(std::optional<U>&&) move constructor (and its copy-constructor peer). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18128	2024-04-01 16:52:18 +03:00
Benny Halevy	01fc1a9f66	schema_tables: std::move mutation into the mutation vector To save a copy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18120	2024-04-01 14:16:30 +03:00
Pavel Emelyanov	5427967f45	schema: Introduce build() && overload The schema_builder::build() method creates a copy of raw schema internaly in a hope that builder will be updated and be asked to build the resulting schema again (e.g. alternator uses this). However, there are places that build schema using temporary object once in a `return schema_builder().with_...().build()` manner. For those invocations copying raw schema is just waste of cycles. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18094	2024-04-01 14:00:42 +03:00
Takuya ASADA	be3776ec2a	configure.py: add --build-dir option Add --build-dir option to specify build directory. This is needed for optimized clang support, since it requires to build Scylla in tools/toolchain/prepare, w/o deleting current build/ directory.	2024-04-01 18:35:42 +09:00
Nadav Har'El	b6854cbb21	Merge 'test/cql-pytest: match error message formated using {fmt} ' from Kefu Chai currently, our homebrew formatter formats `std::map` like ``` {{k1, v1}, {k2, v2}} ``` while {fmt} formats a map like: ``` {k1: v1, k2: v2} ``` and if the type of key/value is string, {fmt} quotes it, so a compaction strategy option is formatted like ``` {"max_threshold": "1"} ``` before switching the formatter to the ones supported by {fmt}, let's update the test to match with the new format. this should reduce the overhead of reviewing the change of switching the formatter. we can revert this change, and use a simpler approach after the change of formatter lands. Closes scylladb/scylladb#18058 * github.com:scylladb/scylladb: test/cql-pytest: match error message formated using {fmt} test/cql-pytest: extract scylla_error() for not allowed options test	2024-04-01 11:23:24 +03:00
Kefu Chai	fcf7ca5675	utils/logalloc: do not allocate memory in reclaim_timer::report() before this change, `reclaim_timer::report()` calls ```c++ fmt::format(", at {}", current_backtrace()) ``` which allocates a `std::string` on heap, so it can fail and throw. in that case, `std::terminate()` is called. but at that moment, the reason why `reclaim_timer::report()` gets called is that we fail to reclaim memory for the caller. so we are more likely to run into this issue. anyway, we should not allocate memory in this path. in this change, a dedicated printer is created so that we don't format to a temporary `std::string`, and instead write directly to the buffer of logger. this avoids the memory allocation. Fixes #18099 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18100	2024-04-01 11:01:52 +03:00
Botond Dénes	885cb2af07	utils/rjson: include tasklocal backtrace in rapidjson assert error message Currently, the error message on a failed RAPIDJSON_ASSERT() is this: rjson::error (JSON error: condition not met: false) This is printed e.g. when the code processing a json expects an object but the JSON has a different type. Or if a JSON object is missing an expected member. This message however is completely inadequate for determinig what went wrong. Change this to include a task-local backtrace, like a real assert failure would. The new error looks like this: rjson::error (JSON assertion failed on condition '{}' at: libseastar.so+0x56dede 0x2bde95e 0x2cc18f3 0x2cf092d 0x2d2316b libseastar.so+0x46b623) Closes scylladb/scylladb#18101	2024-03-29 18:41:54 +01:00
Pavel Emelyanov	41a1b1c0d0	move_tablets: Emplace mutations into vector, not push It's more applicable in this case. Also, built tablets mutations are casted to canonical_mutations, but when emplaced compiler can pick-up canonical_mutation(const mutation&) constructor and the cast is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18090	2024-03-29 15:21:49 +02:00
Kamil Braun	f5603ad9ca	Merge 'test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero' from Mikołaj Grzebieluch Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it. In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations. Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use. This can be seen in the following logs from a node: ``` ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams. ``` Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed `svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use. Write can fail with the following response from a node: ``` cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node. ``` Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation. Fixes scylladb/scylladb#17977 Reapply `b4144d14c6`. Closes scylladb/scylladb#17998 * github.com:scylladb/scylladb: test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"	2024-03-29 12:52:31 +01:00
Tzach Livyatan	4930095d39	Docs: Fix link fro scylla-sstable.rst to /architecture/sstable/ Fix https://github.com/scylladb/scylladb/issues/18096 Closes scylladb/scylladb#18097	2024-03-29 10:48:24 +02:00
Piotr Dulikowski	57719ece4f	Merge 'main: reload service levels data accessor after join_cluster' from Marcin Maliszkiewicz Setting data accessor implicitly depends on node joining the cluster with raft leader elected as only then service level mutation is put into scylla_local table. Calling it after join_cluster avoids starting new cluster with older version only to immediately migrate it to the latest one in the background. Closes scylladb/scylladb#18040 * github.com:scylladb/scylladb: main: reload service levels data accessor after join_cluster service: qos: create separate function for reloading data accessor	2024-03-29 09:39:11 +01:00
Kefu Chai	1632fbbef9	test/cql-pytest: match error message formated using {fmt} currently, our homebrew formatter formats `std::map` like {{k1, v1}, {k2, v2}} while {fmt} formats a map like: {k1: v1, k2: v2} and if the type of key/value is string, {fmt} quotes it, so a compaction strategy option is formatted like {"max_threshold": "1"} before switching the formatter to the ones supported by {fmt}, let's update the test to match with the new format. this should reduce the overhead of reviewing the change of switching the formatter. we can revert this change, and use a simpler approach after the change of formatter lands. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-29 08:07:59 +08:00
Kefu Chai	8f47fcedf6	test/cql-pytest: extract scylla_error() for not allowed options test currently, our homebrew formatter formats `std::map` like {{k1, v1}, {k2, v2}} while {fmt} formats a map like: {k1: v1, k2: v2} and if the type of key/value is string, {fmt} quotes it, so a compaction strategy option is formatted like {"max_threshold": "1"} as we are switching to the formatters provided by {fmt}, would be better to support its convention directly. so, in this change, to prepare the change, before migrating to {fmt}, let's refactor the test to support both formats by extracting a helper to format the error message, so that we can change it to emit both formats. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-29 08:03:02 +08:00
Mikołaj Grzebieluch	1e2607563f	test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it. In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations. Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use. This can be seen in the following logs from a node: ``` ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams. ``` Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed `svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use. Write can fail with the following response from a node: ``` cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node. ``` Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation. Fixes #17977	2024-03-28 17:13:43 +01:00
Botond Dénes	4c0dadee7c	Merge 'test: changes to prepare for dropping FMT_DEPRECATED_OSTREAM' from Kefu Chai this series includes test related changes to enable us to drop `FMT_DEPRECATED_OSTREAM` deprecated in {fmt} v10. Refs #13245 Closes scylladb/scylladb#18054 * github.com:scylladb/scylladb: test: unit: add fmt::formatter for test_data in tests test/lib: do not print with fmt::to_string() test/boost: print runtime_error using e.what()	2024-03-28 15:33:56 +02:00
Kamil Braun	33751f8f4e	Merge 'raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC' from Gleb * 'gleb/raft_snapshot_rpc-v3' of github.com:scylladb/scylla-dev: raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC Use correct limit for raft commands throughout the code.	2024-03-28 14:25:58 +01:00
Nadav Har'El	566223c34a	Merge ' tools/scylla-nodetool: repair: abort on first failed repair' from Botond Dénes When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well. To be able to test this, a bit of refactoring was needed, to be able to assert that `scylla-nodetool` doesn't make repair requests, beyond the expected ones. Refs: https://github.com/scylladb/scylla-cluster-tests/issues/7226 Closes scylladb/scylladb#17678 * github.com:scylladb/scylladb: tools/scylla-nodetool: repair: abort on first failed repair test/nodetool: nodetool(): add check_return_code param test/nodetool: nodetool(): return res object instead of just stdout test/nodetool: count unexpected requests	2024-03-28 14:02:29 +02:00
Botond Dénes	81bbfae77a	tools/scylla-nodetool: implement the checkAndRepairCdcStreams command Closes scylladb/scylladb#18076	2024-03-28 13:54:37 +02:00
Pavel Emelyanov	1adf16ce73	Merge 'network_topology_strategy: reallocate_tablets: support for rf changes' from Benny Halevy This series provides a reallocate_tablets function, that's initially called by allocate_tablets_for_new_table. The new allocation implementation is independent of vnodes/token ownership. Rather than using the natural_endpoints_tracker, it implements its own tracking based on dc/rack load (== number of replicas in rack), with the additional benefit that tablet allocation will balance the allocation across racks, using a heap structure, similar to the one we use to balance tablet allocation across shards in each node. reallocate_tablets may also be called with an optional parameter pointing the the current tablet_map. In this case the function either allocates more tablet replicas in datacenters for which the replication factor was increased, or it will deallocate tablet replicas from datacenters for which replication factor was decreased. The NetworkTopologyStrategy_tablets_test unit test was extended to cover replication factor changes. Closes scylladb/scylladb#17846 * github.com:scylladb/scylladb: network_topology_strategy: reallocate_tablets: consider new_racks before existing racks network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test network_topology_strategy: reallocate_tablets: support deallocation via rf change network_topology_startegy_test: tablets_test: randomize cases network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes network_topology_strategy_test: endpoints_check: strictly check rf for tablets network_topology_strategy_test: full_ring_check for tablets: drop unused options param	2024-03-28 11:19:11 +03:00
Kefu Chai	2bfc7324d4	mutation: friend fmt::formatter<atomic_cell> in atomic_cell_view GCC-14 rightly points out that the constructor of `atomic_cell_view` is marked private, and cannot be called from its formatter: ``` /usr/bin/g++-14 -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/var/ssd/scylladb -I/var/ssd/scylladb/build/gen -I/var/ssd/scylladb/seastar/include -I/var/ssd/scylladb/build/seastar/gen/include -I/var/ssd/scylladb/build/seastar/gen/src -g -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unused-parameter -ffile-prefix-map=/var/ssd/scylladb=. -march=westmere -Wstack-usage=40960 -U_FORTIFY_SOURCE -Wno-maybe-uninitialized -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -MF mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o.d -o mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -c /var/ssd/scylladb/mutation/atomic_cell.cc In file included from /var/ssd/scylladb/mutation/atomic_cell.cc:9: /var/ssd/scylladb/mutation/atomic_cell.hh: In member function ‘auto fmt::v10::formatter<atomic_cell>::format(const atomic_cell&, fmt::v10::format_context&) const’: /var/ssd/scylladb/mutation/atomic_cell.hh:413:67: error: ‘atomic_cell_view::atomic_cell_view(basic_atomic_cell_view<is_mutable>) [with mutable_view is_mutable = mutable_view::yes]’ is private within this context 413 \| return fmt::format_to(ctx.out(), "{}", atomic_cell_view(ac)); \| ^ /var/ssd/scylladb/mutation/atomic_cell.hh:275:5: note: declared private here 275 \| atomic_cell_view(basic_atomic_cell_view<is_mutable> view) \| ^~~~~~~~~~~~~~~~ ``` so, in this change, we make the formatter a friend of `atomic_cell_view`. since the operator<< was dropped, there is no need to keep its friend declaration around, so it is dropped in this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18081	2024-03-28 09:44:00 +02:00
Kefu Chai	99e743de9d	test: nodetool: match with vector printed by {fmt} our homebrew formatter for std::vector<string> formats like ``` {hello, world} ``` while {fmt}'s formatter for sequence-like container formats like ``` ["hello", "world"] ``` since we are moving to {fmt} formatters. and in this context, quoting the verbatim text makes more sense to user. let's support the format used by {fmt} as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18057	2024-03-28 09:35:37 +02:00
Kefu Chai	c2ffa0d813	bytes.hh: stop at '}' in fmt::formatter<fmt_hex> according to {fmt}'s document at https://fmt.dev/latest/api.html#formatting-user-defined-types, ``` // the range will contain "f} continued". The formatter should parse // specifiers until '}' or the end of the range. In this example the // formatter should parse the 'f' specifier and return an iterator // pointing to '}'. ``` so we should check for _both_ '}' and end of the range. when building scylla with {fmt} 10.2.1, it fails to build code like ```c++ fmt::format_to(out, "{}", fmt_hex(frag)) ``` as {fmt}'s compile-time checker fails to parse this format string along with given argument, as at compile time, ```c++ throw format_error("invalid group_size") ``` is executed. so, in this change, we check both '}' and the end of range. the change which introduced this formatter was `2f9dfba800` Refs `2f9dfba800` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18080	2024-03-28 08:58:36 +02:00
Marcin Maliszkiewicz	50e0032bca	test: auth: remove if not exists from auth cql statement They were added due to https://github.com/scylladb/python-driver/issues/296 but looks like it no longer reproduces. Change was tested with ./test.py -vv --repeat=100 test_auth to minimize chance of introducing flakiness. Closes scylladb/scylladb#18043	2024-03-28 06:06:45 +01:00
Raphael S. Carvalho	902c71bac8	storage_service: Fix undefined behavior in stream_tablet() correctness when constructing range_streamer depends on compiler evaluation order of params. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18079	2024-03-27 23:50:37 +01:00
Gleb Natapov	6e6aefc9ab	raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC We have new, more generic, RPC to pull group0 mutations now: RAFT_PULL_SNAPSHOT. Use it instead of more specific RAFT_PULL_TOPOLOGY_SNAPSHOT one.	2024-03-27 19:18:45 +02:00
Gleb Natapov	c1dcf0fae7	Use correct limit for raft commands throughout the code. Raft uses schema commitlog, so all its limits should be derived from this commitlog segment size, but many places used regular commitlog size to calculate the limits and did not do what they really suppose to be doing.	2024-03-27 19:16:09 +02:00
Kamil Braun	c3989d8e03	Merge 'storage_service: keep subscription to raft topology feature alive' from Piotr Dulikowski The storage_service::track_upgrade_progress_to_topology_coordinator function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES cluster feature (among other things) before starting the raft_state_monitor_fiber. The wait is realized by passing a callback to feature::when_enabled which sets a shared_promise that is waited on by the tracking fiber. If the feature is already enabled, when_enabled will call the callback immediately. However, if it's not, then it will return a non-null listener_registration object - as long as it is alive, the callback is registered. The listener_registration object was not assigned to a variable which caused it to be destroyed shortly after the when_enabled function returns. Due to that, if upgrade was requested but the current group0 leader didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled right after boot, the upgrade would not start until the leader is changed to a node which has that cluster feature already enabled on boot. Moreover, the topology coordinator would not start on such a node until the node were rebooted. Fix the issue by assigning the subscription to a variable. Fixes: scylladb/scylladb#18049 Closes scylladb/scylladb#18051 * github.com:scylladb/scylladb: gms: feature: mark when_enabled(func) with nodiscard storage_service: keep subscription to raft topology feature alive	2024-03-27 14:46:43 +01:00
Avi Kivity	96a3544739	Merge 'alternator: reduce stall for Query and Scan with large pages' from Nadav Har'El Before this series, Alternator's Query and Scan operations convert an entire result page to JSON without yielding. For a page of maximum size (1MB) and tiny rows, this can cause a significant stall - the test included in this PR reported stalls of 14-26ms on my laptop. The problem is the describe_items() function, which does this conversion immediately, without yielding. This patch changes this function to return a future, and use a new result_set::visit_gently() method that does what visit() does, but with yields when needed. This PR improves #17995, but does not completely fix is as the stalls in the are not completely eliminated. But on my laptop it usually reduces the stalls to around 5ms. It appears that the remaining stalls some from other places not fixed in this PR, such as perhaps query_page::handle_result(), and will need to be fixed by additional patches. Closes scylladb/scylladb#18036 * github.com:scylladb/scylladb: alternator: reduce stall for Query and Scan with large pages result_set: introduce visit_gently() alternator: coroutinize do_query() function	2024-03-27 15:06:32 +02:00
Kamil Braun	404406e6a1	Merge ' test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables' from Botond Dénes Memtables are fickle, they can be flushed when there is memory pressure, if there is too much commitlog or if there is too much data in them. The tests in test_select_from_mutation_fragments.py currently assume data written is in the memtable. This is tru most of the time but we have seen some odd test failures that couldn't be understood. To make the tests more robust, flush the data to the disk and read it from the sstables. This means that some range scans need to filter to read from just a single mutation source, but this does not influence the tests. Also fix a use-after-return found when modifying the tests. This PR tentatively fixes the below issues, based on our best guesses on why they failed (each was seen just once): Fixes: scylladb/scylladb#16795 Fixes: scylladb/scylladb#17031 Closes scylladb/scylladb#17562 * github.com:scylladb/scylladb: test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables cql3: select_statement: mutation_fragments_select_statement: fix use-after-return	2024-03-27 13:21:19 +01:00
Botond Dénes	fdd5367974	Merge 'compaction: implement unchecked_tombstone_compaction' from Ferenc Szili This change adds the missing Cassandra compaction option unchecked_tombstone_compaction. Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only based on the value of tombstone_compaction_interval Fixes #1487 Closes scylladb/scylladb#17976 * github.com:scylladb/scylladb: removed forward declaration of resharding_descriptor compaction options and troubleshooting docs cql-pytest/test_compaction_strategy_validation.py test/boost/sstable_compaction_test.cc compaction: implement unchecked_tombstone_compaction	2024-03-27 13:56:02 +02:00
Kefu Chai	6bd0be71ab	mutation: add fmt::formatter for invalid_mutation_fragment_stream before this change, we rely on the default-generated fmt::formatter created from operator<<. but this depends on the `FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10. this change addresses the formatting with fmtlib < 10, and without `FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up, it defines formatter for classes derived from `std::exception`, so our formatter is only added when compiled with {fmt} < 10. in this change, `fmt::formatter<invalid_mutation_fragment_stream>` is added for backward compatibility with {fmt} < 10. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18053	2024-03-27 13:37:48 +02:00
Kefu Chai	d1e8d89ae2	doc: topology-over-raft: add transition_state to node state diagram in order to help the developers to understand the transitions of `node_state` and the `transition_state` on each of the `node_state`, in this change, the nested state machine diagram is added to the node state diagram. please note, instead of trying to merge similar states like bootstrapping and replacing into a single state, we keep them as separate ones, and replicate the nested state machine diagram in them as well, to be more clear. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18025	2024-03-27 12:16:35 +01:00
Andrei Chekun	0752ef1481	test: remove skip annotation for multi-DC test with 5 DCs with one node in each As a follow-up of the https://github.com/scylladb/scylladb/pull/17503 remove skip annotation for the multi-DC test with a reduced amount of the DC used in it: from 30 DCs to 5 DCs Closes scylladb/scylladb#17898	2024-03-27 13:13:13 +02:00
Michał Chojnowski	295b27a07b	cache_flat_mutation_reader: only call get_iterator_in_latest() when pointing at a row Calling `_next_row.get_iterator_in_latest()` is illegal when `_next_row` is not pointing at a row. In particular, the iterator returned by such call might be dangling. We have observed this to cause a use-after-free in the field, when a reverse read called `maybe_add_to_cache` after `_latest_it` was left dangling after a dead row removal in `copy_from_cache_to_buffer`. To fix this, we should ensure that we only call `_next_row.get_iterator_in_latest` is pointing at a row. Only the occurrences of this problem in `maybe_add_to_cache` are truly dangerous. As far as I can see, other occurrences can't break anything as of now. But we apply fixes to them anyway. Closes scylladb/scylladb#18046	2024-03-27 11:48:42 +01:00
Kamil Braun	d274f63d89	Merge 'Add support for "initial-token" parameter in raft mode' from Gleb Fixes scylladb/scylladb#17893 * 'gleb/initial-token-v1' of github.com:scylladb/scylla-dev: dht: drop unused parameter from get_random_bootstrap_tokens() function test: add test for initial_token parameter topology coordinator: use provided initial_token parameter to choose bootstrap tokens topology cooordinator: propagate initial_token option to the coordinator	2024-03-27 11:41:06 +01:00
Kefu Chai	71a519dee8	test: unit: add fmt::formatter for test_data in tests this change is created in same spirit of `d1c35f943d`. before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for test_data in radix_tree_stress_test.cc, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 18:18:32 +08:00
Kefu Chai	4f8c1a4729	test/lib: do not print with fmt::to_string() we should not format a variable unless we want to print it. in this case, we format `first_row` using `fmt::to_string()` to a string, and then insert the string to another string, despite that this is in a cold path, this is still a anti pattern -- both convoluted, and not performant. so let's just pass `first_row` to `format()`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 18:18:32 +08:00
Kefu Chai	d0ceb35e7e	test/boost: print runtime_error using e.what() before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. but fortunately, fmt v10 brings the builtin formatter for classes derived from `std::exception`. but before switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM` macro, we need to print out `std::runtime_error`. so far, we don't have a shared place for formatter for `std::runtime_error`. so we are addressing the needs on a case-by-case basis. in this change, we just print it using `e.what()`. it's behavior is identical to what we have now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 18:18:32 +08:00
Benny Halevy	8a77319cb7	network_topology_strategy: reallocate_tablets: consider new_racks before existing racks Allocate first from new (unpopulated) racks before allocating from racks that are already populated with replicas. Still, rotate both new and existing racks by tablet id to ensure fairness. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	c5ff060dee	network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test Test that tablet allocation is balanced across racks, nodes, and shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	4a7d57525e	network_topology_strategy: reallocate_tablets: support deallocation via rf change Add support for deallocating tablet replicas when the datacenter replication factor is decreased. We deallocate replicas back-to-front order to maintain replica pairing between the base table and its materialized views. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	1e8f8db5b8	network_topology_startegy_test: tablets_test: randomize cases Instead of deterministically testing a very small set of cases, randomize the the shard_count per node, the cluster topology and the NetworkTopologyStrategy options. The next patch will extend the test to also test `reallocate_tablets` with randomized options. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	898cd1d404	network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership Base initial tablets allocation for new table on the dc/rack topology, rather then on the token ring, to remove the dependency on token ownership. We keep the rack ordinal order in each dc to facilitate in-rack pairing of base/view replica pairing, and we apply load-balancing principles by sorting the nodes in each rack by their load (number of tablets allocated to the node), and attempting to fill lease-loaded nodes first. This method is more efficient than circling the token ring and attemting to insert the endpoints to the natural_endpoint_tracker until the replication factor per dc is fulfilled, and it allows an easier way to incrementally allocate more replicas after rf is increased. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:21 +02:00
Botond Dénes	f70f04c240	tools/scylla-nodetool: repair: abort on first failed repair When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well.	2024-03-27 05:46:18 -04:00
Mikołaj Grzebieluch	fa4193e09f	Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables" This reverts commit `230f23004b`.	2024-03-27 10:39:01 +01:00
Benny Halevy	40a4b349bd	network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test Test that we attempting to allocate tablets throws an error when there are not enough nodes for the configured replication factor. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	f19dbb4ae5	network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions Using e.g. `BOOST_CHECK_EQUAL(endpoints.size(), total_rf)` rather than `BOOST_CHECK(endpoints.size() == total_rf)` prints a more detailed error message that includes the runtime valies, if it fails. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	93b6573a90	network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	c11ffd14cc	network_topology_strategy_test: endpoints_check: strictly check rf for tablets With tablet we want to verify that the number of replicas allocated per tablet per dc exactly matches the replication strategy per-dc replication factor options. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	ffa5870758	network_topology_strategy_test: full_ring_check for tablets: drop unused options param Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Botond Dénes	764e9a344d	test/nodetool: nodetool(): add check_return_code param When set to false, the returncode is not checked, this is left to the caller. This in turn allows for checking the expected and unexpected requests which is not checked when the nodetool process fails. This is used by utils._do_check_nodetool_fails_with(), so that expected and unexpected requests are checked even for failed invocations. Some test need adjustment to the stricter checks.	2024-03-27 04:18:19 -04:00
Botond Dénes	8f3b1db37f	test/nodetool: nodetool(): return res object instead of just stdout So callers have access to stderr, return code and more. This causes some churn in the test, but the changes are mechanical.	2024-03-27 04:18:19 -04:00
Kefu Chai	2e2c3a5fea	locator: fix a typo in comment s/Substracts/Subtracts/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18048	2024-03-27 10:15:18 +02:00
Piotr Dulikowski	e76817502f	gms: feature: mark when_enabled(func) with nodiscard The feature::when_enabled function takes a callback and returns a listener_registration object. Unless the feature were enabled right from the start, the listener_registration will be non-null and will keep the callback registered until the registration is destroyed. If the registration is destroyed before the feature is enabled, the callback will not be called. It's easy to make a mistake and forget to keep the returned registration alive - especially when, in tests, the feature is enabled early in boot, because in that case when_enabled calls the callback immediately and returns a null object instead. In order to prevent issues with prematurely dropped listener_registration in the future, mark feature::when_enabled with the [[nodiscard]] attribute.	2024-03-27 08:55:45 +01:00
Piotr Dulikowski	7ea6e1ec0a	storage_service: keep subscription to raft topology feature alive The storage_service::track_upgrade_progress_to_topology_coordinator function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES cluster feature (among other things) before starting the raft_state_monitor_fiber. The wait is realized by passing a callback to feature::when_enabled which sets a shared_promise that is waited on by the tracking fiber. If the feature is already enabled, when_enabled will call the callback immediately. However, if it's not, then it will return a non-null listener_registration object - as long as it is alive, the callback is registered. The listener_registration object was not assigned to a variable which caused it to be destroyed shortly after the when_enabled function returns. Due to that, if upgrade was requested but the current group0 leader didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled right after boot, the upgrade would not start until the leader is changed to a node which has that cluster feature already enabled on boot. Moreover, the topology coordinator would not start on such a node until the node were rebooted. Fix the issue by assigning the subscription to a variable.	2024-03-27 08:55:45 +01:00
Botond Dénes	2d12db81cf	Merge 'docs: document nodetool {getsstables, sstableinfo}' from Kefu Chai these two subcommands are provided by cassandra, and are also implemented natively in scylla. so let's document them. Closes scylladb/scylladb#17982 * github.com:scylladb/scylladb: docs/operating-scylla: document nodetool sstableinfo docs/operating-scylla: document nodetool getsstables	2024-03-27 09:04:55 +02:00
Botond Dénes	4d98b7d532	test/nodetool: count unexpected requests We currently check at the end of each test, that all expected requests set by the test were consumed. This patch adds a mechanism to count unexpected requests -- requests which didn't match any of the expected ones set by the test. This can be used to asser that nodetool didn't make any request to the server, beyond what the test expected it to do. Before this patch, requests like this would only be noticed by the test, if the response of 404/500 caused nodetool to fail, which is not always the case.	2024-03-27 02:39:28 -04:00
Kefu Chai	8af9c735f2	docs/operating-scylla: document nodetool sstableinfo Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 07:29:24 +08:00
Kefu Chai	da90e368dc	docs/operating-scylla: document nodetool getsstables Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 07:29:24 +08:00
Pavel Emelyanov	04370dc8a4	tablets: Introduce substract_sets() There are several places in code that calculate replica sets associated with specific tablet transision. Having a helper to substract two sets improves code readability. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18033	2024-03-26 23:33:06 +02:00
Tomasz Grabiec	042a4b7627	Merge 'tablets: add warning on CREATE KEYSPACE' from Nadav Har'El The CDC feature is not supported on a table that uses tablets (Refs https://github.com/scylladb/scylladb/issues/16317), so if a user creates a keyspace with tablets enabled they may be surprised later (perhaps much later) when they try to enable CDC on the table and can't. The LWT feature always had issue Refs https://github.com/scylladb/scylladb/issues/5251, but it has become potentially more common with tablets. So it was proposed that as long as we have missing features (like CDC or LWT), every time a keyspace is created with tablets it should output a warning (a bona-fide CQL warning, not a log message) that some features are missing, and if you need them you should consider re-creating the keyspace without tablets. This PR does this. The warning text which will be produced is the following (obviously, it can be improved later, as we perhaps find more missing features): > "Tables in this keyspace will be replicated using tablets, and will > not support the CDC feature (issue https://github.com/scylladb/scylladb/issues/16317) and LWT may suffer from > issue https://github.com/scylladb/scylladb/issues/5251 more often. If you want to use CDC or LWT, please drop > this keyspace and re-create it without tablets, by adding AND TABLETS > = {'enabled': false} to the CREATE KEYSPACE statement." This PR also includes a test - that checks that this warning is is indeed generated when a keyspace is created with tablets (either by default or explicitly), and not generated if the keyspace is created without tablets. It also fixes existing tests which didn't like the new warning. Fixes https://github.com/scylladb/scylladb/issues/16807 Closes scylladb/scylladb#17318 * github.com:scylladb/scylladb: tablets: add warning on CREATE KEYSPACE test/cql-pytest: fix guadrail tests to not be sensitive to more warnings	2024-03-26 20:04:07 +01:00
Gleb Natapov	9b00847f31	dht: drop unused parameter from get_random_bootstrap_tokens() function	2024-03-26 18:43:31 +02:00
Gleb Natapov	ed534fde8f	test: add test for initial_token parameter Test that configured tokens are used and tokens collision is detected.	2024-03-26 18:43:31 +02:00
Gleb Natapov	06952ec6dd	topology coordinator: use provided initial_token parameter to choose bootstrap tokens Use the same logic as with gossiper to choose bootstrap tokens in case initial_token parameters is not empty.	2024-03-26 18:43:25 +02:00
Gleb Natapov	6ab78e13c6	topology cooordinator: propagate initial_token option to the coordinator The patch propagates initial_token option to the topology coordinator where it is added to join request parameter.	2024-03-26 18:43:16 +02:00
Marcin Maliszkiewicz	e1fea3af6b	main: reload service levels data accessor after join_cluster Setting data accessor implicitly depends on node joining the cluster with raft leader elected as only then service level mutation is put into scylla_local table. Calling it after join_cluster avoids starting new cluster with older version only to immediately migrate it to the latest one in the background.	2024-03-26 17:36:03 +01:00
Nadav Har'El	ba97fd98a3	alternator: reduce stall for Query and Scan with large pages Before this patch, Alternator's Query and Scan operations convert an entire result page to JSON without yielding. For a page of maximum size (1MB) and tiny rows, this can cause a significant stall - the test included in this patch reported stalls of 14-26ms on my laptop. The problem is the describe_items() function, which does this conversion immediately, without yielding. This patch changes this function to return a future, and use the result_set::visit_gently() method instead of visit() that yields when needed. This patch does not completely eliminate stalls in the test, but on my laptop usually reduces them to around 5ms. It appears that the remaining stalls some from other places not fixed in this PR, such as perhaps query_page::handle_result(), and will need to be fixed by additional patches. The test included in this patch is useful for manually reproducing the stall, but not useful as a regression test: It is slow (requiring a couple of seconds to set up the large partition) and doesn't check anything, and can't even report the stall without modifying the test runner. So the test is skipped by default (using the "veryslow" marker) and can be enabled and run manually by developers who want to continue working on #17995. Refs #17995. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-26 18:32:45 +02:00
Nadav Har'El	e24629a635	result_set: introduce visit_gently() Whereas result_set::visit() passes all the rows the the visitor and returns void, this patch introduces a method visit_gently() that returns a future, and may yield before visiting each row. This method will be used in the next patch to allow Alternator, which used visit() to convert a result_set into JSON format, to potentially yield between rows and avoid large stalls when converting a large result set. Note that I decided to add the yield points in the new visit_gently() between rows - not between each cell. Many places in our code (including the memtable) already work on a per-row basis and do not yield in the middle of a row, so it won't really be helpful to do this either. But if we'll want, we will still be able to modify visit_gently() later to be even more gentle, and yield between individual cells. The callers shouldn't know or care. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-26 18:32:11 +02:00
Marcin Maliszkiewicz	ff17a29b54	service: qos: create separate function for reloading data accessor Scylla's main is already too long, it's better to contain this logic inside qos service.	2024-03-26 17:26:19 +01:00
Avi Kivity	4ddf82e58b	treewide: don't #include "gms/feature_service.hh" from other headers feature_service.hh is a high-level header that integrates much of the system functionality, so including it in lower-level headers causes unnecessary rebuilds. Specifically, when retiring features. Fix by removing feature_service.hh from headers, and supply forward declarations and includes in .cc where needed. Closes scylladb/scylladb#18005	2024-03-26 15:31:18 +02:00
Nadav Har'El	c146b1224c	alternator: coroutinize do_query() function This patch changes the do_query() function, used to implement Alternator's Query and Scan operations, from using continuations to be a coroutine. There are no functional changes in this patch, it's just the necessary changes to convert the function to a coroutine. The new code is easier to read and less indented, but more importantly, will be easier to extend in the next patch to add additional awaits in the middle of the function. In additional to the obvious changes, I also had to rename one local variable (as the same name was used in two scopes), and to convert pass-by-rvalue-reference to pass-by-value (these parameters are moved by the caller, and moreover the old code had to move them again to a continuation, so there is no performance penalty in this change). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-26 15:08:08 +02:00
Pavel Emelyanov	8bf9098663	system_keyspace: Consolidate node-state vs tokens checks When loading topology state, nodes are checked to have or not to have "tokens" field set. The check is done based on node state and it's spread across the loading method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17957	2024-03-26 14:55:46 +02:00
Avi Kivity	22b8065a89	Merge 'tools/scylla-nodetool: implement the getsstables and sstableinfo commands' from Botond Dénes These commands manage to avoid detection because they are not documented on https://opensource.docs.scylladb.com/stable/operating-scylla/nodetool.html. They were discovered when running dtests, with ccm tuned to use the native nodetool directly. See https://github.com/scylladb/scylla-ccm/pull/565. The commands come with tests, which pass with both the native and Java nodetools. I also checked that the relevant dtests pass with the native implementation. Closes scylladb/scylladb#17979 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the sstableinfo command tools/scylla-nodetool: implement the getsstables command tools/scylla-nodetool: move get_ks_cfs() to the top of the file test/nodetool: rest_api_mock.py: add expected_requests context manager	2024-03-26 14:38:00 +02:00
Kefu Chai	101fdfc33a	test: randomized_nemesis_test: add fmt::formatter for stop_crash::result_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. also, it's impossible to partial specialize a nested type of a template class, we cannot specialize the `fmt::formatter` for `stop_crash<M>::result_type`, as a workaround, a new type is added. in this change, * define a new type named `stop_crash_result` * add fmt::formatter for `stop_crash_result` * define stop_crash::result_type as an alias of `stop_crash_result` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18018	2024-03-26 12:18:55 +02:00
Pavel Emelyanov	67c2a06493	api: Rename (un)set_server_load_sstable -> (un)set_server_column_family The method sets up column family API, not load-sstables one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18022	2024-03-26 12:16:08 +02:00
Botond Dénes	7edbf189e6	Merge 'treewide: use fmt::to_string() to transform a UUID to std::string and drop UUID::to_sstring()' from Kefu Chai `UUID::to_sstring()` relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10. in this series, all callers of `UUID::to_sstring()` are switched to `fmt::to_string()`, and this function is dropped. Closes scylladb/scylladb#18020 * github.com:scylladb/scylladb: utils: UUID: drop UUID::to_sstring() treewide: use fmt::to_string() to transform a UUID to std::string	2024-03-26 12:14:56 +02:00
Kefu Chai	f3532cbaa0	db: commitlog: use fmt::streamed() to print segment before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change: * add `format_as()` for `segment` so we can use it as a fallback after upgrading to {fmt} v10 * use fmt::streamed() when formatting `segment`, this will be used the intermediate solution before {fmt} v10 after dropping `FMT_DEPRECATED_OSTREAM` macro Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18019	2024-03-26 12:13:29 +02:00
Botond Dénes	cd9589ec78	Merge 'test.py: Sanitize test list creation' from Pavel Emelyanov To create the list of tests to run there's a loop that fist collects all tests from suits, then filters the list in two ways -- excludes opt-out-ed lists (disabled and matching the skip pattern) or leaves there only opt-in-ed (those, specified as positional arguments). This patch keeps both list-checking code close to each other so that the intent is explicitly clear. Closes scylladb/scylladb#17981 * github.com:scylladb/scylladb: test.py: Give local variable meaningful name test.py: Sanitize test list creation	2024-03-26 12:09:49 +02:00
Marcin Maliszkiewicz	5844d66676	auth: coroutinize service::start	2024-03-26 09:45:15 +01:00
Patryk Jędrzejczak	13fecd4e36	raft topology: decommission: allow only in NORMAL mode We move the mode check so that the raft-based decommission also uses it. Without this check, it hanged after the drain operation instead of instantly failing. `test_decommission_after_drain_is_invalid` was failing because of it with the raft-based topology enabled. Fixes scylladb/scylladb#16761 Closes scylladb/scylladb#18000	2024-03-26 08:52:26 +01:00
Botond Dénes	f0ff23492f	Merge 'Sanitize topology suites' skiplists' from Pavel Emelyanov There are skip_in_<mode> lists in suite yaml that tells test.py not to run the test from it. This PR sanitizes these lists in two ways. First, to skip pytests the skip-decorators are much more convenient, e.g. because they show the reason why the test is skipped. Also, if a test wants to be opt-in-ed for some mode only, it's opt-out-ed in all other lists instead. There's run_in_<mode> list in suite for that. Closes scylladb/scylladb#17964 * github.com:scylladb/scylladb: test: Do not duplicate test name in several skip-lists test: Mark tests with skip_mode instead of suite skip-list	2024-03-26 08:24:57 +02:00
Kefu Chai	a047178fe7	utils: UUID: drop UUID::to_sstring() this function is not used anymore, and it relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10. in this change, let's drop this member function. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Kefu Chai	1b859e484f	treewide: use fmt::to_string() to transform a UUID to std::string without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is implemented using its `fmt::formatter`, which is not available at the end of this header file where `UUID` is defined. at this moment, we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can still use `UUID::to_sstring()`, but in {fmt} v10, we cannot. so, in this change, we change all callers of `UUID::to_sstring()` to `fmt::to_string()`, so that we don't depend on `FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Wojciech Mitros	9789a3dc7c	mv: keep semaphore units alive until the end of a remote view update When a view update has both a local and remote target endpoint, it extends the lifetime of its memory tracking semaphore units only until the end of the local update, while the resources are actually used until the remote update finishes. This patch changes the semaphore transferring so that in case of both local and remote endpoints, both view updates share the units, causing them to be released only after the update that takes longer finishes. Fixes #17890 Closes scylladb/scylladb#17891	2024-03-25 19:43:58 +02:00
Tzach Livyatan	6702ba3664	Docs: Add link from migration tools page to nodetool refresh load and stream Closes scylladb/scylladb#18006	2024-03-25 17:47:05 +02:00
Botond Dénes	1ea7b408db	tools/scylla-nodetool: implement the sstableinfo command	2024-03-25 11:29:30 -04:00
Botond Dénes	50da93b9c8	tools/scylla-nodetool: implement the getsstables command	2024-03-25 11:29:30 -04:00
Botond Dénes	f51061b198	tools/scylla-nodetool: move get_ks_cfs() to the top of the file So it can be used by all commands.	2024-03-25 11:29:30 -04:00
Botond Dénes	4ff88b848c	test/nodetool: rest_api_mock.py: add expected_requests context manager So tests and fixtures can use `with expected_requests():` and have cleanup be taken care for them. I just discovered that some tests do not clean up after themselves and when running all tests in a certain order, this causes unrelated tests to fail. Fix by using the context everywhere, getting guaranteed cleanup after each test.	2024-03-25 11:29:30 -04:00
Petr Gusev	7c84fc527b	test_invalid_user_type_statements: increase raft timeout The test creates ut4 with a lot of fields, this may take a while in debug builds, to avoid raft operation timeout set the threshold to some big value. The error injector is disabled in release builds, so this settings won't be applied to them. This shouldn't be a problem since release builds are fast enough, even on arm. Fixes scylladb/scylladb#17987 Closes scylladb/scylladb#17997	2024-03-25 14:52:16 +01:00
Ferenc Szili	8bb7a18de2	test/cql-pytest: add --omit-scylla-output to Cassandra test runs Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra. Running the test for either will first output the test results, and subsequently print the stdout output of the process under test. Using the command line option --omit-scylla-output it is possible to disable this print for Scylla, but it is not possible for tests run against Cassandra. This change adds the option to suppress output for Cassandra tests, too. By default, the stdout of the Cassandra run will still be printed after the test results, but this can now be disabled with --omit-scylla-output Closes scylladb/scylladb#17996	2024-03-25 15:14:45 +02:00
Pavel Emelyanov	16343b3edc	test: Do not duplicate test name in several skip-lists Some tests are only run in dev mode for some reason. For such tests there's run_in_dev list, no need in putting it in all the non-dev skip_in_... ones. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:56:37 +03:00
Pavel Emelyanov	90dfcec86b	test: Mark tests with skip_mode instead of suite skip-list There are many tests that are skipped in release mode becuase they rely on error-injection machinery which doesn't work in release mode. Most of those tests are listed in suite's skip_in_release, but it's not very handy, mainly because it's not clear why the test is there. The skip_mode decoration is much more convenient. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:56:37 +03:00
Pavel Emelyanov	2c90aeb5ee	test.py: Give local variable meaningful name Rename t to testname as it's more informative Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:53:48 +03:00
Pavel Emelyanov	b2f5b63aaa	test.py: Sanitize test list creation To create the list of tests to run there's a loop that fist collects all tests from suits, then filters the list in two ways -- excludes opt-out-ed lists (disabled and matching the skip pattern) or leaves there only opt-in-ed (those, specified as positional arguments). This patch keeps both list-checking code close to each other so that the intent is explicitly clear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:53:20 +03:00
Kamil Braun	69bf962522	Merge 'allow changing snitch with topology over raft' from Gleb Fixes scylladb/scylladb#17513 * 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev: doc: amend snitch changing procedure to work with raft test: add test to check that snitch change takes effect. raft topology: update rack/dc info in topology state on reboot if changed	2024-03-25 10:41:39 +01:00
Gleb Natapov	3b272c5650	doc: amend snitch changing procedure to work with raft To change snitch with raft all nodes need to be started simultaneously since each node will try to update its state in the raft and for that quorum is required.	2024-03-25 11:31:30 +02:00
Beni Peled	eecfd164ff	Remove docs-amplify-enhanced github-workflow Since we implemented the CI-Docs on pkg, there is no need for this workflow Closes scylladb/scylladb#17908	2024-03-25 11:30:06 +02:00
Kefu Chai	e97ae6b0de	raft: server: print pointee of `server_impl::_fsm` before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, instead of printing the `unique_ptr` instance, we print the pointee of it. since `server_impl` uses pimpl paradigm, `_fsm` is always valid after `server_impl::start()`, we can always deference it without checking for null. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17953	2024-03-25 11:20:34 +02:00
Botond Dénes	ff421168d0	Update tools/jmx submodule * tools/jmx 3257897a...53696b13 (1): > dist/debian: do not use substvar of ${shlib:Depends}	2024-03-25 11:16:25 +02:00
Gleb Natapov	d7adf26a56	test: add test to check that snitch change takes effect. The test creates two node cluster with default snitch (SimpleSnitch) and checks that dc and rack names are as expected. Then it changes the config to use GossipingPropertyFileSnitch with different names, restart nodes and check that now peers table has new names.	2024-03-25 10:41:49 +02:00
Kefu Chai	4eabf8b617	topology_coordinator: add fmt::formatter for wait_for_ip_timeout before this change, we rely on the default-generated fmt::formatter created from operator<<. but this depends on the `FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10. this change addresses the formatting with fmtlib < 10, and without `FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up, it defines formatter for classes derived from `std::exception`, so our formatter is only added when compiled with {fmt} < 10. in this change, `fmt::formatter<service::wait_for_ip_timeout>` is added for backward compatibility with {fmt} < 10. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17955	2024-03-25 10:39:38 +02:00
Kefu Chai	5d59dd585f	configure.py: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated at the first run of `configure.py`, once these files are around, they are not updated despite that `SCYLLA_VERSION_GEN` does not generate them as long as the release string retrieved from git sha1 is identical the one stored in `SCYLLA-RELEASE-FILE`, because we don't rerun `SCYLLA_VERSION_GEN` at all. but the pain is, when performing incremental build, like other built artifacts, these generated files stay with the build directory, so even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE keeps the same -- it still contains the original git sha1 when it was created. this could leads to confusion if developer or even our CI perform incremental build using the same workspace and build directory, as the built scylla executables always report the same version number. in this change, we always rebuilt the said SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct ninja to re-stat the output files, see https://ninja-build.org/manual.html#ref_rule, in order to avoid unnecessary rebuild. so the downside is that `SCYLLA_VERSION_GEN` is executed every time we run `ninja` even if all targets are updated. but the upside is that the release number reported by scylla is accurate even if we perform incremental build. also, since we encode the product, version and release stored in the above files in the generated `build.ninja` file, in this change, these three files are added as dependencies of `build.ninja`, so that this file is regenerated if any of them is newer than `build.ninja`. Fixes #8255 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17974	2024-03-25 10:29:42 +02:00
Kefu Chai	5bc6d83f3b	build: cmake: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated when CMake generate `build.ninja` for the first time, once these files are around, they are not updated anymore. despite that `SCYLLA_VERSION_GEN` does not generate them as long as the release string retrieved from git sha1 is identical the one stored in `SCYLLA-RELEASE-FILE`, because we don't rerun `SCYLLA_VERSION_GEN` at all. but the pain is, when performing incremental build, like other built artifacts, these generated files stay with the build directory, so even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE keeps the same -- it still contains the original git sha1 when it was created. this could leads to confusion if developer or even our CI perform incremental build using the same workspace and build directory, as the built scylla executables always report the same version number. in this change, we always rebuilt the said SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct CMake to regenerate `build.ninja` if any of these files is updated. Fixes #17975 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17983	2024-03-25 10:28:28 +02:00
Kefu Chai	0eb990fbf6	.github: skip "raison" when running codespell workflow codespell workflow checks for misspellings to identify common mispellings. it considers "raison" in "raison d'etre" (the accent mark over "e" is removed , so the commit message can be encoded in ASCII), to the misspelling of "reason" or "raisin". apparently, the dictionary it uses does not include les mots francais les plus utilises. so, in this change, let's ignore "raison" for this very use case, before we start the l10n support of the document. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17985	2024-03-25 09:51:12 +02:00
Kefu Chai	0713c324d4	cql3: provide fmt::formatter for cql3_type::raw only for {fmt} < 10 since we already have `format_as()` for `cql3_type::raw`, there is no need to provide `cql3_type::raw` if the tree is compiled with {fmt} >= 10, otherwise compiler is not able to figure out which one to match, see the errror at the end of this commit message. so, in this change, we only provide the specialized `fmt::formatter` for `cql3_type::raw` when {fmt} < 10. this should address the FTBFS with {fmt} >= 10. ``` /usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1040:25: error: ambiguous partial specializations of 'formatter<cql3::cql3_type::raw>' 1040 \| = __bool_constant<__is_constructible(_Tp, _Args...)>; \| ^ /usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1046:16: note: in instantiation of template type alias '__is_constructible_impl' requested here 1046 \| : public __is_constructible_impl<_Tp, _Args...> \| ^ /usr/include/fmt/core.h:1420:13: note: in instantiation of template class 'std::is_constructible<fmt::formatter<cql3::cql3_type::raw>>' requested here 1420 \| !has_formatter<T, Context>::value))> \| ^ /usr/include/fmt/core.h:1421:22: note: while substituting prior template arguments into non-type template parameter [with T = cql3::cql3_type::raw] 1421 \| FMT_CONSTEXPR auto map(const T&) -> unformattable_pointer { \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1422 \| return {}; \| ~~~~~~~~~~ 1423 \| } \| ~ ``` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17986	2024-03-25 09:49:40 +02:00
Yaron Kaikov	cb2c69a3f7	github: mergify: Add Ref to original PR When openning a backport PR, adding a reference to the original PR. This will be used later for updating the original PR/issue once the backport is done (with different label) Closes scylladb/scylladb#17973	2024-03-25 08:12:47 +02:00
Raphael S. Carvalho	6bdb456fad	sstables_loader: Fix loader when write selector is previous during tablet migration The loader is writing to pending replica even when write selector is set to previous. If migration is reverted, then the writes won't be rolled back as it assumes pending replicas weren't written to yet. That can cause data resurrection if tablet is later migrated back into the same replica. NOTE: write selector is handled correctly when set to next, because get_natural_endpoints() will return the next replica set, and none of the replicas will be considered leaving. And of course, selector set to both is also handled correctly. Fixes #17892. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17902	2024-03-24 01:20:50 +01:00
Kamil Braun	230f23004b	Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables" This reverts commit `b4144d14c6`. The test is flaky and blocks next promotions.	2024-03-22 17:25:04 +01:00
Petr Gusev	2a5f5d1948	test_fencing: fix flakiness To cause the stale topology exception the test reads the version from the last bootstrapped host and assigns its decremented value to version and fence_version fields of system.topology. The test assumes that version == fence_version here, if version is greater than fence_version we won't get state topology exception in this setup. Tablet balancer can break this -- it may increment the version after the last node is bootstrapped. Fix this by disabling the tablet balancer earlier. fixes scylladb/scylladb#17807 Closes scylladb/scylladb#17940	2024-03-22 12:49:13 +01:00
Piotr Dulikowski	f23f8f81bf	Merge 'Raft-based service levels' from Michał Jadwiszczak This patch introduces raft-based service levels. The difference to the current method of working is: - service levels are stored in `system.service_levels_v2` - reads are executed with `LOCAL_ONE` - writes are done via raft group0 operation Service levels are migrated to v2 in topology upgrade. After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then) Fixes #17926 Closes scylladb/scylladb#16585 * github.com:scylladb/scylladb: test: test service levels v2 works in recovery mode test: add test for service levels migration test: add test for service levels snapshot test:topology: extract `trigger_snapshot` to utils main: create raft dda if sl data was migrated service:qos: store information about sl data migration service:qos: service levels migration main: assign standard service level DDA before starting group0 service:qos: fix `is_v2()` method service:qos: add a method to upgrade data accessor test: add unit_test_raft_service_levels_accessor service:storage_service: add support for service levels raft snapshot service:qos: add abort_source for group0 operations service:qos: raft service level distributed data accessor service:qos: use group0_guard in data accessor cql3:statements: run service level statements on shard0 with raft guard test: fix overrides in unit_test_service_levels_accessor service:qos: fix indentation service:qos: coroutinize some of the methods db:system_keyspace: add `SERVICE_LEVELS_V2` table service:qos: extract common service levels' table functions	2024-03-22 11:51:53 +01:00
Ferenc Szili	b50a9f9bab	removed forward declaration of resharding_descriptor resharding_descriptor has been removed in `e40aa042` in 2020	2024-03-22 11:35:10 +01:00
Ferenc Szili	93395e2ebe	compaction options and troubleshooting docs Added unchecked_tombstone_compaction descrition to compaction docs. Added section to troubleshooting pointless compaction.	2024-03-22 11:26:17 +01:00
Ferenc Szili	455959b80e	cql-pytest/test_compaction_strategy_validation.py Adds the check for the wording of the validation error on invalid values of unchecked_tombstone_compaction	2024-03-22 11:22:56 +01:00
Ferenc Szili	5c0de3b097	test/boost/sstable_compaction_test.cc Checks if the tombstone_threshold value will be ignored if unchecked_tombstone_compaction is set to true	2024-03-22 11:21:21 +01:00
Kamil Braun	9979adb670	Merge 'topology_coordinator: do not clear unpublished CDC generation's data' from Patryk Jędrzejczak In this PR, we ensure unpublished CDC generation's data is never removed, which was theoretically possible. If it happened, it could cause problems. CDC generation publisher would then try to publish the generation with its data removed. In particular, the precondition of calling `_sys_ks.read_cdc_generation` wouldn't be satisfied. We also add a test that passes only after the fix. However, this test needs to block execution of the CDC generation publisher's loop twice. Currently, error injections with handlers do not allow it because handlers always share received messages. Apart from the first created handler, all handlers would be instantly unblocked by a message from the past that has already unblocked the first handler. This seems like a general limitation that could cause problems in the future, so in this PR, we extend injections with handlers to solve it once and for all. We add the `share_messages` parameter to the `inject` (with handler) function. Depending on its value, handlers will share messages (as before) or not. Fixes scylladb/scylladb#17497 Closes scylladb/scylladb#17934 * github.com:scylladb/scylladb: topology_coordinator: clean_obsolete_cdc_generations: fix log topology_coordinator: do not clear unpublished CDC generation's data topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages error_injection: allow injection handlers to not share messages	2024-03-22 11:20:26 +01:00
Ferenc Szili	5a65169f46	compaction: implement unchecked_tombstone_compaction This change adds the missing Cassandra compaction option unchecked_tombstone_compaction. Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only on the value of tombstone_compaction_interval	2024-03-22 11:19:43 +01:00
Kamil Braun	4359a1b460	Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev In this PR we add timeouts support to raft groups registry. We introduce the `raft_server_with_timeouts` class, which wraps the `raft::server` add exposes its interface with additional `raft_timeout` parameter. If it's set, the wrapper cancels the `abort_source` after certain amount of time. The value of the timeout can be specified either in the `raft_timeout` parameter, or the default value can be set in `the raft_server_with_timeouts` class constructor. The `raft_group_registry` interface is extended with `group0_with_timeouts()` method. It returns an instance of `raft_server_with_timeouts` for group0 raft server. The timeout value for it is configured in `create_server_for_group0`. It's one minute by default and can be overridden for tests with `group0-raft-op-timeout-in-ms` parameter. The new api allows the client to decide whether to use timeouts or not. In this PR we are reviewing all the group0 call sites and add `raft_timeout` if that makes sense. The general principle is that if the code is handling a client request and the client expects a potential error, we use timeouts. We don't use timeouts for background fibers (such as topology coordinator), since they wouldn't add much value. The only thing the background fiber can do with a timeout is to retry, and this will have the same end effect as not having a timeout at all. Fixes scylladb/scylladb#16604 Closes scylladb/scylladb#17590 * github.com:scylladb/scylladb: migration_manager: use raft_timeout{} storage_service::join_node_response_handler: use raft_timeout{} storage_service::start_upgrade_to_raft_topology: use raft_timeout{} storage_service::set_tablet_balancing_enabled: use raft_timeout{} storage_service::move_tablet: use raft_timeout{} raft_check_and_repair_cdc_streams: use raft_timeout{} raft_timeout: test that node operations fail properly raft_rebuild: use raft_timeout{} do_cluster_cleanup: use raft_timeout{} raft_initialize_discovery_leader: use raft_timeout{} update_topology_with_local_metadata: use with_timeout{} raft_decommission: use raft_timeout{} raft_removenode: use raft_timeout{} join_node_request_handler: add raft_timeout to make_nonvoters and add_entry raft_group0: make_raft_config_nonvoter: add raft_timeout parameter raft_group0: make_raft_config_nonvoter: add abort_source parameter manager_client: server_add with start=false shouldn't call driver_connect scylla_cluster: add seeds parameter to the add_server and servers_add raft_server_with_timeouts: report the lost quorum join_node_request_handler: add raft_timeout{} for start_operation skip_mode: add platform_key auth: use raft_timeout{} raft_group0_client: add raft_timeout parameter raft_group_registry: add group0_with_timeouts utils: add composite_abort_source.hh error_injection: move api registration to set_server_init error_injection: add inject_parameter method error_injection: move injection_name string into injection_shared_data error_injection: pass injection parameters at startup	2024-03-22 10:45:33 +01:00
Botond Dénes	f02baef871	Merge 'test/lib: sstable::test_env consolidate and reduce header footprint' from Avi Kivity Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease maintenance and reduce recompilations. Closes scylladb/scylladb#17965 * github.com:scylladb/scylladb: test: sstables::test_env: complete pimplification test/lib: test_env: move test_env::reusable_sst() to test_services.cc	2024-03-22 11:26:12 +02:00
Botond Dénes	8b2856339a	Merge 'github: sync-labels: use more descriptive name for workflow' from Kefu Chai * rename `sync_labels.yaml` to `sync-labels.yaml` * use more descrptive name for workflow Closes scylladb/scylladb#17971 * github.com:scylladb/scylladb: github: sync-labels: use more descriptive name for workflow github: sync_labels: rename sync_labels to sync-labels	2024-03-22 10:01:56 +02:00
David Garcia	0375faa6aa	docs: add experimental tag Closes scylladb/scylladb#17633	2024-03-22 09:53:30 +02:00
Patryk Wrobel	28ed20d65e	scylla-nodetool: adjust effective ownership handling When a keyspace uses tablets, then effective ownership can be obtained per table. If the user passes only a keyspace, then /storage_service/ownership/{keyspace} returns an error. This change: - adds an additional positional parameter to 'status' command that allows a user to query status for table in a keyspace - makes usage of /storage_service/ownership/{keyspace} optional to avoid errors when user tries to obtain effective ownership of a keyspace that uses tablets - implements new frontend tests in 'test_status.py' that verify the new logic Refs: scylladb#17405 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17827	2024-03-22 09:51:57 +02:00
Yaron Kaikov	407d25e47b	[mergify] delete backport branch after merge Since those branches clutter the branch search UI and we don't need them after merging Closes scylladb/scylladb#17961	2024-03-22 09:51:22 +02:00
Calle Wilund	7e09517433	Update seastar submodule Submodule seastar 6b7b16a8a3..cd8a9133d2: > abort_source: add fmt::formatter for abort_requested_exception > memory: Ensure thread locals etc are minimally initialized even with non-seastar reactor options for alloc > rpc: add fmt::formatter for rpc::error classes and rpc::optional > Merge 'Adding Metrics family config' from Amnon Heiman > util: add fmt::formatter for bool_class<Tag> > util/bool_class: use the default-generated comparison operators > membarrier: cooperatively serialize calls to sys_membarrier > Merge 'build: relax the version constraint for Protobuf' from Kefu Chai > tls: add fmt::formatter for tls::subject_alt_name > memory.cc: Fix static init fiasco in system malloc override diff --git a/seastar b/seastar index 6b7b16a8a3..cd8a9133d2 160000 --- a/seastar +++ b/seastar @@ -1 +1 @@ -Subproject commit 6b7b16a8a329d831b94fdd4b41f6f55b260e9afd +Subproject commit cd8a9133d2c02f63dbd578d882cf7333a427e194 Closes scylladb/scylladb#17865	2024-03-22 09:49:23 +02:00
Kefu Chai	7ebdfdb705	github: sync-labels: use more descriptive name for workflow "label-sync" is not very helpful for developers to understand what this workflow is for. the "name" field of a job shows in the webpage on github of the pull request against which the job is performed, so if the author or reviewer checks the status of the pull request, he/she would notice these names aside of the workflow's name. for this very job, what we have now is: ``` Sync labels / label-sync ``` after this change it will be: ``` Sync labels / Synchronize labels between PR and the issue(s) fixed by it ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-22 10:41:20 +08:00
Kefu Chai	af879759b9	github: sync_labels: rename sync_labels to sync-labels to be more consistent with other github workflows Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-22 10:31:31 +08:00
Michał Jadwiszczak	c0853b461c	test: test service levels v2 works in recovery mode	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	c551a85cda	test: add test for service levels migration	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	5811f696be	test: add test for service levels snapshot	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	bf3aed1ecb	test:topology: extract `trigger_snapshot` to utils The function was defined separately in a few tests.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	a08918a320	main: create raft dda if sl data was migrated Create `raft_service_levels_distributed_data_accessor` if service levels were migrated to v2 table. This supports raft recovery mode, as service levels will be read from v2 table in the mode.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	dab909b1d1	service:qos: store information about sl data migration Save information whether service levels data was migrated to v2 table. The information is stored in `system.scylla_local` table. It's written with raft command and included in raft snapshot.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	2917ec5d51	service:qos: service levels migration Migrate data from `system_distributes.service_levels` to `system.service_levels_v2` during raft topology upgrade. Migration process reads data from old table with CL ALL and inserts the data to the new table via raft.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	36c9afda99	main: assign standard service level DDA before starting group0 `topology_state_load()` is responsible for upgrading service level DDA, so the standard DDA has to be assigned before to be upgraded	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	159a6a2169	service:qos: fix `is_v2()` method	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	fd32f5162a	service:qos: add a method to upgrade data accessor	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d403bdfdd5	test: add unit_test_raft_service_levels_accessor Raft service level data accessor with logic simillar to `unit_test_service_levels_accessor` to avoid sleeps in boost tests.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	8bbeea0169	service:storage_service: add support for service levels raft snapshot Include mutations from `system.service_levels_v2` in `raft_snapshot`.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d5fa0747d7	service:qos: add abort_source for group0 operations Add mechanism to abort ongoing group0 operations while draining service_level_controller or leaving the cluster.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	7e61bbb0d5	service:qos: raft service level distributed data accessor `raft_service_level_distributed_data_accessor` works this way: - on read path it reads service levels from `SYSTEM.SERVICE_LEVELS_V2` table with CL = LOCAL_ONE - on write path it starts group0 operation and it makes the change using raft command	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	71c07addb5	service:qos: use group0_guard in data accessor Adjust service_level_controller and service_level_controller::service_level_distributed_data_accessor interfaces to take `group0_guard` while adding/altering/dropping a service level.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	da82c5f0b0	cql3:statements: run service level statements on shard0 with raft guard To migrate service levels to be raft managed, obtain `group0_guard` to be able to pass it to service_level_controller's methods. Using this mechanism also automatically provides retries in case of concurrent group0 operation.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	674286b868	test: fix overrides in unit_test_service_levels_accessor	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	c0e22fcb9c	service:qos: fix indentation	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	1f3c6b2813	service:qos: coroutinize some of the methods Functions: - `service_level_controller::set_distributed_service_level()` - `service_level_controller::drop_distributed_service_level()` - `service_level_controller::drain()` Coroutines increase readability of those functions.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	8e242f5acd	db:system_keyspace: add `SERVICE_LEVELS_V2` table The table has the same schema as `system_distributed.service_levels`. However it's created entirely at once (unlike old table which creates base table first and then it adds other columns) because `system` tables are local to the node.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	990c5e7dd0	service:qos: extract common service levels' table functions Getting a service level(s) will be done the same way in raft-based service levels as it's done in standard service levels, so those funtions are extracted to reused it.	2024-03-21 23:14:57 +01:00
Avi Kivity	b530dc1e3b	test: sstables::test_env: complete pimplification sstables::test_env uses the pimpl idiom, but incompletely. This prevents reaping some of the benefits. Complete the pimplification: - the `impl` nested struct is moved out-of-line - all non-template member functions are moved out-of-line - a destructor is declared and defined out-of-line - the move constructor is also defined (necessary after the destructor is defined) After this, we can forward-declare more components.	2024-03-21 22:29:01 +02:00
Avi Kivity	d745929b44	test/lib: test_env: move test_env::reusable_sst() to test_services.cc test_env implementation is scattered around two .cc, concentrate it in test_services.cc, which happens to be the file that doesn't cause link errors. Move toc_filename with it, as it is its only caller and it is static.	2024-03-21 22:21:02 +02:00
Kefu Chai	900b56b117	raft_group0: print runtime_error by printing e.what() before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. but fortunately, fmt v10 brings the builtin formatter for classes derived from `std::exception`. but before switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM` macro, we need to print out `std::runtime_error`. so far, we don't have a shared place for formatter for `std::runtime_error`. so we are addressing the needs on a case-by-case basis. in this change, we just print it using `e.what()`. it's behavior is identical to what we have now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17954	2024-03-21 19:43:52 +02:00
Avi Kivity	f0ca5e5a08	Merge 'treewide: add fmt::formatter for exception types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter` is added for following types for backward compatibility with {fmt} < 10: * `utils::bad_exception_container_access` * `cdc::no_generation_data_exception` * classes derived from `sstables::malformed_sstable_exception` * classes derived from `cassandra_exception` Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17944 * github.com:scylladb/scylladb: cdc: add fmt::formatter for exception types in data_dictionary.hh utils: add fmt::formatter for utils::bad_exception_container_access sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception exceptions: add fmt::formatter for classes derived from cassandra_exception cdc: add fmt::formatter for cdc::no_generation_data_exception	2024-03-21 18:44:37 +02:00
Botond Dénes	f9104fbfa9	tools/toolchain/image: update python driver (implicit) Fixes: #17662 Closes scylladb/scylladb#17956	2024-03-21 18:27:40 +02:00
Andrei Chekun	7de28729e7	test: change maintenance socket location to /tmp Fixes #16912 By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility. Closes scylladb/scylladb#17941	2024-03-21 18:22:21 +02:00
Patryk Jędrzejczak	33a0864aaa	topology_coordinator: clean_obsolete_cdc_generations: fix log We use a non-inclusive bound here, so the log was incorrect.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	27465a00e0	topology_coordinator: do not clear unpublished CDC generation's data In this commit, we ensure unpublished CDC generation's data is never removed, which was theoretically possible. If it happened, it could cause problems. CDC generation publisher would then try to publish the generation with its data removed. In particular, the precondition of calling `_sys_ks.read_cdc_generation` wouldn't be satisfied. We also add a test that passes only after the fix.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	f45aebeee2	topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages In the following commit, we add a test that needs to block the CDC generation publisher's loop twice. We allow it in this commit by making handlers of the `cdc_generation_publisher_fiber` injection share messages. From now on, unblocking every step of the loop will require sending a new message from the test. This change breaks the test already using the `cdc_generation_publisher_fiber` injection, so we adjust the test.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	c5c4cc7d00	error_injection: allow injection handlers to not share messages For a single injection, all created injection handlers share all received messages. In particular, it means that one received message unblocks all handlers waiting for the first message. This behavior is often desired, for example, if multiple fibers execute the injected code and we want to unblock them all with a single message. However, there is a problem if we want to block every execution of the injected code. Apart from the first created handler, all handlers will be instantly unblocked by messages from the past that have already unblocked the first handler. In one of the following commits, we add a test that needs to block the CDC generation publisher's loop twice. Since it looks like there are no good workarounds for this arguably general problem, we extend injections with handlers in a way that solves it. We introduce the new `share_messages` parameter. Depending on its value, handlers will share messages or not. The details are described in the new comments in `error_injection.hh`. We also add some basic unit tests for the new funcionality.	2024-03-21 14:35:38 +01:00
Petr Gusev	ae0ec19537	migration_manager: use raft_timeout{} Checking all the call sites of the migration manager shows that all of them are initiated by user requests, not background activities. Therefore, we add a global raft_timeout{} here.	2024-03-21 16:35:48 +04:00
Petr Gusev	294e1ff464	storage_service::join_node_response_handler: use raft_timeout{} This function is called as part of a node join procedure initiated by the user, so having timeouts here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	e53189dcdc	storage_service::start_upgrade_to_raft_topology: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	6e350fb580	storage_service::set_tablet_balancing_enabled: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	22d7c62c3c	storage_service::move_tablet: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	dafd5d0160	raft_check_and_repair_cdc_streams: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	ca21362ade	raft_timeout: test that node operations fail properly	2024-03-21 16:35:48 +04:00
Petr Gusev	dcc275cb0f	raft_rebuild: use raft_timeout{} This is a user-requested operation, so having a timeout here makes sense. The test will be provided in a subsequent commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	8deb06647a	do_cluster_cleanup: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	d5d2f04cd6	raft_initialize_discovery_leader: use raft_timeout{} This function is called as part of a node startup procedure, so a timeout may be useful. As outlined in the comment, there is no valid way we can lose quorum here, but some subsystems may just become unreasonably slow for various reasons, so we nonetheless use raft_timeout{} here.	2024-03-21 16:35:48 +04:00
Petr Gusev	f498cfae79	update_topology_with_local_metadata: use with_timeout{} This function is called as part of a node startup procedure, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	f1f77b4882	raft_decommission: use raft_timeout{} This is a user requested operation, so having a timeout here makes sense. The test will be provided in a subsequent commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	aabcc0852a	raft_removenode: use raft_timeout{} This is a user requested operation, so having a timeout here makes sense. The test will be provided in a subsequent commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	099c756ba1	join_node_request_handler: add raft_timeout to make_nonvoters and add_entry We also add a specific test_quorum_lost_during_node_join. It exercises the case when the quorum is lost after start_operation but before these methods are called.	2024-03-21 16:35:48 +04:00
Petr Gusev	0ad852e323	raft_group0: make_raft_config_nonvoter: add raft_timeout parameter We'll use this parameter in subsequent commits.	2024-03-21 16:35:48 +04:00
Petr Gusev	ce7fb39750	raft_group0: make_raft_config_nonvoter: add abort_source parameter	2024-03-21 16:35:48 +04:00
Petr Gusev	99ddffac32	manager_client: server_add with start=false shouldn't call driver_connect If the server is not started there is not point in starting the driver, it would fail because there are no nodes to connect to. On the other hand, we should connect the driver in server_start() if it's not connected yet.	2024-03-21 16:35:48 +04:00
Petr Gusev	3f6cf38dd5	scylla_cluster: add seeds parameter to the add_server and servers_add If this parameter is set, we use its value for the scylla.yaml of the new node, otherwise we use IPs of all running nodes as before. We'll need this parameter in subsequent commits to restrict the communication between nodes. We remove default values for _create_server_add_data parameters since they are redundant - in the two call sites we pass all of them.	2024-03-21 16:35:48 +04:00
Petr Gusev	99419d5964	raft_server_with_timeouts: report the lost quorum In this commit we extend the timeout error message with additional context - if we see that there is no quorum of available nodes, we report this as the most likely cause of the error. We adjust the test by adding this new information to the expected_error. We need raft-group-registry-fd-threshold-in-ms to make _direct_fd threshold less than group0-raft-op-timeout-in-ms.	2024-03-21 16:35:48 +04:00
Petr Gusev	1a3fc58438	join_node_request_handler: add raft_timeout{} for start_operation In the test, we use the group0-raft-op-timeout-in-ms parameter to reduce the timeout to one second so as not to waste time. The join_node_request_handler method contains other group0 calls which should have timeouts (make_nonvoters and add_entry). They will be handled in a separate commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	854531ae8e	skip_mode: add platform_key In subsequent commits we are going to add test.py tests for raft_timeout{} feature. The problem is that aarch/debug configuration is infamously slow. Timeout settings used in tests work for all platforms but aarch/debug. In this commit we extend the skip_mode attribute with the platform_key property. We'll use @skip_mode('debug', platform_key='aarch64') to skip the tests for this specific configuration. The tests will still be run for aarch64/release.	2024-03-21 16:35:43 +04:00
Yaron Kaikov	5bd6b4f4c2	github: sync_labels: match issue number with better pattern Seen in https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535 ``` python .github/scripts/sync_labels.py --repo scylladb/scylladb --number 17309 --action labeled --label backport/none shell: /usr/bin/bash -e {0} env: GITHUB_TOKEN: *** Found issue number: ('', '', '15465') Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 9[3](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:3), in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main sync_labels(repo, args.number, args.label, args.action, args.is_issue) File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line [7](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:8)1, in sync_labels target = repo.get_issue(int(pr_or_issue_number)) TypeError: int() argument must be a string, a bytes-like object or a real number, not 'tuple' Error: Process completed with exit code 1. ``` Fixing the pattern to catch all GitHub supported close keywords as describe in https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword Fixed: https://github.com/scylladb/scylladb/issues/17917 Fixed: https://github.com/scylladb/scylladb/issues/17921 Closes scylladb/scylladb#17920	2024-03-21 14:25:24 +02:00
Petr Gusev	e335b17190	auth: use raft_timeout{} The only place where we don't need raft_timeout{} is migrate_to_auth_v2 since it's called from topology_coordinator fiber. All other places are called from user context, so raft_timeout{} is used.	2024-03-21 16:12:51 +04:00
Petr Gusev	cebf87bf59	raft_group0_client: add raft_timeout parameter In this commit we add raft_timeout parameter to start_operation and add_entry method. We fix compilation in default_authorizer.cc, bind_front doesn't account for default parameter values. We should use raft_timeout{} here, but this is for another commit.	2024-03-21 16:12:51 +04:00
Petr Gusev	3d1b94475f	raft_group_registry: add group0_with_timeouts In this commit we add timeouts support to raft groups registry. We introduce the raft_server_with_timeouts class, which wraps the raft::server add exposes its interface with additional raft_timeout parameter. If it's set, the wrapper cancels the abort_source after certain amount of time. The value of the timeout can be specified in the raft_timeout parameter, or the default value can be set in the raft_server_with_timeouts class constructor. The raft_group_registry interface is extended with get_server_with_timeouts(group_id) and group0_with_timeouts() methods. They return an instance of raft_server_with_timeouts for a specified group id or for group0. The timeout value for it is configured in create_server_for_group0. It's one minute by default, can be overridden for tests with group0-raft-op-timeout-in-ms parameter. The new api allows the client to decide whether to use timeouts or not. In subsequent commits we are going to review all group0 call sites and add raft_timeout if that makes sense. The general principle is that if the code is handling a client request and the client expects a potential error, we use timeouts. We don't use timeouts for background fibers (such as topology coordinator), since they won't add much value. The only thing the background fiber can do with a timeout is to retry, and this will have the same effect as not having a timeout at all.	2024-03-21 16:12:51 +04:00
Petr Gusev	532a720c3d	utils: add composite_abort_source.hh	2024-03-21 16:12:51 +04:00
Kefu Chai	8dacec589d	cql3: add fmt::formatter for cql3_type and cql3_type::raw before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<>` is added for following classes: * `cql3::cql3_type` * `cql3::cql3_type::raw` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17945	2024-03-21 14:08:50 +02:00
Nadav Har'El	fdeb14b468	Merge 'scylla-nodetool: make command-line parsing fully compatible with the legacy nodetool' from Botond Dénes There was two more things missing: * Allow global options to be positioned before the operation/command option (https://github.com/scylladb/scylladb/issues/16695) * Ignore JVM args (https://github.com/scylladb/scylladb/issues/16696) This PR fixes both. With this, hopefully we are fully compatible with nodetool as far as command line parsing is concerned. After this PR goes in, we will need another fix to tools/java/bin/nodetool-wrapper, to allow user to benefit from this fix. Namely, after this PR, we can just try to invoke scylla-nodetool first with all the command-line args as-is. If it returns with exit-code 100, we fall back to nodetool. We will not need the current trick with `--help $1`. In fact, this trick doesn't work currently, because `$1` is not guaranteed to be the command in the first place. In addition to the above, this PR also introduces a new option, to help us in the switching process. This is `--rest-api-port`, which can also be provided as `-Dcom.scylladb.apiPort`. When provided, this option takes precedence over `--port\|-p`. This is intended as a bridge for `scylla-ccm`, which currently provides the JMX port as `--port`. With this change, it can also provided the REST API port as `-Dcom.scylladb.apiPort`. The legacy nodetool will ignore this, while the native nodetool will use it to connect to the correct REST API address. After the switch we can ditch these options. Fixes: https://github.com/scylladb/scylladb/issues/16695 Fixes: https://github.com/scylladb/scylladb/issues/16696 Refs: https://github.com/scylladb/scylladb/issues/16679 Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#17168 * github.com:scylladb/scylladb: tools/scylla-nodetool: add --rest-api-port option tools/scylla-nodetool: ignore JVM args tools/utils: make finding the operation command line option more flexible tools/utils: get_selected_operation(): remove alias param tools: add constant with current help command-line arguments	2024-03-21 14:06:45 +02:00
Pavel Emelyanov	c8fc43d169	test: Update topology_custom/suite::run_first list The recently added test_tablets_migration dominates with it run-time (10 minutes). Also update other tests, e.g. test_read_repair is not in top-7 for any mode, test_replace and test_raft_recovery_majority_loss are both not notably slower than most of other tests (~40 sec both). On the other hand, the test_raft_recovery_basic and test_group0_schema_versioning are both 1+ minute Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17927	2024-03-21 12:48:50 +01:00
Gleb Natapov	e26b0f34a0	raft topology: update rack/dc info in topology state on reboot if changed It is allowed to change a snitch after cluster is already running. Changing a snitch may cause dc and/or rack names to be changed and gossiper handles it by gossiping new names on restart. The patch changes raft mode to update the names on restart as well.	2024-03-21 12:44:12 +02:00
Andrei Chekun	a5455460d8	test: fix flakiness of the multi_dc tests The initial version used a redundant method, and it did not cover all cases. So that leads to the flakiness of the test that used this method. Switching to the cluster_con() method removes flakiness since it's written more robustly. Fixes scylladb/scylladb#17914 Closes scylladb/scylladb#17932	2024-03-21 11:17:22 +01:00
Asias He	9587352f13	repair: Invoke group0 read barrier in repair_tablets This allows the repair master to see all previous metadata changes. Refs #17658 Closes scylladb/scylladb#17942	2024-03-21 10:54:40 +01:00
Kamil Braun	4dfb7e3051	Merge 'storage_service::merge_topology_snapshot: handle big mutations' from Petr Gusev The group0 state machine calls `merge_topology_snapshot` from `transfer_snapshot`. It feeds it with `raft_topology_snapshot` returned from `raft_pull_topology_snapshot`. This snapshot includes the entire `system.cdc_generations_v3` table. It can be huge and break the commitlog `max_record_size` limit. The `system.cdc_generations_v3` is a single-partition table, so all the data is contained in one mutation object. To fit the commitlog limit we split this mutation into many smaller ones and apply them in separate `database::apply` calls. That means we give up the atomicity guarantee, but we actually don't need it for `system.cdc_generations_v3` and `system.topology_requests`. This PR fixes the dtest `update_cluster_layout_tests.py::TestLargeScaleCluster::test_add_many_nodes_under_load` Fixes scylladb/scylladb#17545 Closes scylladb/scylladb#17632 * github.com:scylladb/scylladb: test_cdc_generation_data: test snapshot transfer storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations mutation: add split_mutation function storage_service::merge_topology_snapshot: fix indentation	2024-03-21 10:50:03 +01:00
Avi Kivity	628017c810	test: sstables::test_env: mock sstables_registry sstables::test_env is intended for sstable unit tests, but to satisfy its dependency of an sstables_registry we instantiate an entire database. Remove the dependency by having a mock implementation of sstables_registry and using that instead. Closes scylladb/scylladb#17895	2024-03-21 10:19:46 +01:00
Tomasz Grabiec	baf12b0b2f	test: tablets: Avoid infinite loop in rebalance_tablets() If there is a bug in the tablet scheduler which makes it never converge for a given state of topology, rebalance_tablets() will never complete and will generate a huge amounts of logs. This patch adds a sanity limit so that we fail earlier. This was observed in one of the test_load_balancing_with_random_load runs in CI. Fixes scylladb/scylladb#17894. Closes scylladb/scylladb#17916	2024-03-21 10:19:46 +01:00
Kamil Braun	bc42a5a092	Merge 'make sure that address map entry is not dropped between join request placement and the request handling' from Gleb The series marks nodes to be non expiring in the address map earlier, when they are placed in the topology. Fixes: scylladb/scylladb#16849 * 'gleb/16849-fix-v2' of github.com:scylladb/scylla-dev: test: add test to check that address cannot expire between join request placemen and its processing topology_coordinator: set address map entry to nonexpiring when a node is added to the topology raft_group0: add modifiable_address_map() function	2024-03-21 10:19:46 +01:00
Kamil Braun	676af581d8	Merge 'cdc: should_propose_first_generation: get my_host_id from caller' from Benny Halevy There is no need to map this node's inet_address to host_id. The storage_service can easily just pass the local host_id. While at it, get the other node's host_id directly from their endpoint_state instead of looking it up yet again in the gossiper, using the nodes' address. Refs #12283 Closes scylladb/scylladb#17919 * github.com:scylladb/scylladb: cdc: should_propose_first_generation: get my_host_id from caller storage_service: add my_host_id	2024-03-21 10:19:46 +01:00
Avi Kivity	43bcaeb87f	Merge 'test: randomized_nemesis_test: add fmt::formatter for some types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raft_call * raft_read * network_majority_grudge * reconfiguration * stop_crash * operation::thread_id * append_seq * AppendReg::append * AppendReg::ret * operation::either_of<Ops...> * operation::exceptional_result<Op> * operation::completion<Op> * operation::invocable<Op> and drop their operator<<:s. in which, * `operator<<` for append_entry is never used. so it is removed. * `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}. * stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change. Refs #13245 Closes scylladb/scylladb#17884 * github.com:scylladb/scylladb: test: raft: generator: add fmt::formatter:s test: randomized_nemesis_test: add fmt::formatter for some types test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error raft: add fmt::formatter for error classes	2024-03-21 10:19:46 +01:00
Kefu Chai	6d77283941	cdc: add fmt::formatter for exception types in data_dictionary.hh before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<>` is added for following classes for backward compatibility with {fmt} < 10: * `data_dictionary::no_such_keyspace` * `data_dictionary::no_such_column_family` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 13:26:01 +08:00
Kefu Chai	a58be49abf	utils: add fmt::formatter for utils::bad_exception_container_access before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<utils::bad_exception_container_access>` is added for backward compatibility with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Kefu Chai	0d6bff0f56	sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<T>` is added for classes derived from `malformed_sstable_exception`, where `T` is the class type derived from `malformed_sstable_exception`. this change is implemented to be backward compatible with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Kefu Chai	0609cd676f	exceptions: add fmt::formatter for classes derived from cassandra_exception before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<T>` is added for classes derived from `cassandra_exception`, where `T` is the class type derived from `cassandra_exception`. this change is implemented to be backward compatible with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Kefu Chai	f5e1f0ccc7	cdc: add fmt::formatter for cdc::no_generation_data_exception before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<cdc::no_generation_data_exception>` is added for backward compatibility with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Petr Gusev	740b240e9d	test_cdc_generation_data: test snapshot transfer The test only looked at the initial cdc_generation generation. It made the changes bigger to go past the raft max_command_size limit. It then made sure this large mutation set is saved in several raft commands. In this commit we enhance the test to check that the mutations are properly handled during snapshot transfer. The problem is that the entire system.cdc_generations_v3 table is read into the topology_snapshot and it's total size can exceed the commitlog max_record_size limit. We need a separate injection since the compaction could nullify the effects of the previous injection. The test fails without the fix from the previous commit.	2024-03-20 22:40:03 +04:00
Petr Gusev	276d58114d	storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations The group0 state machine calls merge_topology_snapshot from transfer_snapshot. It feeds it with raft_topology_snapshot returned from raft_pull_topology_snapshot. This snapshot includes the entire system.cdc_generations_v3 table. It can be huge and break the commitlog max_record_size limit. The system.cdc_generations_v3 is a single-partition table, so all the data is contained in one mutation object. To fit the commitlog limit we split this mutation into several smaller ones and apply them in separate database::apply calls. That means we give up the atomicity guarantee, but we actually don't need it for system.cdc_generations_v3. The cdc_generations_v3 data is not used in any way until it's referenced from the topology table. By applying the cdc_generations_v3 mutations before topology mutations we ensure that the lack of atomicity isn't a problem here. The database::apply method takes frozen_mutation parameter by const reference, so we need to keep them alive until all the futures are complete. fixes #17545	2024-03-20 22:40:03 +04:00
Petr Gusev	db1afa0aba	mutation: add split_mutation function The function splits the source mutation into multiple mutations so that their size does not exceed the max_size limit. The size of a mutation is calculated as the sum of the memory_usage() of its constituent mutation_fragments. The implementation is taken from view_updating_consumer. We use mutation_rebuilder_v2 to reconstruct mutations from a stream of mutation fragments and recreate the output mutation whenever we reach the limit. We'll need this function in the next commit.	2024-03-20 22:39:51 +04:00
Petr Gusev	d07e0efdd8	storage_service::merge_topology_snapshot: fix indentation It was three spaces, should be four.	2024-03-20 22:30:48 +04:00
Kefu Chai	61424b615c	test: raft: generator: add fmt::formatter:s before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * operation::either_of<Ops...> * operation::exceptional_result<Op> * operation::completion<Op> * operation::invocable<Op> and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Kefu Chai	72899f573e	test: randomized_nemesis_test: add fmt::formatter for some types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raft_call * raft_read * network_majority_grudge * reconfiguration * stop_crash * operation::thread_id * append_seq * append_entry * AppendReg::append * AppendReg::ret and drop their operator<<:s. in which, * `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}. * stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Kefu Chai	97b203b1af	test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatter for `seastar::timed_out_error`, which will be used by the `fmt::formatter` for `std::variant<...>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Kefu Chai	50637964ed	raft: add fmt::formatter for error classes before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatter for classes derived from `raft::error`. since {fmt} v10 defines the formatter for all classes derived from `std::exception`, the definition is provided only when the tree is compiled with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Pavel Emelyanov	21a5911e60	Merge 'db/virtual_tables: make token_ring_table tablet aware' from Botond Dénes The token ring table is a virtual table (`system.token_ring`), which contains the ring information for all keyspaces in the system. This is essentially an alternative to `nodetool describering`, but since it is a virtual table, it allows for all the usual filtering/aggregation/etc. that CQL supports. Up until now, this table only supported keyspaces which use vnodes. This PR adds support for tablet keyspaces. To accommodate these keyspaces a new `table_name` column is added, which is set to `ALL` for vnodes keyspaces. For tablet keyspaces, this contains the name of the table. Simple sanity tests are added for this virtual table (it had none). Fixes: #16850 Closes scylladb/scylladb#17351 * github.com:scylladb/scylladb: test/cql-pytest: test_virtual_tables: add test for token_ring table db/virtual_tables: token_ring_table: add tablet support db/virtual_tables: token_ring_table: add table_name column db/virtual_tables: token_ring_table: extract ring emit service/storage_service: describe_ring_for_table(): use topology to map hostid to ip	2024-03-20 14:05:49 +03:00
Benny Halevy	fceb1183d3	cdc: should_propose_first_generation: get my_host_id from caller There is no need to map this node's inet_address to host_id. The storage_service can easily just pass the local host_id. While at it, get the other node's host_id directly from their endpoint_state instead of looking it up yet again in the gossiper, using the nodes' address. Refs #12283 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-20 12:53:49 +02:00
Benny Halevy	37adcd3ecf	storage_service: add my_host_id Shorthand for getting this node's host_id from token_metadata.topology, similar to the `get_broadcast_address` helper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-20 12:53:49 +02:00
Mikołaj Grzebieluch	b4144d14c6	test.py: adjust the test for topology upgrade to write to and read from CDC tables In topology on raft, management of CDC generations is moved to the topology coordinator. We need to verify that the CDC keeps working correctly during the upgrade for topology on the raft. A similar change will be made in the topology recovery test. It will reuse the `start_writes_to_cdc_table` function. Ref #17409 Closes scylladb/scylladb#17828	2024-03-20 11:15:02 +01:00
Yaron Kaikov	d859067486	[action sync labels] improve pr search when labeling an issue This PR contains few fixes and improvment seen during https://github.com/scylladb/scylladb/issues/15902 label addtion When we add a label to an issue, we go through all PR. 1) Setting PR base to `master` (release PR are not relevant) 2) Since for each Issue we have only one PR, ending the search after a match was found 3) Make sure to skip PR with empty body (mainly debug one) 4) Set backport label prefix to `backport/` Closes scylladb/scylladb#17912	2024-03-20 12:14:42 +02:00
David Garcia	559dc9bb27	docs: Implement relative link support for configuration properties Introduces relative link support for individual properties listed on the configuration properties page. For instance, to link to a property from a different document, use the syntax :ref:`memtable_flush_static_shares <confprop_memtable_flush_static_shares>`. Additionally, it also adds support for linking groups. For example, :ref:`Ungrouped properties <confgroup_ungrouped_properties>`. Closes scylladb/scylladb#17753	2024-03-20 11:39:30 +02:00
Gleb Natapov	2b11842cb4	test: add test to check that address cannot expire between join request placemen and its processing	2024-03-20 11:05:31 +02:00
Kefu Chai	2479328e3b	Update seastar submodule > Revert "build: do not provide zlib as an ingredient" > Fix reference to sstring type in tutorial about concurrency in coroutines > Merge 'Adding a Metrics tester app' from Amnon Heiman > cooking.sh: do not quote backtick in here document Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17887	2024-03-20 09:18:35 +02:00
Kefu Chai	432c000dfa	./: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17888	2024-03-20 09:16:46 +02:00
Raphael S. Carvalho	6115c113fe	sstables_loader: Don't discard sstable that is not fully exhausted Affects load-and-stream for tablets only. The intention is that only this loop is responsible for detecting exhausted sstables and then discarding them for next iterations: while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) { sstable_it++; } But the loop which consumes non exhausted sstables, on behalf of each tablet, was incorrectly advancing the iterator, despite the sstable wasn't considered exhausted. Fixes #17733. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17899	2024-03-20 09:11:59 +02:00
Yaron Kaikov	0cbe5f1aa8	[action] add Fixes validation in backport PR When we open a backport PR we should make sure the patch contains a ref to the issue it suppose to fix in order to make sure we have more accurate backport information This action will only be triggered when base branch is `branch-*` If `Fixes` are missing, this action will fail and notify the author. Ref: https://github.com/scylladb/scylla-pkg/issues/3539 Closes scylladb/scylladb#17897	2024-03-20 08:55:36 +02:00
Nadav Har'El	8df2ea3f95	cql: don't crash when creating a view during a truncate The test dtest materialized_views_test.py::TestMaterializedViews:: test_mv_populating_from_existing_data_during_truncate reproduces an assertion failure, and crash, while doing a CREATE MATERIALIZED VIEW during a TRUNCATE operation. This patch fixes the crash by removing the assert() call for a view (replacing it by a warning message) - we'll explain below why this is fine. Also for base tables change we change the assertion to an on_internal_error (Refs #7871). This makes the test stop crashing Scylla, but it still fails due to issue #17635. Let's explain the crash, and the fix: The test starts TRUNCATE on table that doesn't yet have a view. truncate_table_on_all_shards() begins by disabling compaction on the table and all its views (of which there are none, at this point). At this point, the test creates a new view is on this table. The new view has, by default, compaction enabled. Later, TRUNCATE calls discard_sstables() on this new view, asserts that it has compaction disabled - and this assertion fails. The fix in this patch is to not do the assert() for views. In other words, we acknowledge that in this use case, the view will have compactions enabled while being truncated. I claim that this is "good enough", if we remember why we disable compaction in the first place: It's important to disable compaction while truncating because truncating during compaction can lead us to data resurection when the old sstable is deleted during truncation but the result of the compaction is written back. True, this can now happen in a new view (a view created DURING the truncation). But I claim that worse things can happen for this new view: Notably, we may truncate a view and then the ongoing view building (which happens in a new view) might copy data from the base to the view and only then truncate the base - ending up with an empty base and non-empty view. This problem - issue #17635 - is more likely, and more serious, than the compaction problem, so will need to be solved in a separate patch. Fixes #17543. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17634	2024-03-20 08:54:39 +02:00
Raphael S. Carvalho	d5a5005afa	sstables: Fix clone semantics for runs in partitioned_sstable_set When a sstable set is cloned, we don't want a change in cloned set propagating to the former one. It happens today with partitioned_sstable_set::_all_runs, because sets are sharing ownership of runs, which is wrong. Let's not violate clone semantics by copying all_runs when cloning. Doesn't affect data correctness as readers work directly with sstables, which are properly cloned. Can result in a crash in ICS when it is estimating pending tasks, but should be very rare in practice. Fixes #17878. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17879	2024-03-20 08:41:32 +02:00
Botond Dénes	c2425ca135	tools/scylla-nodetool: add --rest-api-port option This option is an alternative to --port\|-p and takes precedence over it. This is meant to aid the switch from the legacy nodetool to the native one. Users of the legacy nodetool pass the port of JMX to --port. We need a way to provide both the JMX port (via --port) and also the REST API port, which only the native nodetool will interpret. So we add this new --rest-api-port, which when provided, overwrites the --port\|-p option. To ensure the legacy nodeotol doesn't try to interpret this, this option can also be provided as -Dcom.scylladb.apiPort (which is substituted to --rest-api-port behind the scenes).	2024-03-20 02:11:47 -04:00
Botond Dénes	a85ec6fc60	tools/scylla-nodetool: ignore JVM args Legacy scripts and tests for nodetool, might pass JVM args like -Dcom.sun.jndi.rmiURLParsing=legacy. Ignore these, by dropping anything that starts with -D from the command line args.	2024-03-20 02:11:47 -04:00
Botond Dénes	12516b0861	tools/utils: make finding the operation command line option more flexible Currently all scylla-tools assume that the operation/command is in argv[1]. This is not very flexible, because most programs allow global options (that are not dependent on the current operation/command) to be passed before the operation name on the command line. Notably C*'s nodetool is one such program and indeed scripts and tests using nodetool do utilize this. This patch makes this more flexible. Instead of looking at argv[1], do an initial option parsing with boost::program_options to locate the operation parameter. This initial parser knows about the global options, and the operation positional argument. It allows for unrecognized positional and non-positional arguments, but only after the command. With this, any combination of global options + operation is allowed, in any order.	2024-03-20 02:11:47 -04:00
Botond Dénes	7ae98c586a	tools/utils: get_selected_operation(): remove alias param This method has a single caller, who always passes "operation". Just hard-code this into the method, no need to keep a param for it.	2024-03-20 02:11:47 -04:00
Botond Dénes	28e7eecf0b	tools: add constant with current help command-line arguments Unfortunately, we have code in scylla-nodetool.cc which needs to know what are the current help options available. Soon, there will be more code like this in tools/utils.cc, so centralize this list in a const static tool_app_template member.	2024-03-20 02:11:47 -04:00
Petr Gusev	5db6b8b3c2	error_injection: move api registration to set_server_init The set_server_done function is called only when a node is fully initialized. To allow error injection to be used during initialization we move the handler registration to set_server_init, which is called as soon as the api http server is started.	2024-03-19 20:18:29 +04:00
Petr Gusev	e4318e139d	error_injection: add inject_parameter method In this commit we extend the error_injector with a new method inject_parameter. It allows to pass parameters from tests to scylla, e.g. to lower timeouts or limits. A typical use cases is described in scylladb/scylladb#15571. It's logically the same as inject_with_handler, whose lambda reads the parameter named 'value'. The only difference is that the inject_parameter doesn't return future, it just read the parameter from the injection shared_data.	2024-03-19 20:18:23 +04:00
Petr Gusev	460567c4fd	error_injection: move injection_name string into injection_shared_data In subsequent commit we'll need the injection_name from inside injection_shared_data, so in this commit we move it there. Additionally, we fix the todo about switching the injections dictionary from map to unordered_set, now unordered_map contains string_views, pointing to injection_name inside injection_shared_data.	2024-03-19 20:17:02 +04:00
Petr Gusev	49a4220fea	error_injection: pass injection parameters at startup Injection parameters can be used in the lambda passed to inject_with_handler method to take some values from the test. However, there was no way to set values to these parameters on node startup, only through the error injection REST api. Therefore, we couldn't rely on this when inject_with_handler is used during node startup, it could trigger before we call the api from the test. In this commit with solve this problem by allowing these parameters to be assigned through scylla.yaml config. The defer.hh header was added to error_injection.hh to fix compilation after adding error_injection.hh to config.hh, defer function is used in error_injection.hh.	2024-03-19 20:17:02 +04:00
Andrei Chekun	b52f79b1ce	Fix leaking file descriptors in test.py Fixes #17569 Tests are not closing file descriptor after it finishes. This leads to inability to continue tests since the default value for opened files in Linux is 1024. Issue easy to reproduce with the next command: ``` $ ./test.py --mode debug test_native_transport --repeat 1500 ``` After fix applied all tests are passed with a next command: ``` $ ./test.py --mode debug test_native_transport --repeat 10000 ``` Closes scylladb/scylladb#17798	2024-03-19 14:59:14 +01:00
Piotr Dulikowski	70cb1dc8fe	doc: describe upgrade and recovery for raft topology Document the manual upgrade procedure that is required to enable consistent cluster management in clusters that were upgraded from an older version to ScyllaDB Open Source 6.0. This instruction is placed in previously placeholder "Enable Raft-based Topology" page which is a part of the upgrade instructions to ScyllaDB Open Source 6.0. Add references to the new description in the "Raft Consensus Algorithm in ScyllaDB" document in relevant places. Extend the "Handling Node Failures" document so that it mentions steps required during recovery of a ScyllaDB cluster running version 6.0. Fixes: scylladb/scylladb#17341 Closes scylladb/scylladb#17624	2024-03-19 14:59:14 +01:00
Gleb Natapov	fde3068530	topology_coordinator: set address map entry to nonexpiring when a node is added to the topology Currently a node's address is set to nonexpiring in the address map when the node is added to group0, but the node is added to the topology earlier (during the join request) and the cluster must be able to communicate with it (potentially) much later when the request will be processed. The patch marks nodes that are in the topology, but no yet in group0 as non expiring, so they will not be dropped from address map until their join request is processed. Fixes: scylladb/scylladb#16849	2024-03-19 13:35:19 +02:00
Gleb Natapov	9651ae875f	raft_group0: add modifiable_address_map() function Provide access to non const address_map. We will need it later.	2024-03-19 13:34:41 +02:00
Yaron Kaikov	ad76f0325e	[action] Sync labels from an Issue to linked PR After merging https://github.com/scylladb/scylladb/pull/17365, all backport labels should be added to PR (before we used to add backport labels to the issues). Adding a GitHub action which will be triggered in the following conditions only: 1) The base branch is `master` or `next` 2) Pull request events: - opened: For every new PR that someone opens, we will sync all labels from the linked issue (if available) - labeled: This role only applies to labels with the `backport/` prefix. When we add a new label for the backport we will update the relevant issue or PR to get them both to sync - unlabeled: Same as `labeled` only applies to the `backport/` prefix. When we remove a label for backport we will update the relevant issue or pr Closes scylladb/scylladb#17715	2024-03-19 09:17:07 +02:00
Avi Kivity	e48eb76f61	sstables_manager: decouple from system_keyspace sstables_manager now depends on system_keyspace for access to the system.sstables table, needed by object storage. This violates modularity, since sstables_manager is a relatively low-level leaf module while system_keyspace integrates large parts of the system (including, indirectly, sstables_manager). One area where this is grating is sstables::test_env, which has to include the much higher level cql_test_env to accommodate it. Fix this by having sstables_manager expose its dependency on system_keyspace as an interface, sstables_registry, and have system_keyspace implement the glue logic in system_keyspace_sstables_manager. Closes scylladb/scylladb#17868	2024-03-18 20:38:07 +03:00
Anna Stuchlik	a13694daea	doc: fix the image upgrade page This commit updates the Upgrade ScyllaDB Image page. - It removes the incorrect information that updating underlying OS packages is mandatory. - It adds information about the extended procedure for non-official images. Closes scylladb/scylladb#17867	2024-03-18 18:27:46 +02:00
Gleb Natapov	af218d0063	raft_group0_client: assert that hold_read_apply_mutex is called on shard 0 group0 operations a valid on shard 0 only. Assert that. We already do that in the version of the function that gets abort source. Message-ID: <ZeCti70vrd7UFNim@scylladb.com>	2024-03-18 16:20:41 +01:00
Pavel Emelyanov	a8f48e0f6b	test/boost/tablets: Use verbose BOOST_REQUIRE checkers Lot's of BOOST_REQUIRES in this test require some integers to be in some eq/gt/le relations to each other. And one place that compares rack names as strings. Using more verbose boost checkers is preferred in such cases Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17866	2024-03-18 17:09:02 +02:00
Botond Dénes	270d01f16a	Merge 'build: cmake: put server deb packages under build/dist/$<CONFIG>/debian' from Kefu Chai this change is a follow up of `ca7f7bf8e2`, which changed the output path to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh expects is `build/dist/$config/debian/.deb`, where `$config` is the normalized mode, when the debian packages are built using CMake generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`. so, `ca7f7bf8e2` made a mistake, as it does not match the expectation of `build_docker.sh`. in this change, this issue is addressed. so we use the same path in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`. Closes scylladb/scylladb#17848 github.com:scylladb/scylladb: build: cmake: add dist-* targets to the default build target build: cmake: put server deb packages under build/dist/$<CONFIG>/debian	2024-03-18 16:18:35 +02:00
Avi Kivity	72bbe75d5b	Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec This PR fixes a problem with replacing a node with tablets when RF=N. Currently, this will fail because tablet replica allocation for rebuild will not be able to find a viable destination, as the replacing node is not considered to be a candidate. It cannot be a candidate because replace rolls back on failure and we cannot roll back after tablets were migrated. The solution taken here is to not drain tablet replicas from replaced node during topology request but leave it to happen later after the replaced node is in left state and replacing node is in normal state. The replacing node waits for this draining to be complete on boot before the node is considered booted. Fixes https://github.com/scylladb/scylladb/issues/17025 Nodes in the left state will be kept in tablet replica sets for a while after node replace is done, until the new replica is rebuilt. So we need to know about those node's location (dc, rack) for two reasons: 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first. 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement. It's ok to not know the IP, and we don't keep it. Those nodes will not be present in the IP-based replica sets, e.g. those returned by get_natural_endpoints(), only in host_id-based replica sets. storage_proxy request coordination is not affected. Nodes in the left state are still not present in token ring, and not considered to be members of the ring (datacanter endpoints excludes them). In the future we could make the change even more transparent by only loading locator::node* for those nodes and keeping node* in tablet replica sets. Currently left nodes are never removed from topology, so will accumulate in memory. We could garbage-collect them from topology coordinator if a left node is absent in any replica set. That means we need a new state - left_for_real. Closes scylladb/scylladb#17388 * github.com:scylladb/scylladb: test: py: Add test for view replica pairing after replace raft, api: Add RESTful API to query current leader of a raft group test: test_tablets_removenode: Verify replacing when there is no spare node doc: topology-on-raft: Document replace behavior with tablets tablets, raft topology: Rebuild tablets after replacing node is normal tablets: load_balancer: Access node attributes via node struct tablets: load_balancer: Extract ensure_node() mv: Switch to using host_id-based replica set effective_replication_map: Introduce host_id-based get_replicas() raft topology: Keep nodes in the left state to topology tablets: Introduce read_required_hosts()	2024-03-18 16:16:08 +02:00
Kefu Chai	d1c35f943d	test: unit: add fmt::formatter for test_data in tests before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * test_data in two different tests * row_cache_stress_test::reader_id and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17861	2024-03-18 15:35:28 +02:00
Kefu Chai	de6803de92	build: cmake: use --ld-path for specifying linker for clang Clang > 12 starts to complain like ``` warning: '-fuse-ld=' taking a path is deprecated; use '--ld-path=' instead [-Wfuse-ld-path]' ``` this option is not supported by GCC yet. also instead of using the generic driver's name, use the specific name. otherwise ld fails like ``` lld is a generic driver. Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17825	2024-03-18 14:49:11 +02:00
Pavel Emelyanov	933b346166	test/tablets: Add test to check how ALTER changes RF (in one DC) For now test is incomplete in several ways 1. It xfails, until #17116 2. It doesn't rebuild/repair tablets 3. It doesn't check that tablet data actually exists on replicas refs: #17575 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17808	2024-03-18 14:47:57 +02:00
Yaron Kaikov	6406d3083c	[mergify] set draft PR when conflicts When Mergify open a backport PR and identify conflicts, it adding the `conflicts` label. Since GitHub can't identify conflicts in PR, setting a role to move PR to draft, this way we will not trigger CI Once we resolve the conflicts developer should make the PR `ready for review` (which is not draft) and then CI will be triggered `conflict` label can also be removed Closes scylladb/scylladb#17834	2024-03-18 14:45:08 +02:00
Beni Peled	bddac3279e	Skip the backport-label workflow for draft pull requests It's not necessary (and annoying) when this workflow runs and fails against PRs in draft mode Closes scylladb/scylladb#17864	2024-03-18 14:42:55 +02:00
Wojciech Mitros	efcb718e0a	mv: adjust memory tracking of single view updates within a batch Currently, when dividing memory tracked for a batch of updates we do not take into account the overhead that we have for processing every update. This patch adds the overhead for single updates and joins the memory calculation path for batches and their parts so that both use the same overhead. Fixes #17854 Closes scylladb/scylladb#17855	2024-03-18 14:31:54 +02:00
Kefu Chai	d57a82c156	build: cmake: add dist-* targets to the default build target also, add a target of `dist-server`, which mirrors the structure of the targets created by `configure.py`, and it is consistent with the ones defined by `build_submodule()`. so that they are built when our CI runs `ninja -C $build`. CI expects that all these rpm and deb packages to built when `ninja -C $build` finishes. so that it can continue with building the container image. let's make it happen. so that the CMake-based rules can work better with CI. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-18 20:02:43 +08:00
Raphael S. Carvalho	2c9b13d2d1	compaction: Check for key presence in memtable when calculating max purgeable timestamp It was observed that some use cases might append old data constantly to memtable, blocking GC of expired tombstones. That's because timestamp of memtable is unconditionally used for calculating max purgeable, even when the memtable doesn't contain the key of the tombstone we're trying to GC. The idea is to treat memtable as we treat L0 sstables, i.e. it will only prevent GC if it contains data that is possibly shadowed by the expired tombstone (after checking for key presence and timestamp). Memtable will usually have a small subset of keys in largest tier, so after this change, a large fraction of keys containing expired tombstones can be GCed when memtable contains old data. Fixes #17599. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17835	2024-03-18 13:37:44 +02:00
Benny Halevy	2c0b1d1fa7	compaction: get_max_purgeable_timestamp: optimize sstable filtering by min_timestamp There is no point in checking `sst->filter_has_key(*hk)` if the sstable contains no data older than the running minimum timestamp, since even if it matches, it won't change the minimum. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17839	2024-03-18 13:26:49 +02:00
Avi Kivity	ed211cd0bf	sstables: partition_index_cache: reindent Fix up after `e120ba3514`. Closes scylladb/scylladb#17847	2024-03-18 13:23:21 +02:00
Andrei Chekun	b6edf056ea	Add sanity tests for multi dc Fix writing cassandra-rackdc.properties with correct format data instead of yaml Add a parameter to overwrite RF for specific DC Add the possibility to connect cql to the specific node In this PR 4 tests were added to test multi-DC functionality. One is added from initial commit were multi-DC possibility were introduced, however, this test was not commited. Three of them are migrations from dtest, that later will be deleted. To be able to execute migrated tests additional functionality is added: the ability to connect cql to the specific node in the cluster instead of pooled connection and the possibility to overwrite the replication factor for the specific DC. To be able to use the multi DC in test.py issue with the incorrect format of the properties file fixed in this PR. Closes scylladb/scylladb#17503	2024-03-18 13:00:36 +02:00
Nadav Har'El	680e37c4af	Merge 'schema_tables: unfreeze frozen_mutation:s gently' from Avi Kivity With large schemas, unfreezing can stall, especially as it requires a lot of memory. Switch to a gentle version that will not stall. As a preparation step, we add unfreeze_gently() for a span of mutations. Fixes #17841 Closes scylladb/scylladb#17842 * github.com:scylladb/scylladb: schema_tables: unfreeze frozen_mutation:s gently frozen_mutation: add unfreeze_gently(span<frozen_mutation>)	2024-03-18 12:56:44 +02:00
Kefu Chai	fe28aac440	test/perf: add fmt::formatter for perf_result_with_aio_writes before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `perf_result_with_aio_writes`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17849	2024-03-18 12:53:39 +02:00
Botond Dénes	a4e8bea679	tools/scylla-nodetool: status: handle missing host_id Newly joining nodes may not have a host id yet. Handle this and print a "?" for these nodes, instead of the host-id. Extend the existing test for joining node case (also rename it and add comment). Closes scylladb/scylladb#17853	2024-03-18 12:26:59 +02:00
Kefu Chai	384e9e9c7c	build: cmake: put server deb packages under build/dist/$<CONFIG>/debian this change is a follow up of `ca7f7bf8e2`, which changed the output path to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh expects is `build/dist/$config/debian/*.deb`, where `$config` is the normalized mode, when the debian packages are built using CMake generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`. so, `ca7f7bf8e2` made a mistake, as it does not match the expectation of `build_docker.sh`. in this change, this issue is addressed. so we use the same path in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`. apply the same change to `dist-server-rpm`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-18 14:21:39 +08:00
Avi Kivity	731b5c5120	schema_tables: unfreeze frozen_mutation:s gently With large schemas, unfreezing can stall, especially as it requires a lot of memory. Switch to a gentle version that will not stall.	2024-03-17 17:46:02 +02:00
Avi Kivity	a34edb0a93	frozen_mutation: add unfreeze_gently(span<frozen_mutation>) While we have unfreeze(vector<frozen_mutation>), a gentle version is preferred.	2024-03-17 17:45:30 +02:00
Kefu Chai	8811900602	build: cmake: do not link randomized_nemesis_test with replication.cc test/raft/replication.cc defines a symbol named `tlogger`, while test/raft/randomized_nemesis_test.cc also defines a symbol with the same name. when linking the test with mold, it identified the ODR violation. in this change, we extract test-raft-helper out, so that randomized_nemesis_test can selectively only link against this library. this also matches with the behavior of the rules generated by `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17836	2024-03-17 17:01:47 +02:00
Kefu Chai	e1ae36ecfd	test/boost: add formatter for BOOST_REQUIRE_EQUAL in gossiping_property_file_snitch_test, we use `BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])` to check the equality of two instances of `pair<sstring, sstring`, like: ```c++ BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0]) ``` since the standard library does not provide the formatter for printing `std::pair<>`, we rely on the homebrew generic formatter to print `std::pair<>, which in turn uses operator<< to format the elements in the `pair`, but we intend to remove this formatter in future, as the last step of #13245 . so in order to enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL` check fails, we are adding `boost_test_print_type()` for `pair<sstring,sstring>`. the helper function uses {fmt} to print the `pair<>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17831	2024-03-17 16:58:39 +02:00
Kefu Chai	6244a2ae00	service:qos: add fmt::formatter for service_level_options::workload_type this change prepares for the fmt::formatter based formatter used by tests, which will use {fmt} to print the elements in a container, so we need to define the formatter using fmt::formatter for these element. the operator<< for service_level_options::workload_type is preserved, as the tests are still using it. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17837	2024-03-17 16:52:57 +02:00
Kefu Chai	7df3acd39c	repair: add fmt::formatter for row_level_diff_detect_algorithm before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for row_level_diff_detect_algorithm. please note, we already have `format_as()` overload for this type, but we cannot use it as a fallback of the proper `fmt::formatter<>` specialization before {fmt} v10. so before we update our CI to a distro with {fmt} v10, `fmt::formatter<row_level_diff_detect_algorithm>` is still needed. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17824	2024-03-16 19:12:49 +02:00
Botond Dénes	03c47bc30b	tools/scylla-nodetool: status: handle nodes without load Some nodes may not have a load yet. Handle this. Also add a test covering this case. Closes scylladb/scylladb#17823	2024-03-16 17:38:53 +02:00
Pavel Emelyanov	42a2dce4b6	test/lib: Eliminate variadic futures from template The assert_that_failed(future) pair of helpers are templates with variadic futures, but since they are gone in seastar, so should they in test/lib Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17830	2024-03-16 17:37:25 +02:00
Kefu Chai	8bab51733f	db: add fmt::formatter for db::functions::function before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::functions::function`. please note, because we use `std::ostream` as the parameter of the polymorphism implementation of `function::print()`. without an intrusive change, we have to use `fmt::ostream_formatter` or at least use similar technique to format the `function` instance into an instance of `ostream` first. so instead of implementing a "native" `fmt::formatter`, in this change, we just use `fmt::ostream_formatter`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17832	2024-03-16 17:36:49 +02:00
Kefu Chai	23e9958ebb	data_dictionary: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17826	2024-03-15 21:17:11 +03:00
Botond Dénes	ad9bad4700	tools/scylla-nodetool: {proxy,table}histograms: handle empty histograms Empty histograms are missing some of the members that non-empty histograms have. The code handling these histograms assumed all required members are always present and thus error out when receiving an empty histogram. Add tests for empty histograms and fix the code handling them to check for the potentially missing members, instead of making assumptions. Closes scylladb/scylladb#17816	2024-03-15 15:59:31 +03:00
Tomasz Grabiec	a233a699cc	test: py: Add test for view replica pairing after replace	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	6d50e93f10	raft, api: Add RESTful API to query current leader of a raft group Example: $ curl -X GET "http://127.0.0.1:10000/raft/leader_host" "f7f57588-62de-4cac-9e4b-c62bfc458d91" Accepts optional group_id param, defaults to group0.	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	6d24fdee75	test: test_tablets_removenode: Verify replacing when there is no spare node The test is changed to be more strict. Verifies the case of replacing when RF=N in which case tablet replicas have to be rebuilt using the replacing node. This would fail if tablets are drained as part of replace operation, since replacing node is not yet a viable target for tablet migration.	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	1d01b4ca20	doc: topology-on-raft: Document replace behavior with tablets	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	1c71f44e63	tablets, raft topology: Rebuild tablets after replacing node is normal This fixes a problem with replacing a node with tablets when RF=N. Currently, this will fail because new tablet replica allocation will not be able to find a viable destination, as the replacing node is not considered a candidate. It cannot be a candidate because replace rolls back on failure and we cannot roll back after tablets were migrated. The solution taken here is to not drain tablet replicas from replaced node during topology request but leave it to happen later after the replaced node is left and replacing node is normal. The replacing node waits for this draining to be complete on boot before the node is considered booted. Fixes #17025	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	b2418fab39	tablets: load_balancer: Access node attributes via node struct Reduces lookups into topology and decouples the algorithm more from the topology object.	2024-03-15 11:22:34 +01:00
Tomasz Grabiec	9090050244	tablets: load_balancer: Extract ensure_node() Will be called in another loop to populate the "nodes" map with left node.	2024-03-15 11:22:32 +01:00
Artsiom Mishuta	73ed4c0eb5	test.py: fix aiohttp usage issue in python 3.12 Fix aiohttp usage issue in python 3.12: "Timeout context manager should be used inside a task" This occurs due to UnixRESTClient created in one event loop (created inside pytest) but used in another (created in rewriten event_loop fixture), now it is fixed by updating UnixRESTClient object for every new loop. Closes scylladb/scylladb#17760	2024-03-15 11:17:29 +01:00
Tomasz Grabiec	9b656ec2aa	mv: Switch to using host_id-based replica set This is necessary to not break replica pairing between base and view. After replacing a node, tablet replica set contains for a while the replaced node which is in the left state. This node is not returned by the IP-based get_natural_endpoints() so the replica indexes would shift, changing the pairing with the view. The host_id-based replica set always has stable indexes for replicas.	2024-03-15 11:05:29 +01:00
Tomasz Grabiec	888dc41d66	effective_replication_map: Introduce host_id-based get_replicas()	2024-03-15 11:05:29 +01:00
Tomasz Grabiec	61b3453552	raft topology: Keep nodes in the left state to topology Those nodes will be kept in tablet replica sets for a while after node replace is done, until the new replica is rebuilt. So we need to know about those node's location (dc, rack) for two reasons: 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first. 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement. It's ok to not know the IP, and we don't keep it. Those nodes will not be present in the IP-based replica sets, e.g. those returned by get_natural_endpoints(), only in host_id-based replica sets. storage_proxy request coordination is not affected. Nodes in the left state are still not present in token ring, and not considered to be members of the ring (datacanter endpoints excludes them). In the future we could make the change even more transparent by only loading locator::node* for those nodes and keeping node* in tablet replica sets. We load topology infromation only for left nodes which are actually referenced by any tablet. To achieve that, topology loading code queries system.tablet for the set of hosts. This set is then passed to system.topology loading method which decides whether to load replica_state for a left node or not.	2024-03-15 11:05:29 +01:00
Tomasz Grabiec	f7851696fa	tablets: Introduce read_required_hosts() Will be used by topology loading code to determine which hosts are needed in topology, even if they're in the left state. We want to load only left nodes if they are referenced by any tablet, which may happen temporarily until the replacement replica is rebuilt.	2024-03-15 11:05:29 +01:00
Botond Dénes	598e5aebfb	test/cql-pytest: test_virtual_tables: add test for token_ring table Just a simple sanity test for both vnodes and tablets.	2024-03-15 04:23:20 -04:00
Botond Dénes	279e496133	db/virtual_tables: token_ring_table: add tablet support For keyspaces which use tablets, we describe each table separately.	2024-03-15 04:23:20 -04:00
Botond Dénes	61b6ac7ffe	db/virtual_tables: token_ring_table: add table_name column As the first clustering column. For vnode keyspaces, this will always be "ALL", for tablet keyspaces, this will contain the name of the described table.	2024-03-15 04:23:20 -04:00
Botond Dénes	fdef62c232	db/virtual_tables: token_ring_table: extract ring emit Into a separate method. For vnodes there is a single ring per keyspace, but for tablets, there is a separate ring for each table in the keyspace. To accomodate both, we move the code emitting the ring into a separate method, so execute() can just call it once per keyspace or once per table, whichever appropriate.	2024-03-15 04:23:20 -04:00
Botond Dénes	a205752513	service/storage_service: describe_ring_for_table(): use topology to map hostid to ip Do no use the internal host2ip() method. This relies on `_group0`, which is only set on shard 0. Consequently, any call to this method, coming from a shard other than shard 0, would crash ScyllaDB, as it dereferences a nullptr.	2024-03-15 04:23:20 -04:00
Nadav Har'El	6cdb68f094	test/cql-pytest: remove unused function Remove an unused function from test/cql-pytest/test_using_timeout.py. Some linters can complain that this function used re.compile(), but the "re" package was never imported. Since this function isn't used, the right fix is to remove it - and not add the missing import. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17801	2024-03-15 09:56:30 +02:00
Kefu Chai	e1a9340cc1	partition_version: add fmt::formatter for partition_entry::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `parition_entry::printer`, and drop its operator<< . Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17812	2024-03-15 09:52:27 +02:00
Kefu Chai	a0625261ef	build: cmake: reword the comment for dev-headers before this change, the comment was difficult to parse. let's update it for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17814	2024-03-15 09:51:47 +02:00
Kefu Chai	640d573106	schema_mutations: add fmt::formatter for schema_mutations before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `schema_mutations`, and drop its operator<< . Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17815	2024-03-15 09:49:56 +02:00
Kefu Chai	3edd530bd1	test/boost: add formatter for BOOST_REQUIRE_EQUAL before this change, we rely on the homebrew generic formatter to print unordered_set<>, which in turn uses operator<< to format the elements in the `unordered_set`, but we intend to remove this formatter in future, as the last step of #13245 . so enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL` check fails, we are adding `boost_test_print_type()` for `unordered_set<fruit>`. the helper function uses {fmt} to print the `unordered_set<>`, so we are adding a fmt::formatter for `fruit`, the operator<< for this type is dropped, as it is not used anymore. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17813	2024-03-15 09:40:22 +02:00
Benny Halevy	530d270828	api: /storage_service/tablets/balancing: fix incorrect operation summary It was probably copy-pasted from /storage_service/tablets/move Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17811	2024-03-14 22:52:57 +01:00
Tomasz Grabiec	8c5d088928	Merge 'Drop tablets of dropped views and indices' from Benny Halevy This series adds notification before dropping views and indices so that the tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets. Additional unit tests were added for these cases. Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path. This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series. Fixes #17627 Closes scylladb/scylladb#17773 * github.com:scylladb/scylladb: migration_manager: notify before_drop_column_family when dropping indices schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices migration_manager: notify before_drop_column_family before dropping views cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table tablet_allocator: on_before_drop_column_family: remove unused result variable	2024-03-14 22:52:29 +01:00
Raphael S. Carvalho	c46c2d436f	sstables: Reduce cost for loading sstables with tablets Loader was changed to quickly determine ownership after consuming sharding metadata only. If it's not available, it falls back to reading first and last keys from summary. The fallback is only there for backward compatibility and it costs a lot more as we don't skip to the end where keys are located in summary. With tablets, sharding metadata is only first and last keys so we can do it without sharder. So loader will be able to use it instead of looking up keys in summary. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17805	2024-03-14 21:06:35 +01:00
Pavel Emelyanov	8ffb5f27c7	topology_coordinator: Clear tablet transition session after streaming When jumping from streaming stage into cleanup_target, session must also be cleared as pending replica may still process some incoming mutations blocked in the pipeline. Deleting session prior to executing barrier makes sure those mutations will not be applied. fixes: #17682 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17800	2024-03-14 20:35:00 +01:00
Pavel Emelyanov	6a77f36519	doc: Add tablets migration state diagram Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17790	2024-03-14 20:29:21 +01:00
Benny Halevy	5bfca73b30	migration_manager: notify before_drop_column_family when dropping indices Fixes #17627 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:19:12 +02:00
Benny Halevy	9cf6a2e510	schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices When dropping indices, we don't need to go through `create_view_for_index` in order to drop the index. That actually creates a new schema for this view which is used just for its metadata for generating mutations dropping it. Instead, use `find_schema` to lookup the current schema for the dropped index. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:19:11 +02:00
Benny Halevy	358e92e645	migration_manager: notify before_drop_column_family before dropping views Call the before_drop_column_family notifications before dropping the views to allow the tablet_allocator to delete the view's tablets. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:14:56 +02:00
Avi Kivity	5e28bf9b5c	Merge 'Do not try to balance tablets on nodes which are known to be down' from Pavel Emelyanov Tablet transition would get stuck anyway for such nodes, so it's not worth trying refs: #16372 (not fixes, because there's also repair transitions with same problem) Closes scylladb/scylladb#17796 * github.com:scylladb/scylladb: topology_coordinator: Skip dead nodes when balancing tablets test: Add test for load_balancer skiplist tablet_allocator: Add skiplist to load_balancer	2024-03-14 18:47:51 +02:00
Avi Kivity	0f188f2d9f	Merge 'tools/scylla-nodetool: implement the status command' from Botond Dénes The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#17547 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the status command test/nodetool: rest_api_mock.py: match requests out-of-order test/nodetool: rest_api_mock.py: remove trailing / from request paths test/nodetool: rest_api_mock.py: use static routes test/nodetool: check only non-exhausted requests tools/scylla-nodetool: repair: set the jobThreads request parameter	2024-03-14 18:42:54 +02:00
Kamil Braun	5ef47c42b3	Merge 'remove_rpc_client_with_ignored_topology: recreate rpc client earlier' from Petr Gusev It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout. In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`. From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters. Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster. fixes scylladb/scylladb#17445 Closes scylladb/scylladb#17757 * github.com:scylladb/scylladb: test_remove_rpc_client_with_pending_requests: add a regression test remove_rpc_client_with_ignored_topology: call it earlier storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined	2024-03-14 17:20:59 +01:00
Yaniv Kaul	a2ac80340f	Typo: pint -> print Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#17804	2024-03-14 15:50:35 +02:00
Wojciech Mitros	59d5bfa742	mv: fail base writes instead of dropping view updates when overloaded Since `4c767c379c` we can reach a situation where we know that we have admitted too many expensive view update operations and the mechanism of dropping the following view updates can be triggerred in a wider range of scenarios. Ideally, we would want to fail whole requests on the coordinator level, but for now, we change the behavior to failing just the base writes. This allows us to avoid creating inconsistencies between base replicas and views at the cost of introducing inconsistencies between different base replicas. This, however, can be fixed by repair, in contrast to base-view inconsistencies which we don't have a good method of fixing. Fixes #17795 Closes scylladb/scylladb#17777	2024-03-14 15:11:45 +02:00
Aleksandra Martyniuk	43ef6e6ab9	test: fix regular compaction tasks check Since `6b87778` regular compaction tasks are removed from task manager immediately after they are finished. test_regular_compaction_task lists compaction tasks and then requests their statuses. Only one regular compaction task is guaranteed to still be running at that time, the rest of them may finish before their status is requested and so it will no longer be in task manager, causing the test to fail. Fix statuses check to consider the possibility of a regular compaction task being removed from task manager. Fixes: #17776. Closes scylladb/scylladb#17784	2024-03-14 14:40:18 +02:00
Piotr Smaron	ad2d039e3d	db: move all group 0 tables to schema commitlog This is to have durability for the group0 tables. But also because I need it specifially to make `system.topology` & `system_schema.scylla_keyspaces` mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723 Fixes: #15596 Closes scylladb/scylladb#17783	2024-03-14 13:33:30 +01:00
Piotr Dulikowski	2d9e78b09a	gossiper: failure detector: don't handle directly removed live endpoints Commit `0665d9c346` changed the gossiper failure detector in the following way: when live endpoints change and per-node failure detectors finish their loops, the main failure detector calls gossiper::convict for those nodes which were alive when the current iteration of the main FD started but now are not. This was changed in order to make sure that nodes are marked as down, because some other code in gossiper could concurrently remove nodes from the live node lists without marking them properly. This was committed around 3 years ago and the situation changed: - After `75d1dd3a76` the `endpoint_state::_is_alive` field was removed and liveness of a node is solely determined by its presence in the `gossiper::_live_endpoints` field. - Currently, all gossiper code which modifies `_live_endpoints` takes care to trigger relevant callback. The only function which modifies the field but does not trigger notifications is `gossiper::evict_from_membership`, but it is either called after `gossiper::remove_endpoint` which triggers callbacks by itself, or when a node is already dead and there is no need to trigger callbacks. So, it looks like the reasons it was introduced for are not relevant anymore. What's more important though is that it is involved in a bug described in scylladb/scylladb#17515. In short, the following sequence of events may happen: 1. Failure detector for some remote node X decides that it was dead long enough and `convict`s it, causing live endpoints to be updated. 2. The gossiper main loop sends a successful echo to X and decides to mark it as alive. 3. At the same time, failure detector for all nodes other than X finish and main failure detector continues; it notices that node X is not alive (because it was convicted in point 1.) and decides to convict it. 4. Actions planned in 2 and 3 run one after another, i.e. node is first marked as alive and then immediately as dead. This causes `on_alive` callbacks to run first and then `on_dead`. The second one is problematic as it closes RPC connections to node X - in particular, if X is in the process of replacing another node with the same IP then it may cause the replace operation to fail. In order to simplify the code and fix the bug - remove the piece of logic in question. Fixes: scylladb/scylladb#17515 Closes scylladb/scylladb#17754	2024-03-14 13:29:17 +01:00
Botond Dénes	d6103dc1b6	tools/scylla-nodetool: snapshot: handle ks.tbl positional args correctly Nodetool currently assumes that positional arguments are only keyspaces. ks.tbl pairs are only provided when --kt-list or friends are used. This is not the case however. So check positional args too, and if they look like ks.tbl, handle them accordingly. While at it, also make sure that alternator keyspace and tables names are handled correctly. Closes scylladb/scylladb#17480	2024-03-14 13:42:23 +02:00
Avi Kivity	dd76e1c834	Merge 'Simplify error_injection::inject_with_handler()' from Pavel Emelyanov The method in question can have a shorter name that matches all other injections in this class, and can be non-template Closes scylladb/scylladb#17734 * github.com:scylladb/scylladb: error_injection: De-template inject() with handler error_injection: Overload inject() instead of inject_with_handler()	2024-03-14 13:37:54 +02:00
Petr Gusev	2783985bb2	test_remove_rpc_client_with_pending_requests: add a regression test This test reproduces the problem from scylladb/scylladb#17445. It fails quite reliably without the fix from the previous commit. The test just bootstraps a new node while bombarding the cluster with read requests.	2024-03-14 15:17:34 +04:00
Petr Gusev	398e14d6d0	remove_rpc_client_with_ignored_topology: call it earlier In this commit we move the remove_rpc_client_with_ignored_topology call to the earliest point possible - when a node first appears in token_metadata.topology. From the topology coordinator perspective this happens when a joining node moves to node_state::bootstrapping and the topology moves to transition_state::join_group0. In sync_raft_topology_nodes the node should be contained in transition_nodes. The successful wait_for_ip before entering transition_state::join_group0 ensures that update_topology should find a node's IP and put it into the topology. The barrier in commit_cdc_generation will ensure that all nodes in the cluster are using the proper connection parameters. Only outgoing connections are tracked by remove_rpc_client_with_ignored_topology, those created by the current node. This means we need to call remove_rpc_client_with_ignored_topology on each node of the cluster. fixes scylladb/scylladb#17445	2024-03-14 15:10:09 +04:00
Petr Gusev	1b9f21314f	storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined It's too late to call remove_rpc_client_with_ignored_topology on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to write_both_read_new. The remove_rpc_client_with_ignored_topology function shutdowns sockets and causes such requests to timeout. We intend to call remove_rpc_client_with_ignored_topology as soon as a node becomes part of token_metadata topology. In this preparatory commit we refactor storage_service::notify_joined. We remove the remove_rpc_client_with_ignored_topology call from it call it separately from the two call sites of notify_joined.	2024-03-14 15:10:09 +04:00
Kefu Chai	ce17841860	tools/scylla-nodetool: print bpo::options_description with fmt::streamed before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, since boost::program_options::options_description is defined by boost.program_options library, and it only provides the operator<< overload. we're inclined to not specializing `fmt::formatter` for it at this moment, because * this class is not in defined by scylla project. we would have to find a home for this formatter. * we are not likely to reuse the formatter in multiple places so, in this change we just print it using `fmt::streamed`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17791	2024-03-14 10:44:32 +02:00
Pavel Emelyanov	33d258528e	topology_coordinator: Skip dead nodes when balancing tablets The coordinator can find out which nodes are marked as DOWN, thus when calling tablets balancer it can feed it a skiplist Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-14 10:51:11 +03:00
Pavel Emelyanov	ee55e8442a	test: Add test for load_balancer skiplist The test is inspired by the test_load_balancing_with_empty_node one and verifies that when a node is skiplisted, balancer doesn't put load on it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-14 10:50:21 +03:00
Pavel Emelyanov	b4dd732dab	tablet_allocator: Add skiplist to load_balancer Currently load balancer skips nodes only based on its "administrative" state, i.e. whether it's drained/decommissioned/removed/etc. There's no way to exclude any node from balancing decision based on anything else. This patch add this ability by adding skiplist argument to balance_tablets() method. When a node is in it, it will not be considered, as if it was removenode-d. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-14 10:47:31 +03:00
Kefu Chai	926fe29ebd	db: commitlog: add fmt::formatter for commitlog types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * db::commitlog::segment::cf_mark * db::commitlog::segment_manager::named_file * db::commitlog::segment_manager::dispose_mode * db::commitlog::segment_manager::byte_flow<T> please note, the formatter of `db::commitlog::segment` is not included in this commit, as we are formatting it in the inline definition of this class. so we cannot define the specialization of `fmt::formatter` for this class before its callers -- we'd either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`. either way, it's different from the theme of this commit, and we will handle it in a separated commit. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17792	2024-03-14 09:28:12 +02:00
Botond Dénes	20d5c536b5	tools/scylla-nodetool: implement the status command Contrary to Origin, the single-token case is not discriminated in the native implementation, for two reasons: * ScyllaDB doesn't ever run with a single token, it is even moving away from vnodes. * Origin implemented the logic to detect single-token with a mistake: it compares the number of tokens to the number of DCs, not the number of nodes. Another difference is that the native implementation doesn't request ownership information when a keyspace argument was not provided -- it is not printed anyway.	2024-03-14 03:27:04 -04:00
Botond Dénes	2d4f4cfad4	test/nodetool: rest_api_mock.py: match requests out-of-order In the previous patch, we made matching requests to different endpoints be matched out-of-order. In this patch we go one step further and make matching requests to the same endpoint match out-of-order too. With this, tests can register the expected requests in any order, not in the same order as the nodetool-under-test is expected to send them. This makes testing more flexible. Also, how requests are ordered is not interesting from the correctness' POV anyway.	2024-03-14 03:27:04 -04:00
Botond Dénes	09a27f49ea	test/nodetool: rest_api_mock.py: remove trailing / from request paths The legacy nodetool likes to append an "/" to the requests paths every now and then, but not consistently. Unfortunately, request path matching in the mock rest server and in aiohttp is quite sensitive to this currently. Reduce friction by removing trailing "/" from paths in the mock api, allowing paths to match each other even if one has a trailing "/" but the other doesn't. Unfortunately there is nothing we can do about the aiohttp part, so some API endpoints have to be registered with a trailing "/".	2024-03-14 03:27:04 -04:00
Botond Dénes	5659f23b2a	test/nodetool: rest_api_mock.py: use static routes The mock server currently provides its own router to the aiohttp.web app. The ability to provide custom routers however is deprecated and can be removed at any point. So refactor the mock server to use the built-in router. This requires some changes, because the built-in router does not allow adding/removing routes once the server starts. However the mock server only learns of the used routes when the tests run. This unfortunately means that we have to statically register all possible routes the tests will use. Fortunately, aiohttp has variable route support (templated routes) and with this, we can get away with just 9 statically registered routes, which is not too bad. A (desired) side-effect of this refactoring is that now requests to different routes do not have to arrive in order. This constraint of the previous implementation proved to be not useful, and even made writing certain tests awkward.	2024-03-14 03:27:04 -04:00
Botond Dénes	061bd89957	test/nodetool: check only non-exhausted requests Refactor how the tests check for expected requests which were never invoked. At the end of every test, the nodetool fixture requests all unconsumed expected requests from the rest_api_mock.py and checks that there is none. This mechanism has some interaction with requests which have a "multiple" set: rest_api_mock.py allows registering requests with different "multiple" requirements -- how many times a request is expected to be invoked: * ANY: [0, +inf) * ONE: 1 * MULTIPLE: [1, +inf) Requests are stored in a stack. When a request arrives, we pop off requests from the top until we find a perfect match. We pop off requests, iff: multiple == ANY \|\| multiple == MULTIPLE and was hit at least once. This works as long as we don't have an multiple=ANY request at the bottom of the stack which is never invoked. Or a multiple=MULTIPLE one. This will get worse once we refactor requests to be not stored in a stack. So in this patch, we filter requests when collecting unexhausted ones, dropping those which would be qualified to be popped from the stack.	2024-03-14 03:27:04 -04:00
Botond Dénes	be5a18c07d	tools/scylla-nodetool: repair: set the jobThreads request parameter Although ScyllaDB ignores this request parameter, the Java nodetools sets it, so it is better to have the native one do the same for symmetry. It makes testing easier. Discovered with the more strict request matching introduced in the next patches.	2024-03-14 03:26:13 -04:00
Benny Halevy	b4245bf46e	cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 09:01:30 +02:00
Asias He	9d41fb9bcd	repair: Add hosts and ignore_nodes option support for tablet repair It is not supported currently. If a user passes the option, the request will be rejected with: The hosts option is not supported for tablet repair The ignore_nodes option is not supported for tablet repair This option is useful to select nodes to repair. Fixes: #17742 Tests: repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes_errors repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_dc_host Closes scylladb/scylladb#17767	2024-03-14 08:40:30 +02:00
Benny Halevy	b73aaee5e4	tablet_allocator: on_before_drop_column_family: remove unused result variable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 08:34:02 +02:00
Avi Kivity	c1d8a1dda5	Merge 'Fix false-positive errors in scrub validate-mode' from Botond Dénes The new MX-native validator, which validates the index in tandem with the data file, was discovered to print false-positive errors, related to range-tombstones and promoted-index positions. This series fixes that. But first, it refactors the scrub-related tests. These are currently dominated by boiler-plate code. They are hard to read and hard to write. In the first half of the series, a new `scrub_test` is introduced, which moves all the boiler-plate to a central place, allowing the tests to focus on just the aspect of scrub that is tested. Then, all the found bugs in validate are fixed and finally a new test, checking validate with valid sstable is introduced. Fixes: #16326 Closes scylladb/scylladb#16327 * github.com:scylladb/scylladb: test/boost/sstable_compaction_test: add validation test with valid sstable sstablex/mx/reader: validate(): print trace message when finishing the PI block sstablex/mx/reader: validate(): make index-data PI position check message consistent sstablex/mx/reader: validate(): only load the next PI block if current is exhausted sstablex/mx/reader: validate(): reset the current PI block on partition-start sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked sstablex/mx/reader: validate(): fix validator for range tombstone end bounds test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper test/boost/sstable_compaction_test: fix indentation test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test test/boost/sstable_compaction_test: introduce scrub_test_framework test/lib/random_schema: add uncompatible_timestamp_generator()	2024-03-13 20:51:30 +02:00
Kefu Chai	15bea069a9	docs: use less slangy language this is a follow-up change of `1519904fb9`, to incorporate the comment from Anna Stuchlik. Signed-off-by: Anna Stuchlik <anna.stuchlik@scylladb.com> Closes scylladb/scylladb#17671	2024-03-13 13:33:37 +02:00
Avi Kivity	4db4b2279c	Merge 'tools/scylla-nodetool: implement the last batch of commands' from Botond Dénes This PR implements the following new nodetool commands: * netstats * tablehistograms/cfhistograms * proxyhistograms All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#17651 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the proxyhistograms command tools/scylla-nodetool: implement the tableshistograms command tools/scylla-nodetool: introduce buffer_samples utils/estimated_histogram: estimated_histogram: add constructor taking buckets tools/scylla-nodetool: implement the netstats command tools/scylla-nodetool: add correct units to file_size_printer	2024-03-13 12:46:11 +02:00
Avi Kivity	e120ba3514	sstables: partition_index_cache: evict entries within a page gently When the partition_index_cache is evicted, we yield for preemption between pages, but not within a page. Commit `3b2890e1db` ("sstables: Switch index_list to chunked_vector to avoid large allocations") recognized that index pages can be large enough to overflow a 128k alignment block (this was before the index cache and index entries were not stored in LSA then). However, it did not go as far as to gently free individual entries; either the problem was not recognized or wasn't as bad. As the referenced issue shows, a fairly large stall can happen when freeing the page. The workload had a large number of tombstones, so index selectivity was poor. Fix by evicting individual rows gently. The fix ignores the case where rows are still references: it is unlikely that all index pages will be referenced, and in any case skipping over a referenced page takes an insignificant amount of time, compared to freeing a page. Fixes #17605 Closes scylladb/scylladb#17606	2024-03-13 10:44:37 +01:00
Marcin Maliszkiewicz	7b60752e47	test: fix cql connection problem in test_auth_raft_command_split This is a speculative fix as the problem is observed only on CI. When run_async is called right after driver_connect and get_cql it fails with ConnectionException('Host has been marked down or removed'). If the approach proves to be succesfull we can start to deprecate base get_cql in favor of get_ready_cql. It's better to have robust testing helper libraries than try to take care of it in every test case separately. Fixes #17713 Closes scylladb/scylladb#17772	2024-03-13 10:36:51 +01:00
Pavel Emelyanov	4d83a8c12c	topology_coordinator: Mark constant class methods with const Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17756	2024-03-13 10:23:39 +02:00
Pavel Emelyanov	2e982df898	test/tablets: Generalize repair history loading Two repair test cases verify that repair generated enough rows in the history table. Both use identical code for that, worth generalizing Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17761	2024-03-13 10:22:57 +02:00
Pavel Emelyanov	88a40b0dfa	uuid: UUID_gen::get_UUID src argument is const pointer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17762	2024-03-13 10:21:25 +02:00
Botond Dénes	53e3325845	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * mutation_partition_v2::printer * frozen_mutation::printer * mutation their operator<<:s are dropped. Refs #13245 Closes scylladb/scylladb#17769 * github.com:scylladb/scylladb: mutation: add fmt::formatter for mutation mutation: add fmt::formatter for frozen_mutation::printer mutation: add fmt::formatter for mutation_partition_v2::printer	2024-03-13 10:13:09 +02:00
Pavel Emelyanov	488404e080	gms: Remove unused i_failure_detection_event_listener Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17765	2024-03-13 09:33:56 +02:00
Kefu Chai	fb4f48b4ed	schema: add fmt::formatter for schema before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * column_definition * column_mapping * ordinal_column_id * raw_view_info * schema * view_ptr their operator<<:s are dropped. but operator<< for schema is preserved, as we are still printing `seastar::lw_shared_ptr<const schema>` with our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which uses operator<< to print the pointee. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17768	2024-03-13 09:29:00 +02:00
Kefu Chai	85c4034495	.git: skip redis/lolwut.cc when scanning spelling errors codespell reports "Nees" should be "Needs" but "Nees" is the last name of Georg Nees. so it is not a misspelling. can should not be fixed. since the purpose of lolwut.cc is to display Redis version and print a generative computer art. the one included by our version was created by Georg Nees. since the LOLWUT command does not contain business logic connected with scylladb, we don't lose a lot if skip it when scanning for spelling errors. so, in this change, let's skip it, this should silence one more warning from the github codespell workflow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17770	2024-03-13 09:25:58 +02:00
Michał Chojnowski	75864e18a2	open-coredump.sh: respect http redirects downloads.scylladb.com recently started redirecting from http to https (via `301 Moved Permanently`). This broke package downloading in open-coredump.sh. To fix this, we have to instruct curl to follow redirects. Closes scylladb/scylladb#17759	2024-03-13 08:57:04 +02:00
Pavel Emelyanov	d90db016bf	treewide: Use partition_slice::is_reversed() Continuation of `cc56a971e8`, more noisy places detected Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17763	2024-03-13 08:52:46 +02:00
Botond Dénes	a329cc34b7	tools/scylla-nodetool: implement the proxyhistograms command	2024-03-13 02:06:30 -04:00
Botond Dénes	a52eddc9c1	tools/scylla-nodetool: implement the tableshistograms command	2024-03-13 02:06:30 -04:00
Botond Dénes	151fb5a53b	tools/scylla-nodetool: introduce buffer_samples Based on Origin's org.apache.cassandra.tools.NodeProbe.BufferSamples. To be used to qunatile time latency histogram samples.	2024-03-13 02:06:30 -04:00
Botond Dénes	47ac7d70e4	utils/estimated_histogram: estimated_histogram: add constructor taking buckets And bucket offsets. Allows constructing the histogram back from a json format.	2024-03-13 02:06:30 -04:00
Botond Dénes	006bc84761	tools/scylla-nodetool: implement the netstats command	2024-03-13 02:06:10 -04:00
Botond Dénes	ec7e1a2e92	tools/scylla-nodetool: add correct units to file_size_printer When printing human-readable file-sizes, the Java nodetool always uses base-2 steps (1024) to arrive at the human-readable size, but it uses the base-10 units (MB) and base-2 units (MiB) interchangeably. Adapt file_size_printer to support both. Add a flag to control which is used.	2024-03-13 02:05:22 -04:00
Kefu Chai	2d319fa789	mutation: add fmt::formatter for mutation before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation. but its operator<< is preserved, as we are still using our homebrew generic formatter for printing `std::vector<mutation>`, and this formatter is using operator<< for printing the elements in vector. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 11:07:42 +08:00
Kefu Chai	acd14f12f0	mutation: add fmt::formatter for frozen_mutation::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for frozen_mutation::printer, and drop its operator. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 10:47:22 +08:00
Kefu Chai	94d25e02ad	mutation: add fmt::formatter for mutation_partition_v2::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation_partition_v2::printer, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 10:47:09 +08:00
Asias He	f74053af40	repair: Add dc option support for tablet repair This patch adds the dc option support for table repair. The management tool can use this option to select nodes in specific data centers to run repair. Fixes: #17550 Tests: repair_additional_test.py::TestRepairAdditional::test_repair_option_dc Closes scylladb/scylladb#17571	2024-03-12 22:19:50 +02:00
Ferenc Szili	1da5b3033e	scylla-nodetool: check for missing keyspace argument on describering Calling scylla-nodetool with option describering and ommiting the keyspace name argument results in a boost exception with the following error message: error running operation: boost::wrapexcept<boost::bad_any_cast> (boost::bad_any_cast: failed conversion using boost::any_cast) This change checks for the missing keyspace and outputs a more sensible error message: error processing arguments: keyspace must be specified Closes scylladb/scylladb#17741	2024-03-12 21:19:11 +02:00
Avi Kivity	f410038296	Merge 'Use do_with_cql_env_thread() helper in storage proxy test' from Pavel Emelyanov Just a cleanup -- replace do_with_cql_env + async with do_with_cql_env_thread Closes scylladb/scylladb#17758 * github.com:scylladb/scylladb: test/storage_proxy: Restore indentation after previous patch test/storage_proxy: Use do_with_cql_env_thread()	2024-03-12 20:23:40 +02:00
Pavel Emelyanov	34477ad98e	test/storage_proxy: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-12 19:10:44 +03:00
Pavel Emelyanov	fd112446c2	test/storage_proxy: Use do_with_cql_env_thread() One of the test cases explicitly wraps itself into async, but there's a convenience helper for that already. Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-12 19:10:33 +03:00
Botond Dénes	2335f42b2b	test/boost/sstable_compaction_test: add validation test with valid sstable Add a positive test, as it turns out we had some false-positive validation bugs in the validator and we need a regression test for this.	2024-03-12 11:05:18 -04:00
Botond Dénes	a19a2d76c9	sstablex/mx/reader: validate(): print trace message when finishing the PI block	2024-03-12 11:05:18 -04:00
Botond Dénes	677be168c4	sstablex/mx/reader: validate(): make index-data PI position check message consistent The message says "index-data" but when printing the position, the data position is printed first, causing confusion. Fix this and while at it, also print the position of the partition start.	2024-03-12 11:05:18 -04:00
Botond Dénes	5bff7c40d3	sstablex/mx/reader: validate(): only load the next PI block if current is exhausted The validate() consumes the content of partitions in a consume-loop. Every time the consumer asks for a "break", the next PI block is loaded and set on the validator, so it can validate that further clustering elements are indeed from this block. This loop assumed the consumer would only request interruption when the current clustering block is finished. This is wrong, the consumer can also request interruption when yielding is needed. When this is the case, the next PI block doesn't have to be loaded yet, the current one is not exhausted yet. Check this condition, before loading the next PI block, to prevent false positive errors, due to mismatched PI block and clustering elements from the sstable.	2024-03-12 11:05:18 -04:00
Botond Dénes	e073df1dbb	sstablex/mx/reader: validate(): reset the current PI block on partition-start It is possible that the next partition has no PI and thus there won't be a new PI block to overwrite the old one. This will result in false-positive messages about rows being outside of the finished PI block.	2024-03-12 11:05:18 -04:00
Botond Dénes	2737899c21	sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked Promoted index entries can be written on any clustering elements, icluding range tombstones. So the validating consumer also has the check whether the current expected clustering block is finished, when consuming a range tombstone. If it is, consumption has to be interrupted, so that the outer-loop can load up the next promoted index block, before moving on to the next clustering element.	2024-03-12 11:05:18 -04:00
Botond Dénes	f46b458f0d	sstablex/mx/reader: validate(): fix validator for range tombstone end bounds For range tombstone end-bounds, the validate_fragment_order() should be passed a null tombstone, not a disengaged optional. The latter means no change in the current tombstone. This caused the end bound of range tombstones to not make it to the validator and the latter complained later on partition-end that the partition has unclosed range tombstone.	2024-03-12 11:05:18 -04:00
Botond Dénes	8be97884ec	test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper It is not used anymore.	2024-03-12 11:05:18 -04:00
Botond Dénes	da0f4d3a9f	test/boost/sstable_compaction_test: fix indentation	2024-03-12 11:05:18 -04:00
Botond Dénes	c35092aff6	test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	3f76aad609	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	5237e8133b	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test The test becomes a lot shorter and it now uses random schema and random data. The test is also split in two: one test for abort mode and one for skip mode. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	76785baf43	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	b6f0c4efa0	test/boost/sstable_compaction_test: introduce scrub_test_framework Scrub tests require a lot of boilerplate code to work. This has a lot of disadvantages: * Tests are long * The "meat" of the test is lost between all the boiler-plate, it is hard to glean what a test actually does * Tests are hard to write, so we have only a few of them and they test multiple things. * The boiler-plate differs sligthly from test-to-test. To solve this, this patch introduces a new class, `scrub_test_frawmework`, which is a central place for all the boiler-plate code needed to write scrub-related tests. In the next patches, we will migrate scrub related tests to this class.	2024-03-12 11:05:18 -04:00
Botond Dénes	e412673c44	test/lib/random_schema: add uncompatible_timestamp_generator() Guarantees that produced mutations will not be compactible.	2024-03-12 11:05:18 -04:00
Pavel Emelyanov	3a734facc7	view_builder: Complete build step early if reader produces nothing Builder works in "steps". Each step runs for a given base table, when a new view is created it either initiates a step or appends to currently running step. Running a step means reading mutations from local sstables reader and applying them to all views that has jumped into this step so far. When a view is added to the step it remembers the current token value the step is on. When step receives end-of-stream it rewinds to minimal-token. Rewinding is done by closing current reader and creating a new one. Each time token is advanced, all the views that meet the new token value for the second time (i.e. -- scan full round) are marked as built and are removed from step. When no views are left on step, it finishes. The above machinery can break when rewinding the end-of-stream reader. The trick is that a running step silently assumes that if the reader once produced some token (and there can be a view that remembered this token as its starting one), then after rewinding the reader would generate the same token or greater. With tablets, however, that's not the case. When a node is decommissioned tablets are cleaned and all sstables are removed. Rewinding a reader after it makes empty reader that produces no tokens from now on. Respectively, any build steps that had captured tokens prior to cleanup would get stuck forever. The fix is to check if the mutation consumer stepped at least one step forward after rewind, and if no -- complete all the attached views. fixes: #17293 Similar thing should happen if the base table is truncated with views being built from it. Testing it steps on compaction assertion elsewhere and needs more research. refs: #17543 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17548	2024-03-12 14:58:47 +02:00
Kefu Chai	69f140eea6	test.py: s/summarize_tests/summarize_boost_tests/ summarize_tests() is only used to summarize boost tests, so reflect this fact using its name. we will need to summarize the tests which generate JUnit XML as well, so this change also prepares for a following-up change to implement a new summarize helper. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17746	2024-03-12 14:49:01 +02:00
Pavel Emelyanov	def5fed619	api: Fix stats reported for row cache Here are three endpoints in the api/cache_service that report "metrics" for the row cache and the values they return - entries: number of partitions - size: number of partitions - capacity: used space The size and capacity seem very inaccurate. Comment says, that in C* the size should be weighted, but scylla doesn't support weight of entries in cache. Also, capacity is configurable via row_cache_size_in_mb config option or set_row_cache_capacity_in_mb API call, but Scylla doesn't support both either. This patch suggestes changing return values for size and capacity endpoints. Despite row cache doesn't support weights, it's natural to return used_space in bytes as the value, which is more accurate to what "size" means rather than number of entries. The capacity may return back total memory size, because this is what Scylla really does -- row cache growth is only limited by other memory consumers, not by configured limits. fixes: #9418 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17724	2024-03-12 13:44:59 +02:00
Pavel Emelyanov	a755914265	test/cql_query_test: Use string_view by value The test carries const std::string_view& around, but the type is lightweight class that can be copied around at the same cost as its reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17735	2024-03-12 13:44:04 +02:00
Kefu Chai	17fe4a6439	view_info: add fmt::formatter for view_info before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `view_info`, its operator<< is dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17745	2024-03-12 13:28:27 +02:00
Botond Dénes	f3735dc8e0	Merge 'utils: add fmt::formatter for utils types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * utils::human_readable_value * std::strong_ordering * std::weak_ordering * std::partial_ordering * utils::exception_container Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17710 * github.com:scylladb/scylladb: utils/exception_container: add fmt::formatter for exception_container utils/human_readable: add fmt::formatter for human_readable_value utils: add fmt::formatter for std::strong_ordering and friends	2024-03-12 13:27:37 +02:00
Botond Dénes	8e90b856b5	Merge 'Extend test.py's ability to select test cases' from Pavel Emelyanov This PR fixes comments left from #17481 , namely - adds case selection to boost suite - describes the case selection in documentation Closes scylladb/scylladb#17721 * github.com:scylladb/scylladb: docs: Add info about the ability to run specific test case test.py: Support case selection for boost tests	2024-03-12 13:21:50 +02:00
Kefu Chai	9c1d517bcc	data_dictionary: drop unused friend declaration the corresponding implementation of operator<< was dropped in `a40d3fc25b`, so there is no needs to keep this friend declaration anymore. also, drop `include <ostream>`, as this header does not reference any of the ostream types with the change above. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17743	2024-03-12 09:45:15 +02:00
Kefu Chai	af3b69a4d1	Update seastar submodule * seastar 5d3ee980...a71bd96d (51): > util: add formatter for optimized_optional<> > build: search protobuf using package config > reactor: Move pieces of scollectd to scollectd > reactor: Remove write-only task_queue._current > Add missing include in tests/unit/rpc_test.cc > doc/io_tester.md: include request_type::unlink in the docs > doc/io-tester.md: update obsolete information in io_tester docs > io_tester/conf.yaml: include an example of request_type::unlink job > io_tester: implement request_type::unlink > reactor: Print correct errno on io_submit failure > src/core/reactor.cc: qualify metric function calls with "sm::" > build: add shard_id.hh to seastar library > thread: speed up thread creation in debug mode > include: add missing modules.hh import to shard_id.hh > prometheus: avoid ambiguity when calling MetricFamily.set_name() > util/log: add formatter for log_level > util/log: use string_view for log_level_names > perf: Calculate length of name column in perf tests > rpc_test: add a test for inter-compressor communication > rpc: in multi_algo_compressor_factory, propagate send_empty_frame > rpc: give compressors a way to send something over the connection > rpc: allow (and skip) empty compressed frames > metrics: change value_vector type to std::deque > HACKING.md: remove doc related to test_dist > test/unit: do not check if __cplusplus > 201703L > json_elements: s/foramted/formatted/ > iostream: Refactor input_stream::read_exactly_part > add unit test to verify str.starts_with(str), str.ends_with(str) return true. > str.starts_with(str) and str.ends_with(str) should return true, just like std::string > rpc: Remove FrameType::header_and_buffer_type > rpc: Defuturize FrameType::return_type > rpc: Kill FrameType::get_size() > treewide: put std::invocable<> constraints in template param list > include: do not include unuser headers > rpc: fix a deadlock in connection::send() > iostream: Replace recursion by iteration in input_stream::read_exactly_part > core/bitops.hh: use std::integral when appropriate > treewide: include <concepts> instead of seastar/util/concepts.hh > abortable_fifo: fix the indent > treewide: expand `SEASTAR_CONCEPT` macro > util/concepts: always define SEASTAR_CONCEPT > file: Remove unused thread-pool arg from directory lister > seastar-json2code: collect required_query_params using a list > seastar-json2code: reduce the indent level > seastar-json2code: indent the enum and array elements > seastar-json2code: generate code for enum type using Template > seastar-json2code: extract add_operation() out > reactor: Re-ifdef SIGSEGV sigaction installing > reactor: Re-ifdef reactor::enable_timer() > reactor: Re-ifdef task_histogram_add_task() > reactor: Re-ifdef install_signal_handler_stack() Closes scylladb/scylladb#17714	2024-03-12 09:19:28 +02:00
Botond Dénes	3a7364525f	Merge 'test/alternator: improve metrics tests' from Nadav Har'El This small series improves the Alternator tests for metrics: 1. Improves some comments in the test. 2. Restores a test that was previously hidden by two tests having the same name. 3. Adds tests for latency histogram metrics. Closes scylladb/scylladb#17623 * github.com:scylladb/scylladb: test/alternator: tests for latency metrics test/alternator: improve comments and unhide hidden test	2024-03-12 09:13:17 +02:00
Kefu Chai	35fc065458	utils/exception_container: add fmt::formatter for exception_container before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `exception_container<..>` and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-12 14:53:55 +08:00
Kefu Chai	9300d7b80b	utils/human_readable: add fmt::formatter for human_readable_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `utils::human_readable_value`, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-12 14:53:55 +08:00
Kefu Chai	007d7f1355	utils: add fmt::formatter for std::strong_ordering and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * std::strong_ordering * std::weak_ordering * std::partial_ordering and their operator<<:s are moved to test/lib/test_utils.{hh,cc}, as they are only used by Boost.test. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-12 14:53:55 +08:00
Tomasz Grabiec	47a66d0150	Merge 'Handle tablet migration failure in wrapping-up stages' from Pavel Emelyanov There are four stages left to handle: cleanup, cleanup_target, end_migration and revert_migration. All are handling removed nodes already, so the PR just extends the test. fixes: #16527 Closes scylladb/scylladb#17684 * github.com:scylladb/scylladb: test/tablets_migration: Test revert_migration failure handling test/tablets_migration: Test end_migration failure handling test/tablets_migration: Test cleanup_target failure handling test/tablets_migration: Test cleanup failure handling test/tablets_migration: Prepare for do_... stages test/tablets_migration: Add ability to removenode via any other node test/tablets_migration: Wrap migration stages failing code into a helper class storage_service: Add failure injection to crash cleanup_tablet	2024-03-12 00:20:56 +01:00
Botond Dénes	c6cff53771	reader_concurrency_semaphore: use variable reference for metrics Instead of a functor, for those metrics that just return the value of an existing member variable. This is ever so slightly more efficient than a functor. Closes scylladb/scylladb#17726	2024-03-11 20:47:04 +02:00
Mikołaj Grzebieluch	cb17b4ac59	docs: maintenance socket: add section about accessing maintenance socket Closes scylladb/scylladb#17701	2024-03-11 20:25:00 +02:00
Asias He	ebc0ab94e5	repair: Add ranges option support for tablet repair The management tool, e.g., scylla manager, needs the ranges option to select which ranges to repair on a node to schedule repair jobs. This patch adds ranges option support. E.g., curl -X POST "http://127.0.0.1:10000/storage_service/repair_async/ks1?ranges=-4611686018427387905:-1,4611686018427387903:9223372036854775807" Fixes: #17416 Tests: test_tablet_repair_ranges_selection Closes scylladb/scylladb#17436	2024-03-11 20:03:12 +02:00
Nadav Har'El	d207962e40	test/alternator: tests for latency metrics In test/alternator/test_metrics.py we had tests for the operation-count metrics for different Alternator API operations, but not for the latency histograms for these same operations. So this patch adds the missing tests (and removes a TODO asking to do that). Note that only a subset of the operations - PutItem, GetItem, DeleteItem, UpdateItem, and GetRecords - currently have a latency history, and this test verifies this. We have an issue (Refs #17616) about adding latency histograms for more operations - at which point we will be able to expand this test for the additional operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-11 19:26:59 +02:00
Nadav Har'El	970c2dc7a6	test/alternator: improve comments and unhide hidden test The original goal of this patch was to improve comments in test/alternator/test_metrics.py, but while doing that I discovered that one of the test functions was hidden by a second test with the same name! So this patch also renames the second test. The test continues to work after this patch - the hidden test was successful. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-11 19:26:59 +02:00
Pavel Emelyanov	0d5c25aef5	error_injection: De-template inject() with handler The recently renamed inject_with_handler() was a template, but it can be symmetrical to its peer that accepts void function as a callback, and use std::function as its argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 19:32:21 +03:00
Pavel Emelyanov	1f44a374b8	error_injection: Overload inject() instead of inject_with_handler() The inject_with_handler() method accepts a coroutine that can be called wiht injection_handler. With such function as an argument, there's no need in distinctive inject_with_handler() name for a method, it can be overload of all the existing inject()-s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 19:30:19 +03:00
Botond Dénes	7d31093d4b	Merge 'storage_service/ownership: handle requests when tablets are enabled' from Patryk Wróbel Before this change, when user tried to utilize 'storage_service/ownership/{keyspace}' API with keyspace parameter that uses tablets, then internal error was thrown. The code was calling a function, that is intended for vnodes: get_vnode_effective_replication_map(). This commit introduces graceful handling of such scenario and extends the API to allow passing 'cf' parameter that denotes table name. Now, when keyspace uses tablets and cf parameter is not passed a descriptive error message is returned via BAD_REQUEST. Users cannot query ownership for keyspace that uses tablets, but they can query ownership for a table in a given keyspace that uses tablets. Also, new tests have been added to test/rest_api/test_storage_service.py and to test/topology_experimental_raft/test_tablets.py in order to verify the behavior with and without tablets enabled. Fixes: https://github.com/scylladb/scylladb/issues/17342 Closes scylladb/scylladb#17405 * github.com:scylladb/scylladb: storage_service/ownership: discard get_ownership() requests when tablets enabled storage_service/ownership/{keyspace}: handle requests when tablets are enabled locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual locator/tablets: add tablet_map::get_sorted_tokens() pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient rest_api/test_storage_service: add simplistic tests of ownership API for vnodes	2024-03-11 14:55:26 +02:00
Kefu Chai	50c6fc1141	scylla-gdb: use current_scheduling_group_ptr instead of task_queue._current Seastar removed `task_queue::_current` in 258b11220d343d8c7ae1a2ab056fb5e202723cc8 . let's adapt scylla-gdb.py accordingly. despite that `current_scheduling_group_ptr()` is an internal API, it's been around for a while, and relatively stable. so let's use it instead. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17720	2024-03-11 13:13:59 +02:00
Kamil Braun	65b4f754ff	Merge 'gossiper: do_status_check: allow evicting dead nodes from membership with no host_id' from Benny Halevy The short series allows do_status_check to handle down nodes that don't have HOST_ID application state. Fixes #16936 Closes scylladb/scylladb#17024 * github.com:scylladb/scylladb: gossiper: do_status_check: fixup indentation gossiper: do_status_check: allow evicting dead nodes from membership with no host_id gossiper: print the host_id when endpoint state goes UP/DOWN gossiper: get_host_id: differentiate between no endpoint_state and no application_state gms: endpoint_state: add get_host_id gossiper: do_status_check: continue loop after evicting FatClient	2024-03-11 11:21:49 +01:00
Kefu Chai	e1dbfedcdb	service: add fmt::formatter for service/storage_proxy.cc types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for internal types in service/storage_proxy.cc. please note, `service::storage_proxy::remote::read_verb` is extracted out of the outter class, because, the class's implementation formats `read_verb` in this class. so we have to put the formatter at the place where its callers can see. that's why it is moved up and out of `service::storage_proxy::remote`. some of the operator<<:s are preserved, as they are still being used by the existing formatters, for instance, the one for `seastar::shared_ptr<>`, which is used to print `seastar::shared_ptr<service::paxos_response_handler>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17708	2024-03-11 11:52:58 +02:00
Kefu Chai	1ab30fc306	clustering_bounds_comparator: add fmt::formtter for bound_{kind,view} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `bound_kind` and `bound_view`, and drop the latter's operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17706	2024-03-11 11:37:48 +02:00
Botond Dénes	1e7180de57	Update tools/java submodule * tools/java e4878ae7...d61296dc (1): > build.xml: update scylla-driver-core to 3.11.5.2 Closes scylladb/scylladb#17722	2024-03-11 11:36:29 +02:00
Amnon Heiman	8b43609920	alternator: Use summary for shard-level latencies. Shard-level latencies generate a lot of metrics. This patch reduces the the number of latencies reported by Alternator while keeping the same functionality. On the shard level, summaries will be reported instead of histograms. On the instance level, an aggregated histogram will be reported. Summaries, histograms, and counters are marked with skip_when_empty. Fixes #12230 Closes scylladb/scylladb#17581	2024-03-11 11:12:08 +02:00
Patryk Wrobel	9eb91b5526	storage_service/ownership: discard get_ownership() requests when tablets enabled This change introduces a logic, that is responsible for checking if tablets are enabled for any of keyspaces when get_ownership() is invoked. Without it, the result would be calculated based solely on sorted_tokens() which was invalid. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:52:25 +01:00
Patryk Wrobel	51da80da7d	storage_service/ownership/{keyspace}: handle requests when tablets are enabled Before this change, when user tried to utilize 'storage_service/ownership/{keyspace}' API with keyspace parameter that uses tablets, then internal error was thrown. The code was calling a function, that is intended for vnodes: get_vnode_effective_replication_map(). This commit introduces graceful handling of such scenario and extends the API to allow passing 'cf' parameter that denotes table name. Now, when keyspace uses tablets and cf parameter is not passed a descriptive error message is returned via BAD_REQUEST. Users cannot query ownership for keyspace that uses tablets, but they can query ownership for a table in a given keyspace that uses tablets. Also, new tests have been added to test/rest_api/test_storage_service.py and to test/topology_experimental_raft/test_tablets.py in order to verify the behavior with and without tablets enabled. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:52:23 +01:00
Patryk Wrobel	75aadeb32f	locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual Before this patch, the mentioned function was a specific member of vnode_effective_replication_strategy class. To allow its usage also when tablets are enabled it was shifted to the base class - effective_replication_strategy and made pure virtual to force the derived classes to implement it. It is used by 'storage_service::get_ranges_for_endpoint()' that is used in calculation of effective ownership. Such calculation needs to be performed also when tablets are enabled. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Patryk Wrobel	3fff6bd407	locator/tablets: add tablet_map::get_sorted_tokens() This change introudces a new member function that returns a vector of sorted tokens where each pair of adjacent elements depicts a range of tokens that belong to tablet. It will be used to produce the equivalent of sorted_tokens() of vnodes when trying to use dht::describe_ownership() for tablets. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Patryk Wrobel	a39a5b671e	pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient This change adds a member function that can be used to access 'storage_service/ownership' API. It will be used by tests that need to access this API. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Patryk Wrobel	dea76c4763	rest_api/test_storage_service: add simplistic tests of ownership API for vnodes This change is intended to introduce tests for vnodes for the following API paths: - 'storage_service/ownership' - 'storage_service/ownership/{keyspace}' In next patches the logic that is tested will be adjusted to work correctly when tablets are enabled. This is a safety net that ensures that the logic is not broken. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Kefu Chai	38ae52d5cd	add fmt::formatter for reader_permit::state and reader_resources before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * reader_permit::state * reader_resources Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17707	2024-03-11 09:55:51 +02:00
Kefu Chai	ca7b73f34e	tools/scylla-nodetool: use constexpr for compile-time format check instead of using fmt::runtime format string, use compile-time format string, so that we can have compile-time format check provided by {fmt}. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17709	2024-03-11 09:45:32 +02:00
Pavel Emelyanov	3453a934ba	docs: Add info about the ability to run specific test case The test.py usage is documented, the ability to run a specific test by its name is described in doc. Extend it with the new ability to run specific test case as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 09:10:20 +03:00
Pavel Emelyanov	3afbd21faa	test.py: Support case selection for boost tests Boost tests support case-by-case execution and always turn it on -- when run, boost test is split into parallel-running sub-tests each with the specific case name. This patch tunes this, so that when a test is run like test.py boost/testname::casename No parallel-execution happens, but instead just the needed casename is run. Example of selection: test.py --mode=${mode} boost/bptree_test::test_cookie_find Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 09:09:10 +03:00
Pavel Emelyanov	feae470475	test/tablets_migration: Test revert_migration failure handling This stage is also the error path that starts from write_both_read_old, so check this failure in two steps -- first fail the latter stage in one of the nodes, then fail the former in another. For that one more node in the cluster is needed. Also, to avoid name conflicts, the do_revert_migration pseudo stage name is used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	c3d96b1a86	test/tablets_migration: Test end_migration failure handling This stage is pure barrier. Barriers already take ignored nodes into account, so do the fail-injector, so just wire the stage name into the test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	180446e7b8	test/tablets_migration: Test cleanup_target failure handling This stage is error path, so in order to fail it we need to fail some other stage prior to that. This leads to the testing sequence of 1. fail streaming via source node 2. stop and remove source node to let state machine proceed 3. fail cleanup_target on the destination node 4. stop and remove destination node First thing to note here, is that the test doesn't fail source node for cleanup_target stage, symmetrically to how it does for cleanup stage. Next, since we're removing two nodes, the cluster is equipeed with more nodes nodes to have raft quorum. Finally, since remove of source node doesn't finish until tablet migration finishes, it's impossible to remove destination node via the same node-0, so the 2nd removenode happens via node-3. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	724c79ecf6	test/tablets_migration: Test cleanup failure handling The handling itself is already there -- if the leaving node is excluded the cleanup stage resolves immediately. So just add a code that validates that. Also, skip testing of pending replica failure during cleanup stage, as it doesn't really participate in it any longer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	ccefb7f21f	test/tablets_migration: Prepare for do_... stages The tablets migration test is parametrized with stage name to inject failure in. Internal class node_failer uses this parameter as is when injecting a failure into scylla barrier handler. Next patch will need to extend the test with revert_migration value and add handling of this name to node_failer class. The node_failer class, in turn, will want to instantiate two other instances of the same class -- one to fail the write_both_read_old stage, and the other one to fail the revert_migration barrier. So internally the class will need to tell revert_migration value as full test parameter from revert_migration as barrier-only parameter. This test adds the ability to add do_ prefix to node_failer parameter to tell full test from barrier-only. When injecting a failure into scylla the do_ prefix needs to be cut off, since scylla still needs to fail the barrier named revert_migration, not do_revert_migration. Also split the long line while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:58 +03:00
Pavel Emelyanov	abbd22cb90	test/tablets_migration: Add ability to removenode via any other node Currently the test calls removenode via node-0 in the cluster, which is always alive. Next test case will need to call removenode on some other node (more details in that patch later). refs: #17681 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:55 +03:00
Pavel Emelyanov	5d3291f322	test/tablets_migration: Wrap migration stages failing code into a helper class One of the next stages will need to use two of them at the same time and it's going to be easier if the failing code is encapsulated. No functional changes here, just large portions of code and local variables are moved into class and its methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:55 +03:00
Pavel Emelyanov	82270e3ec4	storage_service: Add failure injection to crash cleanup_tablet Will be needed by test that verifies how failures in tablets migration stages are handled by state machine Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:55 +03:00
Benny Halevy	9804ce79d8	gossiper: do_status_check: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	1375c4e6a3	gossiper: do_status_check: allow evicting dead nodes from membership with no host_id Be more permissive about the presence of host_id application state for dead and expired nodes in release mode, so do not throw runtime_error in this case, but rather consider them as non-normal token owners. Instead, call on_internal_error_noexcept that will log the internal error and a backtrace, and will abort if abort-on-internal-error is set. This was seen when replacing dead nodes, without https://github.com/scylladb/scylladb/pull/15788 Fixes #16936 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	f32efcb7a6	gossiper: print the host_id when endpoint state goes UP/DOWN The host_id is now used in token_metadata and in raft topology changes so print it when the gossiper marks the node as UP/DOWN. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	fbf85ee199	gossiper: get_host_id: differentiate between no endpoint_state and no application_state Currently, we throw the same runtime_error: `Host {} does not have HOST_ID application_state` in both case: where there is no endpoint_state or when the endpoint_state has no HOST_ID application state. The latter case is unexpected, especially after `8ba0decda5` (and also from the add_saved_endpoint path after https://github.com/scylladb/scylladb/pull/15788 is merged), so throw different error in each case so we can tell them apart in the logs. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:16:49 +02:00
Benny Halevy	a9fb0cf3dc	gms: endpoint_state: add get_host_id A simpler getter to get the HOST_ID application state from the endpoint_state. Return a null host_id if the application state is not found. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 15:19:51 +02:00
Benny Halevy	234774295e	gossiper: do_status_check: continue loop after evicting FatClient We're seeing cases like #16936: ``` INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - failure_detector_loop: Mark node 127.0.23.4 as DOWN INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - InetAddress 127.0.23.4 is now DOWN, status = BOOT INFO 2024-01-23 02:14:27,913 [shard 0: gms] gossip - FatClient 127.0.23.4 has been silent for 30000ms, removing from gossip INFO 2024-01-23 02:14:27,915 [shard 0: gms] gossip - Removed endpoint 127.0.23.4 WARN 2024-01-23 02:14:27,916 [shard 0: gms] gossip - === Gossip round FAIL: std::runtime_error (Host 127.0.23.4 does not have HOST_ID application_state) ``` Since the FatClient timeout handling already evicts the endpoint from memberhsip there is no need to check further if the node is dead and expired, so just co_return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 15:19:51 +02:00
Nadav Har'El	af90910687	Merge 'repair: add fmt::formatter for repair types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * repair_hash * read_strategy * streaming::stream_summary and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17711 * github.com:scylladb/scylladb: repair: add fmt::formatter for streaming::stream_summary repair: add fmt::formatter for read_strategy repair: add fmt::formatter for repair_hash	2024-03-10 12:15:15 +02:00
Kefu Chai	5687c289f4	repair: add fmt::formatter for streaming::stream_summary before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for streaming::stream_summary, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-09 23:43:32 +08:00
Kefu Chai	7be93084b3	repair: add fmt::formatter for read_strategy before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for read_strategy, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-09 23:42:19 +08:00
Kefu Chai	39ee8593cb	repair: add fmt::formatter for repair_hash before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for repair_hash. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-09 23:41:58 +08:00
Botond Dénes	9f97d21339	Merge 'Enhance perf-simple-query test' from Pavel Emelyanov While measuring #17149 with this test some changes were applied, here they are - keep initial_tablets number in output json's parameters section - disable auto compaction - add control over the amount of sstables generated for --bypass-cache case Closes scylladb/scylladb#17473 * github.com:scylladb/scylladb: perf_simple_query: Add --memtable-partitions option perf_simple_query: Disable auto compaction perf_simple_query: Keep number of initial tablets in output json	2024-03-08 15:21:04 +02:00
Kefu Chai	079d70145e	raft: add fmt::formatter for raft tracker types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raft::election_tracker * raft::votes * raft::vote_result and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17670	2024-03-08 15:19:37 +02:00
Piotr Smaroń	44bbf2e57b	test.py: improve readability of failures resulting in empty XML Before the change, when a test failed because of some error in the `cql_test_env.cc`, we were getting: ``` error: boost/virtual_table_test: failed to parse XML output '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0 ``` After the change we're getting: ``` error: boost/virtual_table_test: Empty testcase XML output, possibly caused by a crash in the cql_test_env.cc, details: '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0 ``` Closes scylladb/scylladb#17679	2024-03-08 15:17:12 +02:00
Kefu Chai	362a8a777c	partition_snapshot_row_cursor: add fmt::format to this class before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `partition_snapshot_row_cursor`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17669	2024-03-08 15:15:43 +02:00
Botond Dénes	630be97d2f	Merge 'tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"' from Kefu Chai before this change, "ring" subcommand has two issues: 1. `--resolve-ip` option accepts a boolean argument, but this option should be a switch, which does not accept any argument at all 2. it always prints the endpoint no matter if `--resolve-ip` is specified or not. but it should print the resolved name, instead of an IP address if `--resolve-ip` is specified. in this change, both issues are addressed. and the test is updated accordingly to exercise the case where `--resolve-ip` is used. Closes scylladb/scylladb#17553 * github.com:scylladb/scylladb: tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring" test/nodetool: calc max_width from all_hosts test/nodetool: keep tokens as Host's member test/nodetool: remove unused import	2024-03-08 15:15:19 +02:00
Pavel Emelyanov	fc9fb03b90	cql3: Remove unused cf_name::operator<< Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17686	2024-03-08 15:14:52 +02:00
Nadav Har'El	ba585905e5	Update tools/java submodule * tools/java 5e11ed17...e4878ae7 (2): > nodetool: fix a typo in error message > bin/cassandra-stress: Add extended version info Closes scylladb/scylladb#17680	2024-03-08 15:14:21 +02:00
Kefu Chai	f5f5ff1d51	clustering_interval_set: add fmt::formatter for clustering_interval_set before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `clustering_interval_set` their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17593	2024-03-08 15:13:14 +02:00
Kefu Chai	9b5ec53355	tombstone_gc_options: add fmt::formatter for tombstone_gc_mode before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `tombstone_gc_mode`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17673	2024-03-08 15:12:00 +02:00
Kefu Chai	8ca672a02c	test/pylib: return better error if self.create_server() raises in `ScyllaServer::add_server()`, `self.create_server()` is called to create a server, but if it raises, we would reference a local variable of `server` which is not bound to any value, as `server` is not assigned at that moment. if `ScyllaServer` is used by `ScyllaClusterManager`, we would not be able to see the real exception apart from the error like ``` cannot access local variable 'server' where it is not associated with a value ``` which is but the error from Python runtime. in this change, `server` is always initialized, and we check for None, before dereference it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17693	2024-03-08 15:10:27 +02:00
Kefu Chai	70ef7e63b5	tools: toolchain: prepare: do not bail out when checking for command before this change, if `buildah` is not available in $PATH, this script fails like: ```console $ tools/toolchain/prepare --help tools/toolchain/prepare: line 3: buildah: command not found ``` the error message never gets a chance to show up. as `set -e` in the shebang line just let bash quit. after this change, we check for the existence of buildah, and bail out if it is not available. so, on a machine without buildah around, we now have: ```console $ tools/toolchain/prepare --help install buildah 1.19.3 or later ``` the same applies to "reg". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17697	2024-03-08 15:09:21 +02:00
Botond Dénes	05307d0be9	Merge 'service: add fmt::formatter for service types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * service::fencing_token * service::topology::transition_state * service::node_state * service::topology_request * service::global_topology_request * service::raft_topology_cmd::command * service::paxos::proposal * service::paxos::promise Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17692 * github.com:scylladb/scylladb: service/paxos: add fmt::formatter for paxos::promise service/paxos: add fmt::formatter for paxos::proposal service: add fmt::formatter for topology_state_machine types	2024-03-08 15:06:07 +02:00
Botond Dénes	505f137cc9	Merge 'Make object_store suite use ManagerClient' from Pavel Emelyanov The test cases in this suite need to start scylla with custom config options, restart it and call API on it. By the time the suite was created all this wasn't possible with any library facility, so the suite carries its version of managed_cluster class that piggy-backs cql-pytest scylla starting. Now test.py has pretty flexible manager that provides all the scylla cluster management object_store suite needs. This PR makes the suite use the manager client instead of the home-brew managed_cluster thing refs: #16006 fixes: #16268 Closes scylladb/scylladb#17292 * github.com:scylladb/scylladb: test/object_store: Remove unused managed_cluster (and other stuff) test/object_store: Use tmpdir fixture in flush-retry case test/object_store: Turn flush-retry case to use ManagerClient test/object_store: Turn "misconfigured" case to use ManagerClient test/object_store: Turn garbage-collect case to use ManagerClient test/object_store: Turn basic case to use ManagerClient test/object_store: Prepare to work with ManagerClient	2024-03-08 15:04:46 +02:00
Tomasz Grabiec	85ae10f632	Merge 'Make it possible to run individual pytest cases with test.py' from Pavel Emelyanov Today's test.py allows filtering tests to run with the `test.py --options name` syntax. The "name" argument is then considered to be some prefix, and when iterating tests only those whose name starts with that prefix are collected and executed. This has two troubles. Minor: since it is prefix filtering, running e.g. topology_custom/test_tablets will run test_tablets _and_ test_tablets_migration from it. There's no way to exclude the latter from this selection. It's not common, but careful file names selection is welcome for better ~~user~~ testing experience. Major: most of test files in topology and python suites contain many cases, some are extremely long. When the intent is to run a single, potentially fast, test case one needs to either wait or patch the test .py file by hand to somehow exclude unwanted test cases. This PR adds the ability to run individual test case with test.py. The new syntax is `test.py --options name::case`. If the "::case" part is present two changes apply. First, the test file selection is done by name match, not by prefix match. So running topology_custom/test_tablets will _not_ select test_tablets_migration from it. Second, the "::case" part is appended to the pytest execution so that it collects and runs only the specified test case. Closes scylladb/scylladb#17481 * github.com:scylladb/scylladb: test.py: Add test-case splitting in 'name' selection test.py: Add casename argument to PythonTest	2024-03-08 12:56:39 +01:00
Kamil Braun	ae954fb2ec	test: unflake test_tablets_removenode These tests are inserting data into RF=3 tables, but used the default consistency level which is taken from the default execution profile which is set to LOCAL_QUORUM. The tests would then read with CL=ONE, so we cannot give a guarantee that some of the data won't be missed. Fix this by inserting the data with CL=ALL. (Do it for all RF cases for simplicity.) Fixes scylladb/scylladb#17695 Closes scylladb/scylladb#17700	2024-03-08 12:47:47 +01:00
Benny Halevy	8456967012	tablets: read_tablet_mutations: unfreeze_gently Use co_await unfreeze_gently in the loop body unfreezing each partition mutation to prevent reactor stalls when building group0 snapshot with lots of tablets. Fixes #15303 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17688	2024-03-08 10:52:39 +01:00
Yaron Kaikov	ad842e5ad7	[mergify] Fix worng label and base branch for backport pr This PR contains 2 fixes for mergify config file: 1) When openning a backport PR base branch should be `branch-x.y` 2) Once a commit is promoted, we should add the label `promoted-to-master`, in 5.4 configuraion we were using the wrong label. fixing it Closes scylladb/scylladb#17698	2024-03-08 10:08:09 +01:00
Kamil Braun	76fb902858	test: unflake test_topology_remove_garbage_group0 The test is booting nodes, and then immediately starts shutting down nodes and removing them from the cluster. The shutting down and removing may happen before driver manages to connect to all nodes in the cluster. In particular, the driver didn't yet connect to the last bootstrapped node. Or it can even happen that the driver has connected, but the control connection is established to the first node, and the driver fetched topology from the first node when the first node didn't yet consider the last node to be normal. So the driver decides to close connection to the last node like this: ``` 22:34:03.159 DEBUG> [control connection] Removing host not found in peers metadata: <Host: 127.42.90.14:9042 datacenter1> ``` Eventually, at the end of the test, only the last node remains, all other nodes have been removed or stopped. But the driver does not have a connection to that last node. Fix this problem by ensuring that: - all nodes see each other as NORMAL, - the driver has connected to all nodes at the beginning of the test, before we start shutting down and removing nodes. Fixes scylladb/scylladb#16373 Closes scylladb/scylladb#17676	2024-03-08 10:08:09 +01:00
Mikołaj Grzebieluch	a0915115c3	maintenance_socket: change log message to differentiate from regular CQL ports Scylla-ccm uses function `wait_for_binary_interface` that waits for scylla logs to print "Starting listening for CQL clients". If this log is printed far before the regular cql_controller is initialized, scylla-ccm assumes too early that node is initialized. It can result in timeouts that throw errors, for example in the function `watch_rest_for_alive`. Closes scylladb/scylladb#17496	2024-03-08 10:08:09 +01:00
Nadav Har'El	ea53db379f	Merge 'tools/scylla-nodetool: listsnapshot: make it compatible with origin' from Botond Dénes The following incompatibilities were identified by `listsnapshots_test.py` in dtests: * Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report * Formatting is incompatible Both are fixed in this mini-series. Closes scylladb/scylladb#17541 * github.com:scylladb/scylladb: tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots	2024-03-08 10:08:09 +01:00
Kefu Chai	185b503b73	service/paxos: add fmt::formatter for paxos::promise before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `service::paxos::promise`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-08 14:26:58 +08:00
Kefu Chai	cb6c7bb9bf	service/paxos: add fmt::formatter for paxos::proposal before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `service::paxos::proposal`, but its operator<< is preserved, as it is still used by our generic formatter for std::tuple<> which uses operator<< for printing the elements in it, so operator<< of this class is indirectly used. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-08 14:26:58 +08:00
Kefu Chai	14cb48eb0a	service: add fmt::formatter for topology_state_machine types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * service::fencing_token * service::topology::transition_state * service::node_state * service::topology_request * service::global_topology_request * service::raft_topology_cmd::command Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-08 14:05:45 +08:00
Kefu Chai	de276901f2	tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring" before this change, "ring" subcommand has two issues: 1. `--resolve-ip` option accepts a boolean argument, but this option should be a switch, which does not accept any argument at all 2. it always prints the endpoint no matter if `--resolve-ip` is specified or not. but it should print the resolved name, instead of an IP address if `--resolve-ip` is specified. in this change, both issues are addressed. and the test is updated accordingly to exercise the case where `--resolve-ip` is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:29:31 +08:00
Kefu Chai	d927ee8d8f	test/nodetool: calc max_width from all_hosts for better readability. as `token_to_endpoint` is but a derived variable from `all_hosts`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:28:54 +08:00
Kefu Chai	4a748c7fb0	test/nodetool: keep tokens as Host's member to be more consistent with the test_status.py. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:28:54 +08:00
Kefu Chai	aefc385786	test/nodetool: remove unused import and add two empty lines in between global functions Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:28:54 +08:00
Botond Dénes	b69ee6bc27	Merge 'Fix load-and-stream for tablets' from Raphael "Raph" Carvalho It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver. Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path. Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations. Fixes #17315. Closes scylladb/scylladb#17449 * github.com:scylladb/scylladb: test: test_tablets: Add load-and-stream test sstables_loader: Stream to pending tablet replica if needed sstables_loader: Implement tablet based load-and-stream sstables_loader: Virtualize sstable_streamer for tablet sstables_loader: Avoid reallocations in vector sstable_loader: Decouple sstable streaming from selection sstables_loader: Introduce sstable_streamer Fix online SSTable loading with concurrent tablet migration	2024-03-07 14:18:30 +02:00
Nadav Har'El	19bcea6216	materialized views: fix rare failure caused by empty update This one-line patch fixes a failure in the dtest lwt_schema_modification_test.py::TestLWTSchemaModification ::test_table_alter_delete Where an update sometimes failed due to an internal server error, and the log had the mysterious warning message: "std::logic_error (Empty materialized view updated)" We've also seen this log-message in the past in another user's log, and never understood what it meant. It turns out that the error message was generated (and warning printed) while building view updates for a base-table mutation, and noticing that the base mutation contains an empty row - a row with no cells or tombstone or anything whatsoever. This case was deemed (8 years ago, in `d5a61a8c48`) unexpected and nonsensical, and we threw an exception. But this case actually can happen - here is how it happened in test_table_alter_delete - which is a test involving a strange combination of materialized views, LWT and schema changes: 1. A table has a materialized view, and also a regular column "int_col". 2. A background thread repeatedly drops and re-creates this column int_col. 3. Another thread deletes rows with LWT ("IF EXISTS"). 4. These LWT operations each reads the existing row, and because of repeated drop-and-recreate of the "int_col" column, sometimes this read notices that one node has a value for int_col and the other doesn't, and creates a read-repair mutation setting int_col (the difference between the two reads includes just in this column). 5. The node missing "int_col" receives this mutation which sets only int_col. It upgrade()s this mutation to its most recent schema, which doesn't have int_col, so it removes this column from the mutation row - and is left with a completely empty mutation row. This completely empty row is not useful, but upgrade() doesn't remove it. 6. The view-update generation code sees this empty base-mutation row and fails it with this std::logic_error. 7. The node which sent the read-repair mutation sees that the read repair failed, so it fails the read and therefore fails the LWT delete operation. It is this LWT operation which failed in the test, and caused the whole test to fail. The fix is trivial: an empty base-table row mutation should simply be ignored when generating view updates - it shouldn't cause any error. Before this patch, test_table_alter_delete used to fail in roughly 20% of the runs on my laptop. After this patch, I ran it 100 times without a single failure. Fixes #15228 Fixes #17549 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17607	2024-03-07 12:00:43 +02:00
Botond Dénes	09068d20ea	tools/scylla-nodetool: scrub: make keyspace parameter optional When no keyspace is provided, request all keyspaces from the server, then scrub all of them. This is what the legacy nodetool does, for some reason this was missed when re-implementing scrub. Closes scylladb/scylladb#17495	2024-03-07 11:15:46 +02:00
Tomasz Grabiec	ec6ed18b5c	Merge 'Handle tablet migration failure in barrier stages' from Pavel Emelyanov There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC. refs: #16527 Closes scylladb/scylladb#17450 * github.com:scylladb/scylladb: topology.tablets_migration: Handle failed use_new topology.tablets_migration: Handle failed write_both_read_new topology.tablets_migration: Handle failed write_both_read_old topology.tablets_migration: Handle failed allow_write_both_read_old test/tablets_migration: Add conditional break-point into barrier handler replica: Add helper to read tablet transition stage topology_coordinator: Add action_failed() helper	2024-03-07 09:56:13 +01:00
Botond Dénes	5dfaa69bde	tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's The author (me) tried to be clever and fix the formatting, but then he realized this just means a lot of unnecessary fighting with tests. So this patch makes the formatting compatible with that of the legacy nodetool: * Use compatible rounding and precision formatting * Use incorrect unit (KB instead of KiB) * Align numbers to the left * Add trailing white-space to "Snapshot Details: "	2024-03-07 03:54:54 -05:00
Botond Dénes	80483ba732	tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots Print a message and exit, don't continue to output the snapshot table. This is what the legacy nodetool does too.	2024-03-07 03:54:54 -05:00
Botond Dénes	ac15e4c109	tools/scylla-nodetool: repair: accept and ignore -full/--full and -j/--job-threads These two parameters are not used by the native nodetool, because ScyllaDB itself doesn't support them. These should be just ignored and indeed there was a unit test checking that this is the case. However, due to a mistake in the unit test, this was not actually tested and nodetool complained when seeing these params. This patch fixes both the test and the native nodetool. Closes scylladb/scylladb#17477	2024-03-07 11:53:50 +03:00
Nadav Har'El	a36c8b28dd	Merge 'scylla-gdb.py: fixes warnings raised by flake8' from Kefu Chai this changeset addresses some warnings raised by flake8 in hope to improve the readability of this script in general. Closes scylladb/scylladb#17668 * github.com:scylladb/scylladb: scylla-gdb: s/if not foo is None/if foo is not None/ scylla-gdb.py: add space after keyword scylla-gdb.py: remove extraneous spaces scylla-gdb.py: use 2 empty lines between top-level funcs/classes scylla-gdb.py: replace <tab> with 4 spaces scylla-gdb: fix the indent	2024-03-07 10:41:15 +02:00
Botond Dénes	28639e6a59	Merge 'docs: trigger the docs-pages workflow on release branches' from Beni Peled Currently, the github docs-pages workflow is triggered only when changes are merged to the master/enterprise branches, which means that in the case of changes to a release branch, for example, a fix to branch-5.4, or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and therefore the documentation is not updated with the new change, In this change, I added the `branch-*` pattern, so changes to release branches will trigger the workflow Closes scylladb/scylladb#17281 github.com:scylladb/scylladb: docs: always build from the default branch docs: trigger the docs-pages workflow on release branches	2024-03-07 10:01:50 +02:00
Botond Dénes	75fe2f5c3a	Merge 'test: rest_api: fix tests to work with tablets' from Aleksandra Martyniuk Fix test_compaction_task.py, test_repair_task.py and test_storage_service.py to work with tablets. Fixes: #17338. Closes scylladb/scylladb#17474 * github.com:scylladb/scylladb: test: rest_api: enable tablets by default test: fix indentation and delete unused this_dc param test: rest_api: fix test_storage_service.py test: rest_api: fix test_repair_task.py test: rest_api: fix test_compaction_task.py test: rest_api: use skip_without_tablets fixture test: rest_api: add some tablet related fixtures	2024-03-07 10:00:09 +02:00
Asias He	83a28342ea	service: Drop unused table param from session_topology_guard The table param is not used. Dropping it so it can be used in places where the table object is not available. Closes scylladb/scylladb#17628	2024-03-07 09:34:40 +02:00
Israel Fruchter	6eb0509ff9	Update tools/cqlsh submodule * tools/cqlsh b8d86b76...e5f5eafd (2): > dist/debian: fix the trailer line format > `COPY TO STDOUT` shouldn't put None where a function is expected Fixes: scylladb/scylladb#17451 Closes scylladb/scylladb#17447	2024-03-07 09:33:36 +02:00
Michał Chojnowski	f9e97fa632	sstables: fix a use-after-free in key_view::explode() key_view::explode() contains a blatant use-after-free: unless the input is already linearized, it returns a view to a local temporary buffer. This is rare, because partition keys are usually not large enough to be fragmented. But for a sufficiently large key, this bug causes a corrupted partition_key down the line. Fixes #17625 Closes scylladb/scylladb#17626	2024-03-07 09:07:07 +02:00
Kefu Chai	7631605892	query-request: use default-generated operator== instead of using the hand-crafted operator==, use the default-generated one, which is equivalent to the former. regarding the difference between global operator== and member operator==, the default-generated operator in C++20 is now symmetric. so we don't need to worry about the problem of `max_result_size` being lhs or rhs. but neither do we need to worry about the implicit conversion, because all constructors of `max_result_size` are marked explicit. so we don't gain any advantage by making the operator== global instead of a member operator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17536	2024-03-07 09:02:42 +03:00
Kefu Chai	64e14d21db	locator/tablets: add fmt::formatter for tablet_* before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * tablet_id * tablet_replica * tablet_metadata * tablet_map their operator<<:s are dropped Refs scylladb/scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17504	2024-03-07 09:00:49 +03:00
Kefu Chai	6ef507e842	build: cmake: add table_check.cc to repair/CMakeLists.txt in `5202bb9d`, we introduced repair/table_check.cc, but we didn't update repair/CMakeLists.txt accordingly. but the symbols defined by this compilation unit is referenced by other source files when building scylla. so, in this change, we add this table_check.cc to the "repair" target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17517	2024-03-07 08:59:02 +03:00
Pavel Emelyanov	52a1b2c413	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * position_range * mutation_fragment * range_tombstone_stream * mutation_fragment_v2::printer Refs #13245 Closes scylladb/scylladb#17521 * github.com:scylladb/scylladb: mutation: add fmt::formatter for position_range mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream mutation: add fmt::formatter for mutation_fragment_v2::printer	2024-03-07 08:56:21 +03:00
Pavel Emelyanov	df6048adec	topology.tablets_migration: Handle failed use_new This stage doesn't need any special treatment, because we cannot revert to old replicas and should proceed normally. The barrier itself won't get stuck, because it already handles excluded/ignored nodes. Just make the test validate it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	fb7428c560	topology.tablets_migration: Handle failed write_both_read_new Two options here -- go revert to old replicas by jumping into cleanup_target stage or proceed noramlly. The choice depends on which replica set has less number of dead nodes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	324eaaf873	topology.tablets_migration: Handle failed write_both_read_old At this stage it can happen that target replica got some writes, so its tablet needs to be cleaned up, so jump to cleanup_target stage. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	f81e0b2e88	topology.tablets_migration: Handle failed allow_write_both_read_old This is early stage, just proceed to existing revert_migration Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	5bb1597a30	test/tablets_migration: Add conditional break-point into barrier handler There are several transition stages that are executed by the topology coordinator with the help of barrier-and-drain raft commands. For the test to stop and remove a node while handling this stage it must inject a break-point into barrier handler, wait for it to happen and then stop the node without resuming the break-point. Then removenode from the cluster. The break-point suspends barrier handling when a specific tablet is in specific transition stage. Tablet ID and desired stage are configured via injector parameters. With today's error-injection facilities the way to suspend code execution is with injecting a lambda that waits for a message from the injection engine. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	f5264dc501	replica: Add helper to read tablet transition stage Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:25 +03:00
Kefu Chai	4f8b618be7	scylla-gdb: s/if not foo is None/if foo is not None/ more readable this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	643a6d5bda	scylla-gdb.py: add space after keyword it'd be more pythonic to just put an expression after `assert`, instead of quoting it with a pair of parenthesis. and there is no need to add `;` after `break`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	8c65f92f1f	scylla-gdb.py: remove extraneous spaces Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	12c06c39c3	scylla-gdb.py: use 2 empty lines between top-level funcs/classes and 1 empty line for nested functions/classes, to be more PEP8 compliant. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	8e3b22c76a	scylla-gdb.py: replace <tab> with 4 spaces do not mix tab and spaces for indent Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	c4b679fe3b	scylla-gdb: fix the indent indent should be multiple of 4 spaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Pavel Emelyanov	79b5a75ded	topology_coordinator: Add action_failed() helper It checks if the action holder holds a failed action. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:46:29 +03:00
Botond Dénes	8dd6fe75e7	Merge 'tools/scylla-nodetool: implement info ' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17498 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement info test/nodetool: move format_size into utils.py	2024-03-07 07:14:51 +02:00
Avi Kivity	c5f01349b1	Merge 'Add specialized tablet_sstable_set' from Benny Halevy Make a specialized sstable_set for tablets via tablet_storage_group_manager::make_sstable_set. This sstable set takes a snapshot of the storage_groups (compound) sstable_sets and maps the selected tokens directly into the tablet compound_sstable_set. This sstable_set provides much more efficient access to the table's sstable sets as it takes advantage of the disjointness of sstable sets between tablets/storage_groups, and making it is cheaper that rebuilding a complete partitioned_sstable_set from all sstables in the table. Fixes #16876 Cassandra-stress setup: ``` $ sudo cpupower frequency-set -g userspace $ build/release/scylla (developer-mode options) --smp=16 --memory=8G --experimental-features=consistent-topology-changes --experimental-features=tablets cqlsh> CREATE KEYSPACE keyspace1 WITH replication={'class':'NetworkTopologyStrategy', 'replication_factor':1} AND tablets={'initial':2048}; $ ./tools/java/tools/bin/cassandra-stress write no-warmup n=10000000 -pop 'seq=1...10000000' -rate threads=128 $ scylla-api-client system drop_sstable_caches POST $ ./tools/java/tools/bin/cassandra-stress read no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128 $ scylla-api-client system drop_sstable_caches POST $ ./tools/java/tools/bin/cassandra-stress mixed no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128 ``` Baseline (`0a7854ea4d`) vs. fix (`0c2c00f01b`) Throughput (op/s): workload \| baseline \| fix ---------\|----------\|---------- write \| 76,806 \| 100,787 read \| 34,330 \| 106,099 mixed \| 32,195 \| 79,246 Closes scylladb/scylladb#17149 * github.com:scylladb/scylladb: table: tablet_storage_group_manager: make tablet_sstable_set storage_group_manager: add make_sstable_set tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode table: move compaction_group_list and storage_group_vector to storage_group_manager compaction_group::table_state: get_group_id: become self-sufficient compaction_group, table: make_compound_sstable_set: declare as const tablet_storage_group_manager: precalculate my_host_id and _tablet_map table: coroutinize update_effective_replication_map	2024-03-06 23:59:39 +02:00
Botond Dénes	557d851191	tools/toolchain/README.md: mention the need of credentials for publishing images Without this, the push will fail, complaining about bad permissions. Closes scylladb/scylladb#17652	2024-03-06 15:58:24 +02:00
Kefu Chai	3e91b1382b	tools/scylla-nodetool: always use compile-time format string instead of passing fmt string as a plain `const char*`, pass it as a consteval type, so that `fmt::format()` can perform compile-time format check against it and the formatted params. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17656	2024-03-06 14:55:10 +02:00
Avi Kivity	3ab2088119	Merge 'build: cmake: use scylla build mode for rust profile name ' from Kefu Chai before this change, we used the lower-case CMake build configuration name for the rust profile names. but this was wrong, because the profiles are named with the scylla build mode. in this change, we translate the `$<CONFIG>` to scylla build mode, and use it for the profile name and for the output directory of the built library. Closes scylladb/scylladb#17648 * github.com:scylladb/scylladb: build: cmake: use scylla build mode for rust profile name build: cmake: define per-config build mode	2024-03-06 13:46:20 +02:00
Botond Dénes	65b9e10543	repair: resolve start-up deadlock Repairs have to obtain a permit to the reader concurrency semaphore on each shard they have a presence on. This is prone to deadlocks: node1 node2 repair1_master (takes permit) repair1_follower (waits on permit) repair2_master (waits for permit) repair2_follower (takes permit) In lieu of strong central coordination, we solved this by making permits evictable: if repair2 can evict repair1's permit so it can obtain one and make progress. This is not efficient as evicting a permit usually means discarding already done work, but it prevents the deadlocks. We recently discovered that there is a window when deadlocks can still happen. The permit is made evictable when the disk reader is created. This reader is an evictable one, which effectively makes the permit evictable. But the permit is obtained when the repair constrol structrure -- repair meta -- is create. Between creating the repair meta and reading the first row from disk, the deadlock is still possible. And we know that what is possible, will happen (and did happen). Fix by making the permit evictable as soon as the repair meta is created. This is very clunky and we should have a better API for this (refs #17644), but for now we go with this simple patch, to make it easy to backport. Refs: #17644 Fixes: #17591 Closes scylladb/scylladb#17646	2024-03-06 11:38:07 +02:00
Kamil Braun	19b816bb68	Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz This patch series makes all auth writes serialized via raft. Reads stay eventually consistent for performance reasons. To make transition to new code easier data is stored in a newly created keyspace: system_auth_v2. Internally the difference is that instead of executing CQL directly for writes we generate mutations and then announce them via raft group0. Per commit descriptions provide more implementation details. Refs https://github.com/scylladb/scylladb/issues/16970 Fixes https://github.com/scylladb/scylladb/issues/11157 Closes scylladb/scylladb#16578 * github.com:scylladb/scylladb: test: extend auth-v2 migration test to catch stale static test: add auth-v2 migration test test: add auth-v2 snapshot transfer test test: auth: add tests for lost quorum and command splitting test: pylib: disconnect driver before re-connection test: adjust tests for auth-v2 auth: implement auth-v2 migration auth: remove static from queries on auth-v2 path auth: coroutinize functions in password_authenticator auth: coroutinize functions in standard_role_manager auth: coroutinize functions in default_authorizer storage_service: add support for auth-v2 raft snapshots storage_service: extract getting mutations in raft snapshot to a common function auth: service: capture string_view by value alternator: add support for auth-v2 auth: add auth-v2 write paths auth: add raft_group0_client as dependency cql3: auth: add a way to create mutations without executing cql3: run auth DML writes on shard 0 and with raft guard service: don't loose service_level_controller when bouncing client_state auth: put system_auth and users consts in legacy namespace cql3: parametrize keyspace name in auth related statements auth: parametrize keyspace name in roles metadata helpers auth: parametrize keyspace name in password_authenticator auth: parametrize keyspace name in standard_role_manager auth: remove redundant consts auth::meta::*::qualified_name auth: parametrize keyspace name in default_authorizer db: make all system_auth_v2 tables use schema commitlog db: add system_auth_v2 tables db: add system_auth_v2 keyspace	2024-03-06 10:11:33 +01:00
Botond Dénes	58265a7dc1	tools/utils: fix use-after-free when printing error message for unknown operation When a tool application is invoked with an unknown operation, an error message is printed, which includes all the known operations, with all their aliases. This is collected in `std::vector<std::string_view>`. The problem is that the vector containing alias names, is returned as a value, so the code ends up creating views to temporaries. Fix this by returning alias vector with const&. Fixes: #17584 Closes scylladb/scylladb#17586	2024-03-06 10:42:02 +02:00
Pavel Emelyanov	ca8bfed8e6	topology_coordinator: Demote log level for advance_in_background() errors The helper in question is supposed to spawn a background fiber with tablet migration stage action and repeat it in case action fails (until operator intervention, but that's another story). In case action fails a message with ERROR level is logger about the failure. This error confuses some tests that scan scylla log messages for ERROR-s at the end, treat most of them (if not all) as ciritical and fail. But this particular message is not in fact an error -- topology coordinator would re-execute this action anyway, so let's demote the message to be WARN instead. refs: #17027 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17568	2024-03-06 10:39:00 +02:00
Botond Dénes	88a76245ba	Merge 'Get metrics description' from Amnon Heiman This series adds a Python script that searches the code for metrics definition and their description. Because part of the code uses a nonstandard way of definition, it uses a configuration file to resolve parameter values. The script supports the code that uses string format and string concatenation with variables. The documentation team will use the results to both document the existing metrics and to get the metrics changes between releases. Replaces #16328 Closes scylladb/scylladb#17479 * github.com:scylladb/scylladb: Adding scripts/metrics-config.yml Adding scripts/get_description.py to fetch metrics description	2024-03-06 10:37:35 +02:00
Kefu Chai	e248ab48db	tools/scylla-nodetool: correct tablestats filtering before this change, we failed to apply the filtering of tablestats command in the right way: 1. `table_filter` failed to check if delimiter is npos before extract the cf component from the specified table name. 2. the stats should not included the keyspace which are not included by the filter. 3. the total number of tables in the stats report should contain all tables no matter they are filtered or not. in this change, all the problems above are addressed. and the tests are updated to cover these use cases. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17468	2024-03-06 10:36:20 +02:00
Benny Halevy	0c2c00f01b	table: tablet_storage_group_manager: make tablet_sstable_set Make a specialized sstable_set for tablets via tablet_storage_group_manager::make_sstable_set. This sstable set takes a snapshot of the storage_groups (compound) sstable_sets and maps the selected tokens directly into the tablet compound_sstable_set. Refs #16876 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	0745865914	storage_group_manager: add make_sstable_set Move the responsibility for preparing the table_set covering all sstables in the table to the storage_group_manager so it can specialize the sstable_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	3cee24c148	tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count Mini-cleanup of `new_tablet_count`, similar to pre-calculating `old_tablet_count` once. While at it, add some missing coding-style related spaces. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	c65768dc24	table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode No validation is really required in release build. Add `#ifndef SCYLLA_BUILD_MODE_RELEASE` before adding another term to the logic in the next patch that adds support for sparse allocation in a cloned tablet_storage_group_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	7f203f0551	table: move compaction_group_list and storage_group_vector to storage_group_manager So the storage_group_manager can be used later by table_sstable_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:33 +02:00
Tzach Livyatan	a245c0bb98	Docs: Remove 3rd party Rust Driver from the driver list The 3rd party Rust https://github.com/AlexPikalov/cdrs is not maintained, and we have a better internal alternative. Closes scylladb/scylladb#15815	2024-03-06 10:34:43 +02:00
Aleksandra Martyniuk	923ef3c8c8	repair: reuse table name from repair_range argument Currently in shard_repair_task_impl::repair_range table name is retrieved with database::find_column_family and in case of exception, we return from the function. But the table name is already kept in table_info passed to repair_range as an argument. Let's reuse it. If a table is dropped, we will find it out almost immediately after calling repair_cf_range_row_level and handle it more adequately. Closes scylladb/scylladb#17245	2024-03-06 10:34:21 +02:00
Botond Dénes	41424231f1	Merge 'compaction: reshape sstables within compaction groups' from Lakshmi Narayanan Sreethar For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them. Fixes https://github.com/scylladb/scylladb/issues/16966 Closes scylladb/scylladb#17395 * github.com:scylladb/scylladb: test/topology_custom: add testcase to verify reshape with tablets test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction replica/distributed_loader: enable reshape for sstables compaction: reshape sstables within compaction groups replica/table : add method to get compaction group id for an sstable compaction: reshape: update total reshaped size only on success compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run	2024-03-06 10:33:56 +02:00
Botond Dénes	f164ed8bae	Merge 'docs: fix the formattings in operating-scylla/nodetool-commands/info.rst' from Kefu Chai couple minor formatting fixes. Closes scylladb/scylladb#17518 * github.com:scylladb/scylladb: docs: remove leading space in table element docs: remove space in words	2024-03-06 10:33:21 +02:00
Tzach Livyatan	dafc83205b	Docs: rename the select-from-mutation-fragments page name Closes scylladb/scylladb#17456	2024-03-06 10:32:56 +02:00
David Garcia	d27d89fd34	docs: add collapsible for images Introduces collapsible dropdowns for images reference docs. With this update, only the latest version's details will be displayed open by default. Information about previous versions will be hidden under dropdowns, which users can expand as needed. This enhancement aims to make pages shorter and easier to navigate. Closes scylladb/scylladb#17492	2024-03-06 10:32:35 +02:00
Botond Dénes	dce42b2517	Merge 'tools/scylla-nodetool: fixes to address the test failure with dtest' from Kefu Chai - use API endpoint of /storage_service/toppartition/ - only print out the specified samplings. - print "\n" separator between samplings Closes scylladb/scylladb#17574 * github.com:scylladb/scylladb: tools/scylla-nodetool: print separator between samplings tools/scylla-nodetool: only print the specified sampling tools/scylla-nodetool: use /storage_service/toppartition/	2024-03-06 10:27:25 +02:00
David Garcia	847882b981	docs: add dynamic substitutions This pull request adds dynamic substitutions for the following variables: * `.. \|CURRENT_VERSION\| replace:: {current_version}` * `.. \|UBUNTU_SCYLLADB_LIST\| replace:: scylla-{current_version}.list` * `.. \|CENTOS_SCYLLADB_REPO\| replace:: scylla-{current_version}.repo` As a result, it is no longer needed to update the "Installation on Linux" page manually after every new release. Closes scylladb/scylladb#17544	2024-03-06 10:25:57 +02:00
comsky	48ad1b3d20	Update stats-output.rst I read this doc to learn how to use nodetool commands, and I eventually found some typos in the docs. 😄 Closes scylladb/scylladb#15771	2024-03-06 10:25:06 +02:00
Kefu Chai	7bb33a1f8d	node_ops: add fmt::formatter for node_ops_cmd and node_ops_cmd_request before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * node_ops_cmd * node_ops_cmd_request their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17505	2024-03-06 10:24:31 +02:00
Benny Halevy	dc10d02890	compaction_group::table_state: get_group_id: become self-sufficient Printing the compaction_group group_id as "i/size" where size is the total number of compaction_groups in the table is convenient but it comes with a price of a circular dependency on the table, as noted by Aleksandra Martyniuk in `c25827feb3 (r1511341251)`, which can be triggered when hitting an error when adding the compaction_group::table_state to the table's compaction_manager within the table's constructor. This patch just prints the _group_id member resolving the dependency on the table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:21:48 +02:00
Avi Kivity	6383aa1e3c	docs: maintainer.md: add exceptions to the don't-commit-your-own-code rules Submodule and toolchain updates aren't original code and so are exempt from the don't-commit-own-code rule. Closes scylladb/scylladb#17534	2024-03-06 10:19:46 +02:00
Tzach Livyatan	04b483e286	Docs: fix RF type in the consistency-calculator Closes scylladb/scylladb#17557	2024-03-06 10:18:29 +02:00
Kefu Chai	d93b018bcf	create-relocatable-package.py: add --debian-dir option before this change, we assume that debian packaging directory is always located under `build/debian/debian`. which is hardwired by `configure.py`. but this could might hold anymore, if we want to have a self-contained build, in the sense that different builds do not share the same build directory. this could be a waste for the non-mult-config build, but `configure.py` uses mult-config generator when building with CMake. so in that case, all builds still share the same $build_dir/debian/ directory. in order to work with the out-of-source build, where the build directory is not necessarily "build", a new option is added to `create-relocatable-package.py`, this allows us to specify the directory where "debian" artifacts are located. Refs #15241 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17558	2024-03-06 10:18:00 +02:00
Kefu Chai	19e02de1aa	transport/controller: remove unused struct definition the removed struct definition is not used, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17537	2024-03-06 10:17:08 +02:00
Tzach Livyatan	1edce9f4b6	Improve the frozen vs. non-frozen doc section, removing falses claimes Closes scylladb/scylladb#17556	2024-03-06 10:16:33 +02:00
Kefu Chai	4d4c0ddf31	build: cmake: exclude Seastar's tests from "all" in `02de9f1833`, we enable building Seastar testing for using the testing facilities in scylla's own tests. but this brings in Seastar's tests. since scylladb's CI builds the "all" targets, and we are not interested in running Seastar's tests when building scylladb, let's exclude Seastar's tests from the "all" target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17554	2024-03-06 10:15:45 +02:00
Benny Halevy	bfe13daed4	compaction_group, table: make_compound_sstable_set: declare as const It does not modify the compaction_group/table respectively. This is required by the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:15:34 +02:00
Benny Halevy	d7b1851449	tablet_storage_group_manager: precalculate my_host_id and _tablet_map The node host_id never changes, so get it once, when the object is constructed. A pointer to the tablet_map is taken when constructed using the effective_replication_map and it is updated whenever the e_r_m changes, using a newly added `update_effective_replication_map` method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:15:34 +02:00
Benny Halevy	f2ff701489	table: coroutinize update_effective_replication_map It's better to wait on deregistering the old main compaction_groups:s in handle_tablet_split_completion rather than leaving work in the background. Especially since their respective storage_groups are being destroyed by handle_tablet_split_completion. handle_tablet_split_completion keeps a continuation chain for all non-ready compaction_group stop fibers. and returns it so that update_effective_replication_map can await it, leaving no cleanup work in the background. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:15:34 +02:00
Konstantin Osipov	39d882ddca	main: print pid (process id) at start Print process id to the log at start. It aids debugging/administering the instance if you have multiple instances running on the same machine. Closes scylladb/scylladb#17582	2024-03-06 10:14:22 +02:00
Kefu Chai	80d2981473	dist/docker: collect deb packages from different dir for CMake builds CMake generate debian packages under build/$<CONFIG>/debian instead of build/$mode/debian. so let's translate $mode to $<CONFIG> if build.ninja is found under build/ directory, as configure.py puts build.ninja under $top_srcdir, while CMake puts it under build/ . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17592	2024-03-06 10:13:47 +02:00
Botond Dénes	d37ac1545b	Merge 'build: cmake: fixes for debian packaging' from Kefu Chai - changes to use build/$<CONFIG> for build directory - add ${CMAKE_BINARY_DIR}/debian as a dep - generate deb packages under build/$<CONFIG>/debian Closes scylladb/scylladb#17560 * github.com:scylladb/scylladb: build: cmake: generate deb packages under build/$<CONFIG>/debian build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep build: cmake: use build/$<CONFIG>/ instead of build build: cmake: always pass absolute path for add_stripped()	2024-03-06 10:12:18 +02:00
Anna Stuchlik	a024c2d692	doc: remove Membership changes vs LWT page This commit removes the redundant "Cluster membership changes and LWT consistency" page. The page is no longer useful because the Raft algorithm serializes topology operations, which results in consistent topology updates. Closes scylladb/scylladb#17523	2024-03-06 10:10:01 +02:00
Kefu Chai	e8473d6d03	row_cache: add fmt::formatter for cache_entry before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cache_entry, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17594	2024-03-06 10:08:11 +02:00
Botond Dénes	6f374aa7d6	Merge 'doc: update procedures following the introduction of Raft-based topology' from Anna Stuchlik This PR updates the procedures that changed as a result of introducing Raft-based topology. Refs https://github.com/scylladb/scylladb/issues/15934 Applied the updates from https://docs.google.com/document/d/1BgZaYtKHs2GZKAxudBZv4G7uwaXcRt2jM6TK9dctRQg/edit In addition, it adds a placeholder for the 5.4-to-6.0 upgrade guide, as a file included in that guide, Enable Raft topology, is referenced from other places in the docs. Closes scylladb/scylladb#17500 * github.com:scylladb/scylladb: doc: replace "Raft Topology" with "Consistent Topology" doc: (Raft topology) update Removenode doc: (Raft topology) update Upscale a Cluster doc:(Raft topology)update Membership Change Failures doc: doc: (Raft topology) update Replace Dead Node doc: (Raft topology) update Remove a Node doc: (Raft topology) update Add a New DC doc: (Raft topology) update Add a New Node doc: (Raft topology) update Create Cluster (EC2) doc: (Raft topology) update Create Cluster (n-DC) doc: (Raft topology) update Create Cluster (1DC) doc: include the quorum requirement file doc: add the quorum requirement file doc: add placeholder for Enable Raft topology page	2024-03-06 10:05:47 +02:00
Botond Dénes	c843f98769	Merge 'cql3: add fmt::formatter for cql3 types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * std::vector<data_type> * column_identifier * column_identifier_raw * untyped_constant::type_class and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17538 * github.com:scylladb/scylladb: cql3: add fmt::formatter for expression::printer cql3: add fmt::formatter for raw_value{,_view} cql3: add fmt::formatter for std::vector<data_type> cql3: add fmt::formatter for untyped_constant::type_class cql3: add fmt::formatter for column_identifier{,_row}	2024-03-06 10:03:50 +02:00
Kefu Chai	1519904fb9	docs: quote CQL keywords this "misspelling" was identified by codespell. actually, it's not quite a misspelling, as "UPDATE" and "INSERT" are keywords in CQL. so we intended to emaphasis them, so to make codespell more useful, and to preserve the intention, let's quote the keywords with backticks. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17391	2024-03-06 09:57:07 +02:00
Kefu Chai	51a789afc1	build: cmake: use scylla build mode for rust profile name before this change, we used the lower-case CMake build configuration name for the rust profile names. but this was wrong, because the profiles are named with the scylla build mode. in this change, we translate the $<CONFIG> to scylla build mode, and use it for the profile name and for the output directory of the built library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-06 15:53:11 +08:00
Kefu Chai	0c1864eebd	build: cmake: define per-config build mode so that scylla_build_mode_$<CONFIG> can be referenced when necessary. we using it for referencing build mode in the building system instead of the CMake configuration name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-06 15:53:11 +08:00
Kefu Chai	7e9b0d3d9e	network_topology_strategy: use structured binding when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17642	2024-03-06 09:52:20 +02:00
Botond Dénes	c370f42d8b	Merge 'Automation of ScyllaDB backports - Phase #1 : Master → OSS backports' from Yaron Kaikov This PR includes 3 commits: - [actions] Add a check for backport labels: As part of the Automation of ScyllaDB backports project, each PR should get either a `backport/none` or `backport/X.Y` label. Based on this label we will automatically open a backport PR for the relevant OSS release. In this commit, I am adding a GitHub action to verify if such a label was added. This only applies to PR with a based branch of `master` or `next`. For releases, we don't need this check - Add Mergify (https://mergify.com/) configuration file: In this PR we introduce the `.mergify.yml` configuration file, which include a set of rules that we will use for automating our backport process. For each supported OSS release (currently 5.2 and 5.4) we have an almost identical configuration section which includes the four conditions before we open a backport pr: * PR should be closed * PR should have the proper label. for example: backport/5.4 (we can have multiple labels) * Base branch should be `master` * PR should be set with a `promoted` label - this condition will be set automatically once the commits are promoted to the `master` branch (passed gating) Once all conditions are applied, the verify bot will open a backport PR and will assign it to the author of the original PR, then CI will start running, and only after it pass. we merge - [action] Add promoted label when commits are in master: In Scylla, we don't merge our PR but use ./script/pull_github_pr.sh` to close the pull request, adding `closes scylladb/scylladb <PR number>` remark and push changes to `next` branch. One of the conditions for opening a backport PR is that all relevant commits are in `master` (passed gating), in this GitHub action, we will go through the list of commits once a push was made to `master` and will identify the relevant PR, and add `promoted` label to it. This will allow Mergify to start the process of backporting Closes scylladb/scylladb#17365 * github.com:scylladb/scylladb: [action] Add promoted label when commits are in master Add mergify (https://mergify.com/) configuration file [actions] Add a check for backport labels	2024-03-06 09:50:30 +02:00
Dawid Medrek	b36becc1f3	db/hints: Fix too_many_in_flight_hints_for The semantics of the function was accidentally modified in `6e79d64`. The consequence of the change was that we didn't limit memory consumption: the function always returned false for any node different from the local node. The returned value is used by storage_proxy to decide whether it is able to store a hint or not. This commit fixes the problem by taking other nodes into consideration again. Fixes #17636 Closes scylladb/scylladb#17639	2024-03-06 09:48:30 +02:00
Benny Halevy	08b0426318	scripts/open-coredump.sh: calculate MAIN_BRANCH before cloning repo We need MAIN_BRANCH calculated earlier so we can use it to checkout the right branch when cloning the src repo (either `master` or `enterprise`, based on the detected `PRODUCT`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17647	2024-03-06 09:46:30 +02:00
Avi Kivity	c32a4c8d5c	build: docker: clean up after docker build The `buildah commit` command doesn't remove the working container. These accumulate in ~/.local/container/storage until something bad happens. Fix by adding the `--rm` flag to remove the container and volume. Closes scylladb/scylladb#17546	2024-03-06 09:41:36 +02:00
Kefu Chai	3d8ac06ee8	cql3: add fmt::formatter for expression::printer before this change, we already have a `fmt::formatter` specialized for `expression::printer`. but the formatter was implemented by 1. formatting the `printer` instance to an `ostringstream`, and 2. extracting a `std::string` from this `ostringstream` 3. formatting the `std::string` instance to the fmt context this is convoluted and is not an optimal implementation. so, in this change, it is reimplemented by formatting directly to the context. its operator<< is also dropped in this change. please note, to avoid adding the large chunk of code into the .hh file, the implementation is put in the .cc file. but in order to preserve the usage of `transformed(fmt::to_string<expression::printer>)`, the `format()` function is defined as a template, and instantiated explicitly for two use cases: 1. to format to `fmt::context` 2. to format using `fmt::to_string()` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-05 14:00:13 +08:00
Kefu Chai	fc774361e8	cql3: add fmt::formatter for raw_value{,_view} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raw_value * raw_value_view `raw_value_view` 's operator<< is still being used by the generic homebrew printer for vector<>, so it is preserved. `raw_value` 's operator<< is still being used by the generic homebrew printer for optional<>, so it's preserved as well. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-05 14:00:13 +08:00
Kamil Braun	0a7854ea4d	Merge 'test: test_topology_ops: fix flakiness and reenable bg writes' from Patryk Jędrzejczak We decrease the server's request timeouts in topology tests so that they are lower than the driver's timeout. Before, the driver could time out its request before the server handled it successfully. This problem caused scylladb/scylladb#15924. Since scylladb/scylladb#15924 is the last issue mentioned in scylladb/scylladb#15962, this PR also reenables background writes in `test_topology_ops` with tablets disabled. The test doesn't pass with tablets and background writes because of scylladb/scylladb#17025. We will reenable background writes with tablets after fixing that issue. Fixes scylladb/scylladb#15924 Fixes scylladb/scylladb#15962 Closes scylladb/scylladb#17585 * github.com:scylladb/scylladb: test: test_topology_ops: reenable background writes without tablets test: test_topology_ops: run with and without tablets test: topology: decrease the server's request timeouts	2024-03-04 20:57:24 +01:00
Patryk Jędrzejczak	f1d9248df9	test: wait for CDC generations publishing before checking CDC-topology consistency Tests that verify upgrading to the raft-based topology (`test_topology_upgrade`, `test_topology_recovery_basic`, `test_topology_recovery_majority_loss`) have flaky `check_system_topology_and_cdc_generations_v3_consistency` calls. `assert topo_results[0] == topo_res` can fail because of different `unpublished_cdc_generations` on different nodes. The upgrade procedure creates a new CDC generation, which is later published by the CDC generation publisher. However, this can happen after the upgrade procedure finishes. In tests, if publishing happens just before querying `system.topology` in `check_system_topology_and_cdc_generations_v3_consistency`, we can observe different `unpublished_cdc_generations` on different nodes. It is an expected and temporary inconsistency. For the same reasons, `check_system_topology_and_cdc_generations_v3_consistency` can fail after adding a new node. To make the tests not flaky, we wait until the CDC generation publisher finishes its job. Then, all nodes should always have equal (and empty) `unpublished_cdc_generations`. Fixes scylladb/scylladb#17587 Fixes scylladb/scylladb#17600 Fixes scylladb/scylladb#17621 Closes scylladb/scylladb#17622	2024-03-04 19:28:51 +02:00
Kamil Braun	ec1f574b3a	test/pylib: util: silence exception from `refresh_nodes` Driver's `refresh_nodes` function may throw an exception if we call it in the middle of driver reconnecting. Silence it. Fixes scylladb/scylladb#17616 Closes scylladb/scylladb#17620	2024-03-04 17:50:16 +02:00
Avi Kivity	e3de30f943	tools: toolchain: update frozen toolchain for python driver 3.26.7 Fixes scylladb/scylladb#16709 Fixes scylladb/scylladb#17353 Closes scylladb/scylladb#17604	2024-03-03 16:36:14 +02:00
Kefu Chai	4cc5fcde72	cql3: add fmt::formatter for std::vector<data_type> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for std::vector<data_type>, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-02 10:52:50 +08:00
Kefu Chai	ed6dc6e3b4	cql3: add fmt::formatter for untyped_constant::type_class before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for untyped_constant::type_class, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-02 10:52:50 +08:00
Kefu Chai	213d13a31c	cql3: add fmt::formatter for column_identifier{,_row} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * column_identifier * column_identifier_raw and their operator<<:s are dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-02 10:52:50 +08:00
Marcin Maliszkiewicz	eb56ae3bb9	test: extend auth-v2 migration test to catch stale static	2024-03-01 16:31:04 +01:00
Marcin Maliszkiewicz	6c30dc6351	test: add auth-v2 migration test	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	53996e2557	test: add auth-v2 snapshot transfer test	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	4f65e173cf	test: auth: add tests for lost quorum and command splitting With auth-v2 we can login even if quorum is lost. So test which checks if error occurs in such situation is deleted and the opposite test which checks if logging in works was added.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	a5f81f0836	test: pylib: disconnect driver before re-connection	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	1badd09d45	test: adjust tests for auth-v2	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	ebb0ffeb6c	auth: implement auth-v2 migration During raft topology upgrade procedure data from system_auth keyspace will be migrated to system_auth_v2. Migration works mostly on top of CQL layer to minimize amount of new code introduced, it mostly executes SELECTs on old tables and then INSERTs on new tables. Writes are not executed as usual but rather announced via raft.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	a8175ce5c6	auth: remove static from queries on auth-v2 path Because keyspace is part of the query when we migrate from v1 to v2 query should change otherwise code would operate on old keyspace if those statics were initialized. Likewise keyspace name can no longer be class field initialized in constructor as it can change during class lifetime.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	ca488c5777	auth: coroutinize functions in password_authenticator Affected functions are: create, create_default_if_missing, authenticate, alter, drop	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	9f172f1843	auth: coroutinize functions in standard_role_manager Affected functions are: find_record, create_default_role_if_missing, create_or_replace, drop, modify_membership, query_all, get_attribute, set_attribute, remove_attribute	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	896b474db0	auth: coroutinize functions in default_authorizer Affected functions: authorize, list_all, revoke_all	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	5a6d4dbc37	storage_service: add support for auth-v2 raft snapshots This patch adds new RPC for pulling snapshot of auth tables.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	c27a84d8e7	storage_service: extract getting mutations in raft snapshot to a common function	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	17572a0e44	auth: service: capture string_view by value This doesn't seem to fix anything but typically we capture string_view by value, so do it consistently the same way.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	9cb1f111d5	alternator: add support for auth-v2 Alternator doesn't do any writes to auth tables so it's simply change of keyspace name. Docs will be updated later, when auth-v2 is enabled as default.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	913a773b1a	auth: add auth-v2 write paths All auth modifications will go now via group0. This is achieved by acquiring group0 guard, creating mutations without executing and then announcing them. Actually first guard is taken by query processor, it serves as read barrier for query validations (such as standard_role_manager::exists), otherwise we could read older data. In principle this single guard should be used for entire query but it's impossible to achive with current code without major refactor. For read before write cases it's good to do write with the guard acquired before the read so that there wouldn't be any modify operation allowed in between. Alought not doing it doesn't make the implementation worse than it currently is so the most complex cases were left with FIXME.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	7f204a6e80	auth: add raft_group0_client as dependency Most auth classes need this to be able to announce raft commands. Usage added in subsequent commit.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	bd444ed6f1	cql3: auth: add a way to create mutations without executing To make table modifications go via raft we need to publish mutations. Currently many system tables (especially auth) use CQL to generate table modifications. Added function is a missing link which will allow to do a seamless transition of certain system tables to raft.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	b482679857	cql3: run auth DML writes on shard 0 and with raft guard Because we'll be doing group0 operations we need to run on shard 0. Additional benefit is that with needs_guard set query_processor will also do automatic retries in case of concurrent group0 operations.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	5607aa590e	service: don't loose service_level_controller when bouncing client_state When bounce_to_shard happens we need to fill client_state with sl_controller appropriate for destination shard. Before the patch sl_controller was set to null after the bounce. It was fine becauase looks like it was never used in such scenario. With auth-v2 we need to bounce attach/detach service level statements because they modify things via auth subsystem which needs to be called on shard 0.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	e26e786340	auth: put system_auth and users consts in legacy namespace This is done to clearly mark legacy (no longer used, once auth-v2 feature becomes default) code paths.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	661eec6e07	cql3: parametrize keyspace name in auth related statements	2024-03-01 16:25:11 +01:00
Marcin Maliszkiewicz	6728965869	auth: parametrize keyspace name in roles metadata helpers	2024-03-01 16:25:03 +01:00
Marcin Maliszkiewicz	f9b985b68c	auth: parametrize keyspace name in password_authenticator It's the same approach as done for standard_role_manager in earlier commit.	2024-03-01 16:24:54 +01:00
Marcin Maliszkiewicz	1901b1c808	auth: parametrize keyspace name in standard_role_manager It's the same approach as done for default_authorizer in earlier commit. Note that only non-legacy paths were changed, in particular legacy migrations and table creations won't be ever executed in new keyspace as they will be managed by system_auth_keyspace implementation. For now we add keyspace name as class member because it's static value anyway. But statics will be removed in future commits because migration can occur and auth need to switch keyspace name in runtime.	2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz	12d7b40b34	auth: remove redundant consts auth::meta::*::qualified_name Just follow the same pattern as in default_authorizer so it's easy to track where system_auth keyspace is actually used. It will also allow for easier parametrization.	2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz	ae2d8975b9	auth: parametrize keyspace name in default_authorizer When adding group0 replication for auth we will change only write path and plan to reuse read path. To not copy the code or make more complicated class hierarchy default_authorizer's read code will remain unchanged except this parametrization, it is needed as group0 implementation uses separate keyspace (replication is defined on a keyspace level). In subsequent commits legacy write path code will be separated and new implementation placed in default_authorizer. For now we add keyspace name as class member because it's static value anyway. But statics will be removed in future commits because migration can occur and auth need to switch keyspace name in runtime.	2024-03-01 16:22:17 +01:00
Gleb Natapov	94cd235888	topology_coordinator: drop group0 guard while changing raft configuration Changing config under the guard can cause a deadlock. The guard holds _read_apply_mutex. The same lock is held by the group0 apply() function. It means that no entry can be applied while the guard is held and raft apply fiber may be even sleeping waiting for this lock to be release. Configuration change OTOH waits for the config change command to be committed before returning, but the way raft is implement is that commit notifications are triggered from apply fiber which may be stuck. Deadlock. Drop and re-take guard around configuration changes. Fixes scylladb/scylladb#17186	2024-03-01 11:20:15 +01:00
Marcin Maliszkiewicz	d3679de1d2	db: make all system_auth_v2 tables use schema commitlog	2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz	a706424825	db: add system_auth_v2 tables Their schema is equivalent to legacy tables in system_auth.	2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz	9144d8203b	db: add system_auth_v2 keyspace New keyspace is added similarly as system_schema keyspace, it's being registred via system_keyspace::make which calls all_tables to build its schema. Dummy table 'roles' is added as keyspaces are being currently registered by walking through their tables. Full table schemas will be added in subsequent commits. Change can be observed via cqlsh: cassandra@cqlsh> describe keyspaces; system_auth_v2 system_schema system system_distributed_everywhere system_auth system_distributed system_traces cassandra@cqlsh> describe keyspace system_auth_v2; CREATE KEYSPACE system_auth_v2 WITH replication = {'class': 'LocalStrategy'} AND durable_writes = true; CREATE TABLE system_auth_v2.roles ( role text PRIMARY KEY ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = 'comment' AND compaction = {'class': 'SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 604800 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE';	2024-03-01 10:40:29 +01:00
Kefu Chai	ca7f7bf8e2	build: cmake: generate deb packages under build/$<CONFIG>/debian this follows the convention of configure.py, which puts debian packages under build/$mode/debian. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-01 09:50:30 +08:00
Patryk Jędrzejczak	e7d4e080e9	test: test_topology_ops: reenable background writes without tablets After fixing scylladb/scylladb#15924 in one of the previous patches, we reenable background writes in `test_topology_ops`. We also start background writes a bit later after adding all nodes. Without this change and with tablets, the test fails with: ``` > await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)") E cassandra.protocol.ConfigurationException: <Error from server: code=2300 [Query invalid because of configuration issue] message="Datacenter datacenter1 doesn't have enough nodes for replication_factor=3"> ``` The change above makes the test a bit weaker, but we don't have to worry about it. If adding nodes is bugged, other tests should detect it. Unfortunately, the test still doesn't pass with tablets and background writes because of scylladb/scylladb#17025, so we keep background writes disabled with tablets and leave FIXME. Fixes scylladb/scylladb#15962	2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak	90317c5ceb	test: test_topology_ops: run with and without tablets `test_topology_ops` is a valuable test that has uncovered many bugs. It's worth running it with and without tablets.	2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak	9dfb26428b	test: topology: decrease the server's request timeouts We decrease the server's request timeouts in topology tests so that they are lower than the driver's timeout. Before, the driver could time out its request before the server handled it successfully. This problem caused scylladb/scylladb#15924. A high server's request timeout can slow down the topology tests (see the new comment in `make_scylla_conf`). We make the timeout dependent on the testing mode to not slow down tests for no reason. We don't touch the driver's request timeout. Decreasing it in some modes would require too much effort for almost no improvement. Fixes scylladb/scylladb#15924	2024-02-29 18:37:38 +01:00
Gleb Natapov	4ef57096bc	topology coordinator: fix use after free after streaming failure node.rs pointer can be freed while guard is released, so it cannot be accessed during error processing. Save state locally. Fixes #17577 Message-ID: <Zd9keSwiIC4v_EiF@scylladb.com>	2024-02-29 18:27:12 +02:00
Kamil Braun	57b14580f0	Merge 'move migration_request handling to shard0' from Gleb The RPC is used by group0 now which is available only on shard0 Fixes scylladb/scylladb#17565 * 'gleb/migration-request-shard0' of github.com:scylladb/scylla-dev: raft_group0_client: assert that hold_read_apply_mutex is called on shard 0 migration_manager: fix indentation after the previous patch. messaging_service: process migration_request rpc on shard 0	2024-02-29 15:13:16 +01:00
Anna Stuchlik	85cfc6059b	doc: replace "Raft Topology" with "Consistent Topology" This commit replaces "Raft-based Topology" with "Consistent Topology Updates" in the 5.4-to-6.0 upgrade guide and all the links to it.	2024-02-29 14:42:30 +01:00
Anna Stuchlik	9250e0d8e0	doc: (Raft topology) update Removenode This commit updates the Nodetool Removenode page with reference to the Raft-related topology. Specifically, it removes outdated warnings, and adds the information about banning removed and ignored nodes from the cluster.	2024-02-29 14:40:19 +01:00
Anna Stuchlik	d59f38a6ad	doc: (Raft topology) update Upscale a Cluster This commit updates the Upscale a Cluster page with reference to the Raft-related topology. Specifically, it adds a note with the quorum requirement.	2024-02-29 14:40:11 +01:00
Anna Stuchlik	5bece99d4d	doc:(Raft topology)update Membership Change Failures This commit updates the Handling Cluster Membership Change Failures page with reference to the Raft-related topology. Specifically, it adds a note that the page only applies when Raft-based topology is not enabled. In addition, it removes the Raft-enabled option.	2024-02-29 14:38:45 +01:00
Anna Stuchlik	48dd7021a7	doc: doc: (Raft topology) update Replace Dead Node This commit updates the Replace a Dead Node page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to replace the nodes one by one and the requirement to ensure that the the replaced node will never come back to the cluster In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:38:45 +01:00
Anna Stuchlik	a390ce9e6b	doc: (Raft topology) update Remove a Node This commit updates the Remove a Node page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to remove the nodes one by one and the requirement to ensure that the the removed node will never come back to the cluster In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:38:45 +01:00
Anna Stuchlik	59f890c0ef	doc: (Raft topology) update Add a New DC This commit updates the Add a New DC) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:38:36 +01:00
Anna Stuchlik	5a3a720b82	doc: (Raft topology) update Add a New Node This commit updates the Add a New Node (Out Scale) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:35:03 +01:00
Anna Stuchlik	631fcebe12	doc: (Raft topology) update Create Cluster (EC2) This commit updates the Create Cluster (EC2) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, it updates the concept of the seed node.	2024-02-29 14:30:00 +01:00
Anna Stuchlik	b6b610c16e	doc: (Raft topology) update Create Cluster (n-DC) This commit updates the Create Cluster (Multi DC) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, it updates the concept of the seed node.	2024-02-29 14:30:00 +01:00
Anna Stuchlik	cbf054f2b9	doc: (Raft topology) update Create Cluster (1DC) This commit updates the Create Cluster (Single DC) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, it updates the concept of the seed node.	2024-02-29 14:30:00 +01:00
Anna Stuchlik	57e0f15c7c	doc: include the quorum requirement file Include the file to avoid repetition.	2024-02-29 14:29:39 +01:00
Gleb Natapov	9847e272f9	raft_group0_client: assert that hold_read_apply_mutex is called on shard 0 group0 operations a valid on shard 0 only. Assert that.	2024-02-29 12:39:48 +02:00
Gleb Natapov	77907b97f1	migration_manager: fix indentation after the previous patch.	2024-02-29 12:39:48 +02:00
Gleb Natapov	4a3c79625f	messaging_service: process migration_request rpc on shard 0 Commit `0c376043eb` added access to group0 semaphore which can be done on shard0 only. Unlike all other group0 rpcs (that already always forwarded to shard0) migration_request does not since it is an rpc that what reused from non raft days. The patch adds the missing jump to shard0 before executing the rpc.	2024-02-29 12:39:48 +02:00
Petr Gusev	6afa80a443	sync_raft_topology_nodes: do no emit REMOVED_NODE on IP change Calling notify_left for old ip on topology change in raft mode was a regression. In gossiper mode it didn't occur. In gossiper mode the function handle_state_normal was responsible for spotting IP addresses that weren't managing any parts of the data, and it would then initiate their removal by calling remove_endpoint. This removal process did not include calling notify_left. Actually, notify_left was only supposed to be called (via excise) by a 'real' removal procedures - removenode and decommission. The redundant notify_left caused troubles in scylla python driver. The driver could receive REMOVED_NODE and NEW_NODE notifications in the same time and their handling routines could race with each other. In this commit we fix the problem by not calling notify_left if the remove_ip lambda was called from the ip change code path. Also, we add a test which verifies that the driver log doesn't mention the REMOVED_NODE notification. Fixes scylladb/scylladb#17444 Closes scylladb/scylladb#17561	2024-02-29 10:18:20 +01:00
Kefu Chai	ce45f93caf	tools/scylla-nodetool: print separator between samplings instead of printing it out after samplings, we should print it in between them. as toppartitions_test.py in dtest splits the samplings using "\n\n". without this change, dtest would consider the empty line as another sampling and then fail the test. as the empty sampling does not match with the expected regular expressions. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-29 16:17:44 +08:00
Kefu Chai	a53457f740	tools/scylla-nodetool: only print the specified sampling before this change, we print all samplings returned by the API, but this is not what cassandra nodetool's behavior, which only prints out the specified one. and the toppartitions_test.py in dtest actually expects that the number of sampling should match with the one specified with command line. so, in this change, we only print out the specified samplings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-29 16:17:44 +08:00
Kefu Chai	604c7440d2	tools/scylla-nodetool: use /storage_service/toppartition/ instead of using the endpoint of /storage_service/toppartition, use /storage_service/toppartition/. otherwise API server refuses to return the expected result. as it does match with any API endpoint. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-29 16:17:44 +08:00
Anna Stuchlik	b02f8a0759	doc: add the quorum requirement file	2024-02-28 13:21:11 +01:00
Botond Dénes	60e04e2c59	test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables Memtables are fickle, they can be flushed when there is memory pressure, if there is too much commitlog or if there is too much data in them. The tests in test_select_from_mutation_fragments.py currently assume data written is in the memtable. This is tru most of the time but we have seen some odd test failures that couldn't be understood. To make the tests more robust, flush the data to the disk and read it from the sstables. This means that some range scans need to filter to read from just a single mutation source, but this does not influence the tests.	2024-02-28 07:00:25 -05:00
Botond Dénes	c228e4d518	cql3: select_statement: mutation_fragments_select_statement: fix use-after-return Don't capture stack variables by reference... it can (and will) explode in your face.	2024-02-28 06:48:09 -05:00
Kefu Chai	9dbc30a385	build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep create-relocatable-package.py packages debian packaging as well, so we have to add it as a dependency for the targets which uses `create-relocatable-package.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-28 16:09:48 +08:00
Kefu Chai	a1cd019e50	build: cmake: use build/$<CONFIG>/ instead of build with multi-config generator, the generated artifacts are located under ${CMAKE_BINARY_DIR}/$<CONFIG>/ instead of ${CMAKE_BINARY_DIR}. so update the paths referencing the built executables. and update the `--build-dir` option of `create-relocatable-package.py` accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-28 16:09:48 +08:00
Kefu Chai	bf9a895c09	build: cmake: always pass absolute path for add_stripped() before this change, we assumed that the $<TARGET_FILE:${name} is the path to the parameter passed to this function, but this was wrong. it actually refers the `TARGET` argument of the keyword of this function. also, the path to the generated files should be located under path like "build/Debug" instead of "build" if multi-config generator is used. as multi-config builds share the same `${CMAKE_BINARY_DIR}`. in this change, instead of acccepting a CMake target, we always accept an absolute path. and use ""${CMAKE_BINARY_DIR}/$<CONFIG>" for the directory of the executable, this should work for multi-config generator which it is used by `configure.py`, when CMake is used to build the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-28 16:09:48 +08:00
Raphael S. Carvalho	305c63c629	test: test_tablets: Add load-and-stream test stresses concurrent migration and stream. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 15:18:21 -03:00
Raphael S. Carvalho	771cbf9b79	sstables_loader: Stream to pending tablet replica if needed Even though taking erm blocks migration, it cannot prevent the load-and-stream to start while a migration is going on, erm only prevents migration from advancing. With tablets, new data will be streamed to pending replica too if the write replica selector, in transition metadata, is set to both. If migration is at a later stage where only new replica is written to, then data is streamed only to new replica as selector is set to next (== new replica set). primary_replica_only flag is handled by only streaming to pending if the primary replica is the one leaving through migration. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 15:17:05 -03:00
Avi Kivity	616eec2214	Merge ' test/topology_custom: test_read_repair.py: reduce run-time ' from Botond Dénes This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time. * Changes query-tombstone-page-limit 1000 -> 10. Before `f068d1a6fa`, reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries. * Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB. The latter configuration is a new one, added by the first patches of this series. It allows configuring the page-size in bytes, after which pages are cut. Previously this was a hard-coded constant: 1MB. This forced any tests which wanted to check paging, with pages cut on size, to work with large datasets. This was especially pronounced in the tests fixed in this PR, because this test works with tombstones which are tiny and a lot of them were needed to trigger paging based on the size. With this two changes, we can reduce the data size: * total_rows: 20000 -> 100 * max_live_rows: 32 -> 8 The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine). Fixes: https://github.com/scylladb/scylladb/issues/15425 Fixes: https://github.com/scylladb/scylladb/issues/16899 Closes scylladb/scylladb#17529 * github.com:scylladb/scylladb: test/topology_custom: test_read_repair.py: reduce run-time replica/database: get_query_max_result_size(): use query_page_size_in_bytes replica/database: use include page-size in max-result-size query-request: max_result_size: add without_page_limit() db/config: introduce query_page_size_in_bytes	2024-02-27 18:54:38 +02:00
Aleksandra Martyniuk	9dcb5c76d6	test: rest_api: enable tablets by default Enable tablets by default. Add --vnodes flag to test/rest_api/run to run tests without tablets.	2024-02-27 17:46:30 +01:00
Aleksandra Martyniuk	92d87eb1f7	test: fix indentation and delete unused this_dc param	2024-02-27 17:37:31 +01:00
Aleksandra Martyniuk	9cca241ec6	test: rest_api: fix test_storage_service.py Fix test_storage_service.py to work with tablets. - test_describe_ring was failing because in storage_service/describe_ring table must be specified for keyspaces with tablets. Do not check the status if tablets are enabled. Add checks for specified table; - test_storage_service_keyspace_cleanup_with_no_owned_ranges was failing because cleanup is disabled on keyspaces with tablets. Use test_keyspace_vnodes fixture to use keyspace with tablet disabled; - test_storage_service_get_natural_endpoints required some minor type-related fixes.	2024-02-27 17:34:40 +01:00
Aleksandra Martyniuk	aee0257051	test: rest_api: fix test_repair_task.py Injection set in test_repair_task_progress didn't consider the case when repair::shard_repair_task_impl::ranges_size() == 1 which is true for tablets. Move the injection so that it is triggered before number of complete ranges is increased.	2024-02-27 17:33:59 +01:00
Aleksandra Martyniuk	6210c210ff	test: rest_api: fix test_compaction_task.py Fix test_compaction_task.py to work with tablets. Currently test fail because cleanup on keyspace with tablets is disabled, and reshape and reshard of keyspace with tablets uses load_and_stream which isn't covered by tasks. Use test_keyspace_vnodes for these tests to have a keyspace with tablets disabled.	2024-02-27 17:32:24 +01:00
Aleksandra Martyniuk	a996ed8be9	test: rest_api: use skip_without_tablets fixture Use skip_without_tablets in tests that can be run only with tablets enabled. Delete xfails for these tests.	2024-02-27 17:12:04 +01:00
Aleksandra Martyniuk	1fbe76814e	test: rest_api: add some tablet related fixtures Add fixtures for checking if tablets are enabled or skipping a test if they are/aren't enabled.	2024-02-27 17:11:57 +01:00
Raphael S. Carvalho	ab498489fe	sstables_loader: Implement tablet based load-and-stream Similar treatment to repair is given to load-and-stream. Jumps into a new streaming session for every tablet, so we guarantee data will be segregated into tablets co-habiting the same shard. Fixes #17315. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 13:04:20 -03:00
Nadav Har'El	fc861742d7	cql: avoid undefined behavior in totimestamp() of extreme dates This patch fixes a UBSAN-reported integer overflow during one of our existing tests, test_native_functions.py::test_mintimeuuid_extreme_from_totimestamp when attempting to convert an extreme "date" value, millions of years in the past, into a "timestamp" value. When UBSAN crashing is enabled, this test crashes before this patch, and succeeds after this patch. The "date" CQL type is 32-bit count of days from the epoch, which can span 2^31 days (5 million years) before or after the epoch. Meanwhile, the "timestamp" type measures the number of milliseconds from the same epoch, in 64 bits. Luckily (or intentionally), every "date", however extreme, can be converted into a "timestamp": This is because 2^31 days is 1.85e17 milliseconds, well below timestamp's limit of 2^63 milliseconds (9.2e18). But it turns out that our conversion function, date_to_time_point(), used some boost::gregorian library code, which carried out these calculations in microsecond resolution. The extra conversion to microseconds wasn't just wasteful, it also caused an integer overflow in the extreme case: 2^31 days is 1.85e20 microseconds, which does NOT fit in a 64-bit integer. UBSAN notices this overflow, and complains (plus, the conversion is incorrect). The fix is to do the trivial conversion on our own (a day is, by convention, exactly 86400 seconds - no fancy library is needed), without the grace of Boost. The result is simpler, faster, correct for the Pliocene-age dates, and fixes the UBSAN crash in the test. Fixes #17516 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17527	2024-02-27 17:04:18 +02:00
Raphael S. Carvalho	b9158e36ef	sstables_loader: Virtualize sstable_streamer for tablet virtualization allows for tablet version of streaming. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:30:14 -03:00
Raphael S. Carvalho	3523cc8063	sstables_loader: Avoid reallocations in vector Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	d1db17d490	sstable_loader: Decouple sstable streaming from selection That will make it easy to introduce tablet-based load-and-stream. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	0a41f2a11f	sstables_loader: Introduce sstable_streamer Will make it easier to implement tablet oriented variant. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	21533aff0f	Fix online SSTable loading with concurrent tablet migration load-and-stream is currently the only method -- for tablets -- that can load SSTables while the node is online. Today, sstable_directory relies on replication map (erm) not being invalidated during loading, and the assumption is broken with concurrent tablet migration. It causes load-and-stream to segfault. The sstable loader needs the sharder from erm in order to compute the owning shard. To fix, let's use auto_refreshing_sharder, which refreshes sharder every time table has replication map updated. So we guarantee any user of sharder will find it alive throughout the lifetime of sstable_directory. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:27:07 -03:00
Gleb Natapov	0c376043eb	migration_manager: take group0 lock during raft snapshot taking Group0 state machine access atomicity is guaranteed by a mutex in group0 client. A code that reads or writes the state needs to hold the log. To transfer schema part of the snapshot we used existing "migration request" verb which did not follow the rule. Fix the code to take group0 lock before accessing schema in case the verb is called as part of group0 snapshot transfer. Fixes scylladb/scylladb#16821	2024-02-27 11:15:17 +01:00
Botond Dénes	5dc145a93f	test/topology_custom: test_read_repair.py: reduce run-time This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time. * Changes query-tombstone-page-limit 1000 -> 10. Before `f068d1a6fa`, reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries. * Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB. With this two changes, we can reduce the data size: * total_rows: 20000 -> 100 * max_live_rows: 32 -> 8 The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine).	2024-02-27 02:27:55 -05:00
Botond Dénes	7f3ca3a3d8	replica/database: get_query_max_result_size(): use query_page_size_in_bytes As the page size for user queries, instead of the hard-coded constant used before. For system queries, we keep using the previous constant.	2024-02-27 02:27:55 -05:00
Botond Dénes	8213e66815	replica/database: use include page-size in max-result-size This patch changes get_unlimited_query_max_result_size(): * Also set the page-size field, not just the soft/hard limits * Renames it to get_query_max_result_size() * Update callers, specifically storage_proxy::get_max_result_size(), which now has a much simpler common return path and has to drop the page size on one rare return path. This is a purely mechanical change, no behaviour is changed.	2024-02-27 02:27:55 -05:00
Botond Dénes	97615e0d9a	query-request: max_result_size: add without_page_limit() Returns an instance with the page_limit reset to 0. This converts a max_results_size which is usable only with the "page_size_and_safety_limit" feature, to one which can be used before this feature. To be used in the next patch.	2024-02-27 02:14:46 -05:00
Botond Dénes	5e37c1465f	db/config: introduce query_page_size_in_bytes Regulates the page size in bytes via config, instead of the currently used hard-coded constant. Allows tests to configure lower limits so they can work with smaller data-sets when testing paging related functionality. Not wired yet.	2024-02-27 02:14:45 -05:00
Kefu Chai	0fd85a98a9	mutation: add fmt::formatter for position_range before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `position_range`, and the helpers for printing related types are dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 20:15:57 +08:00
Kefu Chai	2f532b9ebc	mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * mutation_fragment * range_tombstone_stream their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 20:15:57 +08:00
Beni Peled	c06282b312	docs: always build from the default branch In order to publish the docs-pages from release branches (see the other commit), we need to make sure that docs is always built from the default branch which contains the updated conf.py Ref https://github.com/scylladb/scylladb/pull/17281	2024-02-26 11:48:38 +02:00
Beni Peled	f59f70fc58	docs: trigger the docs-pages workflow on release branches Currently, the github docs-pages workflow is triggered only when changes are merged to the master/enterprise branches, which means that in the case of changes to a release branch, for example, a fix to branch-5.4, or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and therefore the documentation is not updated with the new change, In this change, I added the `branch-**` pattern, so changes to release branches will trigger the workflow.	2024-02-26 11:48:13 +02:00
Kefu Chai	1fe7a467e7	mutation: add fmt::formatter for mutation_fragment_v2::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation_fragment_v2::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 17:47:05 +08:00
Kefu Chai	3d6948c13e	tools/scylla-nodetool: implement info Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 14:52:22 +08:00
Kefu Chai	4d8f74f301	test/nodetool: move format_size into utils.py so that this helper can be shared across more tests. `test_info.py` will be using it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 14:52:22 +08:00
Kefu Chai	cd228f4d6c	docs: remove leading space in table element otherwise sphinx would consider "Within which Data Center the" as the "term" part of an entry in a definition list, and "node is located" as the definition part of this entry. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 13:03:26 +08:00
Kefu Chai	d12655ff46	docs: remove space in words * remove space in "Exceptions", otherwise it renders like "Except" "tions", which does not look right. * remove space in "applicable". * remove space in "Transport". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 13:03:26 +08:00
Kamil Braun	fd32e2ee10	Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc. The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0. The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0. There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A: 1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted, 2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state, 3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state, 4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`, 5. Shard 1 resumes and stores version `v` in the global counter, 6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`. 7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4. Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by: 1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual, 2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value, 3. running a test which performs schema changes in a loop. The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway. Refs: #17493 Closes scylladb/scylladb#17499 * github.com:scylladb/scylladb: version_generator: check that get_next_version is called on shard 0 misc_services: fix data race from bad usage of get_next_version	2024-02-25 19:35:34 +01:00
Gleb Natapov	59df47920b	topology coordinator: fix use after free in rollback_to_normal state node.rs pointer can be freed while guard is released, so it cannot be accessed during error processing. Save state locally. Fixes scylladb/scylladb#17402 CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6993/ Message-ID: <ZdtJNJM056r4EZzz@scylladb.com>	2024-02-25 16:34:19 +02:00
Raphael S. Carvalho	f07c233ad5	Fix potential data resurrection when another compaction type does cleanup work Since commit `f1bbf70`, many compaction types can do cleanup work, but turns out we forgot to invalidate cache on their completion. So if a node regains ownership of token that had partition deleted in its previous owner (and tombstone is already gone), data can be resurrected. Tablet is not affected, as it explicitly invalidates cache during migration cleanup stage. Scylla 5.4 is affected. Fixes #17501. Fixes #17452. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17502	2024-02-25 13:08:04 +02:00
Yaron Kaikov	493327afd8	[action] Add promoted label when commits are in master In Scylla, we don't merge our PR but use ./script/pull_github_pr.shto close the pull request, addingcloses scylladb/scylladb remark and push changes tonext` branch. One of the conditions for opening a backport PR is that all relevant commits are in master (passed gating), in this GitHub action, we will go through the list of commits once a push was made to master and will identify the relevant PR, and add promoted label to it. This will allow Mergify to start the process of backporting	2024-02-25 11:56:50 +02:00
Nadav Har'El	b4cef638ef	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * canonical_mutation * atomic_cell_view * atomic_cell * atomic_cell_or_collection::printer Refs #13245 Closes scylladb/scylladb#17506 * github.com:scylladb/scylladb: mutation: add fmt::formatter for canonical_mutation mutation: add fmt::formatter for atomic_cell_view and atomic_cell mutation: add fmt::formatter for atomic_cell_or_collection::printer	2024-02-25 09:48:56 +02:00
Kefu Chai	84ba624415	mutation: add fmt::formatter for canonical_mutation before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for canonical_mutation Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:48:13 +08:00
Kefu Chai	3625796222	mutation: add fmt::formatter for atomic_cell_view and atomic_cell before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * atomic_cell_view * atomic_cell and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:19:11 +08:00
Kefu Chai	b4fa32ec17	mutation: add fmt::formatter for atomic_cell_or_collection::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `atomic_cell_or_collection::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:18:41 +08:00
Lakshmi Narayanan Sreethar	c7eab9329f	test/topology_custom: add testcase to verify reshape with tablets Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar	ed2d8529f3	test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar	7196d2fff4	replica/distributed_loader: enable reshape for sstables Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar	83fecc2f1f	compaction: reshape sstables within compaction groups For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. Updated shard_reshaping_compaction_task_impl to group the sstables based on their compaction groups before reshaping them within the groups. Fixes #16966 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Piotr Dulikowski	54546e1530	version_generator: check that get_next_version is called on shard 0 The get_next_version function can only be safely called from shard 0, but this constraint is not enforced in any way. As evidenced in the previous commit, it is easy to accidentally call it from a non-zero shard. Introduce a runtime check to get_next_version which calls on_fatal_internal_error if it detects that the function was called form the wrong shard. This will let us detect cross-shard use issues in runtime.	2024-02-23 13:49:49 +01:00
Piotr Dulikowski	21d5d4e15c	misc_services: fix data race from bad usage of get_next_version The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc. The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0. Fix the race by constructing the versioned value on shard 0.	2024-02-23 12:54:32 +01:00
Kefu Chai	496cf9a1d8	interval: add fmt::formatters for managed_bytes and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * wrapping_interval * interval Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17488	2024-02-23 10:26:30 +02:00
Nadav Har'El	0aaa6b1a08	fmt: add formatter for mutation_fragment_v2::kind Unfortunately, fmt v10 dropped support for operator<< formatters, forcing us to replace the huge number of operator<< implementations in our code by uglier and templated fmt::formatter implementations to get Scylla to compile on modern distros (such as Fedora 39) :-( Kefu has already started doing this migration, here is my small contribution - the formatter for mutation_fragment_v2::kind. This patch is need to compile, for example, build/dev/mutation/mutation_fragment_stream_validator.o. I can't remove the old operator<< because it's still used by the implementation of other operator<< functions. We can remove all of them when we're done with this coversion. In the meantime, I replaced the original implementation of operator<< by a trivial implementation just passing the work to the new fmt::print support. Refs #13245 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17432	2024-02-23 10:25:39 +02:00
Botond Dénes	c1267900c6	Merge 'sstables: add fmt::formatter for sstable types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * bound_kind_m * sstable_state * indexable_element * deletion_time drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17490 * github.com:scylladb/scylladb: sstables: add fmt::formatter for deletion_time sstable: add fmt::formatter for indexable_element sstables: add fmt::foramtter for sstable_state sstables: add fmt::formatter for sstables::bound_kind_m	2024-02-23 10:09:26 +02:00
Botond Dénes	89efa89dd7	Merge 'test: add fmt::formatters' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for some types used in testing. Refs #13245 Closes scylladb/scylladb#17485 * github.com:scylladb/scylladb: test/unit: add fmt::formatter for tree_test_key_base test: add printer for type for BOOST_REQUIRE_EQUAL test: add fmt::formatters test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result	2024-02-23 09:32:39 +02:00
Botond Dénes	1f363a876e	Merge 'utils: add fmt::formatter for occupancy_stats, managed_bytes and friends ' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * managed_bytes * managed_bytes_view * managed_bytes_opt * occupancy_stats and drop their operator<<:s Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17462 * github.com:scylladb/scylladb: utils/managed_bytes: add fmt::formatters for managed_bytes and friends utils/logalloc: add fmt::formatter for occupancy_stats	2024-02-23 09:31:22 +02:00
Botond Dénes	d314ad2725	Merge 'sstables: close index_reader in has_partition_key' from Aleksandra Martyniuk If index_reader isn't closed before it is destroyed, then ongoing sstables reads won't be awaited and assertion will be triggered. Close index_reader in has_partition_key before destroying it. Fixes: #17232. Closes scylladb/scylladb#17355 * github.com:scylladb/scylladb: test: add test to check if reader is closed sstables: close index_reader in has_partition_key	2024-02-23 09:27:55 +02:00
Kefu Chai	010fb5f323	tools/scylla-nodetool: make keyspace argument optional for "ring" the "keyspace" argument of the "ring" command is optional. but before this change, we considered it a mandatory option. it was wrong. so, in this change, we make it optional, and print out the warning message if the keyspace is not specified. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17472	2024-02-23 09:25:29 +02:00
Kefu Chai	6800810dba	interval, multishard_mutation_query: fix typos in comments these misspellings were identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17491	2024-02-23 09:06:24 +02:00
Botond Dénes	a08d9ba2a4	Merge 'tools/scylla-nodetool: fixes to address test failures with dtest' from Kefu Chai * tighten the param check for toppartitions * add an extra empty line inbetween reports Closes scylladb/scylladb#17486 * github.com:scylladb/scylladb: tools/scylla-nodetool: add an extra empty line inbetween reports tools/scylla-nodetool: tighten the param check for toppartitions	2024-02-23 09:05:30 +02:00
Botond Dénes	959d33ba39	Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk RPC calls lose information about the type of returned exception. Thus, if a table is dropped on receiver node, but it still exists on a sender node and sender node streams the table's data, then the whole operation fails. To prevent that, add a method which synchronizes schema and then checks, if the exception was caused by table drop. If so, the exception is swallowed. Use the method in streaming and repair to continue them when the table is dropped in the meantime. Fixes: #17028. Fixes: #15370. Fixes: #15598. Closes scylladb/scylladb#17231 * github.com:scylladb/scylladb: repair: handle no_such_column_family from remote node gracefully test: test drop table on receiver side during streaming streaming: fix indentation streaming: handle no_such_column_family from remote node gracefully repair: add methods to skip dropped table	2024-02-23 08:25:45 +02:00
Kefu Chai	3574c22d73	test/nodetool/utils: print out unmatched output on test failure would be more helpful if the matched could print out the unmatched output on test failure. so, in this change, both stdout and stderr are printed if they fail to match with the expected error. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17489	2024-02-23 08:20:30 +02:00
Botond Dénes	234aa99aaa	Merge 'tools/scylla-nodetool: extract and use {yaml,json}_writers' from Kefu Chai simpler this way. Closes scylladb/scylladb#17437 * github.com:scylladb/scylladb: tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation tools/scylla-nodetool: add {json,yaml}_writer	2024-02-23 08:13:07 +02:00
Kefu Chai	3a3f0d392f	gms/versioned_value: impl operator<<(.., const gms::versioned_value) using fmt less repeatings this way. this is also a follow-up change of `cb781c0ff7`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17390	2024-02-23 08:11:03 +02:00
Kefu Chai	62abf89312	sstables: add fmt::formatter for deletion_time before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::deletion_time`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:56:32 +08:00
Kefu Chai	a5a757387a	sstable: add fmt::formatter for indexable_element before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::indexable_element`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:56:28 +08:00
Kefu Chai	5754b9eb08	sstables: add fmt::foramtter for sstable_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::sstable_state`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:55:49 +08:00
Kefu Chai	9a32029a8f	sstables: add fmt::formatter for sstables::bound_kind_m before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::bound_kind_m`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:55:22 +08:00
Kefu Chai	67c69be3c6	tools/scylla-nodetool: add an extra empty line inbetween reports before this change, `toppartitions` does not print an empty line after an empty sampling warning message. but dtest/toppartitions_test.py actually split sampling reports with two newlines, so let's appease it. the output also looks better this way, as the samplings for READS and WRITES are always visually separated with an empty line. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 12:57:51 +08:00
Kefu Chai	381c389b56	tools/scylla-nodetool: tighten the param check for toppartitions the test cases of `test_any_of_required_parameters_is_missing` considers that we should either pass all positional argument or pass none of them, otherwise nodetool should fail. but `scylla nodetool` supported partial positional argument. to be more consistent with the expected behavior, in this change, we enforce the sanity check so that we only accept either all positional args or none of them. the corresponding test is added. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 12:57:51 +08:00
Kefu Chai	3835ebfcdc	utils/managed_bytes: add fmt::formatters for managed_bytes and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * managed_bytes * managed_bytes_view * managed_bytes_opt Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 11:32:41 +08:00
Kefu Chai	3d9054991b	utils/logalloc: add fmt::formatter for occupancy_stats before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `occupancy_stats`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 11:32:41 +08:00
Avi Kivity	bf107dae84	test/unit: add fmt::formatter for tree_test_key_base before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for the classes derived from `tree_test_key_base` (this change was extracted from a larger change at #15599) Refs #13245	2024-02-23 10:52:12 +08:00
Kefu Chai	a70318e722	test: add printer for type for BOOST_REQUIRE_EQUAL after dropping the operator<< for vector, we would not able to use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this, less defined the printer for Boost.test Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:52:12 +08:00
Kefu Chai	63396f780d	test: add fmt::formatters the operator<< for `cql3::expr::test_utils::mutation_column_value` is preserved, as it used by test/lib/expr_test_utils.cc, which prints std::map<sstring, cql3::expr::test_utils::mutation_column_value> using the homebrew generic formatter for std::map<>. and the formatter uses operator<< for printing the elements in map. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:52:12 +08:00
Kefu Chai	2ccd9e695d	test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * scheduling_latency_measurer * perf_result and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:17:50 +08:00
Lakshmi Narayanan Sreethar	c76871aa65	replica/table : add method to get compaction group id for an sstable Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar	9fffd8905f	compaction: reshape: update total reshaped size only on success The total reshaped size should only be updated on reshape success and not after reshape has been failed due to some exception. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar	4fb099659a	compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run Catch and handle the exceptions directly instead of rethrowing and catching again. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 01:07:54 +05:30
Pavel Emelyanov	5682e51a97	test.py: Add test-case splitting in 'name' selection When filtering a test by 'name' consider that name can be in a 'test::case' format. If so, get the left part to be the filter and the right part to be the case name to be passed down to test itself. Later, when the pytest starts it then appends the case name (if not None) to the pytest execution, thus making it run only the specified test-case, not the whole test file. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 19:24:10 +03:00
Pavel Emelyanov	b64710b0c6	test.py: Add casename argument to PythonTest And propagate it from add_test() helper. For now keep it None, next patch will bring more sense to this place Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 19:23:06 +03:00
Amnon Heiman	8859b4d991	Adding scripts/metrics-config.yml The scripts/metrics-config.yml is a configuration file used by get_description.py. It covers the places in the code that uses non-standard way of defining metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-02-22 17:15:30 +02:00
Amnon Heiman	4e67a98a21	Adding scripts/get_description.py to fetch metrics description The get_description script parse a c++ file and search of metrics decleration and their description. It create a pipe delimited file with the metric name, metric family name,description and location in file. To find all description in all files: find . -name "*.cc" -exec grep -l '::description' {} \; \| xargs -i ./get_description.py {} While many of the metrics define in the form of _metrics.add_group("hints_manager", { sm::make_gauge("size_of_hints_in_progress", _stats.size_of_hints_in_progress, sm::description("Size of hinted mutations that are scheduled to be written.")), Some metrics decleration uses variable and string format. The script uses a configuration file to translate parameters and concatenations to the actual names. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-02-22 17:06:26 +02:00
Anna Stuchlik	14a4fa16a8	doc: add placeholder for Enable Raft topology page This commit adds a placeholder for the Enable Raft-based Topology page in the 5.4-to-6.0 upgrade guide. This page needs to be referenced from other pages in the docs.	2024-02-22 16:02:06 +01:00
Pavel Emelyanov	5afaa03241	test/object_store: Remove unused managed_cluster (and other stuff) Now all test cases use pylib manager client to manipulate cluster While at it -- drop more unused bits from suite .py files Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:40:25 +03:00
Kefu Chai	57c408ab5d	alternator: add fmt::formatter for alternator::parsed::path before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `alternator::parsed::path`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17458	2024-02-22 16:40:01 +02:00
Pavel Emelyanov	95ed46e26a	test/object_store: Use tmpdir fixture in flush-retry case Now when the test case in question is not using ManagerCluster, there's no point in using test_tempdir either and the temporary object-store config can be generated in generic temporary directory Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:39:30 +03:00
Pavel Emelyanov	252688fe0c	test/object_store: Turn flush-retry case to use ManagerClient In the middle this test case needs to force scylla server reload its configs. Currently manager API requires that some existing config option is provided as an argument, but in this test case scylla.yaml remains intact. So it satisfies the API with non-chaning option. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Pavel Emelyanov	e742906f1f	test/object_store: Turn "misconfigured" case to use ManagerClient Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Pavel Emelyanov	857b48f950	test/object_store: Turn garbage-collect case to use ManagerClient Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Pavel Emelyanov	d27b91cfb4	test/object_store: Turn basic case to use ManagerClient This case is a bit tricky, as it needs to know where scylla's workdir is, so it replaces the use of test_tempdir with the call to manager to get server's workdir. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Avi Kivity	67f8dc5a7c	Merge 'mutation: add fmt::formatter for clustering_row, row_tombstone and friends' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * row_tombstone * row_marker * deletable_row::printer * row::printer * clustering_row::printer * static_row::printer * partition_start * partition_end * mutation_fragment::printer and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17461 * github.com:scylladb/scylladb: mutation: add fmt::formatter for clustering_row and friends mutation: add fmt::formatter for row_tombstone and friends	2024-02-22 16:16:26 +02:00
Pavel Emelyanov	89d0704d9b	test/object_store: Prepare to work with ManagerClient This includes - marking the suite as Topology - import needed fixtures and options from topology conftest - configuring the zero initial cluster size and anonymous auth - marking all test cases as skipped, as they no longer work after above Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:02:05 +03:00
Aleksandra Martyniuk	4530be9e5b	test: add test to check if reader is closed Add test to check if reader is closed in sstable::has_partition_key.	2024-02-22 14:53:14 +01:00
Aleksandra Martyniuk	5227336a32	sstables: close index_reader in has_partition_key If index_reader isn't closed before it is destroyed, then ongoing sstables reads won't be awaited and assertion will be triggered. Close index_reader in has_partition_key before destroying it.	2024-02-22 14:53:07 +01:00
Yaron Kaikov	6d07f7a0ea	Add mergify (https://mergify.com/) configuration file In this PR we introduce the .mergify.yml configuration file, which include a set of rules that we will use for automating our backport process. For each supported OSS release (currently 5.2 and 5.4) we have an almost identical configuration section which includes the four conditions before we open a backport pr: * PR should be closed * PR should have the proper label. for example: backport/5.4 (we can have multiple labels) * Base branch should be master * PR should be set with a promoted label - this condition will be set automatically once the commits are promoted to the master branch (passed gating) Once all conditions are applied, the verify bot will open a backport PR and will assign it to the author of the original PR, then CI will start running, and only after it pass. we merge	2024-02-22 14:28:08 +02:00
Nadav Har'El	b0233c0833	Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias. Closes scylladb/scylladb#17455 * github.com:scylladb/scylladb: interval: rename nonwrapping_interval to interval interval: rename interval_test to wrapping_interval_test	2024-02-22 14:03:43 +02:00
Kefu Chai	8afdc503b8	cdc: s/string_view/std::string_view/ in `af2553e8`, we added formatters for cdc::image_mode and cdc::delta_mode. but in that change, we failed to qualify `string_view` with `std::` prefix. even it compiles, it depends on a `using std::string_view` or a more error-prone `using namespace std`. neither of which shold be relied on. so, in this change, we add the `std::` prefix to `string_view`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17459	2024-02-22 13:49:19 +02:00
Avi Kivity	35b700a884	Merge 'compaction: add fmt::formatter for types' from Kefu Chai * `sstables::compaction_type` * `sstables::compaction_type_options::scrub::mode` * `sstables::compaction_type_options::scrub::quarantine_mode` * `formatted_sstables_list` Refs #13245 Closes scylladb/scylladb#17439 * github.com:scylladb/scylladb: compaction: add formatter for formatted_sstables_list compaction: add fmt::formatter for compaction_type and friends	2024-02-22 13:48:30 +02:00
Pavel Emelyanov	027282ee07	perf_simple_query: Add --memtable-partitions option There's the --partitions one that specifies how many partitions the test would generate before measuring. When --bypass-cache option is in use, thus making the test alway engage sstables readers, it makes sense to add some control over sstables granularity. The new option suggests that during population phase, memtable gets flushed every $this-number partitions, not just once at the end (and unknown amount of times in the middle because of dirty memory limit). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 14:44:17 +03:00
Pavel Emelyanov	fd4c2e607e	perf_simple_query: Disable auto compaction Usually a perf test doesn't expect that some activity runs in the background without controls. Compaction is one of a kind, so it makes sense to keep it off while running the measurement. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 14:43:23 +03:00
Pavel Emelyanov	74899f71de	perf_simple_query: Keep number of initial tablets in output json When producing the output json file, keep how many initial tablets were requested (if at all) next to other workload parameters Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 14:42:39 +03:00
Kefu Chai	643c01fd80	locator: fix typo in comment -- s/slecting/selecting/ fix a typo Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17470	2024-02-22 13:28:18 +02:00
Avi Kivity	89f86962f5	Merge 'streaming: add fmt::formatter for stream_session_state and stream_request' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * `streaming::stream_request`, * `stream_session_state` and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17464 * github.com:scylladb/scylladb: streaming: add fmt::formatter for streaming::stream_request streaming: add fmt::formatter for stream_session_state	2024-02-22 13:04:02 +02:00
Kefu Chai	5c0952ab59	compaction: add fmt::formatter for compaction_type and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * `sstables::compaction_type` * `sstables::compaction_type_options::scrub::mode` * `sstables::compaction_type_options::scrub::quarantine_mode`` and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17441	2024-02-22 13:02:37 +02:00
Kamil Braun	3d15fecf12	Merge 'amend cluster_status_table virtual table to work with raft' from Gleb cluster_status_table virtual table have a status field for each node. In gossiper mode the status is taken from the gossiper, but with raft the states are different and are stored in the topology state machine. The series fixes the code to check current mode and take the status from correct place. Refs scylladb/scylladb#16984 * 'gleb/cluster_status_table-v1' of github.com:scylladb/scylla-dev: gossiper: remove unused REMOVAL_COORDINATOR state virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled virtual_tables: create result for cluster_status_table read on shard 0	2024-02-22 11:47:57 +01:00
Kamil Braun	3ee56e1936	Merge 'raft topology: enable writes to previous CDC generations' from Patryk Jędrzejczak When we create a CDC generation and ring-delay is non-zero, the timestamp of the new generation is in the future. Hence, we can have multiple generations that can be written to. However, if we add a new node to the cluster with the Raft-based topology, it receives only the last committed generation. So, this node will be rejecting writes considered correct by the other nodes until the last committed generation starts operating. In scylladb/scylladb#17134, we have allowed sending writes to the previous CDC generations. So, the situation became even more complicated. This PR adjusts the Raft-based topology to ensure all required generations are loaded into memory and their data isn't cleared too early. To load all required generations into memory, we replace `current_cdc_generation_{uuid, timestamp}` with the set containing IDs of all committed generations - `committed_cdc_generations`. To ensure this set doesn't grow endlessly, we remove an entry from this set together with the data in CDC_GENERATIONS_V3. Currently, we may clear a CDC generation's data from CDC_GENERATIONS_V3 if it is not the last committed generation and it is at least 24 hours old (according to the topology coordinator's clock). However, after allowing writes to the previous CDC generations, this condition became incorrect. We might clear data of a generation that could still be written to. The new solution introduced in this PR is to clear data of the generations that finished operating more than 24 hours ago. Apart from the changes mentioned above, this PR hardens `test_cdc_generation_clearing.py`. Fixes scylladb/scylladb#16916 Fixes scylladb/scylladb#17184 Fixes scylladb/scylladb#17288 Closes scylladb/scylladb#17374 * github.com:scylladb/scylladb: test: harden test_cdc_generation_clearing test: test clean-up of committed_cdc_generations raft topology: clean committed_cdc_generations raft topology: clean only obsolete CDC generations' data storage_service: topology_state_load: load all committed CDC generations system_keyspace: load_topology_state: fix indentation raft topology: store committed CDC generations' IDs in the topology	2024-02-22 11:41:25 +01:00
Gleb Natapov	fe5853aacc	storage_service: disable removenode --force in raft mode and deprecate it for gossiper mode removenode --force is an unsafe operation and does not even make sense with topology over raft. This patch disables it if raft is enabled and prints a deprecation note otherwise. We already have a PR to remove it (https://github.com/scylladb/scylladb/pull/15834), but it was decided there that a deprecation period is needed for legacy use case. Fixes: scylladb/scylladb#16293	2024-02-22 11:08:57 +01:00
Kefu Chai	37c6073fd5	mutation: add fmt::formatter for clustering_row and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * clustering_row::printer * static_row::printer * partition_start * partition_end * mutation_fragment::printer and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 17:53:34 +08:00
Kefu Chai	9ee728dab9	scylla-gdb: use raw string when '\' is not used in an escape sequence when '\' does not start an escape sequence, Python complains at seeing it. but it continues anyway by considering '\' as a separate char. but the warning message is still annoying: ``` scylla-gdb.py: 2417: SyntaxWarning: invalid escape sequence '\-' branches = (r" \|-- ", " \-- ") ``` when sourcing this script. so, let's mark these strings as raw strings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17466	2024-02-22 09:03:26 +02:00
Kefu Chai	4ee2aee279	tools/scylla-nodetool: define operator<< for vector<sstring> we already have generic operator<< based formatter for sequence-alike ranges defined in `utils/to_string.hh`, but as a part of efforts to address #13245, we will eventually drop the formatter. to prepare for this change, we should create/find the alternatives where the operator<< for printing the ranges is still used. Boost::program_options is one of them. it prints the options' default values using operator<< in its error message or usage. so in order to keep it working, we define operator<< for `vector<sstring>` here. if there are more types are required, we will need the generalize this formatter. if there are more needs from different compiling units, we might need to extract this helper into, for instance, `utils/to_string.hh`. but we should do this after removing it. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17413	2024-02-22 09:01:04 +02:00
Kefu Chai	da7ffd4e73	tools/scylla-types: print using managed_bytes instead of materializing the `managed_bytes_view` to a string, and print it, print it directly to stdout. this change helps to deprecate `to_hex()` helpers, we should materialize string only when necessary. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17463	2024-02-22 09:00:38 +02:00
Kefu Chai	f644ba9cdc	streaming: add fmt::formatter for streaming::stream_request before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `streaming::stream_request`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 14:03:59 +08:00
Kefu Chai	618091f6f7	streaming: add fmt::formatter for stream_session_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `streaming::stream_session_state`, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 14:03:59 +08:00
Kefu Chai	b61b5a8b5d	mutation: add fmt::formatter for row_tombstone and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * row_tombstone * row_marker * deletable_row::printer * row::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 12:44:33 +08:00
Avi Kivity	51df8b9173	interval: rename nonwrapping_interval to interval Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.	2024-02-21 19:43:17 +02:00
Avi Kivity	e338f0e009	interval: rename interval_test to wrapping_interval_test As preparation for reclaiming the name `interval` for nonwrapping_interval, rename interval_test to wrapping_interval_test.	2024-02-21 19:38:53 +02:00
Avi Kivity	1df5697bd7	Merge 'Refine some api/column_family endpoints' from Pavel Emelyanov Those that collect vectors with ks/cf names can reserve the vectors in advance. Also one of those can use range loop for shorter code Closes scylladb/scylladb#17433 * github.com:scylladb/scylladb: api: Reserve vectors in advance api: Use range-loop to iterate keyspaces	2024-02-21 19:19:28 +02:00
Tomasz Grabiec	ef9e5e64a3	locator: token_metadata: Introduce topology barrier stall detector When topology barrier is blocked for longer than configured threshold (2s), stale versions are marked as stalled and when they get released they report backtrace to the logs. This should help to identify what was holding for token metadata pointer for too long. Example log: token_metadata - topology version 30 held for 299.159 [s] past expiry, released at: 0x2397ae1 0x23a36b6 ... Closes scylladb/scylladb#17427	2024-02-21 15:05:34 +02:00
Nadav Har'El	e02cfd0035	Merge 'query.h: add fmt::formatter for types' from Kefu Chai query::specific_ranges * query::partition_slice * query::read_command * query::forward_request * query::forward_request::reduction_type * query::forward_request::aggregation_info * query::forward_result::printer * query::result_set * query::result_set_row * query::result::printer Refs #13245 Closes scylladb/scylladb#17440 * github.com:scylladb/scylladb: query-result.hh: add formatter for query::result::printer query-result-set: add formatter for query-result-set.hh types query-request: add formatter for query-request.hh types	2024-02-21 14:46:36 +02:00
Avi Kivity	4be70bfc2b	Merge 'multishard_mutation_query: add tablets support' from Botond Dénes When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard. The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance. Fixes: #16484 Closes scylladb/scylladb#16802 * github.com:scylladb/scylladb: test/cql-pytest: remove skip_with_tablets fixture test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets multishard_mutation_query: add tablets support multishard_mutation_query: remove compaction-state from result-builder factory multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>> mutation_query: reconcilable_result: add merge_disjoint() locator: introduce tablet_range_spliter dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive interval: add before() overload which takes another interval	2024-02-21 13:40:55 +02:00
Botond Dénes	94dac43b2f	tools/utils: configure tools to use the epoll reactor backend The default AIO backend requires AIO blocks. On production systems, all available AIO blocks could have been already taken by ScyllaDB. Even though the tools only require a single unit, we have seen cases where not even that is available, ScyllDB having siphoned all of the available blocks. We could try to ensure all deployments have some spare blocks, but it is just less friction to not have to deal with this problem at all, by just using the epoll backend. We don't care about performance in the case of the tools anyway, so long as they are not unreasonably slow. And since these tools are replacing legacy tools written in Java, the bar is low. Closes scylladb/scylladb#17438	2024-02-21 11:58:09 +02:00
Kefu Chai	1263494dd1	query-result.hh: add formatter for query::result::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for following types * query::result::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:57:18 +08:00
Kefu Chai	e5a930e7c6	query-result-set: add formatter for query-result-set.hh types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for following types * query::result_set * query::result_set_row Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:54:48 +08:00
Kefu Chai	4383ca431c	query-request: add formatter for query-request.hh types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for following types * query::specific_ranges * query::partition_slice * query::read_command * query::forward_request * query::forward_request::reduction_type * query::forward_request::aggregation_info * query::forward_result::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:54:41 +08:00
Kefu Chai	6408834e33	compaction: add formatter for formatted_sstables_list before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `formatted_sstables_list`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:45:45 +08:00
Kefu Chai	9969d88d82	compaction: add fmt::formatter for compaction_type and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * `sstables::compaction_type` * `sstables::compaction_type_options::scrub::mode` * `sstables::compaction_type_options::scrub::quarantine_mode` and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:45:40 +08:00
Kefu Chai	61308d51ef	tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 16:49:30 +08:00
Kefu Chai	e9e558534a	tools/scylla-nodetool: add {json,yaml}_writer so that we have less repeatings for dumping the metrics. the repeatings are error-prone and not maintainable. also move them out into a separate header, to keep fit of this source file -- it's now 3000 LOC. also, by moving them out, we can reuse them in other subcommands without moving them to the top of this source file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 16:49:30 +08:00
Botond Dénes	ca585903b7	test/cql-pytest: remove skip_with_tablets fixture All tests that used it are fixed, and we should not add any new tests failing with tablets from now on, so remove.	2024-02-21 02:08:49 -05:00
Botond Dénes	8df82d4781	test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests To run with both vnodes and tablets. For this functionality, both replication methods should be covered with tests, because it uses different ways to produce partition lists, depending on the replication method. Also add scylla_only to those tests that were missing this fixture before. All tests in this suite are scylla-only and with the parameterization, this is even more apparent.	2024-02-21 02:08:49 -05:00
Botond Dénes	b09b949159	test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets The underlying functionality was fixed, the tests should now pass with tablets.	2024-02-21 02:08:49 -05:00
Botond Dénes	ce472b33b8	multishard_mutation_query: add tablets support When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard. The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single instance. This makes the code simple. For tablets, multishard_mutation_query.cc is no longer on the hot paths, range scans on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.	2024-02-21 02:08:48 -05:00
Botond Dénes	d160a179ee	multishard_mutation_query: remove compaction-state from result-builder factory This param was used by the query-result builder, to set the last-position on end-of-stream. Instead, do this via a new ResultBuilder method, maybe_set_last_position(), which is called from read_page(), which has access to the compaction-state. With this, the ResultBuilder can be created without a compaction-state at hand. This will be important in the next patch.	2024-02-21 02:08:48 -05:00
Botond Dénes	95bc0cb1c0	multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>> Makes future patching easier.	2024-02-21 02:08:48 -05:00
Botond Dénes	35e6cbf42e	mutation_query: reconcilable_result: add merge_disjoint() Merging two disjoint reconcilable_result instances.	2024-02-21 02:08:48 -05:00
Botond Dénes	7bdd0c2cae	locator: introduce tablet_range_spliter Given a list of partition-ranges, yields the intersection of this range-list, with that of that tablet-ranges, for tablets located on the given host. This will be used in multishard_mutation_query.cc, to obtain the ranges to read from the local node: given the read ranges, obtain the ranges belonging to tablets who have replicas on the local node.	2024-02-21 02:08:48 -05:00
Botond Dénes	4993d0e30a	dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive Consider the inclusiveness of the token-range's start and end bounds and copy the flag to the output bounds, instead of assuming they are always inclusive.	2024-02-21 02:08:48 -05:00
Botond Dénes	239484f259	interval: add before() overload which takes another interval The current point variant cannot take inclusiveness into account, when said point comes from another interval bound. This method had no tests at all, so add tests covering both overloads.	2024-02-21 02:08:48 -05:00
Avi Kivity	605bf6e221	range.hh: retire range.hh was deprecated in `bd794629f9` (2020) since its names conflict with the C++ library concept of an iterator range. The name ::range also mapped to the dangerous wrapping_interval rather than nonwrapping_interval. Complete the deprecation by removing range.hh and replacing all the aliases by the names they point to from the interval library. Note this now exposes uses of wrapping intervals as they are now explicit. The unit tests are renamed and range.hh is deleted. Closes scylladb/scylladb#17428	2024-02-21 00:24:25 +02:00
Wojciech Mitros	4c767c379c	mv: adjust the overhead estimation for view updates In order to avoid running out of memory, we can't underestimate the memory used when processing a view update. Particularly, we need to handle the remote view updates well, because we may create many of them at the same time in contrast to local updates which are processed synchronously. After investigating a coredump generated in a crash caused by running out of memory due to these remote view updates, we found that the current estimation is much lower than what we observed in practice; we identified overhead of up to 2288 bytes for each remote view update. The overhead consists of: - 512 bytes - a write_response_handler - less than 512 bytes - excessive memory allocation for the mutation in bytes_ostream - 448 bytes - the apply_to_remote_endpoints coroutine started in mutate_MV() - 192 bytes - a continuation to the coroutine above - 320 bytes - the coroutine in result_parallel_for_each started in mutate_begin() - 112 bytes - a continuation to the coroutine above - 192 bytes - 5 unspecified allocations of 32, 32, 32, 48 and 48 bytes This patch changes the previous overhead estimate of 256 bytes to 2288 bytes, which should take into account all allocations in the current version of the code. It's worth noting that changes in the related pieces of code may result in a different overhead. The allocations seem to be mostly captures for the background tasks. Coroutines seem to allocate extra, however testing shows that replacing a coroutine with continuations may result in generating a few smaller futures/continuations with a larger total size. Besides that, considering that we're waiting for a response for each remote view update, we need the relatively large write_response_handler, which also includes the mutation in case we needed to reuse it. The change should not majorly affect workloads with many local updates because we don't keep many of them at the same time anyway, and an added benefit of correct memory utilization estimation is avoiding evictions of other memory that would be otherwise necessary to handle the excessive memory used by view updates. Fixes #17364 Closes scylladb/scylladb#17420	2024-02-21 00:05:49 +02:00
Tomasz Grabiec	e63d8ae272	Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration. This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier. To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field. refs: #16527 Closes scylladb/scylladb#17360 * github.com:scylladb/scylladb: test/topology: Add checking error paths for failed migration topology.tablets_migration: Handle failed streaming topology.tablets_migration: Add cleanup_target transition stage topology.tablets_migration: Add revert_migration transition stage storage_service: Rewrap cleanup stage checking in cleanup_tablet() test/topology: Move helpers to get tablet replicas to pylib	2024-02-20 18:50:55 +01:00
Anna Stuchlik	37237407f6	doc: remove info about outdated versions This PR removes information about outdated versions, including disclaimers and information when a given feature was added. Now that the documentation is versioned, information about outdated versions is unnecessary (and makes the docs harder to read). Fixes https://github.com/scylladb/scylladb/issues/12110 Closes scylladb/scylladb#17430	2024-02-20 19:32:13 +02:00
Pavel Emelyanov	ceac65be1e	api: Reserve vectors in advance Some endpoints in api/column_family fill vectors with data obtained from database and return them back. Since the amount of data is known in advance, it's good to reserve the vector. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 19:13:05 +03:00
Pavel Emelyanov	f3e58cb806	api: Use range-loop to iterate keyspaces The code uses standard for (;;) loop, but range version is nicer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 19:12:12 +03:00
Avi Kivity	93af3dd69b	Merge 'Maintenance socket: set filesystem permissions to 660' from Mikołaj Grzebieluch Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect. Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket. Fixes https://github.com/scylladb/scylladb/issues/16487. Closes scylladb/scylladb#17113 * github.com:scylladb/scylladb: maintenance_socket: add option to set owning group transport/controller: get rid of magic number for socket path's maximal length transport/controller: set unix_domain_socket_permissions for maintenance_socket transport/controller: pass unix_domain_socket_permissions to generic_server::listen transport/controller: split configuring sockets into separate functions	2024-02-20 15:09:54 +02:00
Botond Dénes	73a3a3faf3	Merge 'tools/scylla-nodetool: implement tablestats' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17387 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement tablestats utils/rjson: add templated streaming_writer::Write()	2024-02-20 14:46:07 +02:00
Botond Dénes	8c228bffc8	Merge 'repair: accelerate repair load_history time' from Xu Chang Using `parallel_for_each_table` instance of `for_each_table_gently` on `repair_service::load_history`, to reduced bootstrap time. Using uuid_xor_to_uint32 on repair load_history dispatch to shard. Ref: https://github.com/scylladb/scylladb/issues/16774 Closes scylladb/scylladb#16927 * github.com:scylladb/scylladb: repair: resolve load_history shard load skew repair: accelerate repair load_history time	2024-02-20 13:45:26 +02:00
Kefu Chai	b0bb3ab5b0	topology: print `node` with node_printer in `da53854b66`, we added formatter for printing a `node`, and switched to this formatter when printing `node*`. but we failed to update some caller sites when migrating to the new formatter, where a `unique_ptr<node>` is printed instead. this is not the behavior before the change, and is not expected. so, in this change, we explicitly instantiate `node_printer` instances with the pointer held by `unique_ptr<node>`, to restore the behavior before `da53854b66`. this issue was identified when compiling the tree using {fmt} v10 and compile-time format-string check enabled, which is yet upstreamed to Seastar. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17418	2024-02-20 14:35:56 +03:00
Patryk Jędrzejczak	419354bc9f	test: harden test_cdc_generation_clearing In one of the previous patches, we fixed scylladb/scylladb#16916 as a side effect. We removed `system_keyspace::get_cdc_generations_cleanup_candidate`, which contained the bug causing the issue. Even though we didn't have to fix this issue directly, it showed us that `test_cdc_generation_clearing` was too weak. If something went wrong during/after the only clearing, the test still could pass because the clearing was the last action in the test. In scylladb/scylladb#16916, the CDC generation publisher was stuck after the clearing because of a recurring error. The test wouldn't detect it. Therefore, we harden the test by expecting two clearings instead of one. If something goes wrong during the first clearing, there is a high chance that the second clearing will fail. The new test version wouldn't pass with the old bug in the code.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	2b724735d1	test: test clean-up of committed_cdc_generations We extend `test_cdc_generation_clearing`. Now, it also tests the clean-up of `TOPOLOGY.committed_cdc_generations` added in the previous patch. In the implementation, we harden the already existing `check_system_topology_and_cdc_generations_v3_consistency`. After the previous patch, data of every generation present in `committed_cdc_generations` should be present in CDC_GENERATIONS_V3. In other words, `committed_cdc_generations` should always be a subset of a set containing generations in CDC_GENERATIONS_V3. Before the previous patch, this wasn't true after the clearing, so the new version of `test_cdc_generation_clearing` wouldn't pass back then.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	7301d1317b	raft topology: clean committed_cdc_generations We clean `TOPOLOGY.committed_cdc_generations` from obsolete generations to ensure this set doesn't grow endlessly. After this patch, the following invariant will be true: if a generation is in `committed_cdc_generation`, its data is in CDC_GENERATIONS_V3.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	b8aa74f539	raft topology: clean only obsolete CDC generations' data Currently, we may clear a CDC generation's data from CDC_GENERATIONS_V3 if it is not the last committed generation and it is at least 24 hours old (according to the topology coordinator's clock). However, after allowing writes to the previous CDC generations, this condition became incorrect. We might clear data of a generation that could still be written to. The new solution is to clear data of the generations that finished operating more than 24 hours ago. The rationale behind it is in the new comment in `topology_coordinator:clean_obsolete_cdc_generations`. The previous solution used the clean-up candidate. After introducing `committed_cdc_generations`, it became unneeded. The last obsolete generation can be computed in `topology_coordinator:clean_obsolete_cdc_generations`. Therefore, we remove all the code that handles the clean-up candidate. After changing how we clear CDC generations' data, `test_current_cdc_generation_is_not_removed` became obsolete. The tested feature is not present in the code anymore. `test_dependency_on_timestamps` became the only test case covering the CDC generation's data clearing. We adjust it after the changes.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	8b214d02fb	storage_service: topology_state_load: load all committed CDC generations We load all committed CDC generations into `cdc::metadata`. Since we have allowed sending writes to the previous generations in scylladb/scylladb#17134, the committed generations may be necessary to handle a correct request.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	18cff1aa6a	system_keyspace: load_topology_state: fix indentation Broken in the previous patch.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	e145e758eb	raft topology: store committed CDC generations' IDs in the topology When we create a CDC generation and ring-delay is non-zero, the timestamp of the new generation is in the future. Hence, we can have multiple generations that can be written to. However, if we add a new node to the cluster with the Raft-based topology, it receives only the last committed generation. So, this node will be rejecting writes considered correct by the other nodes until the last committed generation starts operating. In scylladb/scylladb#17134, we have allowed sending writes to the previous CDC generations. So, the situation became even more complicated. We need to adjust the Raft-based topology to ensure all required generations are loaded into memory and their data isn't cleared too early. This patch is the first step of the adjustment. We replace `current_cdc_generation_{uuid, timestamp}` with the set containing IDs of all committed generations - `committed_cdc_generations`. This set is sorted by timestamps, just like `unpublished_cdc_generations`. This patch is mostly refactoring. The last generation in `committed_cdc_generations` is the equivalent of the previous `current_cdc_generation_{uuid, timestamp}`. The other generations are irrelevant for now. They will be used in the following patches. After introducing `committed_cdc_generations`, a newly committed generation is also unpublished (it was current and unpublished before the patch). We introduce `add_new_committed_cdc_generation`, which updates both sets of generations so that we don't have to call `add_committed_cdc_generation` and `add_unpublished_cdc_generation` together. It's easy to forget that both of them are necessary. Before this patch, there was no call to `add_unpublished_cdc_generation` in `topology_coordinator::build_coordinator_state`. It was a bug reported in scylladb/scylladb#17288. This patch fixes it. This patch also removes "the current generation" notion from the Raft-based topology. For the Raft-based topology, the current generation was the last committed generation. However, for the `cdc::metadata`, it was the generation operating now. These two generations could be different, which was confusing. For the `cdc::metadata`, the current generation is relevant as it is handled differently, but for the Raft-based topology, it isn't. Therefore, we change only the Raft-based topology. The generation called "current" is called "the last committed" from now.	2024-02-20 12:35:16 +01:00
Kefu Chai	c627d9134e	tools/scylla-nodetool: implement tablestats Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 18:12:35 +08:00
Kefu Chai	a7a2cf64cc	utils/rjson: add templated streaming_writer::Write() so we can use it in a templated context. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 18:12:35 +08:00
Botond Dénes	050c6dcad7	api: storage_service/keyspaces: add replication filter To allow to filter the returned keyspaces based by the replication they use: tablets or vnodes. The filter can be disabled by omitting the parameter or passing "all". The default is "all". Fixes: #16509 Closes scylladb/scylladb#17319	2024-02-20 09:04:41 +01:00
Kefu Chai	57ede58a64	raft: add fmt::formatter for raft::fsm before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `raft::fsm`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17414	2024-02-20 09:02:02 +02:00
Kefu Chai	acefde0735	mutation: add fmt::formatter for mutation_partition::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `mutation_partition::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17419	2024-02-20 09:01:22 +02:00
Kefu Chai	0b13de52de	sstable/mx: add fmt::formatter for cached_promoted_index::promoted_index_block before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `cached_promoted_index::promoted_index_block`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17415	2024-02-20 09:00:32 +02:00
Botond Dénes	2a494b6c47	Merge 'test/nodetool: parameterize test_ring' from Kefu Chai so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Fixes #17401 Closes scylladb/scylladb#17417 * github.com:scylladb/scylladb: test/nodetool: parameterize test_ring test/nodetool: fail a test only with leftover expected requests	2024-02-20 08:48:11 +02:00
Anna Stuchlik	69ead0142d	doc: remove outdated/invalid entries from FAQ This commit removes outdated or invalid FAQ entries specified in https://github.com/scylladb/scylladb/issues/16631 In addition, the questions about Cassandra compatibility are removed as they are already answered on the forum: https://forum.scylladb.com/t/which-cassandra-version-is-scylladb-it-compatible-with/84 Also, the incorrect entry about the cache has been removed and the correct answer is added to the forum. Fixes https://github.com/scylladb/scylladb/issues/17003 The question about troubleshooting performance issues has also been removed, as it's already covered on the Forum. Also, it removes the Apache copyright entry, which should not be added to the FAQ page. Closes scylladb/scylladb#17200	2024-02-20 08:43:58 +02:00
Anna Stuchlik	4f8f183736	doc: remove SSTable2json from the docs This commit removes the SSTable2json documentation, as well as the links to the removed page. In addition, it adds a redirection for that page to prevent 404. Fixes https://github.com/scylladb/scylladb/issues/17204 Closes scylladb/scylladb#17340	2024-02-20 08:43:27 +02:00
Kefu Chai	64f9d90f7b	tools/scylla-nodetool: implement toppartitions Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17357	2024-02-20 08:16:43 +02:00
Pavel Emelyanov	1440eddc58	test/topology: Add checking error paths for failed migration For now only fail streaming stage and check that migration doesn't get stuck and doesn't make tablet appear on dead node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:59:06 +03:00
Pavel Emelyanov	cb02297642	topology.tablets_migration: Handle failed streaming In case pending or leaving replica is marked as ignored by operator, streaming cannot be retried and should jump to "cleanup_target" stage after a barrier. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:59:06 +03:00
Pavel Emelyanov	72f3b1d5fe	topology.tablets_migration: Add cleanup_target transition stage The new stage will be used to revert migration that fails at some stages. The goal is to cleanup the pending replica, which may already received some writes by doing the cleanup RPC to the pending replica, then jumping to "revert_migration" stage introduced earlier. If pending node is dead, the call to cleanup RPC is skipped. Coordinators use old replicas. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:59:06 +03:00
Pavel Emelyanov	ced5bf56eb	topology.tablets_migration: Add revert_migration transition stage It's like end_migration, but old replicas intact just removing the transition (including new replicas). Coordinators use old replicas. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:53:36 +03:00
Pavel Emelyanov	a0a33e8be1	storage_service: Rewrap cleanup stage checking in cleanup_tablet() Next patch will need to teach this code to handle new cleanup_target stage, this change prepares this place for smoother patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:53:36 +03:00
Pavel Emelyanov	c06cbc391f	test/topology: Move helpers to get tablet replicas to pylib These are very useful and will be used across different test files soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:53:36 +03:00
Kefu Chai	3a94a7c1ff	test/nodetool: parameterize test_ring so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. Fixes #17401 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Kefu Chai	3d8a6956fc	test/nodetool: fail a test only with leftover expected requests if there are unconsumed requests whose `multiple` is -1, we should not consider it a required, the test can consume it or not. but if it does not, we should not consider the test a failure just because these requests are sitting at the end of queue. so, in this change, we * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Patryk Wrobel	82104b6f50	test_tablets: tablet count metric - remove assumption about tablets existence The mentioned test failed on CI. It sets up two nodes and performs operations related to creation and dropping of tables as well as moving tablets. Locally, the issue was not visible - also, the test was passing on CI in majority of cases. One of steps in the test case is intended to select the shard that has some tablets on host_0 and then move them to (host_1, shard_3). It contains also a precondition that requires the tablets count to be greater than zero - to ensure, that move_tablets operation really moves tablets. The error message in the failed CI run comes from the precondition related to tablets count on (host0, src_shard) - it was zero. This indicated that there were no tablets on entire host_0. The following commit removes the assumption about the existence of tablets on host_0. In case when there are no tablets there, the procedure is rerun for host_1. Now the logic is as follows: - find shard that has some tablets on host_0 - if such shard does not exist, then find such shard on host_1 - depending on the result of search set src/dest nodes - verify that reported tablet count metric is changed when move_tablet operation finishes Refs: scylladb#17386 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17398	2024-02-19 21:26:08 +01:00
Kefu Chai	3c84f08b93	alternator: add formatter for attribute_path_map_node<update_expression::action> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `attribute_path_map_node<update_expression::action>`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17270	2024-02-19 20:09:11 +02:00
Gleb Natapov	f00ea36f63	gossiper: remove unused REMOVAL_COORDINATOR state This is leftover from `66ff072540`	2024-02-19 15:01:33 +02:00
Gleb Natapov	461bba08cb	virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled If topology over raft is enabled the most up-to-date node status is in the topology state machine. Get it from there.	2024-02-19 15:01:33 +02:00
Gleb Natapov	eb6fa81714	virtual_tables: create result for cluster_status_table read on shard 0 Next patch will access data that is available only on shard 0 during result creation.	2024-02-19 15:01:33 +02:00
Petr Gusev	f83df24108	test_decommission: fix log messages Closes scylladb/scylladb#17396	2024-02-19 12:09:43 +02:00
Mikołaj Grzebieluch	182cfebe40	maintenance_socket: add option to set owning group Option `maintenance-socket-group` sets the owning group of the maintenance socket. If not set, the group will be the same as the user running the scylla node.	2024-02-19 10:21:00 +01:00
Kefu Chai	34cc245da5	gms: add formatter for read_context::dismantle_buffer_stats before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `read_context::dismantle_buffer_stats`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17389	2024-02-19 09:43:53 +02:00
Kefu Chai	fe8e37c5bd	configure.py: remove -Wno-unused-command-line-argument `-Wno-unused-command-line-argument` is used to disable the warning of `-Wunused-command-line-argument`, which is in turn used to split warnings if any of the command line arguments passed to the compiler driver is not used. see https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-command-line-argument but it seems we are not passing unused command line arguments to the compiler anymore. so let's drop this option. this change helps to * reduce the discrepencies between the compiling options used by CMake-generated rules and those generated directly using `configure.py` * reenable the warning so we are aware if any of the options is not used by compiler. this could a sign that the option fails to serve its purpose. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17195	2024-02-19 09:42:31 +02:00
Botond Dénes	42a76ca568	Merge 'Improve printing of nodes and backtraces in topology' from Pavel Emelyanov There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not. This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation. Closes scylladb/scylladb#17235 * github.com:scylladb/scylladb: topology: Restore indentation after previous patch topology: Drop if_enabled checks for logging topology: Add lazy_backtrace() helper topology: Add printer wrapper for node* and formatter for it topology: Expand formatter<locator::node>	2024-02-19 09:32:53 +02:00
Kefu Chai	47ec74ad1a	tools/scylla-nodetool: implement ring Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17375	2024-02-19 09:30:01 +02:00
Anna Stuchlik	ef1468d5ec	doc: remove Enterprise OS support from Open Source With this commit: - The information about ScyllaDB Enterprise OS support is removed from the Open Source documentation. - The information about ScyllaDB Open Source OS support is moved to the os-support-info file in the _common folder. - The os-support-info file is included in the os-support page using the scylladb_include_flag directive. This update employs the solution we added with https://github.com/scylladb/scylladb/pull/16753. It allows to dynamically add content to a page depending on the opensource/enterprise flag. Refs https://github.com/scylladb/scylladb/issues/15484 Closes scylladb/scylladb#17310	2024-02-18 22:09:06 +02:00
Petr Gusev	1d6caa42b9	join_cluster: move was_decommissioned check earlier Before the patch if a decommissioned node tries to restart, it calls _group0->discover_group0 first in join_cluster, which hangs since decommissioned nodes are banned and other nodes don't respond to their discovering requests. We fix the problem by checking was_decommissioned() flag before calling discover_group0. fixes scylladb/scylladb#17282 Closes scylladb/scylladb#17358	2024-02-18 22:07:28 +02:00
Kefu Chai	9d666f7d29	cmake: add -Wextra to compiling options this matches what we have in configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17376	2024-02-18 19:21:54 +02:00
Kefu Chai	cb781c0ff7	gms: add add formatter for gms::versioned_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::versioned_value`. its operator<< is preserved, as it's still being used by the homebrew generic formatter for std::unordered_map<gms::application_state, gms::versioned_value>, which is in turn used in gms/gossiper.cc. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17366	2024-02-18 19:21:54 +02:00
Avi Kivity	43f1c3df2e	Merge 'repair: Update repair history for tablet repair' from Asias He This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history Closes scylladb/scylladb#17047 * github.com:scylladb/scylladb: repair: Update repair history for tablet repair repair: Extract flush hints code	2024-02-18 19:21:54 +02:00
Kefu Chai	8fc4243cf6	configure.py: do not pass include cxx_ldflags in cxxflags ldflags are passed to ld (the linker), while cxxflags are passed to the C++ compiler. the compiler does not understand the ldflags. if we pass ldflags to it, it complains if `-Wunused-command-line-argument` is enabled. in this change, we do not include the ldflags in cxxflags, this helps us to enable the warning option of `-Wunused-command-line-argument`, so we don't need to disabled it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17328	2024-02-18 19:21:54 +02:00
Avi Kivity	d257cc5003	Merge 'scylla-nodetool: implement the repair command' from Botond Dénes As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation. Refs: #15588 Closes scylladb/scylladb#17368 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the repair command test/nodetool: utils: add check_nodetool_fails_with_error_contains() test/nodetool: util: replace flags with custom matcher	2024-02-18 19:21:54 +02:00
Petr Gusev	4ef5d92f50	gossiping_property_file_snitch_test: modernize + fix potential race This is mostly a refactoring commit to make the test more readable, as a byproduct of scylladb/scylladb#17369 investigation. We add the check for specific type of exceptions that can be thrown (bad_property_file_error). We also fix the potential race - the test may write to res from multiple cores with no locks. Closes scylladb/scylladb#17371	2024-02-18 19:21:53 +02:00
Kefu Chai	4812a57f71	gms: add add formatter for gms::gossip_* before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for - gms::gossip_digest - gms::gossip_digest_ack - gms::gossip_digest_syn and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17379	2024-02-18 19:21:53 +02:00
Patryk Wrobel	3842bf18a7	storage_service/range_to_endpoint_map: allow API to properly handle tablets This API endpoint was failing when tablets were enabled because of usage of get_vnode_effective_replication_map(). Moreover, it was providing an error message that was not user-friendly. This change extends the handler to properly service the incoming requests. Furthermore, it introduces two new test cases that verify the behavior of storage_service/range_to_endpoint_map API. It also adjusts the test case of this endpoint for vnodes to succeed when tablets are enabled by default. The new logic is as follows: - when tablets are disabled then users may query endpoints for a keyspace or for a given table in a keyspace - when tablets are enabled then users have to provide table name, because effective replication map is per-table When user does not provide table name when tablets are enabled for a given keyspace, then BAD_REQUEST is returned with a meaningful error message. Fixes: scylladb#17343 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17372	2024-02-18 19:21:53 +02:00
Kefu Chai	808f4d72fb	storage_service: fix typos in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17377	2024-02-18 19:21:53 +02:00
Botond Dénes	b11213e547	tools/scylla-nodetool: implement the upgradesstables command Refs: #15588 Closes scylladb/scylladb#17370	2024-02-18 19:21:53 +02:00
Kefu Chai	af2553e8bc	cdc: add formatter for cdc::image_mode and cdc::delta_mode before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cdc::image_mode and cdc::delta_mode, and drop their operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17381	2024-02-18 19:21:53 +02:00
Avi Kivity	9bb4482ad0	Merge 'cdc: metadata: allow sending writes to the previous generations' from Patryk Jędrzejczak Before this PR, writes to the previous CDC generations would always be rejected. After this PR, they will be accepted if the write's timestamp is greater than `now - generation_leeway`. This change was proposed around 3 years ago. The motivation was to improve user experience. If a client generates timestamps by itself and its clock is desynchronized with the clock of the node the client is connected to, there could be a period during generation switching when writes fail. We didn't consider this problem critical because the client could simply retry a failed write with a higher timestamp. Eventually, it would succeed. This approach is safe because these failed writes cannot have any side effects. However, it can be inconvenient. Writing to previous generations was proposed to improve it. The idea was rejected 3 years ago. Recently, it turned out that there is a case when the client cannot retry a write with the increased timestamp. It happens when a table uses CDC and LWT, which makes timestamps permanent. Once Paxos commits an entry with a given timestamp, Scylla will keep trying to apply that entry until it succeeds, with the same timestamp. Applying the entry involves writing to the CDC log table. If it fails, we get stuck. It's a major bug with an unknown perfect solution. Allowing writes to previous generations for `generation_leeway` is a probabilistic fix that should solve the problem in practice. Apart from this change, this PR adds tests for it and updates the documentation. This PR is sufficient to enable writes to the previous generations only in the gossiper-based topology. The Raft-based topology needs some adjustments in loading and cleaning CDC generations. These changes won't interfere with the changes introduced in this PR, so they are left for a follow-up. Fixes scylladb/scylladb#7251 Fixes scylladb/scylladb#15260 Closes scylladb/scylladb#17134 * github.com:scylladb/scylladb: docs: using-scylla: cdc: remove info about failing writes to old generations docs: dev: cdc: document writing to previous CDC generations test: add test_writes_to_previous_cdc_generations cdc: generation: allow increasing generation_leeway through error injection cdc: metadata: allow sending writes to the previous generations	2024-02-18 19:21:53 +02:00
Asias He	796044be1c	repair: Update repair history for tablet repair This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history	2024-02-18 10:21:58 +08:00
Asias He	e43bc775d0	repair: Extract flush hints code So it can be used by tablet repair as well.	2024-02-18 09:42:02 +08:00
Kefu Chai	50964c423e	hints: host_filter: add formatter for hints::host_filter before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `hints::host_filter`. its operator<< is preserved as it's still used by the homebrew generic formatter for vector<>, which is in turn used by db/config.cc. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17347	2024-02-16 19:03:11 +03:00
Anna Stuchlik	e132ffdb60	doc: add missing redirections This commit adds the missing redirections to the pages whose source files were previously stored in the install-scylla folder and were moved to another location. Closes scylladb/scylladb#17367	2024-02-16 14:09:26 +02:00
Kefu Chai	47fec0428a	tools/scylla-nodetool: return 1 when viewbuild not succeeds this change introduces a new exception which carries the status code so that an operation can return a non-zero exit code without printing any errors. this mimics the behavior of "viewbuildstatus" command of C* nodetool. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17359	2024-02-16 13:53:33 +02:00
Botond Dénes	8d8ea12862	tools/scylla-nodetool: implement the repair command	2024-02-16 04:42:08 -05:00
Botond Dénes	48e8435466	test/nodetool: utils: add check_nodetool_fails_with_error_contains() Checks that at least one error snippet is contained in the error output.	2024-02-16 04:40:31 -05:00
Botond Dénes	190c9a7239	test/nodetool: util: replace flags with custom matcher _do_check_nodetool_fails_with() currently has a `match_all` flag to control how the match is checked. Now we need yet another way to control how matching is done. Instead of adding yet another flag (and who knows how many more), jut replace the flag and the errors input with a matcher functor, which gets the stdout and stderr and is delegated to do any checks it wants. This method will scale much better going forward.	2024-02-16 04:40:31 -05:00
Yaron Kaikov	44edb89f79	[actions] Add a check for backport labels As part of the Automation of ScyllaDB backports project, each PR should get either a backport/none or backport/X.Y label. Based on this label we will automatically open a backport PR for the relevant OSS release. In this commit, I am adding a GitHub action to verify if such a label was added. This only applies to PR with a based branch of master or next. For releases, we don't need this check	2024-02-15 22:40:09 +02:00
Avi Kivity	eedb997568	Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Closes scylladb/scylladb#17335 * github.com:scylladb/scylladb: compaction: upgrade: handle keyspaces that use tablets replica/database: add an optional variant to get_keyspace_local_ranges	2024-02-15 21:31:54 +02:00
Kefu Chai	f0b3068bcf	build: cmake: disable unused-parameter, missing-field-initializers and deprecated-copy -Wunused-parameter, -Wmissing-field-initializers and -Wdeprecated-copy warning options are enabled by -Wextra. the tree fails to build with these options enabled, before we address them if the warning are genuine problems, let's disable them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17352	2024-02-15 21:27:44 +02:00
Kamil Braun	50ebce8acc	Merge 'Purge old ip on change' from Petr Gusev When a node changes IP address we need to remove its old IP from `system.peers` and gossiper. We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted. The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address. The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1. To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup. The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes. Fixes #16886 Fixes #16691 Fixes #17199 Closes scylladb/scylladb#17162 * github.com:scylladb/scylladb: test_change_ip: improve the test raft_ip_address_updater: remove stale IPs from gossiper raft_address_map: add my ip with the new generation system_keyspace::update_peer_info: check ep and host_id are not empty system_keyspace::update_peer_info: make host_id an explicit parameter system_keyspace::update_peer_info: remove any_set flag optimisation system_keyspace: remove duplicate ips for host_id system_keyspace: peers table: use coroutines storage_service::raft_ip_address_updater: log gossiper event name raft topology: ip change: purge old IP on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes	2024-02-15 17:40:29 +01:00
Nadav Har'El	6873a4772f	tablets: add warning on CREATE KEYSPACE The CDC feature is not supported on a table that uses tablets (Refs #16317), so if a user creates a keyspace with tablets enabled they may be surprised later (perhaps much later) when they try to enable CDC on the table and can't. The LWT feature always had issue Refs #5251, but it has become potentially more common with tablets. So it was proposed that as long as we have missing features (like CDC or LWT), every time a keyspace is created with tablets it should output a warning (a bona-fide CQL warning, not a log message) that some features are missing, and if you need them you should consider re-creating the keyspace without tablets. This patch does this. It was surprisingly hard and ugly to find a place in the code that can check the tablet-ness of a keyspace while it is still being created, but I think I found a reasonable solution. The warning text in this patch is the following (obviously, it can be improved later, as we perhaps find more missing features): "Tables in this keyspace will be replicated using tablets, and will not support the CDC feature (issue #16317) and LWT may suffer from issue #5251 more often. If you want to use CDC or LWT, please drop this keyspace and re-create it without tablets, by adding AND TABLETS = {'enabled': false} to the CREATE KEYSPACE statement." This patch also includes a test - that checks that this warning is is indeed generated when a keyspace is created with tablets (either by default or explicitly), and not generated if the keyspace is created without tablets. Obviously, this entire patch - the warning and its test - can be reverted as soon as we support CDC (and all other features) on tablets. Fixes #16807 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-02-15 15:51:47 +02:00
Nadav Har'El	29b42e47e5	test/cql-pytest: fix guadrail tests to not be sensitive to more warnings The guardrail tests check that certain guardrails enable and disable certain warnings. These tests currently check for the number of warnings returned by a request, assuming that without the guardrail there would be no warning. But in the following patch we plan to add an additional warning on keyspace creation (that warns about tablets missing some features). So the tests should check for whether or not a specific warning is returned - not the count. I only modified tests which the change in the next patch will break. Tests which use SimpleStrategy and will not get the extra warning, are unmodified and continue to use the old approach of counting warnings. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-02-15 15:08:08 +02:00
Lakshmi Narayanan Sreethar	7a98877798	compaction: upgrade: handle keyspaces that use tablets Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-15 17:47:39 +05:30
Lakshmi Narayanan Sreethar	8925a2c3cb	replica/database: add an optional variant to get_keyspace_local_ranges Add a new method database::maybe_get_keyspace_local_ranges that optionally returns the owned ranges for the given keyspace if it has a effective_replication_map for the entire keyspace. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-15 17:44:47 +05:30
Botond Dénes	22a5112bf1	tools/scylla-sstable-scripts: add keys.lua and largest-key.lua I wrote these scripts to identify sstables with too large keys for a recent investigation. I think they could be useful in the future, certainly as further examples on how to write lua scripts for scylla-sstable script. Closes scylladb/scylladb#17000	2024-02-15 13:39:41 +02:00
Avi Kivity	5df5714331	Merge 'api: storage_service/natural_endpoints: add tablets support' from Botond Dénes This API endpoint currently returns with status 500 if attempted to be called for a table which uses tablets. This series adds tablet support. No change in usage semantics is required, the endpoint already has a table parameter. This endpoint is the backend of `nodetool getendpoints` which should now work, after this PR. Fixes: #17313 Closes scylladb/scylladb#17316 * github.com:scylladb/scylladb: service/storage_service: get_natural_endpoints(): add tablets support replica/database: keyspace: add uses_tablets() service/storage_service: remove token overload of get_natural_endpoints()	2024-02-15 13:36:56 +02:00
Kefu Chai	caa20c491f	storage_service: pass non-empty keyspace when performing cleanup_all this change addresses the regression introduced by `5e0b3671`, which fall backs to local cleanup in cleanup_all. but `5e0b3671` failed to pass the keyspace to the `shard_cleanup_keyspace_compaction_task_impl` is its constructor parameter, that's why the test fails like ``` error executing POST request to http://localhost:10000/storage_service/cleanup_all with parameters {}: remote replied with status code 400 Bad Request: Can't find a keyspace ``` where the string after "Can't find a keyspace" is empty. in this change, the keyspace name of the keyspace to be cleaned is passed to `shard_cleanup_keyspace_compaction_task_impl`. we always enable the topology coordinator when performing testing, that's why this issue does not pop up until the longevity test. Fixes #17302 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17320	2024-02-15 13:17:45 +02:00
Aleksandra Martyniuk	cf36015591	repair: handle no_such_column_family from remote node gracefully If no_such_column_family is thrown on remote node, then repair operation fails as the type of exception cannot be determined. Use repair::with_table_drop_silenced in repair to continue operation if a table was dropped.	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	2ea5d9b623	test: test drop table on receiver side during streaming	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	b08f539427	streaming: fix indentation	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	219e1eda09	streaming: handle no_such_column_family from remote node gracefully If no_such_column_family is thrown on remote node, then streaming operation fails as the type of exception cannot be determined. Use repair::with_table_drop_silenced in streaming to continue operation if a table was dropped.	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	5202bb9d3c	repair: add methods to skip dropped table Schema propagation is async so one node can see the table while on the other node it is already dropped. So, if the nodes stream the table data, the latter node throws no_such_column_family. The exception is propagated to the other node, but its type is lost, so the operation fails on the other node. Add method which waits until all raft changes are applied and then checks whether given table exists. Add the function which uses the above to determine, whether the function failed because of dropped table (eg. on the remote node so the exact exception type is unknown). If so, the exception isn't rethrown.	2024-02-15 12:06:42 +01:00
Botond Dénes	811e931b09	Merge 'tools/scylla-nodetool: implement compactionstats and viewbuildstatus' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17344 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement viewbuildstatus tools/scylla-nodetool: implement compactionstats	2024-02-15 12:44:05 +02:00
Petr Gusev	c4140678ba	test_change_ip: improve the test In this commit we refactor test_change_ip to improve it in several ways: * We inject failure before old IP is removed and verify that after restart the node sees the proper peers - the new IP for node2 and old IP for node3, which is not restarted yet. * We introduce the lambda wait_proper_ips, which checks not only the system.peers table, but also gossiper and token_metadata. * We call this lambda for all nodes, not only the first node; this allows to validate that the node that has changed its IP has the proper IP of itself in the data structures above. Note that we need to inject an additional delay ip-change-raft-sync-delay before old IP is removed. Otherwise the problem stop reproducing - other nodes remove the old IP before it's send back to the just restarted node.	2024-02-15 13:26:02 +04:00
Petr Gusev	a068dba8c9	raft_ip_address_updater: remove stale IPs from gossiper In the scenario described in the previous commit the on_endpoint_change could be called with our previous IP. We can easily detect this case - after add_or_update_entry the IP for a given id in address_map hasn't changed. We remove such IP from gossiper since it's not needed, and makes the test in the next commit more natural - all old IPs are removed from all subsystems.	2024-02-15 13:25:56 +04:00
Petr Gusev	4b33ba2894	raft_address_map: add my ip with the new generation The following scenario is possible: a node A changes its IP from ip1 to ip2 with restart, other nodes are not yet aware of ip2 so they keep gossiping ip1, after restart A receives ip1 in a gossip message and calls handle_major_state_change since it considers it as a new node. Then on_join event is called on the gossiper notification handles, we receive such event in raft_ip_address_updater and reverts the IP of the node A back to ip1. The essence of the problem is that we don't pass the proper generation when we add ip2 as a local IP during initialization when node A restarts, so the zero generation is used in raft_address_map::add_or_update_entry and the gossiper message owerwrites ip2 to ip1. In this commit we fix this problem by passing the new generation. To do that we move the increment_and_get_generation call from join_token_ring to scylla_main, so that we have a new generation value before init_address_map is called. Also we remove the load_initial_raft_address_map function from raft_group0 since it's redundant. The comment above its call site says that it's needed to not miss gossiper updates, but the function storage_service::init_address_map where raft_address_map is now initialized is called before gossiper is started. This function does both - it load the previously persisted host_id<->IP mappings from system.local and subscribes to gossiper notifications, so there is no room for races. Note that this problem reproduces less likely with the 'raft topology: ip change: purge old IP' commit - other nodes remove the old IP before it's send back to the just restarted node. This is also the reason why this problem doesn't occur in gossiper mode. fixes scylladb/scylladb#17199	2024-02-15 13:21:04 +04:00
Petr Gusev	2bf75c1a4e	system_keyspace::update_peer_info: check ep and host_id are not empty	2024-02-15 13:21:04 +04:00
Petr Gusev	86410d71d1	system_keyspace::update_peer_info: make host_id an explicit parameter The host_id field should always be set, so it's more appropriate to pass it as a separate parameter. The function storage_service::get_peer_info_for_update is updated. It shouldn't look for host_id app state is the passed map, instead the callers should get the host_id on their own.	2024-02-15 13:21:04 +04:00
Petr Gusev	e0072f7cb3	system_keyspace::update_peer_info: remove any_set flag optimisation This optimization never worked -- there were four usages of the update_peer_info function and in all of them some of the peer_info fields were set or should be set: * sync_raft_topology_nodes/process_normal_node: e.g. tokens is set * sync_raft_topology_nodes/process_transition_node: host_id is set * handle_state_normal: tokens is set * storage_service::on_change: get_peer_info_for_update could potentially return a peer_info with all fields set to empty, but this shouldn't be possible, host_id should always be set. Moreover, there is a bug here: we extract host_id from the states_ parameter, which represent the gossiper application states that have been changed. This parameter contains host_id only if a node changes its IP address, in all other cases host_id is unset. This means we could end up with a record with empty host_id, if it wasn't previously set by some other means. We are going to fix this bug in the next commit.	2024-02-15 13:21:04 +04:00
Petr Gusev	4a14988735	system_keyspace: remove duplicate ips for host_id When a node changes IP we call sync_raft_topology_nodes from raft_ip_address_updater::on_endpoint_change with the old IP value in prev_ip parameter. It's possible that the nodes crashes right after we insert a new IP for the host_id, but before we remove the old IP. In this commit we fix the possible inconsistency by removing the system.peers record with old timestamp. This is what the new peers_table_read_fixup function is responsible for. We call this function in all system_keyspace methods that read the system.peers table. The function loads the table in memory, decides if some rows are stale by comparing their timestamps and removes them. The new function also removes the records with no host_id, so we no longer need the get_host_id function. We'll add a test for the problem this commit fixes in the next commit.	2024-02-15 13:21:04 +04:00
Petr Gusev	fa8718085a	system_keyspace: peers table: use coroutines This is a refactoring commit with no observable changes in behaviour. We switch the functions to coroutines, it'll be easy to work with them in this way in the next commit. Also, we add more const-s along the way.	2024-02-15 13:21:04 +04:00
Petr Gusev	00547d3f48	storage_service::raft_ip_address_updater: log gossiper event name It's useful for debugging.	2024-02-15 13:20:54 +04:00
Petr Gusev	6955cfa419	raft topology: ip change: purge old IP When a node changes IP address we need to remove its old IP from system.peers and gossiper. We do this in sync_raft_topology_nodes when the new IP is saved into system.peers to avoid losing the mapping if the node crashes between deleting and saving the new IP. In the next commit we handle the possible duplicates in this case by dropping them on the read path. In subsequent commits, test_change_ip will be adjusted to ensure that old IPs are removed. fixes scylladb/scylladb#16886 fixes scylladb/scylladb#16691	2024-02-15 13:19:13 +04:00
Petr Gusev	a2c0384cd1	on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes We introduce the helper 'ensure_alive' which takes a coroutine lambda and returns a wrapper which ensures the proper lifetime for it. It works by moving the input lambda onto the heap and keeping the ptr alive until the resulting future is resolved. We also move the holder acquired from _async_gate to the 'then' lambda closure, since now these closures will be kept alive during the lambda coroutine execution. We'll be adding more code to this lambda in the subsequent commits, it's easier to work with coroutines.	2024-02-15 13:13:44 +04:00
Kefu Chai	f9d19a61ff	tools/scylla-nodetool: implement viewbuildstatus Refs 15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-15 16:54:16 +08:00
Nadav Har'El	28db187756	alternator, tablets: return error if enabling TTL with tablets Alternator TTL doesn't yet work on tables using tablets (this is issue #16567). Before this patch, it can be enabled on a table with tablets, and the result is a lot of log spam and nothing will get expired. So let's make the attempt to enable TTL on a table that uses tablets into a clear error. The error message points to the issue, and also suggests how to create a table that uses vnodes, not tablets. This patch also adds a test that verifies that trying to enable TTL with tablets is an error. Obviously, this test should be removed once the issue is solved and TTL begins working with tablets. Refs #16567 Refs #16807 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17306	2024-02-15 10:47:06 +02:00
Kefu Chai	4da9a62472	utils: managed_bytes: fix typo in comment s/assigments/assignments/ this misspelling was identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17333	2024-02-15 10:37:25 +02:00
Kefu Chai	8e8b73fa82	dht: add formatter for paritition_range_view and i_partition before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `partition_range_view` and `i_partition`, and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17331	2024-02-15 09:46:03 +02:00
Lakshmi Narayanan Sreethar	3b7b315f6a	replica/database: quiesce compaction before closing system tables during shutdown During shutdown, as all system tables are closed in parallel, there is a possibility of a race condition between compaction stoppage and the closure of the compaction_history table. So, quiesce all the compaction tasks before attempting to close the tables. Fixes #15721 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17218	2024-02-15 09:44:16 +02:00
Nadav Har'El	b97ded5c4a	test/topology: tests for setting tombstone_gc on materialized view A user asked on the ScyllaDB forum several questions on whether tombstone_gc works on materialized views. This patch includes two tests that confirm the following: 1. The tombstone_gc may be set on a view - either during its creation with CREATE MATERIALIZED VIEW or later with ALTER MATERIALIZED VIEW. 2. The tombstone_gc setting is correctly shown - for both base tables and views - by the "DESC" statement. 3. The tombstone_gc setting is NOT inherited from a base table to a new view - if you want this option on a view, you need to set it separately. Unfortunately, this test could not be a single-node cql-pytest because we forbid tombstone_gc=repair when RF=1, and since recently, we forbid setting RF>1 on a single-node setup. So the new tests are written in the test/topology framework - which may run multiple tests against a single three-node cluster run multiple tests against it. To write tests over a shared cluster, we need functions which create temporary keyspaces, tables and views, which are deleted automatically as soon as a test ends. The test/topology framework was lacking such functions, so this tests includes them - currently inside the test file, but if other people find them useful they can be moved to a more central location. The new functions, net_test_keyspace(), new_test_table() and new_materialized_view() are inspired by the identically-named functions in test/cql-pytest/util.py, but the implementation is different: Importantly, the new functions here are async context managers, used via "async with", to fit with the rest of the asynchronous code used in the topology test framework. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17345	2024-02-15 09:43:30 +02:00
Kefu Chai	bcb144ada3	configure.py: disable stack-use-after-scope check only when ASan is enabled `-fno-sanitize-address-use-after-scope` is used to disable the check for stack-use-after-scope bugs, but this check is only performed when ASan is enabled. if we pass this option when ASan is not enabled, we'd have following warning, so let's apply it only when ASan is enabled. ``` clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17329	2024-02-15 09:28:29 +02:00
Botond Dénes	ca13ff10ea	service/storage_service: get_natural_endpoints(): add tablets support Also add a unit test for this API endpoint, testing it with both tablets and vnodes.	2024-02-15 02:07:18 -05:00
Botond Dénes	7f17d3bb0e	replica/database: keyspace: add uses_tablets() Mirroring table::uses_tablets(), provides a convenient and -- more importabtly -- easily discoverable way to determine whether the keyspace uses tablets or not. This information is of course already available via the abstract replication strategy, but as seen in a few examples, this is not easily discoverable and sometimes people resorted to enumerating the keyspace's tables to be able to invoke table::uses_tablets().	2024-02-15 01:51:26 -05:00
Botond Dénes	0b2acf90ff	service/storage_service: remove token overload of get_natural_endpoints() This overload does not work with tablets because it only has a keyspace and token parameters. The only caller is the other overload, which also has a table parameters, so it can be made to works with tablets. Inline this overload into the other and remove it, in preparation to fixing this method for tablets.	2024-02-15 01:51:25 -05:00
Kefu Chai	68795eb8fa	tools/scylla-nodetool: implement gossipinfo Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17317	2024-02-15 08:41:39 +02:00
Kefu Chai	a7abaa457b	tools/scylla-nodetool: implement compactionstats Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-15 12:29:10 +08:00
Anna Stuchlik	710d182654	doc: update Handling Node Failures to add topology This commit updates the Handling Node Failures page to specify that the quorum requirement refers to both schema and topology updates. Closes scylladb/scylladb#17321	2024-02-14 17:15:13 +01:00
Kamil Braun	7e9e10186f	Merge 'change the way ignored nodes are handled by the topology coordinator' from Gleb This series makes several changes to how ignored nodes list is treated by the topology coordinator. First the series makes it global and not part of a single topology operation, second it extends the list at the time of removenode/replace invocation and third it bans all nodes in the list from contacting the cluster ever again. The main motivation is to have a way to unblock tablet migration in case of a node failure. Tablet migration knows how to avoid nodes in ignored nodes list and this patch series provides a way to extend it without performing any topology operation (which is not possible while tables migration runs). Fixes scylladb/scylladb#16108 * 'gleb/ignore-nodes-handling-v2' of github.com:scylladb/scylla-dev: test: add test for the new ignore nodes behaviour topology coordinator: cleanup node_state::decommissioning state handling code topology coordinator: ban ignored nodes just like we ban nodes that are left storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time topology coordinator: make ignored_nodes list global and permanent topology_coordinator: do not cancel rebuild just because some other nodes are dead topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously	2024-02-14 16:36:01 +01:00
Marcin Maliszkiewicz	0b8b9381f4	auth: drop const from methods on write path In a follow-up patch abort_source will be used inside those methods. Current pattern is that abort_source is passed everywhere as non const so it needs to be executed in non const context. Closes scylladb/scylladb#17312	2024-02-14 13:24:53 +01:00
Tzach Livyatan	902733cd7e	Docs: rename doc page from REST tp Admin REST API Closes scylladb/scylladb#17334	2024-02-14 13:49:54 +02:00
Kefu Chai	d43c418f72	tools/scylla-nodetool: implement getendpoints Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17332	2024-02-14 11:20:52 +02:00
Gleb Natapov	7802c206c7	test: add test for the new ignore nodes behaviour The test checks that once a node is specified in ignored node list by one topology operation the information is carried over to the next operation as well.	2024-02-14 10:35:11 +02:00
Gleb Natapov	7ec9316774	topology coordinator: cleanup node_state::decommissioning state handling code The code is shared between decommission and removenode and it has scattered 'ifs' for different behaviours between those. Change it to have only one 'if'.	2024-02-14 10:35:11 +02:00
Gleb Natapov	363af9e664	topology coordinator: ban ignored nodes just like we ban nodes that are left Since now a node that is at one point was marked as dead, either via --ignore-dead-nodes parameter or by been a target for removenode or replace, can no longer be made "undead" we need to make sure that they cannot rejoin the cluster any longer. Do that by banning them on a messaging layer just like we do for nodes that are left. Not that the removenode failure test had to be altered since it restarted a node after removenode failure (which now will not work). Also, since the check for liveness was removed from the topology coordinator (because the node is already banned by then), the test case that triggers the removed code is removed as well.	2024-02-14 10:35:06 +02:00
Kefu Chai	ab07fb25f5	scylla_raid_setup: reference xfsprog on the minimal 1024 block size the quote of "The minimum block size for crc enabled filesystems is 1024" comes from the output of mkfs.xfs, let's quote the source for better maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17094	2024-02-14 08:44:14 +02:00
Michał Chojnowski	3d81138852	configure.py: don't modify `modes` in write_build_file() The true motivation for this patch is a certain problem with configure.py in scylla-enterprise, which can only be solved by moving the `extra_cxxflags` lines before configure_seastar(). This patch does that by hoisting get_extra_cxxflags() up to create_build_system(). But this patch makes sense even if we disregard the real motivation. It's weird that a function called `write_build_file()` adds additional build flags on its own. Closes scylladb/scylladb#17189	2024-02-13 21:28:32 +02:00
Patryk Wrobel	a3fb44cbca	Rename keyspace::get_effective_replication_map() This commit renames keyspace::get_effective_replication_map() to keyspace::get_vnode_effective_replication_map(). This change is required to ease the analysis of the usage of this function. When tablets are enabled, then this function shall not be used. Instead of per-keyspace, per-table replication map should be used. The rename was performed to distinguish between those two calls. The next step will be an audit of usages of keyspace::get_vnode_effective_replication_map(). Refs: scylladb#16626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17314	2024-02-13 20:22:02 +02:00
Nadav Har'El	5d4c60aee3	test/cql-pytest: avoid spurious guardrail warnings All cql-pytest tests use one node, and unsuprisingly most use RF=1. By default, as part of the "guardrails" feature, we print a warning when creating a keyspace with RF=1. This warning gets printed on every cql-pytest run, which creates a "boy who cried wolf" effect whereby developers get used to seeing these warnings, and won't care if new warnings start appearing. The fix is easy - in run.py start Scylla with minimum-replication-factor- warn-threshold set to -1 instead of the default 3. Note that we do have cql-pytest tests for this guardrail, but those don't rely on the default setting of this variable (they can't, cql-pytest tests can also be run on a Scylla instance run manually by a developer). Those tests temporarily set the threshold during the test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17274	2024-02-13 17:44:20 +02:00
Kefu Chai	b309e42195	collection_mutation: add formatter for collection_mutation_view::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `collection_mutation_view::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17300	2024-02-13 17:42:25 +02:00
Botond Dénes	120442231f	Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it. Closes scylladb/scylladb#17208 * github.com:scylladb/scylladb: row_cache_test: test cache consistency during memtable-to-cache merge row_cache: use preemption_source in update() utils: preempt: add preemption_source	2024-02-13 17:37:06 +02:00
Kefu Chai	54ed65bb50	mutation: s/statics/static content/ codespell reports that "statics" could be the misspelling of "statistics". but "static" here means the static column(s). so replace "static" with more specific wording. Refs #589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17216	2024-02-13 17:33:21 +02:00
Kefu Chai	9b6a66826c	api/storage_service: add more constness to http_context parameter when we just want to perform read access to `http_context`, there is no need to use a non-const reference. so let's add `const` specifier to make this explicit. this shoudl help with the readability and maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17219	2024-02-13 17:32:45 +02:00
Lakshmi Narayanan Sreethar	f8f8d64982	test.py: support skipping multiple test patterns Support skipping multiple patterns by allowing them to be passed via multiple '--skip' arguments to test.py. Example : `test.py --skip=topology --skip=sstables` Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17220	2024-02-13 17:32:03 +02:00
Kefu Chai	57d138b80f	row_cache: s/fro/reader/ "fro" is the short of "from" but the value is an `optimized_optional<flat_mutation_reader_v2>`. codespell considers it a misspelling of "for" or "from". neither of them makes sense, so let's change it to "reader" for better readability, also for silencing the warning. so that the geniune warning can stands out, this would help to make the codespell more useful. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17221	2024-02-13 17:28:14 +02:00
Kefu Chai	c555af3cd8	raft: add formatter for raft::log before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `raft::log`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17301	2024-02-13 17:17:57 +02:00
Anna Stuchlik	02cd84adbf	doc: remove OSS-vs-Ent Matrix from OSS docs This commit removes the Open Source vs. Enterprise matrix from the Open Source documentation. In addition, a redirection is added to prevent 404 in the OSS docs, and to the removed page is replaced with a link to the same page in the Enterprise docs. This commit must be reverted enterprise.git, because we want to keep the Matrix in the Enterprise docs. Fixes https://github.com/scylladb/scylladb/issues/17289 Closes scylladb/scylladb#17295	2024-02-13 17:17:22 +02:00
Yaniv Kaul	d2ef100b60	Typos: more/less then -> more/less than Fix repated typos in comments: more then -> more than, less then -> less than Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#17303	2024-02-13 17:16:15 +02:00
Nadav Har'El	dce47a81b0	alternator, tablets: return error if enabling Streams with tablets Alternator Streams doesn't yet work on tables using tablets (this is issue #16317). Before this patch, an attempt to enable it results in an unsightly InternalServerError, which isn't terrible - but we can do better. So in this patch, we make the attempt to enable Streams and tablets together into a clear error. The error message points to the open issue, and also suggests how to create a table that uses vnodes, not tablets. Unfortunately, there are slightly two different code paths and error messages for two cases: One case is the creation of a new table (where the validation happens before the keyspace is actually created), and the other case is an attempt to enable streams on an existing table with an existing keyspace (which already might or might not be using tablets). This patch also adds a test that verifies that trying to enable Streams with tablets is an error - in both cases (table creation and update). Obviously, this test - and the validation code - should be removed once the issue is solved and Alternator Streams begins working with tablets. Fixes #16497 Refs #16807 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17311	2024-02-13 16:42:35 +02:00
Raphael S. Carvalho	54226dddf5	replica: Kill vnode-oriented cleanup handling for multiple compaction groups With tablets, we don't use vnode-oriented sstable cleanup. So let's just remove unused code and bail out silently if sharding is tablet based. The reason for silence is that we don't want to break tests that might be reused for tablets, and it's not a problem for sstable cleanup to be ignored with tablets. This approach is actually already used in the higher level code, implementing the cleanup API. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17296	2024-02-13 16:35:15 +02:00
Gleb Natapov	8f7d2fd44b	storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace Fail commands if provided nodes are not in the "normal" state.	2024-02-13 16:15:35 +02:00
Gleb Natapov	d062a04df0	topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time To unblock tablet migration in case of a node failure we need a way to dynamically extend a list of ignored_nodes while the migration is happening. This patch does it by piggybacking on existing topology operations that assume their target node is already dead. It adds the target node to now global ignored_nodes list when request is issued and, for better HA, makes the nodes in ignored_nodes non voters.	2024-02-13 16:15:35 +02:00
Gleb Natapov	9b52dc4560	topology coordinator: make ignored_nodes list global and permanent Currently ignored_nodes list is part of a request (removenode or replace) and exists only while a request is handled. This patch changes it to be global and exist outside of any request. Node stays in the list until they eventually removed and moved to the "left" state. If a node is specified in the ignore-dead-nodes option for any command it will be ignored for all other operations that support ignored_nodes (like tablet migration).	2024-02-13 16:15:35 +02:00
Gleb Natapov	cbef807e69	topology_coordinator: do not cancel rebuild just because some other nodes are dead Rebuild may not contact all the nodes, so it may succeed even while some nodes are dead.	2024-02-13 16:15:35 +02:00
Gleb Natapov	0fe00e34ef	topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout It will be easier to distinguish the failure reason.	2024-02-13 16:15:35 +02:00
Gleb Natapov	f21a3b4ca5	raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously	2024-02-13 16:15:35 +02:00
Petr Gusev	3722ca0a41	sync_raft_topology_nodes: parallelize system_keyspace update functions In sync_raft_topology_nodes we execute a system keyspace update query for each node of the cluster. The system keyspace tables use schema commitlog which by default enables use_o_dsync. This means that each write to the commitlog is accompanied by fsync. For large clusters this can incur hundreds of writes with fsyncs, which is very expensive. For example, in #17039 for a moderate size cluster of 50 nodes sync_raft_topology_nodes took almost 5 seconds. In this commit we solve this problem by running all such update queries in parallel. The commitlog should batch them and issue only one write syscall to the OS. Closes scylladb/scylladb#17243	2024-02-13 14:44:48 +01:00
Piotr Dulikowski	314fd9a11f	test: test_topology_recovery_basic: add missing driver reconnect Unfortunately, scylladb/python-driver#230 is not fixed yet, so it is necessary for the sake of our CI's stability to re-create the driver session after all nodes in the cluster are restarted. There is one place in test_topology_recovery_basic where all nodes are restarted but the driver session is not re-created. Even though nodes are not restarted at once but rather sequentially, we observed a failure with similar symptoms in a CI run for scylla-enterprise. Add the missing driver reconnect as a workaround for the issue. Fixes: scylladb/scylladb#17277 Closes scylladb/scylladb#17278	2024-02-13 12:28:30 +01:00
David Garcia	f45d9d33f1	docs: remove liveness asterisks Instead of adding an asterisk next to "liveness" linking to the glossary, we will temporarily replace them with a hyperlink pending the implementation of tooltip functionality. Closes scylladb/scylladb#17244	2024-02-12 20:37:52 +02:00
Avi Kivity	b22db74e6a	Regenerate frozen toolchain For gnutls 3.8.3 and clang clang-16.0.6-4. Fixes #17285. Closes scylladb/scylladb#17287	2024-02-12 18:36:11 +02:00
Botond Dénes	3f2d7e8b25	tree: remove unnecessary yields around for_each_tablet() Commit `904bafd069` consolidated the two existing for_each_tablet() overloads, to the one which has a future<> returning callback. It also added yields to the bodies of said callbacks. This is unnecessary, the loop in for_each_tablet() already has a yield per tablet, which should be enough to prevent stalls. This patch is a follow-up to #17118 Closes scylladb/scylladb#17284	2024-02-12 17:10:25 +01:00
Kamil Braun	2e81f045cc	Merge 'transport: controller: do_start_server: do not set_cql_read for maintenance port' from Benny Halevy RPC is not ready yet at this point, so we should not set this application state yet. Also, simplify add_local_application_state as it contains dead code that will never generate an internal error after `1d07a596bf`. Fixes #16932 Closes scylladb/scylladb#17263 * github.com:scylladb/scylladb: gossiper: add_local_application_state: drop internae error transport: controller: do_start_server: do not set_cql_read for maintenance port	2024-02-12 13:26:45 +01:00
Pavel Emelyanov	2b1612aa04	main: Stop lifecycle notifier for real It wasn't because of storage service, not the latter is stopped (since `e6b34527c1`), so the former can be stopped to Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17251	2024-02-12 13:59:50 +02:00
Kefu Chai	7baee379de	sstable/storage: pass fs::path to storage::create_links() this change is a follow-up of `637dd730`. the goal is to use std::filesystem::path for manipulating paths, and to avoid the converting between sstring and fs::path back and forth. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17257	2024-02-12 13:26:11 +02:00
Kefu Chai	7a5cb69e33	storage_service: s/format()/fmt::format/ in the same spirit of `e84a0991`, let's switch the callers who expect std::string to fmt::format(). to minimize the impact and to reduce the risk, the switch will be performed piecemeal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17253	2024-02-12 13:24:21 +02:00
Pavel Emelyanov	b9721bd397	test/tablets: Decommissioning node below RF is not allowed When a node is decommissioned, all tablet replicas need to be moved away from it. In some cases it may not be possible. If the number of node in the cluster equals the keysapce RF, one cannot decommission any node because it's not possible to find nodes for every replica. The new test case validates this constraint is satisfied. refs: #16195 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17248	2024-02-12 13:21:47 +02:00
Nadav Har'El	21e7deafeb	alternator, mv: fix case of two new key columns in GSI A materialized view in CQL allows AT MOST ONE view key column that wasn't a key column in the base table. This is because if there were two or more of those, the "liveness" (timestamp, ttl) of these different columns can change at every update, and it's not possible to pick what liveness to use for the view row we create. We made an exception for this rule for Alternator: DynamoDB's API allows creating a GSI whose partition key and range key are both regular columns in the base table, and we must support this. We claim that the fact that Alternator allows neither TTL (Alternator's "TTL" is a different feature) nor user-defined timestamps, does allow picking the liveness for the view row we create. But we did it wrong! We claimed in a comment - and implemented in the code before this patch - that in Alternator we can assume that both GSI key columns will have the same liveness, and in particular timestamp. But this is only true if one modifies both columns together! In fact, in general it is not true: We can have two non-key attributes 'a' and 'b' which are the GSI's key columns, and we can modify only b, without modifying a, in which case the timestamp of the view modification should be b's newer timestamp, not a's older one. The existing code took a's timestamp, assuming it will be the same as b's, which is incorrect. The result was that if we repeatedly modify only b, all view updates will receive the same timestamp (a's old timestamp), and a deletion will always win over all the modifications. This patch includes a reproducing test written by a user (@Zak-Kent) that demonstrates how after a view row is deleted it doesn't get recreated - because all the modifications use the same timestamp. The fix is, as suggested above, to use the higher of the two timestamps of both base-regular-column GSI key columns as the timestamp for the new view rows or view row deletions. The reproducer that failed before this patch passes with it. As usual, the reproducer passes on AWS DynamoDB as well, proving that the test is correct and should really work. Fixes #17119 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17172	2024-02-12 13:17:29 +02:00
Nadav Har'El	341af86167	test/cql-pytest: reproducer for GROUP BY regression This patch adds a simple reproducer for a regression in Scylla 5.4 caused by commit `432cb02`, breaking LIMIT support in GROUP BY. Refs #17237 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17275	2024-02-12 13:09:52 +02:00
Kefu Chai	57df20eef8	configure.py: use un-deprecated module PEP 632 deprecates distutils module, and it is remove from Python 3.12. we are actually using the one vendored by setuptools, if we are using 3.12. so let's use shutil for finding ninja executable. see https://peps.python.org/pep-0632/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17271	2024-02-12 13:05:35 +02:00
Kamil Braun	7d73c40125	Merge 'test.py: tablets: Fix flakiness of test_tablet_missing_data_repair' from Tomasz Grabiec Reimplements stop/start sequence using rolling_restart() which is safe with regards to UP status propagation and not prone to sudden connection drop which may cause later CQL queries to time out. It also ensures that CQL is up on all the remaining nodes when the with_down callback is executed. The test was observed to fail in CI like this: ``` cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.157.135.26:9042 datacenter1>: ConnectionException('Pool for 127.157.135.26:9042 is shutdown')}) ... @pytest.mark.repair @pytest.mark.asyncio async def test_tablet_missing_data_repair(manager: ManagerClient): ... for idx in range(0,3): s = servers[idx].server_id await manager.server_stop_gracefully(s, timeout=120) > await check() ``` Hopefully: Fixes #17107 Closes scylladb/scylladb#17252 * github.com:scylladb/scylladb: test: py: tablets: Fix flakiness of test_tablet_missing_data_repair test: pylib: manager_client: Wait for driver to catch up in rolling_restart() test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down	2024-02-12 11:52:09 +01:00
Botond Dénes	f068d1a6fa	query: do not kill unpaged queries when they reach the tombstone-limit The reason we introduced the tombstone-limit (query_tombstone_page_limit), was to allow paged queries to return incomplete/empty pages in the face of large tombstone spans. This works by cutting the page after the tombstone-limit amount of tombstones were processed. If the read is unpaged, it is killed instead. This was a mistake. First, it doesn't really make sense, the reason we introduced the tombstone limit, was to allow paged queries to process large tombstone-spans without timing out. It does not help unpaged queries. Furthermore, the tombstone-limit can kill internal queries done on behalf of user queries, because all our internal queries are unpaged. This can cause denial of service. So in this patch we disable the tombstone-limit for unpaged queries altogether, they are allowed to continue even after having processed the configured limit of tombstones. Fixes: #17241 Closes scylladb/scylladb#17242	2024-02-12 12:34:04 +02:00
Kefu Chai	9b85d1aebf	configure.py, cmake: do not pass -Wignored-qualifiers explicitly we recently added -Wextra to configure.py, and this option enables a bunch of warning options, including `-Wignored-qualifiers`. so there is no need to enable this specific warning anymore. this change remove ths option from both `configure.py` and the CMake building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17272	2024-02-12 12:32:00 +02:00
Avi Kivity	c14571af16	Update seastar submodule Because Seastar now defaults to C++23, we downgrade it explicitly to C++20. * seastar 289ad5e593...5d3ee98073 (10): > Update supported C++ standards to C++23 and C++20 (dropping C++17) > docker: install clang-tools-18 > http: add handler_base::verify_mandatory_params() > coroutine/exception: document return_exception_ptr() > http: use structured-binding when appropriate > test/http: Read full server response before sending next > doc/lambda-coroutine-fiasco: fix a syntax error > util/source_location-compat: use __cpp_consteval > Fix incorrect class name in documentation. > Add support for missing HTTP PATCH method. Closes scylladb/scylladb#17268	2024-02-12 12:21:47 +02:00
Patryk Wrobel	9fccd968d3	test_tablets.py: implement test_tablet_count_metric_per_shard This change introduces a new test that verifies the functionality related to tablet_count metric. It checks if tablet_count metric is correctly reported and updated when new tables are created, when tables are dropped and when `move_tablet` is executed. Refs: scylladb#16131 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17165	2024-02-12 11:49:38 +02:00
Kefu Chai	54995fcac0	test/manual: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17255	2024-02-12 11:49:38 +02:00
Patryk Jędrzejczak	38e1ddb8bc	docs: using-scylla: cdc: remove info about failing writes to old generations In one of the previous patches, we have allowed writing to the previous CDC generations for `generation_leeway`. This change has made the information about failing writes to the previous generation and the "rejecting writes to an old generation" example obsolete so we remove them. After the change, a write can only fail if its timestamp is distant from the node's timestamp. We add the information about it.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	9b923f8b81	docs: dev: cdc: document writing to previous CDC generations We update the dev documentation after allowing writes to the previous CDC generations in one of the previous patches.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	e64162e8f6	test: add test_writes_to_previous_cdc_generations In one of the previous patches, we allowed writing to the previous CDC generations for `generation_leeway`. Now, we add tests for this change.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	0470b721c2	cdc: generation: allow increasing generation_leeway through error injection The increased `generation_leeway` is used in the next patch to write a test. Since it's no longer a constant, we create a new getter for it.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	330a37b5c9	cdc: metadata: allow sending writes to the previous generations Before this patch, writes to the previous CDC generations would always be rejected. After this patch, they will be accepted if the write's timestamp is greater than `now - generation_leeway`. This change was proposed around 3 years ago. The motivation was to improve user experience. If a client generates timestamps by itself and its clock is desynchronized with the clock of the node the client is connected to, there could be a period during generation switching when writes fail. We didn't consider this problem critical because the client could simply retry a failed write with a higher timestamp. Eventually, it would succeed. This approach is safe because these failed writes cannot have any side effects. However, it can be inconvenient. Writing to previous generations was proposed to improve it. The idea was rejected 3 years ago. Recently, it turned out that there is a case when the client cannot retry a write with the increased timestamp. It happens when a table uses CDC and LWT, which makes timestamps permanent. Once Paxos commits an entry with a given timestamp, Scylla will keep trying to apply that entry until it succeeds, with the same timestamp. Applying the entry involves writing to the CDC log table. If it fails, we get stuck. It's a major bug with an unknown perfect solution. Allowing writes to previous generations for `generation_leeway` is a probabilistic fix that should solve the problem in practice. Note that allowing writes only to the previous generation might not be enough. With the Raft-based topology, it is possible to add multiple nodes concurrently. Moreover, tablets make streaming instant, which allows the topology coordinator to add multiple nodes very quickly. So, creating generations with almost identical timestamps is possible. Then, we could encounter the same bug but, for example, for a generation before the previous generation.	2024-02-12 10:14:00 +01:00
Asias He	a0e46a6b47	repair: Fix rpc::source and rpc::optional parameter order in rpc message In a mixed cluster (5.4.1-20231231.3d22f42cf9c3 and 5.5.0~dev-20240119.b1ba904c4977), in the rolling upgrade test, we saw repair never finishing. The following was observed: rpc - client 127.0.0.2:65273 msg_id 5524: caught exception while processing a message: std::out_of_range (deserialization buffer underflow) It turns out the repair rpc message was not compatible between the two versions. Even with a rpc stream verb, the new rpc parameters must come after the rpc::source<> parameter. The rpc::source<> parameter is not special in the sense that it must be the last parameter. For example, it should be: void register_repair_get_row_diff_with_rpc_stream( std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> ( const rpc::client_info& cinfo, uint32_t repair_meta_id, rpc::source<repair_hash_with_cmd> source, rpc::optional<shard_id> dst_cpu_id_opt)>&& func); not: void register_repair_get_row_diff_with_rpc_stream( std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> ( const rpc::client_info& cinfo, uint32_t repair_meta_id, rpc::optional<shard_id> dst_cpu_id_opt, rpc::source<repair_hash_with_cmd> source)>&& func); Fixes #16941 Closes scylladb/scylladb#17156	2024-02-12 09:50:30 +02:00
Nadav Har'El	13e16475fa	cql-pytest: fix skipping of tests on Cassandra or old Scylla Recently we added a trick to allow running cql-pytests either with or without tablets. A single fixture test_keyspace uses two separate fixtures test_keyspace_tablets or test_keyspace_vnodes, as requested. The problem is that even if test_keyspace doesn't use its test_keyspace_tablets fixture (it doesn't, if the test isn't parameterized to ask for tablets explicitly), it's still a fixture, and it causes the test to be skipped. This causes every test to be skipped when running on Cassandra or old Scylla which doesn't support tablets. The fix is simple - the internal fixture test_keyspace_tablets should yield None instead of skipping. It is the caller, test_keyspace, which now skips the test if tablets are requested but test_keyspace_tablets is None. Fixes #17266 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17267	2024-02-11 21:03:25 +02:00
Kefu Chai	f990ea9678	tools/scylla-nodetool: implement describecluster Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17240	2024-02-11 20:21:07 +02:00
Avi Kivity	14bf09f447	Merge 'utils: managed_bytes: optimize memory usage for small buffers' from Michał Chojnowski managed_bytes is implemented as chain of blob_storage objects. Each blob_storage contains 24 bytes of metadata. But in the most common case -- when there is only a single element in the chain -- 16 bytes of this metadata is trivial/unused. This is regrettable waste because managed_bytes is used for every database cell in the memtables and cache. It means that every value of size >= 7 bytes (smaller ones fit in the inline storage of managed_bytes) receives 16 bytes of useless overhead. To correct that, this series adds to managed_bytes an alternative storage layout -- used for buffers small enough to fit in one fragment -- which only stores the necessary minimum of metadata. (That is: a pointer to the parent, to facilitate moving the storage during memory defragmentation). This saves 16 bytes on every cell greater than 15 bytes. Which includes e.g. every live cell with value bigger than 6 bytes, which likely applies to most cells. Before: ``` $ build/release/scylla perf-simple-query --duration 10 median 218692.88 tps ( 61.1 allocs/op, 13.1 tasks/op, 41762 insns/op, 0 errors) $ build/release/scylla perf-simple-query --duration 10 --write median 173511.46 tps ( 58.3 allocs/op, 13.2 tasks/op, 53258 insns/op, 0 errors) $ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16 - in cache: 2580222 - in memtable: 2549852 ``` After: ``` $ build/release/scylla perf-simple-query --duration 10 median 218780.89 tps ( 61.1 allocs/op, 13.1 tasks/op, 41763 insns/op, 0 errors) $ build/release/scylla perf-simple-query --duration 10 --write median 173105.78 tps ( 58.3 allocs/op, 13.2 tasks/op, 52913 insns/op, 0 errors) $ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16 - in cache: 2068238 - in memtable: 2037696 ``` Closes scylladb/scylladb#14263 * github.com:scylladb/scylladb: utils: managed_bytes: optimize memory usage for small buffers utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view	2024-02-11 16:43:40 +02:00
Kefu Chai	cfb2c2c758	db: add formatter for gc_clock::time_point before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gc_clock::time_point`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17254	2024-02-11 16:39:25 +02:00
Kefu Chai	33224cc10b	sstables/storage: avoid unnecessary type cast the type of `_dir` was changed to fs::path back in `637dd730`, there is no need to cast `_dir` to fs::path anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17256	2024-02-11 16:37:05 +02:00
Benny Halevy	2ed29e31db	gms: inet_address: make constructors explicit In particular, `inet_address(const sstring& addr)` is dangerous, since a function like `topology::get_datacenter(inet_address ep)` might accidentally convert a `sstring` argument into an `inet_address` (which would most likely throw an obscure std::invalid_argument if the datacenter name does not look like an inet_address). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17260	2024-02-11 15:44:13 +02:00
Benny Halevy	136df58cbc	data_value: delete data_value(T) constructor Currently, since the data_value(bool) ctor is implicit, pointers of any kind are implicitly convertible to data_value via intermediate conversion to `bool`. This is error prone, since it allows unsafe comparison between e.g. an `sstring` with `some` by implicit conversion of both sides to `data_value`. For example: ``` sstring name = "dc1"; struct X { sstring s; }; X x(name); auto p = &x; if (name == p) {} ``` Refs #17261 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17262	2024-02-11 15:42:55 +02:00
Benny Halevy	f86a5072d6	gossiper: add_local_application_state: drop internae error After `1d07a596bf` that dropped before_change notifications there is no sense in getting the local endpoint_state_ptr twice: before and after the notifications and call on_internal_error if the state isn't found after the notifications. Just throw the runtime_error if the endpoint state is not found, otherwise, use it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-02-11 13:33:26 +02:00
Benny Halevy	ac83df4875	transport: controller: do_start_server: do not set_cql_read for maintenance port RPC is not ready yet at this point, so we should not set this application state yet. This is indicated by the following warning from `gossiper::add_local_application_state`: ``` WARN 2024-01-22 23:40:53,978 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.227.191.13, application_states = {{RPC_READY -> Value(1,1)}}) ``` That should really be an internal error, but it can't because of this bug. Fixes #16932 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-02-11 11:49:52 +02:00
Kefu Chai	d7a404e1ec	alternator: add formatter for alternator::calculate_value_caller before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `alternator::calculate_value_caller`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17259	2024-02-11 11:49:46 +02:00
Michał Chojnowski	5a3e4a1cc0	utils: managed_bytes: optimize memory usage for small buffers managed_bytes is implemented as chain of blob_storage objects. Each blob_storage contains 24 bytes of metadata. But in the most common case -- when there is only a single element in the chain -- 16 bytes of this metadata is trivial/unused. This is regrettable waste because managed_bytes is used for every database cell in the memtables and cache. It means that every value of size >= 7 bytes (smaller ones fit in the inline storage of managed_bytes) receives 16 bytes of useless overhead. To correct that, this patch adds to managed_bytes an alternative storage layout -- used for buffers small enough to fit in one contiguous fragment -- which only stores the necessary minimum of metadata. (That is: a pointer to the parent, to facilitate moving the storage during memory defragmentation).	2024-02-09 20:56:20 +01:00
Tomasz Grabiec	1eedc85990	test: py: tablets: Fix flakiness of test_tablet_missing_data_repair Reimplement stop/start sequence using rolling_restart() which is safe with regards to UP status propagation and not prone to sudden connection drop which may cause later CQL queries to time out. It also ensures that CQL is up on all the remaining nodes when the with_down callback is executed. Hopefully: Fixes #17107	2024-02-09 20:37:06 +01:00
Tomasz Grabiec	27ed2d94fc	test: pylib: manager_client: Wait for driver to catch up in rolling_restart() For sanity of the developers who want to execute CQL queries after rolling restarts.	2024-02-09 20:35:41 +01:00
Tomasz Grabiec	3ce4ec796a	test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down	2024-02-09 20:35:41 +01:00
Pavel Emelyanov	7a710425f0	streaming: Open-code on-stack lambda It just wraps one if, no benefit in keeping it this way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17250	2024-02-09 20:31:09 +01:00
Petr Gusev	4554653ad9	storage_proxy: add a test for stop_remote This patch adds a reproducer test for an issue #16382. See scylladb/seastar#2044 for details of the problem. The test is enabled only in dev mode since it requires error injection mechanism. The patch adds a new injection into storage_proxy::handle_read to simulate the problem scenario - the node is shutting down and there are some unfinished pending replica requests. Closes scylladb/scylladb#16776	2024-02-09 17:23:13 +01:00
Michał Chojnowski	277a31f0ae	utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view Some methods of managed_bytes contain the logic needed to read/write the contents of managed_bytes, even though this logic is already present in managed_bytes_{,mutable}_view. Reimplementing those methods by using the views as intermediates allows us to remove some code and makes the responsibilities cleaner -- after the change, managed_bytes contains the logic of allocating and freeing the storage, while views provide read/write access to the storage. This change will simplify the next patch which changes the internals of managed_bytes.	2024-02-09 17:00:33 +01:00
Botond Dénes	ba89b86913	Update tools/java submodule * tools/java c75ce2c1...5e11ed17 (1): > bin/nodetool-wrapper: pass all args to nodetool for testings its ability	2024-02-09 16:34:47 +01:00
Raphael S. Carvalho	daa82f406c	test_tablets: Enable table debug log in split test If the test fails, it's helpful to see how split completion was handled. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17236	2024-02-09 14:38:24 +02:00
Mikołaj Grzebieluch	38191144ac	transport/controller: get rid of magic number for socket path's maximal length Calculate `max_socket_length` from the size of the structure representing the Unix domain socket address.	2024-02-09 12:32:37 +01:00
Mikołaj Grzebieluch	fffb732704	transport/controller: set unix_domain_socket_permissions for maintenance_socket Set filesystem permissions for the maintenance socket to 660. Fixes #16487	2024-02-09 12:32:26 +01:00
Botond Dénes	c7d9708092	Merge 'repair: delete table reference from repair related classes' from Aleksandra Martyniuk row_level_repair and repair_meta keep a reference to a table. If the table is dropped during repair, its object is destructed, leaving a dangling reference. Delete {row_level_repair,repair_meta}::_cf and replace their usages. Fixes: #17233. Closes scylladb/scylladb#17234 * github.com:scylladb/scylladb: repair: delete _cf from repair_meta repair: delete _cf from row_level_repair	2024-02-09 13:16:43 +02:00
Kamil Braun	e9e24f47ec	Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski This PR implements a procedure that upgrades existing clusters to use raft-based topology operations. The procedure does not start automatically, it must be triggered manually by the administrator after making sure that no topology operations are currently running. Upgrade is triggered by sending `POST /storage_service/raft_topology/upgrade` request. This causes the topology coordinator to start who drives the rest of the process: it builds the `system.topology` state based on information observed in gossip and tells all nodes to switch to raft mode. Then, topology coordinator runs normally. Upgrade progress is tracked in a new static column `upgrade_state` in `system.topology`. The procedure also serves as an extension to the current recovery procedure on raft. The current recovery procedure requires restarting nodes in a special mode which disables raft, perform `nodetool removenode` on the dead nodes, clean up some state on the nodes and restart them so that they automatically rebuild the group 0. Raft topology fits into existing procedure by falling back to legacy topology operations after disabling raft. After rebuilding the group 0, upgrade needs to be triggered again. Because upgrade is manual and it might not be convenient for administrators to run it right after upgrading the cluster, we allow the cluster to operate in legacy topology operations mode until upgrade, which includes allowing new nodes to join. In order to allow it, nodes now ask the cluster about the mode they should use to join before proceeding by using a new `JOIN_NODE_QUERY` RPC. The procedure is explained in more detail in `topology-over-raft.md`. Fixes: https://github.com/scylladb/scylladb/issues/15008 Closes scylladb/scylladb#17077 * github.com:scylladb/scylladb: test/topology_custom: upgrade/recovery tests for topology on raft cdc/generation_service: in legacy mode, fall back to raft tables system_keyspace: add read_cdc_generation_opt cdc/generation_service: turn off gossip notifications in raft topo mode cql_test_env: move raft_topology_change_enabled var earlier group0_state_machine: pull snapshot after raft topology feature enabled storage_service: disable persistent feature enabler on upgrade storage_service: replicate raft features to system.peers storage_service: gossip tokens and cdc generation in raft topology mode API: add api for triggering and monitoring topology-on-raft upgrade storage_service: infer which topology operations to use on startup storage_service: set the topology kind value based on group 0 state raft_group0: expose link to the upgrade doc in the header feature_service: fall back to checking legacy features on startup storage_service: add fiber for tracking the topology upgrade progress gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES topology_coordinator: implement core upgrade logic topology_coordinator: extract top-level error handling logic storage_service: initialize discovery leader's state earlier topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data topology_state_machine: introduce upgrade_state storage_service: disallow topology ops when upgrade is in progress raft_group0_client: add in_recovery method storage_service: introduce join_node_query verb raft_group0: make discover_group0 public raft_group0: filter current node's IP in discover_group0 raft_group0: remove my_id arg from discover_group0 storage_service: make _raft_topology_change_enabled more advanced docs: document raft topology upgrade and recovery	2024-02-09 11:54:53 +01:00
Kefu Chai	c1c96bbc16	api/storage_service: drop /storage_service/describe_ring/ API per its description, "`/storage_service/describe_ring/`" returns the token ranges of an arbitrary keyspace. actually, it returns the first keyspace which is of non-local-vnode-based-strategy. this API is not used by nodetool, neither is it exercised in dtest. scylla-manager has a wrapper for this API though, but that wrapper is not used anywhere. in this change, this API is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17197	2024-02-09 12:49:21 +02:00
Pavel Emelyanov	309d34a147	topology: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	f7a13b9bb0	topology: Drop if_enabled checks for logging Now all the logged arguments are lazily evaluated (node* format string and backtrace) so the preliminary log-level checks are not needed. indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	c1ea6c8acf	topology: Add lazy_backtrace() helper This helper returns lazy_eval-ed current_backtrace(), so it will be generated and printed only if logger is really going to do it with its current log-level. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	da53854b66	topology: Add printer wrapper for node* and formatter for it Currently to print node information there's a debug_format(node*) helper function that returns back an sstring object. Here's the formatter that's more flexible and convenient, and a node_printer wrapper, since formatters cannot format non-void pointers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	aa0293f411	topology: Expand formatter<locator::node> Equip it with :v specifier that turns verbose mode on and prints much more data about the node. Main user will appear in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Kefu Chai	c07de1fad1	topology_coordinator: s/sate/state/ fix a typo in the logging message. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17201	2024-02-09 10:27:33 +01:00
Kefu Chai	876478b84f	storage_service: allow concurrent tablet migration in tablets/move API Currently it waits for topology state machine to be idle, so it allows one tablet to be moved at a time. We should allow it to start migration if the current transition state is - topology::transition_state::tablet_migration or - topology::transition_state::tablet_draining to allow starting parallel tablet movement. That will be useful when scripting a custom rebalancing algorithm. in this change, we wait until the topology state machine is idle or it is at either of the above two states. Fixes #16437 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17203	2024-02-08 21:47:15 +01:00
Piotr Dulikowski	4d4976feb0	test/topology_custom: upgrade/recovery tests for topology on raft Adds three tests for the new upgrade procedure: - test_topology_upgrade - upgrades a cluster operating in legacy mode to use raft topology operations, - test_topology_recovery_basic - performs recovery on a three-node cluster, no node removal is done, - test_topology_majority_loss - simulates a majority loss scenario, i.e. removed two nodes out of three, performs recovery to rebuild the raft topology state and re-add two nodes back.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	d04b3338ce	cdc/generation_service: in legacy mode, fall back to raft tables When a node enters recovery after being in raft topology mode, topology operations switch back to legacy mode. We want CDC to keep working when that happens, so we need for the legacy code to be able to access generations created back in raft mode - so that the node can still properly serve writes to CDC log tables. In order to make this possible, modify the legacy logic to also look for a cdc generation in raft tables, if it is not found in legacy tables.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	fb02453686	system_keyspace: add read_cdc_generation_opt The `system_keyspace::read_cdc_generation` loads a cdc generation from the system tables. One of its preconditions is that the generation exists - this precondition is quite easy to satisfy in raft mode, and the function was designed to be used solely in that mode. In legacy mode however, in case when we revert from raft mode through recovery, it might be necessary to use generations created in raft mode for some time. In order to make the function useful as a fallback in case lookup of a generation in legacy mode fails, introduce a relaxed variant of `read_cdc_generation` which returns std::nullopt if the generation does not exist.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	77a8f5e3d6	cdc/generation_service: turn off gossip notifications in raft topo mode In raft topology mode CDC information is propagated through group 0. Prevent the generation service from reacting to gossiper notifications after we made the switch to raft mode.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	29e286ee03	cql_test_env: move raft_topology_change_enabled var earlier We will need to pass it to cdc::generation_service::config in the next commit, so move it a bit earlier.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	07aba3abc4	group0_state_machine: pull snapshot after raft topology feature enabled Pulling a snapshot of the raft topology is done via new rpc verb (RAFT_PULL_TOPOLOGY_SNAPSHOT). If the recipient runs an older version of scylla and does not understand the verb, sending it will result in an error. We usually use cluster features to avoid such situations, but in the case when a node joins the cluster, it doesn't have access to features yet. Therefore, we need to enable pulling snapshots in two situations: - when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature becomes enabled, - in case when starting group 0 server when joining a cluster that uses raft-based topology.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	53932420f8	storage_service: disable persistent feature enabler on upgrade When starting in legacy mode, a gossip event listener called persistent feature enabler is registered. This listener marks a feature as enabled when it notices, in gossip, that all nodes declare support for the feature. With raft-based topology, features are managed in group 0 instead and do not rely on the persistent feature enabler at all. Make the listener look at the raft_topology_change_enabled() method and prevent it from enabling more features after that method starts returning true.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	4fdd3e014a	storage_service: replicate raft features to system.peers This is necessary for cluster features to work after we switch from raft topology mode to legacy topology mode during recovery, because information in system.peers is used during legacy cluster feature check and when enabling features.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	08865a0bd7	storage_service: gossip tokens and cdc generation in raft topology mode A mixed raft/legacy cluster can happen when entering recovery mode, i.e. when the group 0 upgrade state is set to 0 and a rolling restart is performed. Legacy nodes expect at least information about tokens, otherwise an internal error occurs in the handle_state_normal function. Therefore, make nodes that use raft topology behave well with respect to other nodes.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	a672383c2a	API: add api for triggering and monitoring topology-on-raft upgrade Implements the /storage_service/raft_topology/upgrade route. The route supports two methods: POST, which triggers the cluster-wide upgrade to topology-on-raft, and GET which reports the status of the upgrade.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	0bfcf7d4c6	storage_service: infer which topology operations to use on startup Adds a piece of logic to storage_service::join_cluster which chooses the mode in which it will boot. If the experimental raft topology flag is disabled, it will fall back to legacy node operations. When the node starts for the first time, it will perform group 0 discovery. If the node creates a cluster, it will start it in raft topology mode. If it joins an existing one, it will ask the node chosen by the discovery algorithm about which joining method to use. If the node is already a part of the cluster, it will base its decision on the group0 state.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	1e0aae8576	storage_service: set the topology kind value based on group 0 state When booting for the first time, the node determines whether to use raft mode or not by asking the cluster, or by going straight to raft mode when it creates a new cluster by itself. This happens before joining group 0. However, right after joining group 0, the `upgrade_state` column from `system.topology` is supposed to control which operations the node is supposed to be using. In order to have a single source of control over the flag (either storage_service code or group 0 code), the `_manage_topology_change_kind_from_group0` flag is added which controls whether the `_topology_change_kind_enabled` flag is controlled from group 0 or not.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	5392bac85b	raft_group0: expose link to the upgrade doc in the header So that it can be referenced from other files.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	3513a07d8a	feature_service: fall back to checking legacy features on startup When checking features on startup (i.e. whether support for any feature was revoked in an unsafe way), it might happen that upgrade to raft topology didn't finish yet. In that case, instead of loading an empty set of features - which supposedly represents the set of features that were enabled until last boot - we should fall back to loading the set from the legacy `enabled_features` key in `system.scylla_local`.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	d5a2837658	storage_service: add fiber for tracking the topology upgrade progress The topology coordinator fiber is not started if a node starts in legacy topology mode. We need to start the raft state monitor fiber after all preconditions for starting upgrade to raft topology are met. Add a fiber which is spawned only in legacy mode that will wait until: - The schema-on-raft upgrade finishes, - The SUPPORTS_CONSISTENT_CLUSTER_MANAGEMENT feature is enabled, - The upgrade is triggered by the user. and, after that, will spawn the raft state monitor fiber.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	2ecb8641b1	gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES All nodes being capable of support for raft topology is a prerequisite for starting upgrade to raft topology. The newly introduced feature will track this prerequisite.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	a55797fd41	topology_coordinator: implement core upgrade logic Implement topology coordinator's logic responsible for building the group 0 state related to topology.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	b3369611bc	topology_coordinator: extract top-level error handling logic ...to a separate method. It will be reused in another method that will be introduced in the next commit.	2024-02-08 19:09:35 +01:00
Kefu Chai	082ad51b71	.git: skip *.svg when scanning spelling errors codespell reports following warnings: ``` Error: ./docs/kb/flamegraph.svg:1: writen ==> written Error: ./docs/kb/flamegraph.svg:1: writen ==> written Error: ./docs/kb/flamegraph.svg:1: storag ==> storage Error: ./docs/kb/flamegraph.svg:1: storag ==> storage ``` these misspellings come from the flamgraph, which can be viewed at https://opensource.docs.scylladb.com/master/kb/flamegraph.html they are very likely to be truncated function names displayed in the frames. and the spelling of these names are not responsible of the author of the article, neither can we change them in a meaningful way. so add it to the skip list. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17215	2024-02-08 19:46:54 +02:00
Kefu Chai	e84a09911a	data_dictionary: use fmt::format() when appropriate we have three format()s in our arsenal: * seastar::format() * fmt::format() * std::format() the first one is used most frequently. but it has two limitations: 1. it returns seastar::sstring instead of std::string. under some circumstances, the caller of the format() function actually expects std::string, in that case a deep copy is performed to construct an instance of std::string. this incurs unnecessary performance overhead. but this limitation is a by-design behavior. 2. it does not do compile-time format check. this can be improved at the Seastar's end. to address these two problems, we switch the callers who expect std::string to fmt::format(). to minimize the impact and to reduce the risk, the switch will be performed piecemeal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17212	2024-02-08 19:44:56 +02:00
Kefu Chai	64c829da70	docs: reformat the state machine diagram using mermaid for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16620	2024-02-08 19:43:53 +02:00
Kefu Chai	3dfb0f86f1	db: add formatter for error_injection_at_startup before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `error_injection_at_startup`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17211	2024-02-08 19:40:48 +02:00
Piotr Dulikowski	09a6862f96	storage_service: initialize discovery leader's state earlier Move it before the topology coordinator is started. This way, the topology coordinator will see non-empty state when it is started and it will allow for us to assert that topology coordinator is never started for an empty system.topology table.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	61e2b2fd9f	topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data Extend the prepare_and_broadcast_cdc_generation_data function like we did in the case of prepare_new_cdc_generation_data - the topology coordinator state building process not only has to create a new generation, but also broadcast it.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	0d9b88fd78	topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data During topology coordinator state build phase a new cdc generation will be generated. We can reuse prepare_new_cdc_generation_data for that. Currently, it always takes sharding information (shard count + ignore msb) from the topology state machine - which won't be available yet at the point of building the topology, so extend the function so that it can accept a custom source of sharding information.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	573bb8dd98	topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data The FIXME mentions that token metadata should return host ID for given token (instead of, presumably, an IP) - but that is already the case, so let's remove the fixme.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	32a2e24a0f	topology_state_machine: introduce upgrade_state `upgrade_state` is a static column which will be used to track the progress of building the topology state machine.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	b8e4e04096	storage_service: disallow topology ops when upgrade is in progress Forbid starting new topology changes while upgrade to topology on raft is in progress. While this does not take into account any ongoing topology operations, it makes sure that at the end of the upgrade no node will try to perform any legacy topology operations.	2024-02-08 18:05:02 +01:00
Avi Kivity	f1e11a7060	Merge 'scylla-nodetool: implement the describering command' from Botond Dénes On top of the capabilities of the java-nodetool command, tablet support is also implemented: in addition to the existing keyspace parameter, an optional table parameter is also accepted and forwarded to the REST API. For tablet keyspaces this is required to get a ring description. The command comes with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Refs: https://github.com/scylladb/scylladb/issues/16846 Closes scylladb/scylladb#17163 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement describering tools/scylla-nodetool.cc: handle API request failures gracefully test/nodetool: util.py: add check_nodetool_fails_with_all()	2024-02-08 18:52:34 +02:00
Tomasz Grabiec	c06173b3a3	range_streamer, tablets: Do not keep token metadata around streaming It holds back global token metadata barrier during streaming, which limits parallelism of load balancing. Topology transition is protected by the means of topology_guard. Closes scylladb/scylladb#17230	2024-02-08 18:26:00 +02:00
Aleksandra Martyniuk	5f7263afb5	repair: delete _cf from repair_meta repair_meta keeps a reference to a table. If the table is dropped during repair, its object is destructed, leaving a dangling reference. Delete repair_meta::_cf and replace its usages with appropriate methods.	2024-02-08 17:01:41 +01:00
Aleksandra Martyniuk	36882e1c4a	repair: delete _cf from row_level_repair row_level_repair keeps a reference to a table. If the table is dropped during repair, its object is destructed, leaving a dangling reference. Delete row_level_repair::_cf and replace its usages with appropriate methods.	2024-02-08 16:47:02 +01:00
Botond Dénes	8fcb4ed707	tools/scylla-nodetool: implement describering Also implementing tablet support, which basically just means that a new table parameter is also accepted and forwarded to the API, in addition to the existing keyspace one.	2024-02-08 09:20:25 -05:00
Botond Dénes	2df2733ed1	tools/scylla-nodetool.cc: handle API request failures gracefully Currently, error handling is done via catching http::unexpected_status_error and re-throwing an std::runtime_error. Turns out this no longer works, because this error will only be thrown by the http client, if the request had an expected reply code set. The scylla_rest_client doesn't set an expected reply code, so this exception was never thrown for some time now. Furthermore, even when the above worked, it was not too user-friendly as the error message only included the reply-code, but not the reply itself. So in this patch this is fixed: * The handling of http::unexpected_status_error is removed, we don't want to use this mechanism, because it yields very terse error messages. * Instead, the status code of the request is checked explicitely and all cases where it is not 200 are handled. * A new api_request_failed exception is added, which is throw for all non-200 statuses with the extracted error message from the server (if any). * This exception is caught by main, the error message is printed and scylla-nodetool returns with a new distinct error-code: 4. With this, all cases where the request fails on ScyllaDB are handled and we shouldn't hit cases where a nodetool command fails with some obscure JSON parsing error, because the error reply has different JSON schema than the expected happy-path reply.	2024-02-08 09:20:25 -05:00
Botond Dénes	d4f7f23b98	test/nodetool: util.py: add check_nodetool_fails_with_all() Similar to the existing check_nodetool_fails_with() but checks that all error messages from expected_errors are contained in stderr. While at it, use list as the typing hint, instead of typing.List.	2024-02-08 09:20:25 -05:00
Kefu Chai	e02958ad35	sstable: let make_entry_descriptor() accept a single fs::path both of its callers are passing parent_path() and filename() to it. so let the callee to do this. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17225	2024-02-08 16:44:16 +03:00
Kefu Chai	770baa806e	streaming: ignore failures when streaming dropped tables before this change, when performing `stream_transfer_task`, if an exception is raised, we check if the table being streamed is still around, if it is missing, we just skip the table as it should be dropped during streaming, otherwise we consider it a failure, and report it back to the peer. this behavior was introduced by `953af382`. but we perform the streaming on all shards in parallel, and if any of the shards fail because of the dropped table, the exception is thrown. and the current shard is not necessarily the one which throws the exception. actually, current shard might be still waiting for a write lock for removing the table from the database's table metadata. in that case, we consider the streaming RPC call a failure even if the table is already removed on some shard(s). and the peer would fail to bootstreap because of streaming failure. in this change, before catching all exceptions, we handle `no_such_column_family`, and do not fail the streaming in that case. please note, we don't touch other tables, so we can just assume that `no_such_column_family` is thrown only if the table to be transferred is missing. that's why `assert()` is added. Fixes #15370 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17160	2024-02-08 14:07:22 +02:00
Amnon Heiman	f4e82174b2	replica/table.cc: Align the tablet's behavior with other metrics. Due to the potentially large number of per-table metrics, ScyllaDB uses configuration to determine what metrics will be reported. The user can decide if they want per-table-per-shard metrics, per-table-per-instance metrics, or none. This patch uses the same logic for tablet metrics registration. It adds a new metrics group tablets with one metric inside it - count. So, scylla_tablets_count will report the number of tablets per shard. The existing per-table metrics will be reported aggregated or not like the other per-table metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#17182	2024-02-08 12:48:25 +01:00
xuchang	9b675d1fe4	repair: resolve load_history shard load skew Using uuid_xor_to_uint32 instance of table_uuid's most_significant_bits, optimize the hash conflict to shard.	2024-02-08 18:18:01 +08:00
xuchang	ae422fdf69	repair: accelerate repair load_history time Using `parallel_for_each_table` instance of `for_each_table_gently` on `repair_service::load_history`, and parallel num 16 for each shard, to reduced bootstrap time.	2024-02-08 18:18:01 +08:00
Kefu Chai	6eae678eb3	db: add formatter for gms::gossip_digest_ack2 before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::gossip_digest_ack2`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17153	2024-02-08 11:49:37 +02:00
Kefu Chai	07da9fd197	sstable: change sstable_touch_directory_io_check() to accept fs::path this change is a follow-up of `637dd730`. the goal is to use std::filesystem::path for manipulating paths, and to avoid the converting between sstring and fs::path back and forth. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17214	2024-02-08 10:01:47 +03:00
Kefu Chai	2c859bc310	sstables: let state_to_dir(sstable_state) return string_view state_to_dir(sstable_state) translate the enum to the corresponding directory component. and it returns a `seastar::sstring`. not all the callers of this function expect a full-blown sstring instance, on the contrary, quite a few of them just want a string-alike object which represents the directory component, so they can use it, for instance to compose a path, or just format the given `state` enum. so to avoid the overhead of creating/destroying the `seastar::sstring` instance, let's switch to `std::string_view`. with this change, we will be able to implement the fmt::formatter for `sstable_state` without the help of the formatter of sstring. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17213	2024-02-08 10:00:08 +03:00
Kurashkin Nikita	7ce9a3e9e5	cql: add limits for integer values when creating date type Added a simple check that prevents entering int values that lead to overflow when creating a date type. Fixes #17066 Closes scylladb/scylladb#17102	2024-02-08 00:08:01 +02:00
Michał Chojnowski	f5e3a728e4	row_cache_test: test cache consistency during memtable-to-cache merge A rather minimal reproducer for #16759. Not extensive.	2024-02-07 18:31:36 +01:00
Michał Chojnowski	bed20a2e37	row_cache: use preemption_source in update() To facilitate testing the state of cache after the update is preempted at various points, pass a preemption_source& to update() instead of calling the reactor directly. In release builds, the calls to preemption_source methods should compile to the same direct reactor calls as today. Only in dev mode they should add an extra branch. (However, the `preemption_source&` argument has to be shoveled in any mode).	2024-02-07 18:31:36 +01:00
Michał Chojnowski	fabab2f46f	utils: preempt: add preemption_source While `preemption_check` can be passed to functions to control their preemption points, there is no way to inspect the state of the system after the preemption results in a yield. `preemption_source` is a superset of `preemption_check`, which also allows for customizing the yield, not just the preemption check. An implementation passed by a test can hook the yield to put the tested function to sleep, run some code, and then wake the function up. We use the preprocessor to minimize the impact on release builds. Only dev-mode preemption_source is hookable. When it's used in other modes, it should compile to direct reactor calls, as if it wasn't used.	2024-02-07 18:31:28 +01:00
Piotr Dulikowski	f6b303d589	raft_group0_client: add in_recovery method It tells whether the current node currently operates in recovery mode or not. It will be vital for storage_service in determining which topology operations to use at startup.	2024-02-07 10:02:01 +01:00
Piotr Dulikowski	7601f40bf8	storage_service: introduce join_node_query verb When a node joins an existing cluster, it will ask a node that already belongs to the cluster about which topology operations to use when joining.	2024-02-07 10:02:00 +01:00
Piotr Dulikowski	bab5d3bbe5	raft_group0: make discover_group0 public The `discover_group0` function returns only after it either finds a node that belongs to some group 0, or learns that the current node is supposed to create a new one. It will be very helpful to storage_service in determining which topology mode to use.	2024-02-07 10:00:16 +01:00
Piotr Dulikowski	367df7322e	raft_group0: filter current node's IP in discover_group0 This was previously done by `setup_group0`, which always was an (indirect) caller of `discover_group0`. As we want to make `discover_group0` public, it's more convenient for the callers if the called method takes care of sanitizing the argument.	2024-02-07 10:00:16 +01:00
Piotr Dulikowski	86e4a59d5b	raft_group0: remove my_id arg from discover_group0 The goal is to make `discover_group0` public. The `my_id` argument was always set to `this->load_my_id()`, so we can get rid of it and it will make it more convenient to call `discover_group0` from the outside.	2024-02-07 10:00:16 +01:00
Piotr Dulikowski	4174a32d3f	storage_service: make _raft_topology_change_enabled more advanced Currently, nodes either operate in the topology-on-raft mode or legacy mode, depending on whether the experimental topology on raft flag is enabled. This also affects the way nodes join the cluster, as both modes have different procedures. We want to allow joining nodes in legacy mode until the cluster is upgraded. Nodes should automatically choose the best method. Therefore, the existing boolean _raft_topology_change_enabled flag is extended into an enum with the following variants: - unknown - the node still didn't decide in which mode it will operate - legacy - the node uses legacy topology operations - upgrading_to_raft - the node is upgrading to use raft topology operations - raft - the node uses raft topology operations Currently, only the `legacy` and `raft` variants are utilized, but this will change in the commits that follow. Additionally, the `_raft_experimental_topology` bool flag is introduced which retains the meaning of the old `_raft_topology_change_enabled` but has a more fitting name. It is explicitly needed in `topology_state_load`.	2024-02-07 10:00:15 +01:00
Piotr Dulikowski	1104f8b00f	docs: document raft topology upgrade and recovery	2024-02-07 09:54:54 +01:00
Botond Dénes	35da9551fb	Merge 'storage_service: Add describe_ring support for tablet table' from Asias He The table query param is added to get the describe_ring result for a given table. Both vnode table and tablet table can use this table param, so it is easier for users to user. If the table param is not provided by user and the keyspace contains tablet table, the request will be rejected. E.g., curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles" curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1" Refs #16509 Closes scylladb/scylladb#17118 * github.com:scylladb/scylladb: tablets: Convert to use the new version of for_each_tablet storage_service: Add describe_ring support for tablet table storage_service: Mark host2ip as const tablets: Add for_each_tablet_gently	2024-02-07 10:41:36 +02:00
Kefu Chai	b1e4513c2d	dht: add formatter for dht::ring_position before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_posittion`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17194	2024-02-07 09:30:45 +02:00
Kefu Chai	75be212ab2	lang: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17193	2024-02-07 09:27:39 +02:00
Pavel Emelyanov	ca261f8916	utils: Mark chunked_vector::max_chunk_capacity with constexpr It uses only compile-time constants to produce the value, so deserves this marking Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17181	2024-02-07 09:22:23 +02:00
Raphael S. Carvalho	41a5c9eaec	test: Reduce mem footprint of test_token_group_based_splitting_mutation_writer Reduces footprint from hundreds of MB to a very few MB. Issue could be reproduced with: ./build/dev/test/boost/mutation_writer_test --run_test=test_token_group_based_splitting_mutation_writer -- -m 500M --smp 1 --random-seed 1848215131 Fixes #17076. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17187	2024-02-07 09:21:24 +02:00
Tomasz Grabiec	032c1a3d04	Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. closes: #16529 Closes scylladb/scylladb#17079 * github.com:scylladb/scylladb: tablets: Make sure topology has enough endpoints for RF cql-pytest: Disable tablets when RF > nodes-in-DC test: Remove test that configures RF larger than the number of nodes keyspace_metadata: Include tablets property in DESCRIBE	2024-02-06 22:38:11 +01:00
Kefu Chai	f3845a7f3d	sstable: replace "welp" with more descriptive words despite that "welp" is more emotional expressive, it is considered a misspelling of "whelp" by codespell. that's why this comment stands out. but from a non-native speaker's point of view, probably we can use more descriptive words to explain what "welp" is for in plain English. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17183	2024-02-06 16:31:18 +02:00
David Garcia	f14edf3543	docs: correct image sorting order for reference docs This commit displays images in reference docs in the correct order. Prior to this fix, the images were listed as 4.0.0, 4.0.1, and 4.0.2, but they should be sorted in reverse order (4.0.2, 4.0.1, 4.0.0). The changes made in this PR resolve the issue introduced in https://github.com/scylladb/scylladb/pull/16942 when common functions for Azure and GCP were extracted into a separate file without reversing the list as defined in the original extension: https://github.com/scylladb/scylladb/pull/16942/files#diff-b8f6253ea8fdcca681deb556ca61cd1f3feb3b7aeb7e856b145ef9b685aad460L185 Closes scylladb/scylladb#17185	2024-02-06 16:24:22 +02:00
Kamil Braun	c0c291b985	Merge 'raft topology: harden IP related tests' from Petr Gusev In this PR we add the tests for two scenarios, related to the use of IPs in raft topology. * When the replaced node transitions to the `LEFT` state we used to remove the IP of such node from gossiper. If we replace with same IP, this caused the IP of the new node to be removed from gossiper. This problem was fixed by #16820, this PR adds a regression test for it. * When a node is restarted after decommissioning some other node, the restarting node tries to apply the raft log, this log contains a record about the decommissioned node, and we got stuck trying to resolve its IP. This was fixed by #16639 - we excluded IPs from the RAFt log application code and moved it entirely to host_id-s. This PR adds a regression test for this case. Closes scylladb/scylladb#15967 Closes scylladb/scylladb#14803 Closes scylladb/scylladb#17180 * github.com:scylladb/scylladb: test_topology_ops: check node restart after decommission test_replace_reuse_ip: check other servers see the IP	2024-02-06 14:28:06 +01:00
Nadav Har'El	14315fcbc3	mv: fix missing view deletions in some cases of range tombstones For efficiency, if a base-table update generates many view updates that go the same partition, they are collected as one mutation. If this mutation grows too big it can lead to memory exhaustion, so since commit `7d214800d0` we split the output mutation to mutations no longer than 100 rows (max_rows_for_view_updates) each. This patch fixes a bug where this split was done incorrectly when the update involved range tombstones, a bug which was discovered by a user in a real use case (#17117). Range tombstones are read in two parts, a beginning and an end, and the code could split the processing between these two parts and the result that some of the range tombstones in update could be missed - and the view could miss some deletions that happened in the base table. This patch fixes the code in two places to avoid breaking up the processing between range tombstones: 1. The counter "_op_count" that decides where to break the output mutation should only be incremented when adding rows to this output mutation. The existing code strangely incrmented it on every read (!?) which resulted in the counter being incremented on every input fragment, and in particular could reach the limit 100 between two range tombstone pieces. 2. Moreover, the length of output was checked in the wrong place... The existing code could get to 100 rows, not check at that point, read the next input - half a range tombstone - and only then check that we reached 100 rows and stop. The fix is to calculate the number of rows in the right place - exactly when it's needed, not before the step. The first change needs more justification: The old code, that incremented _op_count on every input fragment and not just output fragments did not fit the stated goal of its introduction - to avoid large allocations. In one test it resulted in breaking up the output mutation to chunks of 25 rows instead of the intended 100 rows. But, maybe there was another goal, to stop the iteration after 100 input rows and avoid the possibility of stalls if there are no output rows? It turns out the answer is no - we don't need this _op_count increment to avoid stalls: The function build_some() uses `co_await on_results()` to run one step of processing one input fragment - and `co_await` always checks for preemption. I verfied that indeed no stalls happen by using the existing test test_long_skipped_view_update_delete_with_timestamp. It generates a very long base update where all the view updates go to the same partition, but all but the last few updates don't generate any view updates. I confirmed that the fixed code loops over all these input rows without increasing _op_count and without generating any view update yet, but it does NOT stall. This patch also includes two tests reproducing this bug and confirming its fixed, and also two additional tests for breaking up long deletions that I wanted to make sure doesn't fail after this patch (it doesn't). By the way, this fix would have also fixed issue #12297 - which we fixed a year ago in a different way. That issue happend when the code went through 100 input rows without generating any output rows, and incorrectly concluding that there's no view update to send. With this fix, the code no longer stops generating the view update just because it saw 100 input rows - it would have waited until it generated 100 output rows in the view update (or the input is really done). Fixes #17117 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17164	2024-02-06 14:57:33 +02:00
Asias He	e7e1f4b01a	streaming: Fix rpc::source and rpc::optional parameter order The new rpc::optional parameter must come after any existing parameters, including the rpc::source parameters, otherwise it will break compatibility. The regression was introduced in: ``` commit `fd3c089ccc` Author: Tomasz Grabiec <tgrabiec@scylladb.com> Date: Thu Oct 26 00:35:19 2023 +0200 service: range_streamer: Propagate topology_guard to receivers ``` We need to backport this patch ASAP before we release anything that contains commit `fd3c089ccc`. Refs: #16941 Fixes: #17175 Closes scylladb/scylladb#17176	2024-02-06 13:15:28 +01:00
Botond Dénes	a3d4131918	Merge 'Sanitize replication factor parsing by strategies' from Pavel Emelyanov RF values appear as strings and strategies classes convert them to integers. This PR removes some duplication of efforts in converting code. Closes scylladb/scylladb#17132 * github.com:scylladb/scylladb: network_topology_strategy: Do not walk list of datacenters twice replication_strategy: Do not convert string RF into int twise abstract_replication_strategy: Make validate_replication_factor return value	2024-02-06 13:26:31 +02:00
Kefu Chai	a40d3fc25b	db: add formatter for data_dictionary::user_types_metadata before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `data_dictionary::user_types_metadata`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17140	2024-02-06 13:24:07 +02:00
Kefu Chai	97587a2ea4	test/boost: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17139	2024-02-06 13:22:16 +02:00
Kefu Chai	16e1700246	exceptions: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17152	2024-02-06 13:16:03 +02:00
Kefu Chai	3bca11668a	db: add formatter for exceptions::exception_code before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `exceptions::exception_code`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17151	2024-02-06 13:15:08 +02:00
Pavel Emelyanov	93918eef62	ks_prop_defs: Remove preprocessor-guarded java code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17166	2024-02-06 13:14:15 +02:00
Botond Dénes	53a11cba62	Merge 'types/types.cc: move stringstream content instead of copying it' from Patryk Wróbel C++20 introduced a new overload of std::ostringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. It also removes a helper function `inet_addr_type_impl::to_sstring()` - it was used only in two places. It was replaced with `fmt::to_string()`. Closes scylladb/scylladb#16991 * github.com:scylladb/scylladb: use fmt::to_string() for seastar::net::inet_address types/types.cc: move stringstream content instead of copying it	2024-02-06 13:11:41 +02:00
Botond Dénes	619c3fdf32	Merge 'types: use {fmt} to format time and boolean' from Kefu Chai so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. also, there is no need to reinvent the wheels if there are ready-to-use ones. Closes scylladb/scylladb#17177 * github.com:scylladb/scylladb: types: use {fmt} to format boolean types: use {fmt} to format time	2024-02-06 13:10:39 +02:00
Kefu Chai	3dfe7c44f6	dht: add formatter for dht::sharder before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::sharder`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17178	2024-02-06 13:06:46 +02:00
Kefu Chai	c38325db26	Update seastar submodule * seastar 85359b28...289ad5e5 (19): > net/dpdk: use user-defined literal when appropriate > io_tester: Allow running on non-XFS fs > io: Apply rate-factor early > circular_buffer: make iterator default constructible > net/posix: add a way to change file permissions of unix domain socket > resource: move includes to the top of the source file > treewide: replace calls to future::get0() by calls to future::get() > core/future: add as_ready_future utility > build: do not expose -Wno-error=#warnings > coroutine: remove remnants of variadic futures > build: prevent gcc -Wstringop-overflow workaround from affecting clang > util/spinlock: use #warning instead of #warn > io_tester: encapsulate code into allocate_and_fill_buffer() > io_tester: make maybe_remove_file a function > future: remove tuples from get0_return_type > circular_buffer_fixed_capacity: use std::uninitialized_move() instead of open-coding > rpc/rpc_types: do not use integer literal in preprocessor macro > future: use "(T(...))" instead of "{T(...)}" in uninitialized_set() > net/posix: include used header Closes scylladb/scylladb#17179	2024-02-06 13:05:33 +02:00
David Garcia	ad1c9ae452	docs: fix logging in images extensions Adds a missing logging import in the file scylladb_common_images extension, which prevents the enterprise build from building. Additionally, it standardizes logging handling across the extensions and removes "ami" references in Azure and GCP extensions. Closes scylladb/scylladb#17137	2024-02-06 13:00:37 +02:00
Botond Dénes	ce3233112e	Merge 'configure.py: add -Wextra to cflags' from Kefu Chai also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them. this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra. Closes scylladb/scylladb#17131 * github.com:scylladb/scylladb: configure.py: add -Wextra to cflags test/tablets: do not compare signed and unsigned	2024-02-06 12:57:32 +02:00
Petr Gusev	646ca9515e	test_topology_ops: check node restart after decommission There used to be a problem with restarting a node after decommissioning some other node - the restarting node tries to apply the raft log, this log contains a record about the decommissioned node, and we got stuck trying to resolve its IP. This was fixed in #16639 - we excluded IPs from the RAFt log application code and moved it entirely to host_id-s. In this commit we add a regression test for this case. We move the decommission_node call before server_stop/server_start. We need to add one more server to retain majority when the node is decommissioned, otherwise the topology coordinator won't migrate from the stopped node before replacing it, and we'll get an error. closes #14803	2024-02-06 13:29:42 +04:00
Petr Gusev	aeed5c5fe3	test_replace_reuse_ip: check other servers see the IP The replaced node transitions to LEFT state, and we used to remove the IPs of such nodes from gossiper. If we replace with same IP, this caused the IP of the new node to be removed from gossiper. This problem was fixed by #16820, this commit adds a regression test for it. closes #15967	2024-02-06 13:28:04 +04:00
Botond Dénes	115ee4e1f5	Merge 'doc: remove the OSS and Enterprise Features pages' from Anna Stuchlik This PR removes the following pages: - ScyllaDB Open Source Features - ScyllaDB Enterprise Features They were outdated, incomplete, and misleading. They were also redundant, as the per-release updates are added as Release Notes. With this update, the features listed on the removed pages are added under the common page: ScyllaDB Features. In addition, a reference to the Enterprise-only Features section is added. Note: No redirections are added because no file paths or URLs are changed with this PR. Fixes https://github.com/scylladb/scylladb/issues/13485 Refs https://github.com/scylladb/scylladb/issues/16496 (nobackport) Closes scylladb/scylladb#17150 * github.com:scylladb/scylladb: Update docs/using-scylla/features.rst doc: remove the OSS and Enterprise Features pages	2024-02-06 08:17:18 +02:00
Botond Dénes	edb983d165	Merge 'doc: add the 5.4-to-2024.1 upgrade guide' from Anna Stuchlik This PR: - Adds the upgrade guide from ScyllaDB Open Source 5.4 to ScyllaDB Enterprise 2024.1. Note: The need to include the "Restore system tables" step in rollback has been confirmed; see https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959. - Removes the 5.1-to-2022.2 upgrade guide (unsupported versions). Fixes https://github.com/scylladb/scylladb/issues/16445 Closes scylladb/scylladb#16887 * github.com:scylladb/scylladb: doc: fix the OSS version number doc: metric updates between 2024.1. and 5.4 doc: remove the 5.1-to-2022.2 upgrade guide doc: add the 5.4-to-2024.1 upgrade guide	2024-02-06 08:16:05 +02:00
Kefu Chai	6f07d9edaa	types: use {fmt} to format boolean {fmt} format boolean as "true" / "false" since v2.0.1, no need to reinvent the wheel. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-06 10:40:02 +08:00
Kefu Chai	be29556955	types: use {fmt} to format time so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. the outputs of these two ways are identical: see https://wandbox.org/permlink/Lo9NUrQNUEqyiMEa and https://godbolt.org/z/YEha9ah7v to compare their outputs. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-06 10:39:30 +08:00
Kefu Chai	02376250b5	storage_service: do no filter tablets tables manually instead of filtering the keyspaces manually, let's reuse `database::get_non_local_strategy_keyspaces_erms()`. less repeatings and more future-proof this way. Fixes #16974 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17121	2024-02-05 21:28:35 +01:00
Anna Stuchlik	d6723134ab	doc: fix the OSS version number Replace "5.2" with "5.4", as this is the 5.4-to-2024.1 upgrade guide.	2024-02-05 21:10:50 +01:00
Tomasz Grabiec	448e117e7d	Merge 'service: validate replication strategy constraints in tablet-moving API' from Aleksandra Martyniuk Validate replication strategy constraints in /storage_service/tablets/move API: - replicas are not on the same node - replicas don't move across DC (violates RF in each DC) - availability is not reduced due to rack overloading Add flag to force tablet move even though dc/rack constraints aren't fulfilled. Test for the change: https://github.com/scylladb/scylla-dtest/pull/3911. Fixes: #16379. Closes scylladb/scylladb#16648 * github.com:scylladb/scylladb: api: service: add force param to move_tablet api service: validate replication strategy constraints	2024-02-05 20:07:21 +01:00
Avi Kivity	9dd76c1035	Merge 'db: add formatter for dht::ring_position_{ext,view}' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_position_ext` and `dht::ring_position_view`, and drop their operator<<. Refs #13245 Closes scylladb/scylladb#17128 * github.com:scylladb/scylladb: db: add formatter for dht::ring_position_ext db: add formatter for dht::ring_position_view	2024-02-05 20:27:54 +02:00
Patryk Wrobel	cc186c1798	use fmt::to_string() for seastar::net::inet_address This change removes inet_addr_type_impl::to_sstring() and replaces its usages with fmt::to_string(). The removed helper performed an uneeded copying via std::ostringstream::str(). Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-02-05 16:56:40 +01:00
Patryk Wrobel	8c0d30cd88	types/types.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::ofstringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-02-05 16:35:27 +01:00
Kamil Braun	968d1e3e78	Merge 'raft topology: make rollback_to_normal a transition state' from Patryk Jędrzejczak After changing `left_token_ring` from a node state to a transition state in scylladb/scylladb#17009, we do the same for `rollback_to_normal`. `rollback_to_normal` was created as a node state because `left_token_ring` was a node state. This change will allow us to distinguish a failed removenode from a failed decommission in the `rollback_to_normal` handler. Currently, we use the same logic for both of them, so it's not required. However, this might change, as it has happened with the decommission and the failed bootstrap/replace in the `left_token_ring` state (scylladb/scylladb#16797). We are making this change now because it would be much harder after branching. Fixes scylladb/scylladb#17032 Closes scylladb/scylladb#17136 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: align indentation docs: dev: topology-over-raft: document the rollback_to_normal state topology_coordinator: improve logs in rollback_to_normal handler raft topology: make rollback_to_normal a transition state	2024-02-05 16:30:20 +01:00
Anna Stuchlik	6d6c400b77	doc: metric updates between 2024.1. and 5.4 This commit adds the information about metrics updates between these two versions. Fixes https://github.com/scylladb/scylladb/issues/16446	2024-02-05 16:24:16 +01:00
Anna Stuchlik	1e9c7ab6d1	Update docs/using-scylla/features.rst Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>	2024-02-05 14:44:31 +01:00
Mikołaj Grzebieluch	4cecda7ead	transport/controller: pass unix_domain_socket_permissions to generic_server::listen	2024-02-05 14:22:03 +01:00
Mikołaj Grzebieluch	6b178f9a4a	transport/controller: split configuring sockets into separate functions TCP sockets and unix domain sockets don't share common listen options excluding `socket_address`. For unix domain sockets, available options will be expanded to cover also filesystem permissions and owner for the socket. Storing listen options for both types of sockets in one structure would become messy. For now, both use `listen_cfg`. In a singular cql controller, only sockets of one type are created, thus it can be easily split into two cases. Isolate maintenance socket from `listen_cfg`.	2024-02-05 14:20:17 +01:00
Nadav Har'El	7888b23e9e	Merge 'test/cql-pytest: re-enable disabled tests' from Botond Dénes In a previous PR (https://github.com/scylladb/scylladb/pull/16840), we enabled tablets by default when running the cql-pytest suite. To handle tests which are failing with tablets enabled, we used a new fixture, `xfail_tablets` to mark these as xfail. This means that we effectively lost test coverage, as these tests can now freely fail and no-one will notice if this is due to a new regression. To restore test coverage, this PR re-enables all the previously disabled tests, by parametrizing each one of them to run with both vnodes and tablets, and targetedly mark as xfail, only the tablet variant. After these tests are fixed with tablets (or the underlying functionality they test is fixed to work with tablets), we will run them with both vnodes and tablets, because these tests apparently do care which replication method is used. Together with https://github.com/scylladb/scylladb/pull/16802, this means all previously disabled test is re-enabled and no coverage is lost. Closes scylladb/scylladb#16945 * github.com:scylladb/scylladb: test/cql-pytest: conftest.py: remove xfail_tablets fixture test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests test/cql-pytest: test_describe.py: re-enable disabled tests test/cql-pytest: test_cdc.py: re-enable disabled tests test/cql-pytest: add parameter support to test_keyspace	2024-02-05 14:12:57 +02:00
Asias He	904bafd069	tablets: Convert to use the new version of for_each_tablet It is more gently than the old one.	2024-02-05 18:45:40 +08:00
Asias He	04773bd1df	storage_service: Add describe_ring support for tablet table The table query param is added to get the describe_ring result for a given table. Both vnode table and tablet table can use this table param, so it is easier for users to user. If the table param is not provided by user and the keyspace contains tablet table, the request will be rejected. E.g., curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles" curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1" Refs #16509	2024-02-05 18:11:07 +08:00
Pavel Emelyanov	45dbe38658	tablets: Make sure topology has enough endpoints for RF When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:04 +03:00
Pavel Emelyanov	8471d88576	cql-pytest: Disable tablets when RF > nodes-in-DC All the cql-pytest-s run agains single scylla node, but new_random_keyspace() helper may request RF in the rage of 1 through 6, so tablets need to be explicitly disabled when the RF is too large Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:04 +03:00
Pavel Emelyanov	3b9ca29411	test: Remove test that configures RF larger than the number of nodes This is going to be disabled soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:03 +03:00
Pavel Emelyanov	8910d37994	keyspace_metadata: Include tablets property in DESCRIBE When tablets are enabled and a keyspace being described has them explicitly disabled or non-automatic initial value of zero, include this into the returned describe statement too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:49:20 +03:00
Benny Halevy	bd3ed168ab	api/compaction_manager: stop_keyspace_compaction: prevent stack use-after-free Since `t.parallel_foreach_table_state` may yield, we should access `type` by reference when calling `stop_compaction` since it is captured by the calling lambda and gets lost when it returns if `parallel_foreach_table_state` returns an unavailable future. Instead change all captures to `[&]` so we can access the `type` variable held by the coroutine frame. Fixes #16975 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17143	2024-02-05 09:32:08 +02:00
Asias He	ab560c1580	storage_service: Mark host2ip as const So it can be used by another const function.	2024-02-05 13:42:08 +08:00
Asias He	fab0d33d08	tablets: Add for_each_tablet_gently In this version, the callback returns a future<>, so it can yield itself to avoid stalls in func itself.	2024-02-05 13:42:08 +08:00
Anna Stuchlik	f7afa6773f	doc: remove the OSS and Enterprise Features pages This commit removes the following pages: - ScyllaDB Open Source Features - ScyllaDB Enterprise Features They were outdated, incomplete, and misleading. They were also redundant, as the per-release updates are added as Release Notes. With this update, the features listed on the removed pages are added under the common page: ScyllaDB Features. Note: No redirections are added, because no file paths or URLs are changed with this commit. Fixes https://github.com/scylladb/scylladb/issues/13485 Refs https://github.com/scylladb/scylladb/issues/16496	2024-02-04 20:55:40 +01:00
Avi Kivity	784c2f8ad2	Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing. Closes scylladb/scylladb#17130 * github.com:scylladb/scylladb: treewide: replace seastar::future::get0() with seastar::future::get() sstable: capture return value of get0() using auto utils: result_loop: define result_type with decayed type [avi: add another one that snuck in while this was cooking]	2024-02-04 15:23:33 +02:00
Michał Chojnowski	ed98102c45	row_cache: update _prev_snapshot_pos even if apply_to_incomplete() is preempted Commit `e81fc1f095` accidentally broke the control flow of row_cache::do_update(). Before that commit, the body of the loop was wrapped in a lambda. Thus, to break out of the loop, `return` was used. The bad commit removed the lambda, but didn't update the `return` accordingly. Thus, since the commit, the statement doesn't just break out of the loop as intended, but also skips the code after the loop, which updates `_prev_snapshot_pos` to reflect the work done by the loop. As a result, whenever `apply_to_incomplete()` (the `updater`) is preempted, `do_update()` fails to update `_prev_snapshot_pos`. It remains in a stale state, until `do_update()` runs again and either finishes or is preempted outside of `updater`. If we read a partition processed by `do_update()` but not covered by `_prev_snapshot_pos`, we will read stale data (from the previous snapshot), which will be remembered in the cache as the current data. This results in outdated data being returned by the replica. (And perhaps in something worse if range tombstones are involved. I didn't investigate this possibility in depth). Note: for queries with CL>1, occurences of this bug are likely to be hidden by reconciliation, because the reconciled query will only see stale data if the queried partition is affected by the bug on on all queried replicas at the time of the query. Fixes #16759 Closes scylladb/scylladb#17138	2024-02-04 11:17:41 +02:00
Aleksandra Martyniuk	89c683f51a	api: service: add force param to move_tablet api Force flag is added to /storage_service/tablets/move. If force is set to true, replication strategy constraints regarding racks and dcs can be broken.	2024-02-02 19:08:01 +01:00
Aleksandra Martyniuk	3b0fa7335a	service: validate replication strategy constraints Check whether tablet move meets replication strategy constraints, i.e. replicas aren't on the same node, replicas don't move across DCs or HA isn't reduced due to rack overloading. Throw if constraints are broken.	2024-02-02 19:06:45 +01:00
Botond Dénes	017a574b16	tools: lua_sstable_consumer.cc: load os and math libs The amount of standard Lua libraries loaded for the sstable-script was limited, due to fears that some libraries (like the io library) could expose methods, which if used from the script could interfere with seastar's asynchronous arhitecture. So initially only the table and string libraries were loaded. This patch adds two more libraries to be loaded: match and os. The former is self-explanatory and the latter contains methods to work with date/time, obtain the values of environment variables as well as launch external processes. None of these should interfere with seastar, on the other hand the facilities they provide can come very handy for sstable scripts. Closes scylladb/scylladb#17126	2024-02-02 19:00:57 +03:00
Patryk Jędrzejczak	2687204c7f	docs: dev: topology-over-raft: align indentation	2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak	fdd3c3a280	docs: dev: topology-over-raft: document the rollback_to_normal state In one of the previous patches, we changed the `rollback_to_normal` state from a node state to a transition state. We document it in this patch. The node state wasn't documented, so there is nothing to remove.	2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak	8d6a9730db	topology_coordinator: improve logs in rollback_to_normal handler After making `rollback_to_normal` a transition state, we can distinguish a failed decommission from a failed bootstrap in the `rollback_to_normal` handler. We use it to make logs more descriptive.	2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak	25b90f5554	raft topology: make rollback_to_normal a transition state After changing `left_token_ring` from a node state to a transition state in scylladb/scylladb#17009, we do the same for `rollback_to_normal`. `rollback_to_normal` was created as a node state because `left_token_ring` was a node state. This change will allow us to distinguish a failed removenode from a failed decommission in the `rollback_to_normal` handler. Currently, we use the same logic for both of them, so it's not required. However, this might change, as it has happened with the decommission and the failed bootstrap/replace in the `left_token_ring` state (scylladb/scylladb#16797). We are making this change now because it would be much harder after branching. The change also simplifies the code in `topology_coordinator:rollback_current_topology_op`. Moving the `rollback_to_normal` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There is only one change - adding `auto node = get_node_to_work_on(std::move(guard));`.	2024-02-02 16:55:20 +01:00
Pavel Emelyanov	52e6398ad6	messaging: Add formatter for netw::msg_addr As a part of ongoing "support fmt v10" effort Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17053	2024-02-02 15:20:40 +01:00
Kefu Chai	cd3c7a50ed	scylla_raid_setup: drop unused import Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17095	2024-02-02 15:20:40 +01:00
Kefu Chai	e62b29bab7	tasks: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17125	2024-02-02 15:20:40 +01:00
Pavel Emelyanov	75bc702ae8	utils: Remove unused operator<< for file_lock object The lock itself is only used by utils/directories code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17051	2024-02-02 15:20:40 +01:00
Kefu Chai	792fa4441e	docs: s/ontop/on top/ this misspelling is identified by codespell. ontop cannot be found on merriam-webster, but "on top" can, see https://www.merriam-webster.com/dictionary/on%20top, so let's replace ontop with "on top". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17127	2024-02-02 15:20:40 +01:00
Botond Dénes	c9ab39af88	install-dependencies.sh: remove duplicate python3-pyudev package It appeared in the list twice. Closes scylladb/scylladb#17060	2024-02-02 15:20:40 +01:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Kefu Chai	deef78c796	sstable: capture return value of get0() using auto instead of capturing the return value of `get0()` with a reference type, use a plain type. as `get0()` returns a plain `T` while `get0()` returns a `T&&`, to avoid the value referenced by `T&&` gets destroyed after the expression, let's use a plain `auto` instead of `auto&&`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 22:12:18 +08:00
Kefu Chai	9fcca8f585	utils: result_loop: define result_type with decayed type this change prepares for replacing `seastar::future::get0()` with `seastar::future::get()`. the former's return type is a plain `T`, while the latter is `T&&`. in this case `T` is `boost::outcome::result<..>`. in order to extract its `error_type`, we need to get its decayed type. since `std::remove_reference_t<T>` also returns `T`, let's use it so it works with both `get0()` and `get()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 22:12:18 +08:00
Kefu Chai	19025127c3	configure.py: add -Wextra to cflags also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them. this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 20:49:21 +08:00
Kefu Chai	aea6cd0b2d	test/tablets: do not compare signed and unsigned this change should silence following warning: ``` test/boost/tablets_test.cc:1600:27: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare] 19:47:04 for (int i = 0; i < smp::count * 20; i++) { 19:47:04 ~ ^ ~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 20:49:21 +08:00
Pavel Emelyanov	afda0f6ddf	network_topology_strategy: Do not walk list of datacenters twice Construct of that class walks the provided options to get per-DC replication factors. It does it twice -- first to populate the dc:rf map, second to calculate the sum of provided RF values. The latter loop can be optimized away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:39:24 +03:00
Pavel Emelyanov	06f9e7367c	replication_strategy: Do not convert string RF into int twise There are two replication strategy classes that validate string RF and then convert it into integer. Since validation helper returns the parsed value, it can be just used avoiding the 2nd conversion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:38:17 +03:00
Pavel Emelyanov	a8cd3bc636	abstract_replication_strategy: Make validate_replication_factor return value The helper in question checks if string RF is indeed an integer. Make this helper return the "checked" integer value, because it does this conversion. And rename it to parse_... to reflect what it now does. Next patches will make use of this change. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:36:47 +03:00
Kefu Chai	e56e74df0a	db: add formatter for dht::ring_position_ext before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_position_ext`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 18:37:56 +08:00
Kefu Chai	bb3ba81b15	db: add formatter for dht::ring_position_view before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_position_view`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 18:36:17 +08:00
Pavel Emelyanov	9450a03cdf	data_dictionary: Add formatter for keyspace-metadata Other than being fmt v10 compatible, it's also shorter and easier to read, thanks to fmt::join() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17115	2024-02-02 11:26:39 +02:00
Kefu Chai	c7a01b9eb4	transport: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17092	2024-02-02 11:20:24 +02:00
Lakshmi Narayanan Sreethar	e86965c272	compaction: run rewrite_sstables_compaction_task_executor tasks in maintenance group Use maintenance group to run all the compaction tasks that use the rewrite_sstables_compaction_task_executor. Fixes #16699 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17112	2024-02-02 11:18:49 +02:00
Pavel Emelyanov	b557dcbf5a	cql3: Sanitize ALTER KEYSPACE check for non-local storages This kills three birds with one stone 1. fixes broken indentation 2. re-uses new_options local variable 3. stops using string literal to check storage type Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17111	2024-02-02 11:13:29 +02:00
Botond Dénes	63d44712af	Merge 'storage_service: Fix indentation for stream_ranges' from Asias He This is a follow up of "storage_service: Run stream_ranges cmd in streaming group" to fix indentation and drop a unnecessary co_return. Refs: #17090 Closes scylladb/scylladb#17114 * github.com:scylladb/scylladb: storage_service: Drop unnecessary co_return in raft_topology_cmd_handler storage_service: Fix indentation for stream_ranges	2024-02-02 11:12:52 +02:00
Kefu Chai	b45af994c2	locator/utils: remove stale comment this comment has already served its purpose when rewriting C* in C++. since we've re-implemented it, there is no need to keep it around. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17120	2024-02-02 11:07:35 +02:00
Asias He	23a8b0552c	storage_service: Drop unnecessary co_return in raft_topology_cmd_handler It is introduced in "storage_service: Run stream_ranges cmd in streaming group". Refs: #17090	2024-02-02 08:20:06 +08:00
Asias He	732a9b5253	storage_service: Fix indentation for stream_ranges Fixes the indentation introduced in "storage_service: Run stream_ranges cmd in streaming group". Refs: #17090	2024-02-02 08:20:03 +08:00
Pavel Emelyanov	66b859a29f	gms: Remove unused operator<< for feature object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17109	2024-02-01 19:00:46 +02:00
Kefu Chai	aad8035bed	replica/database: use structured-bind when appropriate for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17104	2024-02-01 16:31:29 +02:00
Botond Dénes	dc8e13baed	Merge 'Move some tablets tests from topology_custom to cql-pytest' from Pavel Emelyanov The latter suite is now tablets-aware and tablets cases from the former one can happily work with single shared scylla instance Closes scylladb/scylladb#17101 * github.com:scylladb/scylladb: test/topology_custom: Remove test_tablets.py test/topology: Move test_tablet_change_initial_tablets test/topology: Move test_tablet_explicit_disabling test/topology: Move test_tablet_default_initialization test/topology: Move test_tablet_change_replication_strategy test/topology: Move test_tablet_change_replication_vnode_to_tablets cql-pytest: Add skip_without_tablets fixture	2024-02-01 16:28:43 +02:00
Kamil Braun	c911bf1a33	test_raft_snapshot_request: fix flakiness (again) At the end of the test, we wait until a restarted node receives a snapshot from the leader, and then verify that the log has been truncated. To check the snapshot, the test used the `system.raft_snapshots` table, while the log is stored in `system.raft`. Unfortunately, the two tables are not updated atomically when Raft persists a snapshot (scylladb/scylladb#9603). We first update `system.raft_snapshots`, then `system.raft` (see `raft_sys_table_storage::store_snapshot_descriptor`). So after the wait finishes, there's no guarantee the log has been truncated yet -- there's a race between the test's last check and Scylla doing that last delete. But we can check the snapshot using `system.raft` instead of `system.raft_snapshots`, as `system.raft` has the latest ID. And since `1640f83fdc`, storing that ID and truncating the log in `system.raft` happens atomically. Closes scylladb/scylladb#17106	2024-02-01 16:06:12 +02:00
Kefu Chai	946d281d39	exceptions: s/#warn/#warning/ `#warning` is a preprocessor macro in C/C++, while `#warn` is not. the reason we haven't run into the build failure caused by this is likely that we are only building on amd64/aarch64 with libstdc++ at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17074	2024-02-01 14:50:17 +02:00
Botond Dénes	1a0300dba6	Merge 'compaction_manager: flush tables before cleanup' from Kefu Chai according to the document "nodetool cleanup" > Triggers removal of data that the node no longer owns currently, scylla performs cleanup by rewriting the sstables. but commitlog segments may still contain the mutations to the tables which are dropped during sstable rewriting. when scylla server restarts, the dirty mutations are replayed to the memtable. if any of these dirty mutations changes the tables cleaned up. the stale data are reapplied. this would lead to data resurrection. so, in this change we following the same model of major compaction where we 1. forcing new active segment, 2. flushing tables being cleaned up 3. perform cleanup using compaction Fixes #4734 Closes scylladb/scylladb#16757 * github.com:scylladb/scylladb: storage_service: fall back to local cleanup in cleanup_all compaction: format flush_mode without the helper compaction_manager: flush all tables before cleanup replica: table: pass do_flush to table::perform_cleanup_compaction() api, compaction: promote flush_mode	2024-02-01 13:47:45 +02:00
libo-sober	a341b870bc	Remove unnecessary calculations in integrity_checked_file_impl::write_dma. Use calculated `rbuf_end` in `std::mismatch` to reduce unnecessary calculations. Closes scylladb/scylladb#16979	2024-02-01 13:42:59 +02:00
Botond Dénes	8debb6b98f	Merge 'storage_service: Run stream_ranges cmd in streaming group' from Asias He Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group. Fixes #17090 Closes scylladb/scylladb#17097 * github.com:scylladb/scylladb: streaming: Verify stream consumer runs inside streaming group storage_service: Run stream_ranges cmd in streaming group	2024-02-01 13:18:26 +02:00
Patryk Wrobel	25324bbe50	cql_test_env.cc: remove dead code This change removes empty anonymous namespace that is a dead code. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17099	2024-02-01 13:17:48 +02:00
Pavel Emelyanov	64cb3a6496	test/topology_custom: Remove test_tablets.py It's now empty, all test cases had been moved to cql-pytest Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	3fbe93e45d	test/topology: Move test_tablet_change_initial_tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	480227fcad	test/topology: Move test_tablet_explicit_disabling Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	45b0490100	test/topology: Move test_tablet_default_initialization Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	3258c56ca3	test/topology: Move test_tablet_change_replication_strategy Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	6f50cc2783	test/topology: Move test_tablet_change_replication_vnode_to_tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Botond Dénes	b9af2efcb1	Merge 'directories: prevent inode cache fragmentation by orderly verifying data directory contents' from Lakshmi Narayanan Sreethar During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Mitigate this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes https://github.com/scylladb/scylladb/issues/14506 Closes scylladb/scylladb#16952 * github.com:scylladb/scylladb: directories: prevent inode cache fragmentation by orderly verifying data directory contents directories: skip verifying data directory contents during startup directories: co-routinize create_and_verify	2024-02-01 12:30:07 +02:00
Kefu Chai	4ec104e086	api: storage_service: correct a typo s/a any keyspace/a given keyspace/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17098	2024-02-01 10:55:58 +02:00
Botond Dénes	2a4b991772	Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs. The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function. Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw. Fixes #17035 Closes scylladb/scylladb#17073 * github.com:scylladb/scylladb: utils: replace assert() by on_internal_error() utils: add on_internal_error with common logger utils: add a timeuuid minimum, like we had maximum test/cql-pytest: tests for "date" type	2024-02-01 10:48:48 +02:00
Patryk Wrobel	6e5a85c387	replica/table: add tablet count metric This change introduces a new metric called tablet_count that is recalculated during construction of table object and on each call to table::update_effective_replication_map(). To get the count of tablet per current shard, tablet map is traversed and for each tablet_id tablet_map::get_shard() is called. Its return value is compared with this_shard_id(). The new metric is maintained and exposed only for tables that uses tablets. Refs: scylladb#16131 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17056	2024-02-01 10:46:53 +02:00
Asias He	2888c3086c	utils: Add uuid_xor_to_uint32 helper Convert the uuid to a uint32_t using xor. It is useful to get a uint32_t number from the uuid. Refs: #16927 Closes scylladb/scylladb#17049	2024-02-01 10:27:55 +02:00
Botond Dénes	f5917b215f	Merge 'replica, tablet_allocator: do not compare unsigned with signed' from Kefu Chai this series addresses couple `-Wsign-compare` warnings surfaced in the tree. Closes scylladb/scylladb#17091 * github.com:scylladb/scylladb: tablet_allocator: do not compare signed and unsigned replica: table: do not compare signed with unsigned	2024-02-01 10:26:04 +02:00
Kefu Chai	7a8e8c2ced	db: add formatter for db::write_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::write_type`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17093	2024-02-01 10:22:45 +02:00
Kefu Chai	005d231f96	db: add formatter for gms::application_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::application_state`, but its operator<< is preserved, as it is still used by the generic homebrew formatter for `std::unordered_map<>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17096	2024-02-01 10:02:25 +02:00
Pavel Emelyanov	ab7ce3d1fa	cql-pytest: Add skip_without_tablets fixture It's opposite to skip_with_tablets one and thus also depends on scylla_only one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 10:58:13 +03:00
Lakshmi Narayanan Sreethar	dbe758d309	directories: prevent inode cache fragmentation by orderly verifying data directory contents During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Prevent this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes #14506 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 12:20:23 +05:30
Lakshmi Narayanan Sreethar	74a4085426	directories: skip verifying data directory contents during startup This is in preparation for a subsequent patch that will verify the contents of the data directory in a specific order. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:54:59 +05:30
Lakshmi Narayanan Sreethar	2e3d2498f4	directories: co-routinize create_and_verify Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:41:10 +05:30
Kefu Chai	5e0b3671d3	storage_service: fall back to local cleanup in cleanup_all before this change, if no keyspaces are specified, scylla-nodetool just enumerate all non-local keyspaces, and call "/storage_service/keyspace_cleanup" on them one after another. this is not quite efficient, as each this RESTful API call force a new active commitlog segment, and flushes all tables. so, if the target node of this command has N non-local keyspaces, it would repeat the steps above for N times. this is not necessary. and after a topology change, we would like to run a global "nodetool cleanup" without specifying the keyspace, so this is a typical use case which we do care about. to address this performance issue, in this change, we improve an existing RESTful API call "/storage_service/cleanup_all", so if the topology coordinator is not enabled, we fall back to a local cleanup to cleanup all non-local keyspaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	4f90a875f6	compaction: format flush_mode without the helper since flush_mode is moved out of major_compaction_task_impl, let's drop the helper hosted in that class as well, and implement the formatter witout it. please note, the `__builtin_unreachable()` is dropped. it should not change the behavior of the formatter. we don't put it in the `default` branch in hope that `-Wswitch` can warn us in the case when another enum of `flush_mode` is added, but we fail to handle it somehow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	b39cc01bb3	compaction_manager: flush all tables before cleanup according to the document "nodetool cleanup" > Triggers removal of data that the node no longer owns currently, scylla performs cleanup by rewriting the sstables. but commitlog segments may still contain the mutations to the tables which are dropped during sstable rewriting. when scylla server restarts, the dirty mutations are replayed to the memtable. if any of these dirty mutations changes the tables cleaned up. the stale data are reapplied. this would lead to data resurrection. so, in this change we following the same model of major compaction: 1. force new active segment, 2. flush all tables 3. perform cleanup using compaction, which rewrites the sstables of specified tables because we already `flush()` all tables in `cleanup_keyspace_compaction_task_impl::run()`, there is no need to call `flush()` again, in `table::perform_cleanup_compaction()`, so the `flush()` call is dropped in this function, and the tests using this function are updated to call `flush()` manually to preserve the existing behavior. there are two callers of `cleanup_keyspace_compaction_task_impl`, * one is `storage_service::sstable_cleanup_fiber()`, which listens for the events fired by topology_state_machine, which is in turn driven by, for instance, "/storage_service/cleanup_all" API. which cleanup all keyspaces in one after another. * another is "/storage_service/keyspace_cleanup", which cleans up the specified keyspace. in the first use case, we can force a new active segment for a single time, so another parameter to the ctor of `cleanup_keyspace_compaction_task_impl` is introduced to specify if the `db.flush_all_tables()` call should be skiped. please note, there are two possible optimizations, 1. force new active segment only if the mutations in it touches the tables being cleaned up 2. after forcing new active segment, only flush the (mem)tables mutated by the non-active segments but let's leave them for following-up changes. this change is a minimal fix for data resurrection issue. Fixes #16757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	34d80690fa	replica: table: pass do_flush to table::perform_cleanup_compaction() this parameter defaults to do_flush::yes, so the existing behavior is preserved. and this change prepares for a change which flushes all tables before performing cleanup on the tables per-demand. please note, we cannot pass compaction::flush_mode to this function, as it is used by compaction/task_manager_module.hh, if we want to share it by both database.hh and compaction/task_manager_module.hh, we would have to find it a new home. so `table::do_flush` boolean tag is reused instead. Refs #16757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	9afec2e3e7	api, compaction: promote flush_mode so that this enum type can be shared by other task(s) as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	110d2e52be	tablet_allocator: do not compare signed and unsigned `available_shards` could be negative when `resize_plan` is empty, and the loop to build `resize_plan` stops at the next iteration after `available_shards` is assigned with a negative number. so, instead of making it an `unsigned`, let's just compare it using `std::cmp_less()`. this change should silence following warning: ``` /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -g -O0 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wignored-qualifiers -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -MF service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o.d -o service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -c /home/kefu/dev/scylladb/service/tablet_allocator.cc /home/kefu/dev/scylladb/service/tablet_allocator.cc:529:60: error: comparison of integers of different signs: 'long' and 'const size_t' (aka 'const unsigned long') [-Werror,-Wsign-compare] 529 \| if (resize_plan.size() > 0 && available_shards < size_desc.shard_count) { \| ~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:01:19 +08:00
Kefu Chai	493a608417	replica: table: do not compare signed with unsigned this change helps to silence follow warning: ``` /home/kefu/dev/scylladb/replica/table.cc:1952:26: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] 1952 \| for (auto id = 0; id < _storage_groups.size(); id++) { \| ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:01:19 +08:00
Asias He	e1fc91bea9	streaming: Verify stream consumer runs inside streaming group This will catch schedule group leaks by accident. Refs: 17090	2024-02-01 10:37:24 +08:00
Asias He	f103f75ed8	storage_service: Run stream_ranges cmd in streaming group Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group. Fixes #17090	2024-02-01 10:20:02 +08:00
Kamil Braun	b2c02d8268	Merge 'schema: column_mapping::{static,regular}_column_at(): use on_internal_error()' from Botond Dénes Instead of std::out_of_range(). Accessing a non-existing column is a serious bug and the backtrace coming with `on_internal_error()` can be very valuable when debugging it. As can be the coredump that is possible to trigger with `--abort-on-internal-error`. This change follows another similar change to `schema::column_at()`. This should help us get to the bottom of the mysterious repair failures caused by invalid column access, seen in https://github.com/scylladb/scylladb/issues/16821. Refs: https://github.com/scylladb/scylladb/issues/16821 Closes scylladb/scylladb#17080 * github.com:scylladb/scylladb: schema: column_mapping::{static,regular}_column_at(): use on_internal_error() schema: column_mapping: move column accessors out-of-line	2024-01-31 16:29:15 +01:00
Nadav Har'El	458fd0c2f7	utils: replace assert() by on_internal_error() In issue #17035 we had a situation where a certain input timestamp could result in the create_time() utility function getting called on a timestamp that cannot be represented as timeuuid, and this resulted in an assertion failure, and a crash. I guess we used an assertion because we believed that callers try to avoid calling this function on excessively large timestamps, but evidentally, they didn't tried hard enough and we got a crash. The code in UUID_gen.hh changed a lot over the years and has become very convoluted and it is almost impossible to understand all the code paths that could lead to this assertion failures. So it's better to replace this assertion by a on_internal_error, which by default is just an exception - and also logs the backtrace of the failure. Issue #17035 would have been much less serious if we had an exception instead of an assert. Refs #17035 Refs #7871, Refs #13970 (removes an assert) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:28 +02:00
Nadav Har'El	259811b6ec	utils: add on_internal_error with common logger Seastar's on_internal_error() is a useful replacement for assert() but it's inconvenient that it requires each caller to supply a logger - which is often inconvenient, especially when the caller is a header file. So in this patch we introduce a utils::on_internal_error() function which is the same as seastar::on_internal_error() (the former calls the latter), except it uses a single logger instead of asking the caller to pass a logger. Refs #7871 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:09 +02:00
Patryk Wrobel	c6de20a608	replica/mutation_dump.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::stringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Moreover, it introduces usage of std::stringstream::view() when checking if the stream contains some characters. It skips another copy of the underlying string, because std::string_view is returned. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17084	2024-01-31 14:58:20 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Avi Kivity	c8397f0287	Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer. If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint. Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on. A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split. Tablet metadata gains 2 new fields for managing this: resize_type: resize decision type, can be either of "merge", "split", or "none". resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator). A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ). When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready. When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split"). The split monitor will start splitting of compaction groups (using mechanism introduced here: `081f30d149`) for the table. And once splitting work is completed, the table updates its local state as having completed split. When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step). Fixes #16536. Closes scylladb/scylladb#16580 * github.com:scylladb/scylladb: test/topology_experimental_raft: Add tablet split test replica: Bypass reshape on boot with tablets temporarily replica: Fix table::compaction_group_for_sstable() for tablet streaming test/topology_experimental_raft: Disable load balancer in test fencing replica: Remap compaction groups when tablet split is finalized service: Split tablet map when split request is finalized replica: Update table split status if completed split compaction work storage_service: Implement split monitor topology_cordinator: Generate updates for resize decisions made by balancer load_balancer: Introduce metrics for resize decisions db: Make target tablet size a live-updateable config option load_balancer: Implement resize decisions service: Wire table_resize_plan into migration_plan service: Introduce table_resize_plan tablet_mutation_builder: Add set_resize_decision() topology_coordinator: Wire load stats into load balancer storage_service: Allow tablet split and migration to happen concurrently topology_coordinator: Periodically retrieve table_load_stats locator: Introduce topology::get_datacenter_nodes() storage_service: Implement table_load_stats RPC replica: Expose table_load_stats in table replica: Introduce storage_group::live_disk_space_used() locator: Introduce table_load_stats tablets: Add resize decision metadata to tablet metadata locator: Introduce resize_decision	2024-01-31 13:59:56 +02:00
Kefu Chai	bd71e0b794	tracing: add formatter for tracing::span_id before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `tracing::span_id`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17058	2024-01-31 13:43:46 +02:00
Kefu Chai	f5e3a2d98e	test.py: add `boost_tests()` to suite this change is a cleanup. so it only returns tests, to be more symmetric with `junit_tests()`. this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`. as only the BoostTest will be asked for `get_test_case()` after this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16961	2024-01-31 13:43:21 +02:00
Botond Dénes	181f68f248	Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Also add a test. Fixes scylladb/scylladb#16683 Closes scylladb/scylladb#17072 * github.com:scylladb/scylladb: test: add test for fixing a broken group 0 snapshot raft_group0: trigger snapshot if existing snapshot index is 0	2024-01-31 13:04:59 +02:00
Kefu Chai	843d74428d	configure.py: s/-DBOOST_TEST_DYN_LINK/-DBOOST_ALL_DYN_LINK/ we add `-DBOOST_TEST_DYN_LINK` to the cflags when `--static-boost` is not passed to `configure.py`. but we don't never pass this option to `configure.py` in our CI/CD. also, we don't install `boost-static` in `install-dependencies.sh`, so the linker always use the boost shared libraries when building scylla and other executables in this project. this fact has been verified with the latest master HEAD, after building scylla from `build.ninja` which was in turn created using `configure.py`. Seastar::seastar_testing exposes `Boost::dynamic_linking` in its public interface, and `Boost::dynamic_linking` exposes `-DBOOST_ALL_DYN_LINK` as one of its cflags. so, when building testings using CMake, the tests are compiled with `-DBOOST_ALL_DYN_LINK`, while when building tests using `configure.py`, they are compiled with `-DBOOST_TEST_DYN_LINK`. the former is exposed by `Boost::dynamic_linking`, the latter is hardwired using `configure.py`. but the net results are identical. it would be better to use identical cflags on these two building systems. so, let's use `-DBOOST_ALL_DYN_LINK` in `configure.py` also. furthermore, this is what non-static-boost implies. please note, we don't consume the cflags exposed by `seastar-testing.pc`, so they don't override the ones we set using `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17070	2024-01-31 12:21:31 +02:00
Botond Dénes	ecf654ea11	schema: column_mapping::{static,regular}_column_at(): use on_internal_error() Instead of std::out_of_range(). Accessing a non-existing column is a serious bug and the backtrace coming with on_internal_error() can be very valuable when debugging it. As can be the coredump that is possible to trigger with --abort-on-internal-error. This change follows another similar change to schema::column_at().	2024-01-31 05:12:33 -05:00
Botond Dénes	03ed9f77ff	schema: column_mapping: move column accessors out-of-line To faciliate further patching.	2024-01-31 05:06:34 -05:00
Lakshmi Narayanan Sreethar	b5e1097858	build: cmake: include raft.cc in api library When building with cmake, include the raft source files introduced by commit `617e0913` as sources for api library target. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17075	2024-01-31 11:39:41 +02:00
Nadav Har'El	827c20467c	utils: add a timeuuid minimum, like we had maximum Our time-handling code in UUID_gen.hh is very fragile for very large timestamps, because the different types - such as Cassandra "timestamp" and Timeuuid use very different resolution and ranges. In issue #17035 we discovered a situation where a certain CQL "timestamp"-type value could cause an assertion-failure and a crash in the create_time() function that creates a timeuuid - because that timestamp didn't fit the place we have in timeuuid. We already added in the past a limit, UUID_UNIXTIME_MAX, beyond which we refuse timestamps, to avoid these assertions failure. However, we missed the possibility of negative timestamps (which are allowed in CQL), and indeed a negative timestamp (or a timestamp which was "wrapped" to a negative value) is what caused issue #17035. So this patch adds a second limit, UUID_UNIXTIME_MIN - limiting the most negative timestamp that we support to well below the area which causes problems, and adds tests that reproduce #17035 and that we didn't break anything else (e.g., negative timestamps are still allowed - just not extremely negative timestamps). Fixes #17035. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 11:32:26 +02:00
Kamil Braun	bb22e06a9e	Merge 'abort failed rebuild instead of retrying it forever' from Gleb Add error handling to rebuild instead of retrying it until succeeds. * 'gleb/rebuild-fail-v2' of github.com:scylladb/scylla-dev: test: add test for rebuild failure test: add expected_error to rebuild_node operation topology_coordinator: Propagate rebuild failure to the initiator	2024-01-31 10:07:28 +01:00
Nadav Har'El	47955642d9	test/cql-pytest: tests for "date" type This patch adds a few simple tests for the values of the "date" column type, and how it can be initialized from string or integers, and what do those values mean. Two of the tests reproduce issue #17066, where validation is missing for values that don't fit in a 32-bit unsigned integer. Refs #17066 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 10:58:02 +02:00
Patryk Wrobel	1b6ab65c51	reader_concurrency_semaphore.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::stringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17064	2024-01-31 09:31:50 +02:00
Botond Dénes	f8d3070559	Merge 'Fix flakiness in test_raft_snapshot_request' from Kamil Braun Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added. Closes scylladb/scylladb#17071 * github.com:scylladb/scylladb: test_raft_snapshot_request: fix flakiness test: topology/util: update comment for `reconnect_driver`	2024-01-31 09:30:27 +02:00
Pavel Emelyanov	84ddc37130	utils: Coroutinize disk_sanity() It's pretty hairy in its future-promises form, with coroutines it's much easier to read Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17052	2024-01-31 09:20:21 +02:00
Kefu Chai	8a9f13c187	redis: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17057	2024-01-31 09:17:18 +02:00
Kefu Chai	b931d93668	treewide: fix misspellings in code comments these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17004	2024-01-31 09:16:10 +02:00
Kamil Braun	57d5aa5a68	test: add test for fixing a broken group 0 snapshot In a cluster with group 0 with snapshot at index 0 (such group 0 might be established in a 5.2 cluster, then preserved once it upgrades to 5.4 or later), no snapshot transfer will be triggered when a node is bootstrapped. This way to new node might not obtain full schema, or obtain incorrect schema, like in scylladb/scylladb#16683. Simulate this scenario in a test case using the RECOVERY mode and error injections. Check that the newly added logic for creating a new snapshot if such situation is detected helps in this case.	2024-01-30 16:44:01 +01:00
Kamil Braun	98d75c65af	raft_group0: trigger snapshot if existing snapshot index is 0 The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Fixes scylladb/scylladb#16683	2024-01-30 16:35:54 +01:00
Kamil Braun	74bf60a8ca	test_raft_snapshot_request: fix flakiness Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added.	2024-01-30 16:21:24 +01:00
Kamil Braun	39339b9f70	test: topology/util: update comment for `reconnect_driver` The issues mentioned in the comment before are already fixed. Unfortunately, there is another, opposite issue which this function can be used for. The previous issue was about the existing driver session not reconnecting. The current issue is about the existing driver session reconnecting too much... (and in the middle of queries.)	2024-01-30 15:36:48 +01:00
Piotr Smaroń	35ba037724	config: fix a typo in --role-manager's description Closes scylladb/scylladb#17063	2024-01-30 16:13:33 +02:00
Kamil Braun	cf3f26dc94	test_maintenance_mode: fix flakiness Wait until CQL is available and nodes see each other before trying to perform a query. Closes scylladb/scylladb#17059	2024-01-30 14:11:14 +02:00
Gleb Natapov	8b50613465	test: add test for rebuild failure	2024-01-30 11:04:19 +02:00
Gleb Natapov	d62204e758	test: add expected_error to rebuild_node operation	2024-01-30 11:04:19 +02:00
Gleb Natapov	51c40034f5	topology_coordinator: Propagate rebuild failure to the initiator Do not retry rebuild endlessly, but report the error instead.	2024-01-30 11:04:19 +02:00
Kefu Chai	90c0e83f9a	thrift: remove unused namespace definition thrift_transport is never used, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17050	2024-01-30 09:16:47 +02:00
Michał Chojnowski	904bb25987	test: test_tablet_cleanup: wait for servers to see each other before multi-node queries Waiting for CQL connections is not enough. For the queries to succeed, nodes must see each other. We have to wait for this, otherwise the test will be flaky. Fixes #17029 Closes scylladb/scylladb#17040	2024-01-30 08:56:01 +02:00
Tomasz Grabiec	36f218c83b	Merge 'main: refuse startup when tablet resharding is required' from Botond Dénes We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.) Refs: #16739 Fixes: #16843 Closes scylladb/scylladb#17008 * github.com:scylladb/scylladb: test/topolgy_experimental_raft: test_tablets.py: add test for resharding test/pylib: manager[_client]: add update_cmdline() main: refuse startup when tablet resharding is required locator: tablets: add check_tablet_replica_shards()	2024-01-29 23:39:41 +01:00
Pavel Emelyanov	370fbd346c	Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel `db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact. This PR: - extends the public interface of utils::directories class to provide required directory paths to the users - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs Fixes: scylladb#5626 Closes scylladb/scylladb#16787 * github.com:scylladb/scylladb: utils/directories: make utils::directories::set an internal type db::config: keep dir paths unchanged cql_transport/controler: use utils::directories to get paths of dirs service/storage_proxy: use utils::directories to get paths of dirs api/storage_service.cc: use utils::directories to get paths of dirs tools/scylla-sstable.cc: use utils::directories to get paths db/commitlog: do not use db::config to get dirs Use utils::directories to get dirs paths in replica::database Allow utils::directories to provide paths to dirs Clean-up of utils::directories	2024-01-29 18:01:15 +03:00
Kamil Braun	0912d2a2c6	Merge 'raft topology: make left_token_ring a transition state' from Patryk Jędrzejczak When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this PR is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Fixes scylladb/scylladb#16944 Closes scylladb/scylladb#17009 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: document the left_token_ring state topology_coordinator: adjust reason string in left_token_ring handler raft topology: make left_token_ring a transition state topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes	2024-01-29 15:29:01 +01:00
Kefu Chai	819fc95a67	reader: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17036	2024-01-29 16:21:42 +02:00
Kefu Chai	43094d2023	db: add formatter for db::read_repair_decision before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::read_repair_decision`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17033	2024-01-29 15:43:51 +02:00
Botond Dénes	d202d32f81	Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun This allows the user of `raft::server` to cause it to create a snapshot and truncate the Raft log (leaving no trailing entries; in the future we may extend the API to specify number of trailing entries left if needed). In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables). The PR adds the API to `raft::server` and a HTTP endpoint that uses it. In a follow-up PR, we plan to modify group 0 server startup logic to automatically call this API if it sees that no snapshot is present yet (to automatically fix the aforementioned 5.2 deployments once they upgrade.) Closes scylladb/scylladb#16816 * github.com:scylladb/scylladb: raft: remove `empty()` from `fsm_output` test: add test for manual triggering of Raft snapshots api: add HTTP endpoint to trigger Raft snapshots raft: server: add `trigger_snapshot` API raft: server: track last persisted snapshot descriptor index raft: server: framework for handling server requests raft: server: inline `poll_fsm_output` raft: server: fix indentation raft: server: move `io_fiber`'s processing of `batch` to a separate function raft: move `poll_output()` from `fsm` to `server` raft: move `_sm_events` from `fsm` to `server` raft: fsm: remove constructor used only in tests raft: fsm: move trace message from `poll_output` to `has_output` raft: fsm: extract `has_output()` raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` raft: server: pass `*_aborted` to `set_exception` call	2024-01-29 15:06:04 +02:00
Beni Peled	8009170d3a	docs: update the installation instructions with the new gpg 2024 key Closes scylladb/scylladb#17019	2024-01-29 14:37:25 +02:00
Kefu Chai	6f55d68dd9	.git: add more skip words these words are either * shortened words: strategy => strat, read_from_primary => fro * or acronyms: node_or_data => nd before we rename them with better names, let's just add them to the ignore word list. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17002	2024-01-29 14:37:03 +02:00
Patryk Wrobel	781a6a5071	utils/directories: make utils::directories::set an internal type Previously, utils::directories::set could have been used by clients of utils::directories class to provide dirs for creation. Due to moving the responsibility for providing paths of dirs from db::config to utils::directories, such usage is no longer the case. This change: - defines utils::directories::set in utils/directories.cc to disallow its usage by the clients of utils::directories - makes utils::directories::create_and_verify() member function private; now it is used only by the internals of the class - introduces a new member function to utils::directories called create_and_verify_sharded_directory() to limit the functionality provided to clients Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	dc8d5ffaf6	db::config: keep dir paths unchanged This change is intended to ensure, that db::config fields related to directories are not changed. To achieve that a member function called setup_directories() is removed. The responsibility for directories paths has been moved to utils::directories, which may generate default paths if the configuration does not provide a specific value. Fixes: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	0f3b00f9ad	cql_transport/controler: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in cql_transport/controler. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:38 +01:00
Patryk Wrobel	f08768e767	service/storage_proxy: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in service/storage_proxy. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	5ac3d0f135	api/storage_service.cc: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories in api/storage_service.cc in order to get the paths of directories. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	51fa108df7	tools/scylla-sstable.cc: use utils::directories to get paths This change replaces usage of db::config with usage of utils::directories to get paths of directories in tools/scylla-sstable.cc. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	804afffb11	db/commitlog: do not use db::config to get dirs This change removes usage of db::config to get path of commitlog_directory. Instead, it introduces a new parameter to directly pass the path to db::commitlog::config::from_db_config(). Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	9483d149af	Use utils::directories to get dirs paths in replica::database This change replaces the usage of db::config with usage of utils::directories to get dirs paths in replica::database class. Moreover, it adjusts tests that require construction of replica::database - its constructor has been changed to accept utils::directories object. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1cd676e438	Allow utils::directories to provide paths to dirs This change extends utils::directories class in the following way: - adds new member variables that correspond to fields from db::config that describe paths of directories - introduces a public interface to retrieve the values of the new members - allows construction of utils::directories object based on db::config to setup internal member variables related to paths to dirs The new members of utils::directories are overriden when the provided values are empty. The way of setting paths is taken from db::config. To ensure that the new logic works correctly `utils_directories_test` has been created. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1b0ccaf4f2	Clean-up of utils::directories This change is intended to clean-up files in which utils::directories class is defined to ease further extensions. The preparation consists of: - removal of `using namespace` from directories.hh to avoid namespace pollution in files, that include this header - explicit inclusion of headers, that were missing or were implicitly included to ensure that directories.hh is self-sufficient - defining directories::set class outside of its parent to improve readability Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Botond Dénes	fd66ce1591	test/topolgy_experimental_raft: test_tablets.py: add test for resharding Check that scylla refuses to start when the shard count is reduced.	2024-01-29 07:04:33 -05:00
Botond Dénes	a7a5aada2a	test/pylib: manager[_client]: add update_cmdline() Similar to the existing update_config(). Updates the command-line arguments of the specified nodes, merging the new options into the existing ones. Needs a restart to take effect.	2024-01-29 07:04:33 -05:00
Botond Dénes	8a439fc2a8	main: refuse startup when tablet resharding is required We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.)	2024-01-29 07:04:33 -05:00
Botond Dénes	95b6aeebae	locator: tablets: add check_tablet_replica_shards() Checks that all tablets with a replica on the this node, have a valid replica shard (< smp::count). Will be used to check whether the node can start-up with the current shard-count.	2024-01-29 07:04:33 -05:00
Patryk Jędrzejczak	7c10cae6c4	docs: dev: topology-over-raft: document the left_token_ring state In one of the previous patches, we changed the `left_token_ring` state from a node state to a transition state. We document it in this patch. The node state wasn't documented, so there is nothing to remove.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	9b2d1a20a3	topology_coordinator: adjust reason string in left_token_ring handler We were using the "finished decommission node" reason string for a failed bootstrap and replace.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	b0eef50b2e	raft topology: make left_token_ring a transition state A node can be in the `left_token_ring` state after: - a finished decommission, - a failed bootstrap, - a failed replace. When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this patch is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Regarding implementation, most of the changes are simple refactoring. The less obvious are: - Before this patch, in `system_keyspace::left_topology_state`, we had to keep the ignored nodes' IDs for replace to ensure that the replacing node will have access to it after moving to the `left_token_ring` state, which happens when replace fails. We don't need this workaround anymore. When we enter the new `left_token_ring` transition state, the new node will still be in the `decommissioning` state, so it won't lose its request param. - Before this patch, a decommissioning node lost its tokens while moving to the `left_token_ring` state. After the patch, it loses tokens while still being in the `decommissioning` state. We ensure that all `decommissioning` handlers correctly handle a node that lost its tokens. Moving the `left_token_ring` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There are only three changes: - adding `auto node = get_node_to_work_on(std::move(guard));`, - adding `builder.del_transition_state()`, - changing error logged when `global_token_metadata_barrier` fails.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	12eb0738cf	topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes The `exclude_nodes` variable was unused, but it wasn't a bug. The `left_token_ring` and `rollback_to_normal` handlers correctly compute excluded nodes on their own.	2024-01-29 10:39:06 +01:00
Kefu Chai	0cbf8f75f0	db: add formatter for dht::decorated_key and repair_sync_boundary before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for dht::decorated_key and repair_sync_boundary. please note, before this change, repair_sync_boundary was using the operator<< based formatter of `dht::decorated_key`, so we are updating both of them in a single commit. because we still use the homebrew generic formatter of vector<> in to format vector<repair_sync_boundary> and vector<dht::decorated_key>, so their operator<< are preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16994	2024-01-29 11:11:41 +02:00
Tzach Livyatan	06a9a925a5	Update link to sizing / pricing calc Closes scylladb/scylladb#17015	2024-01-29 11:07:20 +02:00
Kefu Chai	b5ff098f28	thrift: add formatter for cassandra::ConsistencyLevel::type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cassandra::ConsistencyLevel::type. please note, the operator<< for `cassandra::ConsistencyLevel::type` is generated using `thrift` command line tool, which does not emit specialization for fmt::formatter yet, so we need to use `fmt::ostream_formatter` to implement the formatter for this type. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17013	2024-01-29 10:10:35 +02:00
Pavel Emelyanov	3abdb3c7ee	tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets It's now unused, string with initial tablets its parsed elsewhere Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17010	2024-01-29 10:03:38 +02:00
Kefu Chai	912c588975	thrift: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17012	2024-01-29 10:02:30 +02:00
Kefu Chai	abb12979f8	raft: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17011	2024-01-29 10:00:56 +02:00
Kefu Chai	8f38bd5376	commitlog: add formatter for db::replay_position before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::replay_position`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17014	2024-01-29 09:59:30 +02:00
Botond Dénes	d3c1be9107	Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El This series does a similar change to Alternator as was done recently to CQL: 1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to. 3. In particular, all test/alternator tests will also automatically run with tablets enabled 4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets. 5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later). Fixes #16355 Closes scylladb/scylladb#16900 * github.com:scylladb/scylladb: test/alternator: add "--vnodes" option to run script alternator: use tablets by default, if available test/alternator: run some tests without tablets	2024-01-29 09:22:13 +02:00
Kefu Chai	cb5453d534	.git: only allow codespell to run on master branch so that non-master branches are not read by 3rd-party tools unless they are audited. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16999	2024-01-29 09:04:20 +02:00
Kefu Chai	f96d25a0a7	tool: check for existence of keyspace before getting it in general, user should save output of `DESC foo.bar` to a file, and pass the path to the file as the argument of `--schema-file` option of `scylla sstable` commands. the CQL statement generated from `DESC` command always include the keyspace name of the table. but in case user create the CQL statement manually and misses the keyspace name. he/she would have following assertion failure ``` scylla: cql3/statements/cf_statement.cc:49: virtual const sstring &cql3::statements::raw::cf_statement::keyspace() const: Assertion `_cf_name->has_keyspace()' failed. ``` this is not a great user experience. so, in this change, we check for the existence of keyspace before looking it up. and throw a runtime error with a better error mesage. so when the CQL statement does not have the keyspace name, the new error message would look like: ``` error processing arguments: could not load schema via schema-file: std::runtime_error (tools::do_load_schemas(): CQL statement does not have keyspace specified) ``` since this check is only performed by `do_load_schemas()` which care about the existence of keyspace, and it only expects the CQL statement to create table/keyspace/type, we just override the new `has_keyspace()` method of the corresponding types derived from `cf_statement`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16981	2024-01-29 09:02:01 +02:00
Anna Stuchlik	dfa88ccc28	doc: document nodetool resetlocalschema This adds the documentation for the nodetool resetlocalschema command. The syntax description is based on the description for Cassandra and the ScyllaDB help for nodetool. Fixes https://github.com/scylladb/scylladb/issues/16286 Closes scylladb/scylladb#16790	2024-01-28 21:09:02 +01:00
Kefu Chai	fe3bc00045	topology_coordinator: fix misspellings in log these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17006	2024-01-26 16:50:39 +02:00
Dawid Medrek	b92fb3537a	main: Postpone start-up of hint manager In this commit, we postpone the start-up of the hint manager until we obtain information about other nodes in the cluster. When we start the hint managers, one of the things that happen is creating endpoint managers -- structures managed by db::hints::manager. Whether we create an instance of endpoint manager depends on the value returned by host_filter::can_hint_for, which, in turn, may depend on the current state of locator::topology. If locator::topology is incomplete, some endpoint managers may not be started even though they should (because the target node IS part of the cluster and we SHOULD send hints to it if there are some). The situation like that can happen because we start the hint managers too early. This commit aims to solve that problem. We only start the hint managers when we've gathered information about the other nodes in the cluster and created the locator::topology using it. Hinted Handoff is not negatively affected by these changes since in between the previous point of starting the hint managers and the current one, all of the mutations performed by service::storage_proxy target the local node, so no hints would need to be generated anyway. Fixes scylladb/scylladb#11870 Closes scylladb/scylladb#16511	2024-01-26 12:49:40 +01:00
Botond Dénes	c6fd4dffbb	Merge 'Remove anonymous namespaces from headers' from Patryk Wróbel Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This PR removes unnamed namespaces from header files. References: - [CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf21-dont-use-an-unnamed-anonymous-namespace-in-a-header) - [SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file"](https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file) Closes scylladb/scylladb#16998 * github.com:scylladb/scylladb: utils/config_file_impl.hh: remove anonymous namespace from header mutation/mutation.hh: remove anonymous namespace from header	2024-01-26 13:20:17 +02:00
Kefu Chai	a9d781d70f	test/nodetool: only test "storage_service/cleanup_all" with scylla this RESTful API is a scylla specific extension and is only used by scylla-nodetool. currently, the java-based nodetool does not use it at all, so mark it with "scylla_only". one can verify this change with: ``` pytest --mode=debug --nodetool=cassandra test_cleanup.py::test_cleanup ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17001	2024-01-26 13:19:15 +02:00
Botond Dénes	582ddc70ec	Merge 'test/nodetool: return a randomized address if not running with unshare' from Kefu Chai we should allow user to run nodetool tests without `test.py`. but there are good chance that the host could be reused by multiple tests or multiple users who could be using port 12345. by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Closes scylladb/scylladb#16996 * github.com:scylladb/scylladb: test/nodetool: return a randomized address if not running with unshare test/nodetool: return an address from loopback_network fixture	2024-01-26 13:15:58 +02:00
Kefu Chai	9ee6c00c84	docs: fix misspellings these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17005	2024-01-26 13:14:21 +02:00
Kefu Chai	72cec22932	repair: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16993	2024-01-26 13:12:38 +02:00
Kamil Braun	4f736894e1	Merge 'Add maintenance mode' from Mikołaj Grzebieluch In this mode, the node is not reachable from the outside, i.e. * it refuses all incoming RPC connections, * it does not join the cluster, thus * all group0 operations are disabled (e.g. schema changes), * all cluster-wide operations are disabled for this node (e.g. repair), * other nodes see this node as dead, * cannot read or write data from/to other nodes, * it does not open Alternator and Redis transport ports and the TCP CQL port. The only way to make CQL queries is to use the maintenance socket. The node serves only local data. To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file. REST API works as usual, but some routes are disabled: * authorization_cache * failure_detector * hinted_hand_off_manager This PR also updates the maintenance socket documentation: * add cqlsh usage to the documentation * update the documentation to use `WhiteListRoundRobinPolicy` Fixes #5489. Closes scylladb/scylladb#15346 * github.com:scylladb/scylladb: test.py: add test for maintenance mode test.py: generalize usage of cluster_con test.py: when connecting to node in maintenance mode use maintenance socket docs: add maintenance mode documentation main: add maintenance mode main: move some REST routes initialization before joining group0 message_service: add sanity check that rpc connections are not created in the maintenance mode raft_group0_client: disable group0 operations in the maintenance mode service/storage_service: add start_maintenance_mode() method storage_service: add MAINTENANCE option to mode enum service/maintenance_mode: add maintenance_mode_enabled bool class service/maintenance_mode: move maintenance_socket_enabled definition to seperate file db/config: add maintenance mode flag docs: add cqlsh usage to maintenance socket documentation docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy	2024-01-26 11:02:34 +01:00
Botond Dénes	f94acc2eb4	test/cql-pytest: conftest.py: remove xfail_tablets fixture No test uses it and going forward we should not add tests wchich do not work with tablets.	2024-01-26 04:02:40 -05:00
Botond Dénes	dcaf308a59	test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests The tests in this file, that are related to partition-scans are failing with tablets, and were hence disabled with xfail_tablets. This means we are loosing test coverage, so parametrize these tests to run with both vnodes and tablets, and targetedly mark as xfail only when running with tablets.	2024-01-26 04:02:40 -05:00
Botond Dénes	3527d0aaed	test/cql-pytest: test_describe.py: re-enable disabled tests This test file has two tests disabled: * test_desc_cluster - due to #16789 * test_whitespaces_in_table_options - due to #16317 They are disabled via xfail, because they do not work with tablets. This means we loose test coverage of the respective functionality. This patch re-enables the two tests, by parametrizing them to run with both vnodes and tablets: * test_desc_cluster - when run with tablets, endpoint info is not validated. The test is still useful because it checks that DESC CLUSTER doesn't break with tablets. A FIXME with a link to #16789 is left. * test_whitespaces_in_table_options - marked xfail when run with tablets, but not when run with vnodes, thus we re-gain the test coverage.	2024-01-26 04:02:40 -05:00
Botond Dénes	a3b75e863b	test/cql-pytest: test_cdc.py: re-enable disabled tests The tests in this file are currently all marked with xfail_tablets, because tablets are not enabled by default in the cql-pytest suite and CDC doesn't currently work with tablets at all. This however means that the CDC functionality looses test coverage. So instead, of a blanket xfail, prametrize these tests to run with both vnodes and tablets, and add a targeted xfail for the tablets parameter. This way the no coverage is lost, the tests are still running with vnode (and will fail if regressions are introduced), and they are allowed to xfail with tablets enabled. We could simply make these tests only run with vnodes for now. But looking forward, after the CDC functionality is fixed to work with tablets, we want to verify that it works with both vnodes and tablets. So we run the test with both and leave the xfail as a remainder that a fix is required.	2024-01-26 04:02:40 -05:00
Botond Dénes	631f7c99f5	test/cql-pytest: add parameter support to test_keyspace Tests can now request to be run against both tablets and vnodes, via: @pytest.mark.parametrize("test_keyspace", ["tablets", "vnodes"], indirect=True) This will set request.param for the test_keyspace fixture, which can create the keyspace according to the requested parameter. This way, tests can conveniently opt-in to be run against both replication methods. When not parameterized like this, the test_keyspace fixture will create a keyspace as before -- with tablets, if support is enabled.	2024-01-26 04:02:40 -05:00
Kefu Chai	637dd73079	sstable/storage: use fs::path to represent _dir and _temp_dir they are directories, and we are concating strings to build the paths to the sstable components. so it would be more elegant to use fs::path for manipulating paths. this change was inspired by the discussion on passing the relative path to sstable to `scylla sstables`, where we use the `path::parent_path()` as the dir of sstable, and then concatenate it with the filename component. but if the `parent_path()` method returns an empty string, we end up with a path like "/me-42-big-TOC.txt", which is not reachable. what we should be reading is "me-42-big-TOC.txt". so, we should better off either using `fs::path` or enforcing the absolute path. since we already using "/" as separator, and concatenating strings, this is an opportunity to switch over to `fs::path` to address the problem and to avoid the string concatenating. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16982	2024-01-26 09:54:41 +02:00
Patryk Wrobel	6faa178f10	utils/config_file_impl.hh: remove anonymous namespace from header Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This change aligns the code with the following guidelines: - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header" - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file" Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-26 08:44:44 +01:00
Patryk Wrobel	c218333afb	cql3/type_json.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::ofstringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16990	2024-01-26 09:41:09 +02:00
Kefu Chai	36e81f93d2	.git: do not apply codespell to licenses we should keep the licenses as they are, even with misspellings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16992	2024-01-26 09:39:27 +02:00
Patryk Wrobel	ba488b10ec	mutation/mutation.hh: remove anonymous namespace from header Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This change aligns the code with the following guidelines: - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header" - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file" Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-26 08:38:39 +01:00
Kefu Chai	01727a5399	test/nodetool: return a randomized address if not running with unshare we should allow user to run nodetool tests without `test.py`. but there are good chance that the host could be reused by multiple tests or multiple users who could be using port 12345. by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-26 13:32:47 +08:00
Kefu Chai	358d30fd29	test/nodetool: return an address from loopback_network fixture * rename "maybe_setup_loopback_network" to "server_address" * return an address from the fixture this change prepares for bringing back the randomized IP and port, in case users run this test without test.py, by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-26 13:20:37 +08:00
Raphael S. Carvalho	3b14c5b84a	test/topology_experimental_raft: Add tablet split test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	90c9a5d7af	replica: Bypass reshape on boot with tablets temporarily Without it, table loading fails as reshape mixes sstables from different tablets together, and now we have a guard for that: Unable to load SSTable ...-big-Data.db that belongs to tablets 1 and 31, The fix is about making reshape compaction group aware. It will be fixed, but not now. Refs #16966. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	2cb8a824ec	replica: Fix table::compaction_group_for_sstable() for tablet streaming It might happen that sstable being streamed during migration is not split yet, therefore it should be added to the main compaction group, allowing the streaming stage to start split work on it, and not fool the coordinator thinking it can proceed with split execution which would cause problems. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	4245ad333a	test/topology_experimental_raft: Disable load balancer in test fencing This is easier to reproducer after changes in load balancer, to emit resize decisions, which in turn results in topology version being incremented, and that might race with fencing tests that manipulate the topology version manually. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	85020861fc	replica: Remap compaction groups when tablet split is finalized When coordinator executes split, i.e. commit the new tablet map with each tablet split into two, all replicas must then proceed with remapping of compaction groups that were previously split. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	bf6f692f60	service: Split tablet map when split request is finalized When load balancer emits finalize request, the coordinator will now react to it by splitting each tablet in the current tablet map and then committing the new map. There can be no active migration while we do it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	9342792173	replica: Update table split status if completed split compaction work The table replica will say to coordinator that its split status is ready by loading the sequence number from tablet metadata into its local state, which is pulled periodically by the coordinator via RPC. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	cfa8200da5	storage_service: Implement split monitor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	e0de3dd844	topology_cordinator: Generate updates for resize decisions made by balancer Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:40 -03:00
Raphael S. Carvalho	3ef792c4e8	load_balancer: Introduce metrics for resize decisions Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	638e6e30cb	db: Make target tablet size a live-updateable config option Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	7ed5b44d52	load_balancer: Implement resize decisions This implements the ability in load balancer to emit split or merge requests, cancel ongoing ones if they're no longer needed, and also finalize those that are ready for the topology changes. That's all based on average tablet size, collected by coordinator from all nodes, and split and merge thresholds. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	8f7f74c490	service: Wire table_resize_plan into migration_plan Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	8d283b2593	service: Introduce table_resize_plan Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	ed2138a35a	tablet_mutation_builder: Add set_resize_decision() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	490d109055	topology_coordinator: Wire load stats into load balancer Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	ce353bf47c	storage_service: Allow tablet split and migration to happen concurrently Lack of synchronization could lead the coordinator to think that a pending replica in migration has split ready status, when in reality it escaped the check if it happens that the leaving replica escaped the split ready check, after the status has already been pulled at destination by coordinator. Example: 1) Coordinator pulls split status (ready) from destination replica 2) Migration sends a non-split tablet into destination 3) Coordinator pulls split status (ready) from source after transition stage of migration moved to cleanup (so there's no longer a leaving replica in it). 4) Migration completes, but compaction group is not split yet. Coordinator thinks destination is ready. To solve it, streaming now guarantees that pending replica is split before returning, so migration can only advance to next stage after the pending replica is split, if and only if there's a split request emitted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	2209c7440c	topology_coordinator: Periodically retrieve table_load_stats This implements the fiber that aggregates per-table stats that will be feeded into load balancer to make resize decisions (split, merge, or revoke ongoing ones). Initially, the stats will be refreshed every 60s, but the idea is that eventually we make the frequency table based, where the size of each table is taken into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	489a527e20	locator: Introduce topology::get_datacenter_nodes() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	9519a0c9e4	storage_service: Implement table_load_stats RPC This implements the RPC for collecting table stats. Since both leaving and pending replica can be accounted during tablet migration, the RPC handler will look at tablet transition info and account only either leaving or replica based on the tablet migration stage. Replicas that are not leaving or pending, of course, don't contribute to the anomaly in the reported size. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	4684615927	replica: Expose table_load_stats in table This is the table replica state that coordinator will aggregate from all nodes and feed into the load balancer. A tablet filter is added to not double account migrating tablets, so only one of pending or leaving tablet replica will be accounted based on current migration stage. More details can be known in the patch that will implement the filter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	beef9c9f70	replica: Introduce storage_group::live_disk_space_used() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	6c74fc4b82	locator: Introduce table_load_stats This is per table stats that will be aggregated from all nodes, by the coordinator, in order to help load balancer make resize decisions. size_in_bytes is the total aggregated table size, so coordinator becomes responsible for taking into account RF of each DC and also tablet count, for computing an accurate average size. split_ready_seq_number is the minimum sequence number among all replicas. If coordinator sees all replicas store the seq number of current split, then it knows all replicas are ready for the next stage in the split process. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	0d5ba1ee4b	tablets: Add resize decision metadata to tablet metadata The new metadata describes the ongoing resize operation (can be either of merge, split or none) that spans tablets of a given table. That's managed by group0, so down nodes will be able to see the decision when they come back up and see the changes to the metadata. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:06 -03:00
Raphael S. Carvalho	57582ac9c4	locator: Introduce resize_decision resize_decision is the metadata the says whether tablets of a table needs split, merge, or none. That will be recorded in tablet metadata, and therefore stored in group0. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:31:12 -03:00
Avi Kivity	03313d359e	Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay. This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations. Fixes #16752 Closes scylladb/scylladb#16888 * github.com:scylladb/scylladb: test: test_tablets: add a test for cleanup after migration test: pylib: add ScyllaCluster.wipe_sstables test: boost: add commitlog_cleanup_test db: commitlog_replayer: ignore mutations affected by (tablet) cleanups replica: table: garbage-collect irrelevant system.commitlog_cleanups records db: commitlog: add min_position() replica: table: populate system.commitlog_cleanups on tablet cleanup db: system_keyspace: add system.commitlog_cleanups replica: table: refresh compound sstable set after tablet cleanup	2024-01-25 20:51:03 +02:00
Patryk Wrobel	a858daf038	service/client_state.cc: remove redundant copying db::schema_tables::all_table_names() returns std::vector<sstring>. Usage of range-for loop without reference results in copying each of the elements of the traversed container. Such copying is redundant. This change introduces usage of const reference to avoid copying. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16983	2024-01-25 20:35:05 +02:00
Kamil Braun	543ad0987a	Merge 'raft topology: send barrier_and_drain to a decommissioning node' from Patryk Jędrzejczak We didn't send the `barrier_and_drain` command to a decommissioning node that could still be coordinating requests. It could happen that a decommissioning node sent a request with an old topology version after normal nodes received the new fence version. Then, the request would fail on replicas with the stale topology exception. This PR fixes this problem by modifying `exec_global_command`. From now on, it sends `barrier_and_drain` to a decommissioning node. We also stop filtering stale topology exceptions in `test_topology_ops`. We added this filter after detecting the bug fixed by this PR. Fixes scylladb/scylladb#15804 Fixes scylladb/scylladb#16579 Fixes scylladb/scylladb#16642 Closes scylladb/scylladb#16797 * github.com:scylladb/scylladb: test: test_topology_ops: remove failed mutations filter raft topology: send barrier_and_drain to a decommissioning node raft topology: ensure at most one transitioning node	2024-01-25 16:09:02 +01:00
Kefu Chai	ee28cf2285	test.py: s/defalt/default/ this typo was identified by codespell Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16980	2024-01-25 16:54:07 +02:00
Botond Dénes	6d5ee6d48a	Merge 'test/nodetool: run nodetool tests using "unshare"' from Kefu Chai before this change, we use a random address when launching rest_api_mock server, but there are chances that the randomly picked address conflicts with an already-used address on the host. and the subprocess fails right away with the returncode of 1 upon this failure, but we just continue on and check the readiness of the already-dead server. actually, we've seen test failures caused by the EADDRINUSE failure, and when we checked the readiness of the rest_api_mock by sending HTTP request and reading the response, what we had is not a JSON encoded response but a webpage, which was likely the one returned by a minio server. in this change, we * specify the "launcher" option of nodetool test suite to "unshare", so that all its tests are launched in separated namespaces. * do not use a random address for the mock server, as the network namespaces are separated. Fixes #16542 Closes scylladb/scylladb#16773 * github.com:scylladb/scylladb: test/nodetool: run nodetool tests using "unshare" test.py: add "launcher" option support	2024-01-25 16:53:49 +02:00
Mikołaj Grzebieluch	763911af5b	test.py: add test for maintenance mode The test checks that in maintenance mode server A is not available for other nodes and for clients. It is possible to connect by the maintenance socket to server A and perform local CQL operations.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	ca35e352f5	test.py: generalize usage of cluster_con Add option to pass load_balancing policy. Change hosts type to list of IPs or cassandra.Endpoint.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	77a656bfd6	test.py: when connecting to node in maintenance mode use maintenance socket A node in the maintenance socket hasn't an opened regular CQL port. To connect to the node, the scylla cluster needs to use the node's maintenance socket.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	9c07a189e8	docs: add maintenance mode documentation	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	0bdbd6e8f5	main: add maintenance mode In maintenance mode: * Group0 doesn't start and the node doesn't join the token ring to behave as a dead node to others, * Group0 operations are disabled and result in an error, * Only the maintenance socket listens for CQL requests, * The storage service initialises token_metadata with the local node as the only node on the token ring. Maintenance mode is enabled by passing the --maintenance-mode flag. Maintenance mode starts before the group0 is initialised.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	617adde9c9	main: move some REST routes initialization before joining group0 Move REST endpoints that don't need connection with other nodes, before joining the group0. This way, they can be initialized in the maintenance mode. Move `snapshot_ctl` along with routes because of snapshots API and tasks API. Its constructor is a noop, so it is safe to move it.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	d8de209dcf	message_service: add sanity check that rpc connections are not created in the maintenance mode In maintenance mode, a node shouldn't be able to communicate with other nodes. To make sure this does not happen, the sanity check is added.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	c08266cfe5	raft_group0_client: disable group0 operations in the maintenance mode In maintenance mode, the node doesn't communicate with other nodes, so it doesn't start or apply group0 operations. Users can still try to start it, e.g. change the schema, and the node can't allow it. Init _upgrade_state with recovery in the maintenance mode. Throw an error if the group0 operation is started in maintenance mode.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	97641f646a	service/storage_service: add start_maintenance_mode() method In the maintenance mode, other nodes won't be available thus we disabled joining the token ring and the token metadata won't be populated with the local node's endpoint. When a CQL query is executed it checks the `token_metadata` structure and fails if it is empty. Add a method that initialises `token_metadata` with the local node as the only node in the token ring.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	c530756837	storage_service: add MAINTENANCE option to mode enum join_cluster and start_maintenance_mode are incompatible. To make sure that only one is called when the node starts, add the MAINTENANCE option. start_maintenance_mode sets _operation_mode to MAINTENANCE. join_cluster sets _operation_mode to STARTING. set_mode will result in an internal error if: * it tries to set MAINTENANCE mode when the _operation_mode is other than NONE, i.e. start_maintenance_mode is called after join_cluster (or it is called during the drain, but it also shouldn't happen). * it tries to set STARTING mode when the mode is set to MAINTENANCE, i.e. join_cluster is called after start_maintenance_mode.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	d4c22fc86c	service/maintenance_mode: add maintenance_mode_enabled bool class	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	8b2f0e38d9	service/maintenance_mode: move maintenance_socket_enabled definition to seperate file	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	e6a83b9819	db/config: add maintenance mode flag	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	81ef9fc91e	docs: add cqlsh usage to maintenance socket documentation After https://github.com/scylladb/scylla-cqlsh/pull/67, the user can use cqlsh to connect to the node by maintenance socket.	2024-01-25 15:27:53 +01:00
Botond Dénes	c67698ea06	compaction/compaction_manager: perform_cleanup(): hold the compaction gate While the cleanup is ongoing. Otherwise, a concurrent table drop might trigger a use-after-free, as we have seen in dtests recently. Fixes: #16770 Closes scylladb/scylladb#16874	2024-01-25 14:52:50 +01:00
Mikołaj Grzebieluch	2c34d9fcd8	docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy After https://github.com/scylladb/python-driver/pull/287, the user can use WhiteListRoundRobinPolicy to connect to the node by maintenance socket.	2024-01-25 14:52:24 +01:00
Pavel Emelyanov	bf3cae4992	Merge 'tests: utils: error injection: print time duration instead of count' from Kefu Chai before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. the tests are updated to print the time duration instead of count to provide information with a higher resolution. Fixes #15902 Closes scylladb/scylladb#16264 * github.com:scylladb/scylladb: tests: utils: error injection: print time duration instead of count error_injection: do not cast to milliseconds when injecting timeout	2024-01-25 16:13:27 +03:00
Avi Kivity	69d597075a	Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec New tablet replicas are allocated and rebuilt synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet rebuilding transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes https://github.com/scylladb/scylladb/issues/16690. Closes scylladb/scylladb#16894 * github.com:scylladb/scylladb: tests: tablets: Add tests for removenode and replace tablets: Add support for removenode and replace handling topology_coordinator: tablets: Do not fail in a tight loop topology_coordinator: tablets: Avoid warnings about ignored failured future storage_service, topology: Track excluded state in locator::topology raft topology: Introduce param-less topology::get_excluded_nodes() raft topology: Move get_excluded_nodes() to topology tablets: load_balancer: Generalize load tracking tablets: Introduce get_migration_streaming_info() which works on migration request tablets: Move migration_to_transition_info() to tablets.hh tablets: Extract get_new_replicas() which works on migraiton request tablets: Move tablet_migration_info to tablets.hh tablets: Store transition kind per tablet	2024-01-25 14:49:43 +02:00
Patryk Jędrzejczak	b348014745	test: test_topology_ops: remove failed mutations filter We added this filter after detecting a bug in the Raft-based topology. We weren't sending `barrier_and_drain` commands to a decommissioning node that could still be coordinating requests. It could cause stale topology exceptions on replicas if the decommissioning node sent a request with an old topology version after normal nodes received the new fence version. This bug has been fixed in the previous commit, so we remove the filter.	2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak	9aebd6dd96	raft topology: send barrier_and_drain to a decommissioning node Before this patch, we didn't send the `barrier_and_drain` command to a decommissioning node that could still be coordinating requests. It could happen that a decommissioning node sent a request with an old topology version after normal nodes received the new fence version. Then, the request would fail on replicas with the stale topology exception. We fix this problem by modifying `exec_global_command`. From now on, it sends `barrier_and_drain` to a decommissioning node, which can also be in the `left_token_ring` state.	2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak	378cbd0b70	raft topology: ensure at most one transitioning node We add a sanity check to ensure at most one transitioning node at a time. If there is more, something must have gone wrong. In the future, we might implement concurrent topology operations. Then, we will remove this sanity check. We also extend the comment describing `transition_nodes` so that it better explains why we use a map and how it should be handled.	2024-01-25 13:42:46 +01:00
Alexander Turetskiy	c1ae5425f7	DROP TYPE IF EXISTS should work on non-existent keyspace DROP TYPE IF EXISTS should pass and do nothing on non-existent keyspace fixes #9082 Closes scylladb/scylladb#16504	2024-01-25 14:28:43 +02:00
Kefu Chai	b1431f08f7	test/nodetool: run nodetool tests using "unshare" before this change, we use a random address when launching rest_api_mock server, but there are chances that the randomly picked address conflicts with an already-used address on the host. and the subprocess fails right away with the returncode of 1 upon this failure, but we just continue on and check the readiness of the already-dead server. actually, we've seen test failures caused by the EADDRINUSE failure, and when we checked the readiness of the rest_api_mock by sending HTTP request and reading the response, what we had is not a JSON encoded response but a webpage, which was likely the one returned by a minio server. in this change, we * specify the "launcher" option of nodetool test suite to "unshare", so that all its tests are launched in separated namespaces. * use a random fixed address for the mock server, as the network namespaces are not shared anymore * add an option in `nodetool/conftest.py`, so that it can optionally setup the lo network interface when it is launched in a separated new network namespace. Fixes #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 20:28:36 +08:00
Kefu Chai	35b3c51f40	test.py: add "launcher" option support before this change, all "tool" test suites use "pytest" to launch their tests. but some of the tests might need a dedicated namespace so they do not interfere with each other. fortunately, "unshare(1)" allows us to run a progame in new namespaces. in this change, we add a "launcher" option to "tool" test suites. so that these tests can run with the specified "launcher" instead of using "launcher". if "launcher" is not specified, its default value of "pytest" is used. Refs #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 20:28:01 +08:00
Kurashkin Nikita	d90eeb5c4f	cql3:statement_restrictions.cc: multi-column relation null check Before this patch we received internal server error "Attempted to create key component from empty optional" when used null in multi-column relations. This patch adds a null check for each element of each tuple in the expression and generates an invalid request error if it finds such an element. Modified cassandra test and added a new one that checks the occurrence of null values in tuples. Added a test that checks whether the wrong number of items is entered in tuples. Fixes #13217 Closes scylladb/scylladb#16415	2024-01-25 14:17:43 +02:00
Botond Dénes	5df4ad2e48	test/cql-pytest: test_tools.py: fix flaky schema load failure test The test TestScyllaSsstableSchemaLoading.test_fail_schema_autodetect was observed to be flaky. Sometimes failing on local setups, but not in CI. As it turns out, this is because, when run via test.py, the test's working directory is root directory of scylla.git. In this case, scylla-sstable will find and read conf/scylla.yaml. After having done so, it will try look in the default data directory (/var/lib/scylla/data) for the schema tables. If the local machine happens to have a scylla data-dir setup at the above mentioned location, it will read the schema tables and will succeed to find the tested table (which is system table, so it is always present). This will fail the test, as the test expects the opposite -- the table not being found. The solution is to change the test's working directory to the random temporary work dir, so that the local environment doesn't interfere with it. Fixes: #16828 Closes scylladb/scylladb#16837	2024-01-25 15:14:16 +03:00
Botond Dénes	b341aa8f6d	Merge 'api/api.hh: improve usage of standard containers' from Patryk Wróbel This PR contains improvements related to usage of std::vector and looping over containers in the range-for loop. It is advised to use `std::vector::reserve()` to avoid unneeded memory allocations when the total size is known beforehand. When looping over a container that stores non-trivial types usage of const reference is advised to avoid redundant copies. Closes scylladb/scylladb#16978 * github.com:scylladb/scylladb: api/api.hh: use const reference when looping over container api/api.hh: use std::vector::reserve() when the total size is known	2024-01-25 13:22:48 +02:00
Kamil Braun	994a2ea5c3	Merge 'Call left/joined notifiers when topology coordinator is enabled' from Gleb The gossiper topology change code calls left/joined notifiers when a node leave or joins the cluster. This code it not executed in topology coordinator mode, so the coordinator needs to call those notifiers by itself. The series add the calls. Fixes scylladb/scylladb#15841 * 'gleb/raft-topo-notifications-v1' of github.com:scylladb/scylla-dev: storage service: topology coordinator: call notify_joined() when a node joins a cluster storage service: topology coordinator: call notify_left() when a node leaves a cluster storage_service: drop redundant check from notify_joined()	2024-01-25 12:12:53 +01:00
Kefu Chai	1d33a68dd7	tests: utils: error injection: print time duration instead of count instead of casting / comparing the count of duration unit, let's just compare the durations, so that boost.test is able to print the duration in a more informative and user friendly way (line wrapped) test/boost/error_injection_test.cc(167): fatal error: in "test_inject_future_disabled": critical check wait_time > sleep_msec has failed [23839ns <= 10ms] Refs #15902 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 19:10:24 +08:00
Kefu Chai	8a5689e7a7	error_injection: do not cast to milliseconds when injecting timeout before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. Fixes #15902 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 19:10:24 +08:00
Gleb Natapov	adf70aae15	storage service: topology coordinator: call notify_joined() when a node joins a cluster When the topology coordinator is used for topology changes the gossiper based code that calls notify_joined() is not called. The coordinator needs to call it itself. But it needs to call it only once when node becomes normal. For that the patch changes state loading code to remember the old set of nodes in normal state to check if a node that is normal after new state is loaded was not in the normal state before.	2024-01-25 12:28:08 +02:00
Botond Dénes	c9f247f3e8	Merge 'sstables: writer: don't block topology changes while writing sstables' from Avi Kivity The sstable writer held the effective_replication_map_ptr while writing sstables, which is both a layering violation and slows down tablet load balancing. It was needed in order to ensure the sharder was stable. But it turns out that sharding metadata is unnecessary for tablets, so just skip the whole thing when writing an sstable for tablets. Closes scylladb/scylladb#16953 * github.com:scylladb/scylladb: sstables: writer: don't require effective_replication_map for sharding metadata schema: provide method to get sharder, iff it is static	2024-01-25 12:12:01 +02:00
Botond Dénes	8e82df6fb6	Merge 'coverage libraries: bug fixes' from Eliran Sinvani This mini-series contains two bug fixes that were found as part of testing coverage reporting in CI: ref: https://github.com/scylladb/scylladb/pull/16895 1. The html-fixup which is triggered when using:`test/pylib/coverage_utils.py lcov-tools genhtml...` rendered incorrect links for multiple links in the same line. 2. For files that contined `,` in their name the output was simply wrong and resulted in lcov not being able to find such files for the purpose of filtering or generating reports. The aforementioned draft PR served as a testing bed for finding and fixing those bugs. Closes scylladb/scylladb#16977 * github.com:scylladb/scylladb: lcov_utils.py: support sourcefiles that contains commas in their name coreage_utils.py: make regular expression lazy in html-fixup	2024-01-25 11:46:15 +02:00
Kefu Chai	0fbfc96619	db: add formatter for schema_tables::table_kind before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for db::schema_tables::table_kind, and its operator<<() is still used by the homebrew generic formatter for std::map<>, so it is preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16972	2024-01-25 11:33:13 +03:00
Kefu Chai	ffb5ad494f	api: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16973	2024-01-25 11:28:02 +03:00
Patryk Wrobel	cdfe0c1c35	api/api.hh: use const reference when looping over container When reference is not used in the range-for loop, then each element of a container is copied. Such copying is not a problem for scalar types. However, the in case of non-trivial types it may cause unneeded overhead. This change replaces copying with const references to avoid copying of types like seastar::sstring etc. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-25 09:20:35 +01:00
Patryk Wrobel	1ca71f2532	api/api.hh: use std::vector::reserve() when the total size is known When growing via push_back(), std::vector may need to reallocate its internal block of memory due to not enough space. It is advised to allocate the required space before appending elements if the size is known beforehand. This change introduces usage of std::vector::reserve() in api.hh to ensure that push_back() does not cause reallocations. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-25 08:50:19 +01:00
Eliran Sinvani	d27283918f	lcov_utils.py: support sourcefiles that contains commas in their name As part of the parsing, every line of an lcov file was modeled as INFO_TYPE:field[,field]... However specifically for info type "SF" which represents the source file there can only be one field. This caused files that are using ',' in their names to be cut down up to the first ',' and as a results not handled correctly by lcov_utils.py especially when rewriting a file. This patch adds a special handling for the "SF" INFO_TYPE. ref : `man geninfo` Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-25 09:30:52 +02:00
Eliran Sinvani	11eb9f5bb2	coreage_utils.py: make regular expression lazy in html-fixup The html-fixup procedure was created because of a bug in genhtml (`man genhtml` for details about what genhtml is). The bug is that genhtml doesn't account for file names that contains illegal url characters (ref: https://stackoverflow.com/a/1547940/2669716). html-fixup converts those characters to the %<octet> notation (i.e space character becomes %20 etc..). However, the regular expression used to detect links was eager, which didn't account for multiple links in the same line. This was discovered during browsing one of the report and noticing that the links that are meant to alternate between code view and function view of a source got scrambled and unusable after html-fixup. This change makes the regex that is used to detect links lazy so it can handle multiple links in the same line in an html file correctly. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-25 09:30:42 +02:00
Nadav Har'El	69a68e35dd	Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables. This PR addresses both: * When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly. * When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`. Tests are also added. Fixes: #16492 Closes scylladb/scylladb#16517 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI test/cql-pytest: test_tools.py: extract some fixture logic to functions test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas test/boost/schema_loader_test: add test for mvs and indexes tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL replica/database: extract existing_index_names and get_available_index_name tools/schema_loader: make real_db.tables the only source of truth on existing tables tools/schema_loader: table(): store const keyspace& tools/schema_loader: make database,keyspace,table non-movable cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value cql3/statements/create_index_statement: make build_index_schema() public cql3/statements/create_index_statement: relax some method's dependence on qp cql3/statements/create_view_statement: make prepare_view() public	2024-01-24 23:36:54 +02:00
Nadav Har'El	df6c9828ef	Merge 'Add protobuf and Native histogram support' from Amnon Heiman Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. Native histograms hold the benefits of high resolution at a lower resource cost. This series allows sending histograms in a native histogram format over protobuf. By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf. Fixes #12931 Closes scylladb/scylladb#16737 * github.com:scylladb/scylladb: main.cc: Add prometheus_allow_protobuf command line histogram_metrics_helper: support native histogram config: Add prometheus_allow_protobuf flag	2024-01-24 21:24:50 +02:00
Michał Chojnowski	f0eadc734e	test: test_tablets: add a test for cleanup after migration Reproduces the problems fixed by earlier commits in the series.	2024-01-24 19:36:29 +01:00
Botond Dénes	7bb3ed7f23	docs/operating-scylla: scylla-sstable.rst: fix checksum list Add empty line before list of different checksums in validate-checksums's description. Otherwise the list is not rendered. Closes scylladb/scylladb#16401	2024-01-24 16:34:13 +01:00
Kefu Chai	a9851cf834	test.py: replace "$foo is False" with "not $foo" for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16960	2024-01-24 15:21:53 +02:00
Kefu Chai	add74ec8ee	mutation_writer: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16958	2024-01-24 15:20:02 +02:00
Kefu Chai	c978d1b3f8	config: s/re-use/reuse/ this misspelling is identified by codespell. per m-w, reuse is a word per-se, and we don't need the hyphen for addressing the ambiguity in the use cases, like, recover and re-cover. see also https://www.merriam-webster.com/dictionary/reuse Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16962	2024-01-24 15:19:03 +02:00
Kefu Chai	8c39aba820	tools/scylla-sstable: use canonical path for sst_path we deduce the paths to other SSTable components from the one specified from the command line, for instance, if /a/b/c/me-really-big-Data.db is fed to `scylla sstable`, the tool would try to read /a/b/c/me-really-big-TOC.txt for the list of other components. this works fine if the full path is specified in the command line. but if a relative path is specified, like, "me-really-big-Data.db", this does not work anymore. before this change, the tool would be reading "/me-really-big-TOC.txt", which does not exist under most circumstances. while $PWD/me-really-big-TOC.txt should exist if the SSTable is sane. after this change, we always convert the specified path to its canonical representation, no matter it is relative or absolutate. this enables us to get the correct parent path path when trying to read, for instance, the TOC component. Fixes #16955 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16964	2024-01-24 13:28:40 +02:00
Michał Chojnowski	b88a0eb9ab	test: pylib: add ScyllaCluster.wipe_sstables Add a method which wipes sstables files for a particular table on a particular stopped node.	2024-01-24 11:52:49 +01:00
Michał Chojnowski	94cdfcaa94	test: boost: add commitlog_cleanup_test Adds a test for the commitlog cleanup functionality added earlier in the series.	2024-01-24 10:37:39 +01:00
Michał Chojnowski	a246bb39ef	db: commitlog_replayer: ignore mutations affected by (tablet) cleanups To avoid data resurrection, mutations deleted by cleanup operations have to be skipped during commitlog replay. This patch implements this, based on the metadata recorded on cleanup operations into system.commitlog_cleanups.	2024-01-24 10:37:39 +01:00
Michał Chojnowski	f458a1bf3e	replica: table: garbage-collect irrelevant system.commitlog_cleanups records Currently, rows in system.commitlog_cleanups are only dropped on node restart, so the table can accumulate an unbounded number of records. This probably isn't a problem in practice, because tablet cleanups aren't that frequent, but this patch adds a countermeasure anyway. This patch makes the choice to delete the unneeded records right when new records are added. This isn't ideal -- it would be more natural if the unneeded records were deleted as soon as they become unneeded -- but it does the job with a minimal amount of code.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	05ff32ebf9	db: commitlog: add min_position() Add a helper function which returns the minimum replay position across all existing or future commitlog segments. Only positions greater or equal to it can be replayed on the next reboot. We will use this helper in a future patch to garbage collect some cleanup metadata which refers to replay positions.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	a10650959c	replica: table: populate system.commitlog_cleanups on tablet cleanup To avoid data resurrection after cleanup, we have to filter out the cleaned mutations during commitlog replay. In this patch, we get tablet cleanup to record the affected set of mutations to system.commitlog_cleanups. In a later patch, we will use these records for filtering during commitlog replay.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	7c5a8894be	db: system_keyspace: add system.commitlog_cleanups Add a system table which will hold records of cleanup operations, for the purpose of filtering commitlog replays to avoid data resurrections.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	8bfd078c54	replica: table: refresh compound sstable set after tablet cleanup If the compound set isn't refreshed, readers will keep seeing the dataset as it was before the cleanup, which is a bug.	2024-01-24 10:37:38 +01:00
Kefu Chai	207fe93b90	utils: add formatter for rjson::value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for rjson::value, and drop its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16956	2024-01-24 10:30:52 +02:00
Gleb Natapov	b97ff54a41	storage service: topology coordinator: call notify_left() when a node leaves a cluster When the topology coordinator is used for topology changes the gossiper based code that calls notify_left() is not called. The coordinator needs to call it itself.	2024-01-24 10:21:01 +02:00
Gleb Natapov	5459a8b9a5	storage_service: drop redundant check from notify_joined() notify_joined() is called from handle_state_normal only, so there is no point checking that the state is normal inside the function as well.	2024-01-24 10:17:12 +02:00
Avi Kivity	8ee75ae8f4	sstables: writer: don't require effective_replication_map for sharding metadata Currently, we pass an effective_replication_map_ptr to sstable_writer, so that we can get a stable dht::sharder for writing the sharding metadata. This is needed because with tablets, the sharder can change dynamically. However, this is both bad and unnecessary: - bad: holding on to an effective_replication_map_ptr is a barrier for topology operations, preventing tablet migrations (etc) while an sstable is being written - unnecessary: tablets don't require sharding metadata at all, since two tablets cannot overlap (unlike two sstables from different shards in the same node). So the first/last key is sufficient to determine the shard/tablet ownership. Given that, just pass the sharder for vnode sstables, and don't generate sharding metadata for tablet sstables.	2024-01-23 22:23:08 +02:00
Avi Kivity	b88f422a53	schema: provide method to get sharder, iff it is static The current get_sharder() method only allows getting a static sharder (since a dynamic sharder needs additional protection). However, it chooses to abort if someone attempt to get a dynamic sharder. In one case, it's useful to get a sharder only if it's static, so provide a method to do that. This is for providing sstable sharding metadata, which isn't useful with tablets.	2024-01-23 22:20:59 +02:00
Kamil Braun	05643208a8	Merge 'raft topology: move the topology coordinator to a dedicated file' from Piotr Dulikowski The `topology_coordinator` is a large class (>1000 loc) which resides in an even larger source file (storage_service.cc, ~7800 loc). This PR moves the topology_coordinator class out of the storage_service.cc file in order to improve modularity and recompilation times during development. As a first step, the `topology_mutation_builder` and `topology_node_mutation_builder` classes are also moved from storage_service.cc to their own, new header/source files as they are an important abstraction used both by the topology coordinator code and some other code in storage_service.cc that won't be moved. Then, the `topology_coordinator` is moved out. The `topology_coordinator` class is completely hidden in the new topology_coordinator.cc file and can only be started and waited on to finish via the new `run_topology_coordinator` function. Fixes: scylladb/scylladb#16605 Closes scylladb/scylladb#16609 * github.com:scylladb/scylladb: service: move topology coordinator to a separate file storage_service: introduce run_topology_coordinator function service: move topology mutation builder out of storage_service storage_service: detemplate topology_node_mutation_builder::set	2024-01-23 20:02:06 +01:00
Kefu Chai	f86a5ae87a	streaming: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16947	2024-01-23 19:38:30 +02:00
Kefu Chai	d493f949ca	cql3: add formatter for cql3::statements::statement_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cql3::statements::statement_type. and its operator<<() is dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16948	2024-01-23 19:36:24 +02:00
Piotr Dulikowski	c3c3f5c1c8	service: move topology coordinator to a separate file The topology coordinator is a large class that sits in an even larger storage_service.cc file. For the sake of code modularization and reducing recompilation time, move the topology coordinator outside storage_service.cc. The topology_coordinator class is moved to the new topology_coordinator.cc unchanged. Along with it, the following items are moved: - wait_for_ip function - it's used both by storage_service and topology_coordinator, so in order for the new topology_coordinator.cc not to depend on storage service, it is moved to the new file, - raft_topology logger - for the same reason as wait_for_ip, - run_topology_coordinator - serves as the main interface for the topology coordinator. The topology coordinator class is not exposed at all, it's only possible to start the coordinator and wait until it shuts down itself via that function.	2024-01-23 17:51:10 +01:00
Avi Kivity	4a57b67634	docs: add a rough diagram of module interaction It is incomplete and maybe inaccurate, but it is a start. Closes scylladb/scylladb#16903	2024-01-23 18:08:48 +02:00
Kamil Braun	1824c12975	raft: remove `empty()` from `fsm_output` Nobody remembered to keep this function up to date when adding stuff to `fsm_output`. Turns out that it's not being used by any Raft logic but only in some tests. That use case can now be replaced with `fsm::has_output()` which is also being used by `raft::server` code.	2024-01-23 16:48:28 +01:00
Kamil Braun	bf6d5309ca	test: add test for manual triggering of Raft snapshots	2024-01-23 16:48:28 +01:00
Kamil Braun	617e09137d	api: add HTTP endpoint to trigger Raft snapshots This uses the `trigger_snapshot()` API added in previous commit on a server running for the given Raft group. It can be used for example in tests or in the context of disaster recovery (ref scylladb/scylladb#16683).	2024-01-23 16:48:28 +01:00
Kamil Braun	0eda7a2619	raft: server: add `trigger_snapshot` API This allows the user of `raft::server` to ask it to create a snapshot and truncate the Raft log. In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables).	2024-01-23 16:48:28 +01:00
David Garcia	77822fc51d	chore: add azure and gcp images extensions Closes scylladb/scylladb#16942	2024-01-23 16:06:40 +02:00
Botond Dénes	e79ea91990	Merge 'Extend query tracing information' from Michał Jadwiszczak This little patch adds: - authenticated user to "Processing a statement" tracing log - name of a semaphore to reader concurrency semaphore logs The purpose of this patch is to be able to verify parts of query execution to track down issues with service levels. ``` cassandra@cqlsh> select * from ks1.t where a = 1; a \| b ---+--- (0 rows) Tracing session: ea7e5ce0-b9f5-11ee-b123-b0816809f2c0 activity \| timestamp \| source \| source_elapsed \| client ----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2024-01-23 14:47:14.734000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 1/sl:sl1] \| 2024-01-23 14:47:14.734126 \| 127.0.0.1 \| 3 \| 127.0.0.1 Processing a statement for authenticated user: cassandra [shard 1/sl:sl1] \| 2024-01-23 14:47:14.734279 \| 127.0.0.1 \| 156 \| 127.0.0.1 Creating read executor for token -4069959284402364209 with all: {127.0.0.2} targets: {127.0.0.2} repair decision: NONE [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737348 \| 127.0.0.1 \| 3225 \| 127.0.0.1 Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737351 \| 127.0.0.1 \| 3228 \| 127.0.0.1 read_data: sending a message to /127.0.0.2 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737358 \| 127.0.0.1 \| 3236 \| 127.0.0.1 read_data: message received from /127.0.0.1 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737593 \| 127.0.0.2 \| 16 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, 000400000001}} [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737676 \| 127.0.0.2 \| 24 \| 127.0.0.1 [reader concurrency semaphore sl:sl1] admitted immediately [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737684 \| 127.0.0.2 \| 31 \| 127.0.0.1 [reader concurrency semaphore sl:sl1] executing read [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737688 \| 127.0.0.2 \| 35 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, 000400000001}} and slice {(-inf, +inf)} [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737715 \| 127.0.0.2 \| 63 \| 127.0.0.1 Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s) [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737724 \| 127.0.0.2 \| 72 \| 127.0.0.1 Querying is done [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737731 \| 127.0.0.2 \| 79 \| 127.0.0.1 read_data handling is done, sending a response to /127.0.0.1 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.738321 \| 127.0.0.2 \| 743 \| 127.0.0.1 read_data: got response from /127.0.0.2 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.739148 \| 127.0.0.1 \| 5026 \| 127.0.0.1 Done processing - preparing a result [shard 1/sl:sl1] \| 2024-01-23 14:47:14.739196 \| 127.0.0.1 \| 5074 \| 127.0.0.1 Request complete \| 2024-01-23 14:47:14.739087 \| 127.0.0.1 \| 5087 \| 127.0.0.1 ``` Closes scylladb/scylladb#16920 * github.com:scylladb/scylladb: reader_concurrency_semaphore: add name of semaphore in tracing messages cql3:query_processor: add logged user to query tracing info	2024-01-23 16:06:16 +02:00
Piotr Dulikowski	4ad6b6563b	storage_service: introduce run_topology_coordinator function Extracts a part of the logic of the raft_state_monitor_fiber method into a separate function. It will be moved to a separate file in the next commit along with the topology coordinator, and will serve as the only way of interaction with the topology coordinator while the class itself will remain hidden. The topology_coordinator class is now directly constructed on the stack (or rather in the coroutine frame), the indirection via shared_ptr is no longer needed.	2024-01-23 14:09:12 +01:00
Patryk Wrobel	f15880dc48	compaction_group::stop(): always call compaction_manager.remove() Before introduction of PR#15524 the removal had always been invoked via finally() continuation. In spite of making flush() noexcept, the mentioned PR modified the logic. If flush() returns exceptional future, then the removal is not performed. This change restores the old behavior - removal operation is always called. Since now, the logic of compaction_group::stop() is as follows: - firstly, it waits for completion of flush() via seastar::coroutine::as_future() to avoid premature exception - then it executes compaction_manager.remove() - in the end it inspects the future returned from flush() to re-throw the exception if the operation failed Fixed: scylladb#16751 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16940	2024-01-23 14:56:27 +02:00
Botond Dénes	78ec96f5f3	Merge 'alternator: allow empty tag value' from Nadav Har'El Alternator incorrectly refuses an empty tag value for TagResource, but DynamoDB does allow this case and it's useful (note that an empty tag key is rightly forbidden). So this short series fixes this case, and adds additional tests for TagResource which covers this case and other cases we forgot to cover in tests. Fixes #16904. Closes scylladb/scylladb#16910 * github.com:scylladb/scylladb: test/alternator: add more tests for TagResource alternator: allow empty tag value	2024-01-23 13:53:30 +02:00
Botond Dénes	26d814d8be	Merge 'Configure initial tablets count scaling' from Pavel Emelyanov There are currently two options how to "request" the number of initial tables for a table 1. specify it explicitly when creating a keyspace 2. let scylla calculate it on its own Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly. Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient. Closes scylladb/scylladb#16919 * github.com:scylladb/scylladb: config: Add --tablets-initial-scale-factor tablet_allocator: Add initial tablets scale to config tablet_allocator: Add config	2024-01-23 13:25:12 +02:00
Amnon Heiman	50b3078916	main.cc: Add prometheus_allow_protobuf command line This patch add the prometheus_allow_protobuf command line support. When set to true, Prometheus will accept protobuf requests and will reply with protobuf protocol. This will also enable the experimental Prometheus Native Histograms. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:34 +02:00
Amnon Heiman	95d1146fea	histogram_metrics_helper: support native histogram approx_exponential_histogram uses similar logic to Prometheus native histogram, to allow Prometheus sending its data in a native histogram format it needs to report schema and min id (id of the first bucket). This patch update to_metrics_histogram to set those optional parameters, leaving it to the Prometheus to decide in what format the histogram will be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:34 +02:00
Amnon Heiman	fc9bd2de03	config: Add prometheus_allow_protobuf flag Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. The prometheus_allow_protobuf flag allows the user to enable protobuf protocol. When this flag is set to true, and the Prometheus server sends in the request that it accepts protobuf, the result will be in protobuf protocol. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:07 +02:00
Piotr Dulikowski	79c3ed7fdb	service: move topology mutation builder out of storage_service The topology_mutation_builder, topology_node_mutation_builder and topology_request_tracking_mutation_builder are currently used by storage service - mainly, but not exclusively, by the topology coordinator logic. As we are going to extract the topology coordinator to a separate file, we need to move the builders to their own file as well so that they will be accessible both by the topology coordinator and the storage service.	2024-01-23 11:17:46 +01:00
Piotr Dulikowski	6f11651222	storage_service: detemplate topology_node_mutation_builder::set One of the overloads of `topology_node_mutation_builder::set` is a template which takes a std::set of things that convert to a sstring. This was done to support sets of strings of different types (e.g. sstring, string_view) but it turns out that only sstring is used at the moment. De-template the method as it is unnecessary for it to be a template. Moreover, the `topology_node_mutation_builder` is going to be moved in the next commit of the PR to a separate file, so not having template methods makes the task simpler.	2024-01-23 11:17:46 +01:00
Nadav Har'El	830e52008d	test/alternator: add more tests for TagResource Issue #16904 discovered that Alternator refuses to allow an empty tag value while it's useful (and DynamoDB allows it). This brought to my attention that our test coverage of the TagResource operation was lacking. So this patch adds more tests for some corner cases of TagResource which we missed, including the allowed lengths of tag keys and values. These tests reproduce #16904 (the case of empty tag value) and also #16908 (allowing and correctly counting unicode letters), and also add regression testing to cases which we already handled correctly. As usual, all the new tests also pass on DynamoDB. Refs #16904 Refs #16908 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 11:55:22 +02:00
Nadav Har'El	08b26269d8	alternator: allow empty tag value The existing code incorrectly forbid setting a tag on a table to an empty string value, but this is allowed by DynamoDB and is useful, so we fix it in this patch. While at it, improve the error-checking code for tag parameters to cleanly detect more cases (like missing or non-string keys or values). The following patch is a test that fails before this patch (because it fails to insert a tag with an empty value) and passes after it. Fixes #16904. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 11:26:08 +02:00
Michał Jadwiszczak	49544c47a1	reader_concurrency_semaphore: add name of semaphore in tracing messages	2024-01-23 10:25:34 +01:00
Michał Jadwiszczak	aac90c1f92	cql3:query_processor: add logged user to query tracing info	2024-01-23 10:25:34 +01:00
Nadav Har'El	4d6b286345	test/alternator: add "--vnodes" option to run script test/cql-pytest/run.py was recently modified to add the "tablets" experimental feature, so test/alternator/run now also runs Scylla by default with tablets enabled. This is the correct default going forward, but in the short term it would be nice to also have an option to easily do a manual test run without tablets. So this patch adds a "--vnodes" option to the test/alternator/run script. This option causes "run" to run Scylla without enabling the "tablets" experimental feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 10:53:23 +02:00
Nadav Har'El	c496d60716	alternator: use tablets by default, if available Before this patch, Alternator tables did not use tablets even if this feature was available - tablets had to be manually enabled per table by using a tag. But recently we changed CQL to enable tablets by default on all keyspaces (when the experimental "tablets" option is turned on), so this patch does the same for Alternator tables: 1. When the "tablets" experimental feature is on, new Alternator tables will use tablets instead of vnodes. They will use the default choice of initial_tablets. 2. The same tag that in the past could be used to enable tablets on a specific table, now can be used to disable tablets or change the default initial_tablets for a specific table at creation time. Fixes #16355 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 10:53:23 +02:00
Nadav Har'El	36f14f89df	test/alternator: run some tests without tablets If an Alternator table uses tablets (we'll turn this on in a following patch), some tests are known to fail because of features not yet supported with tablets, namely: Refs #16317 - Support Alternator Streams with tablets (CDC) Refs #16567 - Support Alternator TTL with tablets This patch changes all tests failing on tablets due to one of these two known issues to explicitly ask to disable tablets when creating their test table. This means that at least we continue to test these two features (Streams and TTL) even if they don't yet work with tablets. We'll need to remember to remove this override when tablet support for CDC and Alternator TTL arrives. I left a comment in the right places in the code with the relevant issue numbers, to remind us what to change when we fix those issues. This patch also adds xfail_tablets and skip_tablets fixtures that can be used to xfail or skip tests when running with tablets - but we don't use them yet - and may never use them, but since I already wrote this code it won't hurt having it, just in case. When running without tablets, or against an older Scylla or on DynamoDB, the tests with these marks are run normally. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 10:46:48 +02:00
Botond Dénes	08cf5ccd23	Merge 'Fix test_tablet_missing_data_repair' from Asias He This PR fixes test_tablet_missing_data_repair and enable the test again. If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail. Check nodes see each other as UP before repair. Closes scylladb/scylladb#16930 * github.com:scylladb/scylladb: test: Enable test_tablet_missing_data_repair again test: Wait for nodes to be up when repair test: Check repair status in ScyllaRESTAPIClient	2024-01-23 10:38:13 +02:00
Anna Stuchlik	9076a944c5	doc: improve the ScyllaDB for Developers page This commit improves the developer-oriented section of the core documentation: - Added links to the developer sections in the new Get Started guide (Develop with ScyllaDB and Tutorials and Example Projects) for ease of access. - Replaced the outdated Learn to Use ScyllaDB page with a link to the up-to-date page in the Get Started guide. This involves removing the learn.rst file and adding an appropriate redirection. - Removed the Apache Copyrights, as this page does not need it. - Removed the Features panel box as there was only one feature listed, which looked weird. Also, we are in the process of removing the Features section. Closes scylladb/scylladb#16800	2024-01-23 10:06:31 +02:00
Kefu Chai	ac473eca91	utils:: add formatter for enum_option before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for enum_option<>. since its operator<<() is still used by the homebrew generic formatter for formatting vector<>, operator<<() is preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16917	2024-01-23 10:03:51 +02:00
Kefu Chai	91a93b125b	utils:: add formatter for cql3::authorized_prepared_statements_cache_key before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cql3::authorized_prepared_statements_cache_key, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16924	2024-01-23 09:13:14 +02:00
Kefu Chai	76b9e4f4f4	locator: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16914	2024-01-23 09:12:23 +02:00
Asias He	99e3d2ce72	test: Enable test_tablet_missing_data_repair again Fixes #16859	2024-01-23 15:02:02 +08:00
Kefu Chai	db77587309	tracing: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16925	2024-01-23 08:57:11 +02:00
Kefu Chai	26004071b3	configure.py: reenable -Wnarrowing it seems that the tree builds just fine with this warning enabled. and narrowing is a potentially unsafe numeric conversion. so let's enable this warning option. this change also helps to reduce the difference between the rules generated by configure.py and those generated by CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16929	2024-01-23 08:49:25 +02:00
Kefu Chai	5005e0a156	configure.py: s/--std=/-std/ neither clang nor gcc supports the --std flag, they support -std= though. see https://clang.llvm.org/cxx_status.html and https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html so, let's use the -std=gnu++20 for the C++20 standard with GNU extensions. this change also helps to reduce the difference between the rules generated by `configure.py` and those generated by CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16928	2024-01-23 08:48:05 +02:00
Asias He	7c230f17cc	test: Wait for nodes to be up when repair If a node is not UP yet, repair in the test will be a partial repair. Check nodes see each other as UP before repair. Fixes #16859	2024-01-23 11:10:08 +08:00
Asias He	57a4e5594d	test: Check repair status in ScyllaRESTAPIClient Raise an exception in case the repair is not successful.	2024-01-23 11:10:08 +08:00
Tomasz Grabiec	06c42681bd	tests: tablets: Add tests for removenode and replace	2024-01-23 01:19:42 +01:00
Tomasz Grabiec	e5dcf03b88	tablets: Add support for removenode and replace handling New tablet replicas are allocated synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes #16690.	2024-01-23 01:19:42 +01:00
Tomasz Grabiec	bdd5bdae14	topology_coordinator: tablets: Do not fail in a tight loop If streaming or cleanup RPC fails, we would retry immediately. That fills the logs with erorrs. Throttle them by sleeping on error before the same action is retried.	2024-01-23 01:19:42 +01:00
Tomasz Grabiec	a3f6682ba2	topology_coordinator: tablets: Avoid warnings about ignored failured future	2024-01-23 01:18:10 +01:00
Tomasz Grabiec	5fccee3a13	storage_service, topology: Track excluded state in locator::topology Will be used by tablet load balancer to avoid excluded nodes in scheduling.	2024-01-23 01:12:58 +01:00
Tomasz Grabiec	d59db94f3c	raft topology: Introduce param-less topology::get_excluded_nodes() Picks up currently excluded nodes. Will be used during tablet rebuild on removenode.	2024-01-23 01:12:58 +01:00
Tomasz Grabiec	d053c5ef1e	raft topology: Move get_excluded_nodes() to topology Will be accessed outside topology coordinator from tablet rebuild handler.	2024-01-23 01:12:58 +01:00
Tomasz Grabiec	92f01674f2	tablets: load_balancer: Generalize load tracking This patch removes some duplication of logic and implicit assumptions by creating clear algebra for load impact calculation and its application to state of the load balancer. Will make adding new kinds of tablet transitions with different impact on load much easier.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	649ca0e46c	tablets: Introduce get_migration_streaming_info() which works on migration request Will be used by tablet load balancer to compute impact on load of planned migrations. Currently, the logic is hard coded in the load balancer and may get out of sync with the logic we have in get_migration_streaming_info() for already running tablet transitions. The logic will become more complex for rebuild transition, so use shared code to compute it.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	6dc56fd80b	tablets: Move migration_to_transition_info() to tablets.hh	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	1df256221c	tablets: Extract get_new_replicas() which works on migraiton request Now we have a single place which translates tablet migration request to new replicas. Will be reused in other places.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	ae382196f1	tablets: Move tablet_migration_info to tablets.hh Will add methods which operate on it to tablets.hh where they belong.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	4a06ffb43c	tablets: Store transition kind per tablet Will be used to distinguish regular migration from rebuild, repair and RF change.	2024-01-23 01:12:57 +01:00
Pavel Emelyanov	d1d4620af8	config: Add --tablets-initial-scale-factor Previous patch taught tablets allocator to multiply the initial tablets count by some value. This patch makes this factor configurable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:18:18 +03:00
Pavel Emelyanov	eb3b237e05	tablet_allocator: Add initial tablets scale to config When allocating tablets for table for the frist time their initial count is calculated so that each shard in a cluster gets one tablet. It may happen that more than one initial tablet per shard is better, e.g. perf tests typically rely on that. It's possible to specify the initial tablets count when creating a keyspace, this number doesn't take the cluster topology into consideration and may also be not very nice. As a temporary solution (e.g. for perf tests) we may add a configurable that scales the initial number of calculated tablets by some factor Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:14:45 +03:00
Pavel Emelyanov	f57b194db0	tablet_allocator: Add config Tablet allocator is a sharded service, that starts in main, it's worth equipping it with a config. Next patches will fill it with some payload Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:13:58 +03:00
Kamil Braun	3268be3860	raft: server: track last persisted snapshot descriptor index Also introduce a condition variable notified whenever this index is updated. Will be user in following commits.	2024-01-22 16:48:08 +01:00
Kamil Braun	1e786d9d64	raft: server: framework for handling server requests Add data structures and modify `io_fiber` code to prepare it for handling requests generated by the `server`, not just `fsm`. Used in later commits.	2024-01-22 16:47:34 +01:00
Kefu Chai	33794eca19	database: wait until commitlog are reclaimed in flush_all_tables() this change addresses the possible data resurrection after "nodetool compact" and "nodetool flush" commands. and prepare for the fix of a similar data resurrection issue after "nodetool cleanup". active commitlog segments are recycled in the background once they are discarded. and there is a chance that we could have data resurrection even after "nodetool cleanup", because the mutations in commitlog's active segments could change the tables which are supposed to be removed by "nodetool cleanup", so as a solution to address this problem in the pre-tablets era, we force new active segments of commitlog, and flush the involved memtables. since the active segments are discarded in the background, the completion of the "nodetool cleanup" does not guarantee that these mutation won't be applied to memtable when server restarts, if it is killed right away. the same applies to "force_flush", "force_compaction" and "force_keyspace_compaction" API calls which are used by nodetool as well. quote from Benny's comment > If major comapction doesn't wait for the commitlog deletion it is > also exposed to data resurrection since theoretically it could purge > tombstones based on the assumption that commitlog would not resurrect > data that they might shadow, BUT on a crash/restart scenario commitlog > replay would happen since the commitlog segments weren't deleted - > breaking the contract with compaction. so to ensure that the active segments are reclaimed upon completion of "nodetool cleanup", "nodetool compact" and "nodetool flush" commands, let's wait for pending deletes in `database::flush_all_tables()`, so the caller wait until the reclamation of deleted active segments completes. Refs #4734 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16915	2024-01-22 17:31:57 +02:00
David Garcia	f3eeba8cc6	docs: parse config.cc properties as rst text This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST). By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more. Closes scylladb/scylladb#16311	2024-01-22 16:40:18 +02:00
Botond Dénes	a48881801a	replica/tablets: drop keyspace_name from system.tablets partition-key The name of the keyspace being part of the partition key is not useful, the table_id already uniquely identifies the table. The keyspace name being part of the key, means that code wanting to interact with this table, often has to resolve the table id, just to be able to provide the keyspace name. This is counter productive, so make the keyspace_name just a static column instead, just like table_name already is. Fixes: #16377 Closes scylladb/scylladb#16881	2024-01-22 13:12:02 +01:00
Petr Gusev	6a4176c84f	Update seastar submodule * seastar 8b9ae36b...85359b28 (4): > rpc: extend the use_gate until request processing is finished Fixes scylladb/scylladb#16382 > scripts: Remove build.sh > build: do not install FindProtobuf.cmake > net: add missing include Closes scylladb/scylladb#16883	2024-01-22 11:29:50 +01:00
Kamil Braun	1007ac4956	Merge 'sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes' from Petr Gusev Before the patch we called `gossiper.remove_endpoint` for IP-s of the left nodes. The problem is that in replace-with-same-ip scenario we called `gossiper.remove_endpoint` for IP which is used by the new, replacing node. The `gossiper.remove_endpoint` method puts the IP into quarantine, which means gossiper will ignore all events about this IP for `quarantine_delay` (one minute by default). If we immediately replace just replaced node with the same IP again, the bootstrap will fail since the gossiper events are blocked for this IP, and we won't be able to resolve an IP for the new host_id. Another problem was that we called gossiper.remove_endpoint method, which doesn't remove an endpoint from `_endpoint_state_map`, only from live and unreachable lists. This means the IP will keep circulating in the gossiper message exchange between cluster nodes until full cluster restart. This patch fixes both of these problems. First, we rely on the fact that when topology coordinator moves the `being_replaced` node to the left state, the IP of the `replacing` node is known to all nodes. This means before removing an IP from the gossiper we can check if this IP is currently used by another node in the current raft topology. This is done by constructing the `used_ips` map based on normal and transition nodes. This map is cached to avoid quadratic behaviour. Second, we call `gossiper.force_remove_endpoint`, not `gossiper.remove_endpoint`. This function removes and IP from `_endpoint_state_map`, as well as from live and unreachable lists. Closes scylladb/scylladb#16820 * github.com:scylladb/scylladb: get_peer_info_for_update: update only required fields in raft topology mode get_peer_info_for_update: introduce set_field lambda storage_service::on_change: fix indent storage_service::on_change: skip handle_state functions in raft topology mode test_replace_different_ip: check old IP is removed from gossiper test_replace: check two replace with same IP one after another storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes	2024-01-22 11:25:55 +01:00
Botond Dénes	742bc1bd11	test/topology_experimental_raft: test_tablet.py: disable flaky test Skip test_tablet_missing_data_repair, it is failing a lot breaking promotion and CI. Can't revert because the PR introducing it was already piled on. So disable while investigated. Refs: #16859 Closes scylladb/scylladb#16879	2024-01-22 11:49:05 +02:00
Avi Kivity	9e8b65f587	chunked_vector: remove range constructor Standard containers don't have constructors that take ranges; instead people use boost::copy_range or C++23 std::ranges::to. Make the API more uniform by removing this special constructor. The only caller, in a test, is adjusted. Closes scylladb/scylladb#16905	2024-01-22 10:26:15 +02:00
Lakshmi Narayanan Sreethar	a1867986e7	test.py: deduce correct path for unit tests when built with cmake Fix the path deduction for unit test executables when the source code is built with cmake. Fixes #16906 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16907	2024-01-22 10:03:44 +02:00
Nadav Har'El	0bef50ef0c	cql-pytest: add "--vnodes" option to "run" script Running test/cql-pytest/run now defaults to enabling the "tablets" experimental feature when running Scylla - and tests detect this and use this feature as appropriate. This is the correct default going forward, but in the short term it would be nice to also have an option to easily do a manual test run without tablets. So this patch adds a "--vnodes" option to the test/cql-pytest/run script. This option causes "run" to run Scylla without enabling the "tablets" experimental feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16896	2024-01-22 09:35:11 +02:00
Anna Stuchlik	a462b914cb	doc: add 2024.1 to the OSS vs. Enterprise matrix This commit adds the information that ScyllaDB Enterprise 2024.1 is based on ScyllaDB Open Source 5.4 to the OSS vs. Enterprise matrix. Closes scylladb/scylladb#16880	2024-01-22 09:25:08 +02:00
Kefu Chai	9550f29d22	cql3: add formatter for cql3::prepared_cache_key_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cql3::prepared_cache_key_type and cql3::prepared_cache_key_type::cache_key_type, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16901	2024-01-21 19:12:59 +02:00
Avi Kivity	3092e3a5dc	Merge 'doc: improvements to the Create Cluster page' from Anna Stuchlik This PR: - Removes the redundant information about previous versions from the Create Cluster page. - Fixes language mistakes on that page, and replaces "Scylla" with "ScyllaDB". (nobackport) Closes scylladb/scylladb#16885 * github.com:scylladb/scylladb: doc: fix the language on the Create Cluster page doc: remove reduntant info about old versions	2024-01-21 18:18:32 +02:00
Avi Kivity	5810396ba1	Merge 'Invalidate prepared statements for views when their schema changes.' from Eliran Sinvani When a base table changes and altered, so does the views that might refer to the added column (which includes "SELECT " views and also views that might need to use this column for rows lifetime (virtual columns). However the query processor implementation for views change notification was an empty function. Since views are tables, the query processor needs to at least treat them as such (and maybe in the future, do also some MV specific stuff). This commit adds a call to `on_update_column_family` from within `on_update_view`. The side effect true to this date is that prepared statements for views which changed due to a base table change will be invalidated. Fixes https://github.com/scylladb/scylladb/issues/16392 This series also adds a test which fails without this fix and passes when the fix is applied. Closes scylladb/scylladb#16897 github.com:scylladb/scylladb: Add test for mv prepared statements invalidation on base alter query processor: treat view changes at least as table changes	2024-01-21 17:43:49 +02:00
Kefu Chai	d1dd71fbd7	mutation: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16889	2024-01-21 16:58:26 +02:00
Kefu Chai	1ce58595aa	dht: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16891	2024-01-21 16:56:16 +02:00
Kefu Chai	45c4f2039b	cql3: add formatter for cql3::ut_name before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for cql3::ut_name, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16890	2024-01-21 16:53:05 +02:00
Kefu Chai	f916286b25	index: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16892	2024-01-21 16:52:25 +02:00
Kefu Chai	ce076b5ae3	gossiping_property_file_snitch: drop unused using namespace we don't use any symbol in this namespace, in this function, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16893	2024-01-21 16:48:37 +02:00
Eliran Sinvani	0e5a8cad62	Add test for mv prepared statements invalidation on base alter Issue #16392 describes a bug where when a base table is altered, it's materialized views prepared statements are not invalidated which in turn causes them to return missing data. This test reproduces this bug and serves as a regression test for this problem. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-21 15:44:06 +02:00
Eliran Sinvani	5e33d9346b	query processor: treat view changes at least as table changes When a base table changes and altered, so does the views that might refer to the added column (which includes "SELECT *" views and also views that might need to use this column for rows lifetime (virtual columns). However the query processor implementation for views change notification was an empty function. Since views are tables, the query processor needs to at least treat them as such (and maybe in the future, do also some MV specific stuff). This commit adds a call to `on_update_column_family` from within `on_update_view`. The side effect true to this date is that prepared statements for views which changed due to a base table change will be invalidated. Fixes #16392 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-21 15:40:54 +02:00
Anna Stuchlik	652cf1fa70	doc: remove the 5.1-to-2022.2 upgrade guide This commit removes the 5.1-to-2022.2 upgrade guide - the upgrade guide for versions we no longer support. We should remove it while adding the 5.4-to-2024.1 upgrade guide (the previous commit).	2024-01-19 18:33:08 +01:00
Anna Stuchlik	3c17fca363	doc: add the 5.4-to-2024.1 upgrade guide This commit adds the upgrade guide from ScyllaDB Open Source 5.4 to ScyllaDB Enterprise 2024.1. The need to include the "Restore system tables" step in rollback has been confirmed; see https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959 Fixes https://github.com/scylladb/scylladb/issues/16445	2024-01-19 18:23:37 +01:00
Petr Gusev	5de970e430	get_peer_info_for_update: update only required fields in raft topology mode Some fields of system.peers table are updated through raft, we don't need to peek them from gossiper. The goal of the patch is to declare explicitly which code is responsible for which fields. In particular, in raft topology mode we don't need to update raft-managed fields since it's done in topology_state_load and raft_ip_address_updater.	2024-01-19 20:37:12 +04:00
Petr Gusev	f51f843b67	get_peer_info_for_update: introduce set_field lambda This is a refactoring commit. In the next commit we'll add a parameter to this unified lambda and this is easy to do if we have only one lambda and not three.	2024-01-19 20:37:12 +04:00
Petr Gusev	37063e2432	storage_service::on_change: fix indent	2024-01-19 20:37:12 +04:00
Petr Gusev	8e6b569de5	storage_service::on_change: skip handle_state functions in raft topology mode We don't need them in raft topology mode since the token_metadata update happens in topology_state_load function. We lift the _raft_topology_change_enabled checks from those functions to on_change.	2024-01-19 20:37:12 +04:00
Petr Gusev	1e00889842	test_replace_different_ip: check old IP is removed from gossiper In this commit we modify the existing test_replace_different_ip. We add the check that the old IP is not contained in alive or down lists, which means it's completely wiped from gossiper. This test is failing without the force_remove_endpoint fix from a previous commit. We also check that the state of local system.peers table is correct.	2024-01-19 20:36:52 +04:00
Anna Stuchlik	d345a893d6	doc: fix the language on the Create Cluster page This commit fixes language mistakes on the Create Cluster page, and replaces "Scylla" with "ScyllaDB".	2024-01-19 17:21:12 +01:00
Anna Stuchlik	af669dd7ae	doc: remove reduntant info about old versions This commit removes the information about old versions, which is reduntant in the next upcoming version.	2024-01-19 17:06:34 +01:00
Anna Stuchlik	b1ba904c49	doc: remove upgrade for unsupported versions This commit removes the upgrade guides from ScyllaDB Open Source to Enterprise for versions we no longer support. In addition, it removes a link to one of the removed pages from the Troubleshooting section (the link is redundant). Closes scylladb/scylladb#16249	2024-01-19 15:59:35 +02:00
Mikołaj Grzebieluch	c589793a9e	test.py: test_maintenance_socket: remove pytest.xfail Issue https://github.com/scylladb/python-driver/issues/278 was fixed in https://github.com/scylladb/python-driver/pull/279. Closes scylladb/scylladb#16873	2024-01-19 14:54:15 +01:00
Botond Dénes	b50d9bb802	Merge 'Add code coverage support' from Eliran Sinvani This mini-set includes code coverage support for ScyllaDB, it provides: 1. Support for building ScyllaDB with coverage support. 2. Utilities for processing coverage profiling data 3. test.py support for generation and processing of coverage profiling into an lcov trace files which can later be used to produce HTML or textual coverage reports. Refs #16323 Closes scylladb/scylladb#16784 * github.com:scylladb/scylladb: Add code coverage documentation test.py: support code coverage code coverage: Add libraries for coverage handling test.py: support --coverage and --coverage-mode configure.py support coverage profiles on standrad build modes	2024-01-19 15:27:44 +02:00
Pavel Emelyanov	e62114214f	Merge 'More logging for Raft-based topology' from Kamil Braun Currently if topology coordinator gets stuck in a CI test run it's hard to debug this (e.g. scylladb/scylladb#16708). We can add a lot of logging inside topology coordinator code to aid debugging, without spamming the logs -- these are relatively rare control plane events. Closes scylladb/scylladb#16749 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: enable raft_topology=debug level by default raft topology: increase level of some TRACE messages raft topology: log when entering transition states raft topology: don't include null ID in exclude_nodes raft topology: INFO log when executing global commands and updating topology state storage_service: separate logger for raft topology	2024-01-19 16:19:44 +03:00
Nadav Har'El	debf6753c7	Merge 'test/cql-pytest: run tests with tablets' from Botond Dénes Add `--experimental-features=tablets` to both `test/cql-pytest/suite.yaml` and `test/cql-pytest/run.py`, so tablets are enabled. Detect tablet support in `contest.py` and add an xfail and skip marker to mark tests that fail/crash with tablets. These are expected to be fixed soon. Some tests checking things around alter-keyspace, had to force-disable tablets on the created keyspace, because tablets interfere with the test (a keyspace with tablets cannot have simple strategy for example). Tablets were also interfering with `test_keyspace.py:test_storage_options_local`, because it is expecting `system_schema.scylla_keyspaces` to not have any entries for local storage keyspace, but they have it if tablets are enabled. Adjust the test to account for this. Closes scylladb/scylladb#16840 * github.com:scylladb/scylladb: test/cql-pytest: run.py,suite.yaml: enable tablets by default test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed test/cql-pytest: disable tablets for some keyspace-altering tests test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets test/cql-pytest: fix test_tablets.py to set initial_tablets correctly test/cql-pytest: add tablet detection logic and fixtures test/cql-pytest: extract is_scylla check into util.py	2024-01-19 13:38:56 +02:00
Kamil Braun	cc039498c6	Update tools/cqlsh submodule * tools/cqlsh 426fa0ea...b8d86b76 (8): > Make cqlsh work with unix domain sockets Fixes scylladb/scylladb#16489 > Bump python-driver version > dist/debian: add trailer line > dist/debian: wrap long line > Draft: explicit build-time packge dependencies > stop retruning status_code=2 on schema disagreement > Fix minor typos in the code > Dockerfile: apt-get update and apt-get upgrade to get latest OS packages	2024-01-19 11:23:22 +01:00
Botond Dénes	04881b3915	test/cql-pytest: run.py,suite.yaml: enable tablets by default All the preparations are done, the tests can now run with tablets.	2024-01-19 03:46:38 -05:00
Botond Dénes	075be5a04a	test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed For tests that cover functionality, which doesn't yet work with tablets. These tests and the respective functionality they test, are expected to be fixed soon, and then these fixtures will be removed.	2024-01-19 03:46:38 -05:00
Botond Dénes	6e6bee4368	test/cql-pytest: disable tablets for some keyspace-altering tests When tablets are enabled on a keyspace, they cannot be altered to simple replication strategy anymore. These keyspaces are testing exactly that, so disable tablets on the initial keyspace create statements.	2024-01-19 03:46:38 -05:00
Botond Dénes	5f11aa940d	test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets This test expects a keyspace with local storage option, to not have a row in system_schema.scylla_keyspace. With tablets enabled by default, this won't be the case. Adjust the test to check for the specific storage-related columns instead.	2024-01-19 03:46:38 -05:00
Nadav Har'El	f92d2b4928	test/cql-pytest: fix test_tablets.py to set initial_tablets correctly Recently, in commit `49026dc319`, the way to choose the number of tablets in a new keyspace changed. This broke the test we had for a memory leak when many tablets were used, which saw the old syntax wasn't recognized and assumed Scylla is running without tablet support - so the test was skipped. Let's fix the syntax. After this patch the test passes if the tablets experimental feature is enabled, and only skipped if it isn't. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-19 03:46:38 -05:00
Botond Dénes	2119faf7fe	test/cql-pytest: add tablet detection logic and fixtures Add keyspace_has_tablets() utility function, which, given a keyspace, returns whether it is using tablets or not. In addition, 3 new fixtures are added: * has_tablets - does scylla has tablets by default? * xfail_tablets - the test is marked xfail, when tablets are enabled by default. * skip_with_tablets - the test is skipped when tablets are enabled by default, because it might crash with tablets. We expect the latter two to be removed soon(ish), as we make all test, and the functionality they test work with tablets.	2024-01-19 03:46:38 -05:00
Botond Dénes	6e53264bc3	test/cql-pytest: extract is_scylla check into util.py This logic is currently in the scylla_only fixture, but we want to re-use this in other utility functions in the next patches too.	2024-01-19 03:46:38 -05:00
Petr Gusev	070de5c551	test_replace: check two replace with same IP one after another This is a test case for the problem, described in the previous commit. Before that fix the second replace failed since it couldn't resolve an IP for the new host_id.	2024-01-19 12:24:04 +04:00
Petr Gusev	30b2e5838c	storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes Before the patch we called gossiper.remove_endpoint for IP-s of the left nodes. The problem is that in replace-with-same-ip scenario we called gossiper.remove_endpoint for IP which is used by the new, replacing node. The gossiper.remove_endpoint method puts the IP into quarantine, which means gossiper will ignore all events about this IP for quarantine_delay (one minute by default). If we immediately replace just replaced node with the same IP again, the bootstrap will fail since the gossiper events are blocked for this IP, and we won't be able to resolve an IP for the new host_id. Another problem was that we called gossiper.remove_endpoint method, which doesn't remove an endpoint from _endpoint_state_map, only from live and unreachable lists. This means the IP will keep circulating in the gossiper message exchange between cluster nodes until full cluster restart. This patch fixes both of these problems. First, we rely on the fact that when topology coordinator moves the being_replaced node to the left state, the IP of the replacing node is known to all nodes. This means before removing an IP from the gossiper we can check if this IP is currently used by another node in the current raft topology. This is done by constructing the used_ips map based on normal and transition nodes. This map is cached to avoid quadratic behaviour. Second, we call gossiper.force_remove_endpoint, not gossiper.remove_endpoint. This function removes and IP from _endpoint_state_map, as well as from live and unreachable lists. The tests for both of these improvements will be added in subsequent commits.	2024-01-19 12:24:04 +04:00
Kefu Chai	0dbb0ed09f	api: storage_service: correct a typo s/trough/through/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16870	2024-01-19 10:21:41 +02:00
Kefu Chai	5c0484cb02	db: add formatter for db::operation_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for db::operation_type, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16832	2024-01-19 10:16:41 +02:00
Kefu Chai	2d2cd5fa3a	repair: do not compare unsigned with signed this change should silence the warning like ``` /home/kefu/dev/scylladb/repair/repair.cc:222:23: error: comparison of integers of different signs: 'int' and 'size_type' (aka 'unsigned long') [-Werror,-Wsign-compare] 222 \| for (int i = 0; i < all.size(); i++) { \| ~ ^ ~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16867	2024-01-19 08:52:02 +02:00
Kefu Chai	21d55abe8b	unimplemented: add format_as() for unimplemented::cause before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we replace operator<< with format_as() for unimplemented::cause, so that we don't rely on the deprecated behavior, and neither do we create a fully blown fmt::formatter. as in fmt v10, format_as() can be used in place of fmt::formatter, while in fmt v9, format_as() is only allowed to return a integer. so, to be future-proof, and to be simpler, format_as() is used. we can even replace `format_as(c)` with `c`, once fmt v10 is available in future. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16866	2024-01-19 08:38:30 +02:00
Botond Dénes	70252ee36f	Merge 'auth: do not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Closes scylladb/scylladb#16868 * github.com:scylladb/scylladb: auth: do not include unused headers locator: Handle replication factor of 0 for initial_tablets calculations table: add_sstable_and_update_cache: trigger compaction only in compaction group compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction	2024-01-19 08:30:11 +02:00
Kefu Chai	263e2fabae	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-19 10:49:17 +08:00
Avi Kivity	d65ce16cf6	Merge 'Prevent empty compaction tasks in cleanup, upgrade sstables, and add_sstable' from Benny Halevy This short series prevents the creation of compaction tasks when we know in advance that they have nothing to do. This is possible in the clean path by: - improve the detection of candidates for cleanup by skipping sstables that require cleanup but are already being compacted - checking that list of sstables selected for cleanup isn't empty before creating the cleanup task For upgrade sstables, and generally when rewriting all sstable: launch the task only if the list off candidate sstables isn't empty. For regular compaction, when triggered via `table::add_sstable_and_update_cache`, we currently trigger compaction (by calling `submit`) on all compaction groups while the sstable is added only to one of them. Also, it is typically called for maintenance sstables that are awaiting offstrategy compaction, in which case we can skip calling `submit` entirely since the caller triggers offstrategy compaction at a later stage. Refs scylladb/scylladb#15673 Refs scylladb/scylladb#16694 Fixes scylladb/scylladb#16803 Closes scylladb/scylladb#16808 * github.com:scylladb/scylladb: table: add_sstable_and_update_cache: trigger compaction only in compaction group compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction	2024-01-18 19:47:33 +02:00
Pavel Emelyanov	8595d64d01	locator: Handle replication factor of 0 for initial_tablets calculations When calculating per-DC tablets the formula is shards_in_dc / rf_in_dc, but the denominator in it can be configured to be literally zero and the division doesn't work. Fix by assuming zero tablets for dcs with zero rf fixes: #16844 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16861	2024-01-18 19:42:08 +02:00
Kamil Braun	8d9b0a6538	raft: server: inline `poll_fsm_output`	2024-01-18 18:09:13 +01:00
Kamil Braun	754a7b54e4	raft: server: fix indentation	2024-01-18 18:09:11 +01:00
Kamil Braun	527780987b	raft: server: move `io_fiber`'s processing of `batch` to a separate function	2024-01-18 18:09:02 +01:00
Kamil Braun	3e6b4910a6	raft: move `poll_output()` from `fsm` to `server` `server` was the only user of this function and it can now be implemented using `fsm`'s public interface. In later commits we'll extend the logic of `io_fiber` to also subscribe to other events, triggered by `server` API calls, not only to outputs from `fsm`.	2024-01-18 18:07:52 +01:00
Kamil Braun	95b6a60428	raft: move `_sm_events` from `fsm` to `server` In later commits we will use it to wake up `io_fiber` directly from `raft::server` based on events generated by `raft::server` itself -- not only from events generated by `raft::fsm`. `raft::fsm` still obtains a reference to the condition variable so it can keep signaling it.	2024-01-18 18:07:44 +01:00
Kamil Braun	a83e04279e	raft: fsm: remove constructor used only in tests This constructor does not provide persisted commit index. It was only used in tests, so move it there, to the helper `fsm_debug` which inherits from `fsm`. Test cases which used `fsm` directly instead of `fsm_debug` were modified to use `fsm_debug` so they can access the constructor. `fsm_debug` doesn't change the behavior of `fsm`, only adds some helper members. This will be useful in following commits too.	2024-01-18 18:07:17 +01:00
Kamil Braun	689d59fccd	raft: fsm: move trace message from `poll_output` to `has_output` In a later commit we'll move `poll_output` out of `fsm` and it won't have access to internals logged by this message (`_log.stable_idx()`). Besides, having it in `has_output` gives a more detailed trace. In particular we can now see values such as `stable_idx` and `last_idx` from the moment of returning a new fsm output, not only when poll started waiting for it (a lot of time can pass between these two events).	2024-01-18 18:06:55 +01:00
Kamil Braun	f6d43779af	raft: fsm: extract `has_output()` Also use the more efficient coroutine-specific `condition_variable::when` instead of `wait`.	2024-01-18 18:06:27 +01:00
Kamil Braun	dccfd09d83	raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` This parameter says how many entries at most should be left trailing before the snapshot index. There are multiple places where this decision is made: - in `applier_fiber` when the server locally decides to take a snapshot due to log size pressure; this applies to the in-memory log - in `fsm::step` when the server received an `install_snapshot` message from the leader; this also applies to the in-memory log - and in `io_fiber` when calling `store_snapshot_descriptor`; this applies to the on-disk log. The logic of how many entries should be left trailing is calculated twice: - first, in `applier_fiber` or in `fsm::step` when truncating the in-memory log - and then again as the snapshot descriptor is being persisted. The logic is to take `_config.snapshot_trailing` for locally generated snapshots (coming from `applier_fiber`) and `0` for remote snapshots (from `fsm::step`). But there is already an error injection that changes the behavior of `applier_fiber` to leave `0` trailing entries. However, this doesn't affect the following `store_snapshot_descriptor` call which still uses `_config.snapshot_trailing`. So if the server got restarted, the entries which were truncated in-memory would get "revived" from disk. Fortunately, this is test-only code. However in future commits we'd like to change the logic of `applier_fiber` even further. So instead of having a separate calculation of trailing entries inside `io_fiber`, it's better for it to use the number that was already calculated once. This number is passed to `fsm::apply_snapshot` (by `applier_fiber` or `fsm::step`) and can then be received by `io_fiber` from `fsm_output` to use it inside `store_snapshot_descriptor`.	2024-01-18 18:05:45 +01:00
Kamil Braun	40cd91cff7	raft: server: pass `_aborted` to `set_exception` call This looks like a minor oversight, in `server_impl::abort` there are multiple calls to `set_exception` on the different promises, only one of them would not receive `_aborted`.	2024-01-18 18:05:18 +01:00
Kefu Chai	09a688d325	sstables: do not use lambda when not necessary before this change, we always reference the return value of `make_reader()`, and the return value's type `flat_mutation_reader_v2` is movable, so we can just pass it by moving away from it. in this change, instead of using a lambda, let's just have the return value of it. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16835	2024-01-18 15:54:49 +02:00
Kefu Chai	a1dcddd300	utils: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16833	2024-01-18 12:50:06 +02:00
Asias He	d3efb3ab6f	storage_service: Set session id for raft_rebuild Raft rebuild is broken because the session id is not set. The following was seen when run rebuild stream_session - [Stream #8cfca940-afc9-11ee-b6f1-30b8f78c1451] stream_transfer_task: Fail to send to 127.0.70.1:0: seastar::rpc::remote_verb_error (Session not found: 00000000-0000-0000-0000-000000000000) with raft topology, e.g., scylla --enable-repair-based-node-ops 0 --consistent-cluster-management true --experimental-features consistent-topology-changes Fix by setting the session id. Fixes #16741 Closes scylladb/scylladb#16814	2024-01-18 12:47:20 +02:00
Kamil Braun	e4918c0d31	test/pylib: scylla_cluster: enable raft_topology=debug level by default To help debugging test.py failures in CI.	2024-01-18 11:24:16 +01:00
Kamil Braun	52e67ca121	raft topology: increase level of some TRACE messages Increased them to DEBUG level, and in one case to WARN (inside an exception handler). The selected messages are still relatively rare (per-node per-transition control plane events, plus events such as fibers sleeping and waking up) although more low level. They are also small messages. Messages that are large such as those which print all tokens of nodes or large mutations are left on TRACE level. The plan is to enable DEBUG level logging in test.py tests for raft_topology, while not spamming the logs completely such as by printing large mutations.	2024-01-18 11:24:16 +01:00
Kamil Braun	92e6604127	raft topology: log when entering transition states Those are rare control plane events, but might be useful when debugging problems with topology coordinator (e.g. where it got stuck).	2024-01-18 11:24:15 +01:00
Kamil Braun	aeb53ea31d	raft topology: don't include null ID in exclude_nodes Observed with newly added logs: ``` raft topology - executing global topology command barrier_and_drain, excluded nodes: {00000000-0000-0000-0000-000000000000} ```	2024-01-18 11:24:15 +01:00
Kamil Braun	ae25f703c4	raft topology: INFO log when executing global commands and updating topology state Those are rare control plane events, but useful for debugging e.g. if topology coordinator gets stuck at some point.	2024-01-18 11:24:15 +01:00
Kamil Braun	71957b4320	storage_service: separate logger for raft topology Allows selectively enabling higher logging levels for just raft-topology related things, without doing it for the entire storage_service (which includes things like gossiper callbacks). Also gets rid of the redundant "raft topology:" prefix which was also not included everywhere.	2024-01-18 11:24:14 +01:00
Eliran Sinvani	32d8dadf1a	Add code coverage documentation Add `docs/dev/code-coverage.md` with explanations about how to work with the different tools added for coverage reporting and cli options added to `configure.py` and `test.py` Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	c7dff1b81b	test.py: support code coverage test.py already support the routing of coverage data into a predetermined folder under the `tmpdir` logs folder. This patch extends on that and leverage the code coverage processing libraries to produce test coverage lcov files and a coverage summary at the end of the run. The reason for not generating the full report (which can be achieved with a one liner through the `coverage_utils.py` cli) is that it is assumed that unit testing is not necessarily the "last stop" in the testing process and it might need to be joined with other coverage information that is created at other testing stages (for example dtest). The result of this patch is that when running test.py with one of the coverage options (`--coverage` / `--mode-coverage`) it will perform another step of processing and aggregating the profiling information created. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	00a55abdd6	code coverage: Add libraries for coverage handling Coverage handling is divided into 3 steps: 1. Generation of profiling data from a run of an instrumented file (which this patch doesn't cover) 2. Processing of profiling data, which involves indexing the profile and producing the data in some format that can be manipulated and unified. 3. Generate some reporting based on this data. The following patch is aiming to deal with the last two steps by providing a cli and a library for this end. This patch adds two libraries: 1. `coverage_utils.py` which is a library for manipulating coverage data, it also contains a cli for the (assumed) most common operations that are needed in order to eventually generate coverage reporting. 2. `lcov_utils.py` - which is a library to deal with lcov format data, which is a textual form containing a source dependant coverage data. An example of such manipulation can be `coverage diff` operation which produces a set like difference operation. cov_a - cov_b = diff where diff is an lcov formated file containing coverage data for code cov_a that is not covered at all in cov_b. The libraries and cli main goal is to provide a unified way to handle coverage data in a way that can be easily scriptable and extensible. This will pave the way for automating the coverage reporting and processing in test.py and in jenkins piplines (for example to also process dtest or sct coverage reporting) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	f4b6c9074a	test.py: support --coverage and --coverage-mode We aim to support code coverage reporting as part of our development process, to this end, we will need the ability to "route" the dumped profiles from scylla and unit test to a predetermined location. We can consider profile data as logged data that should persist after tests have been run. For this we add two supported options to test.py: --coverage - which means that all suits on all modes will participate in coverage. --coverage-mode - which can be used to "turn on" coverage support only for some of the modes in this run. The strategy chosen is to save the profile data in `tmpdir`/mode/coverage/%m.profraw (ref: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program) This means that for every suite the profiling data of each object is going to be merged into the same file (llvm claims to lock the file so concurrency is fine). More resolution than the suite level seems to not give us anything useful (at least not at the moment). Moreover, it can also be achieved by running a single test. Data in the suite level will help us to detect suits that don't generate coverage data at all and to fix this or to skip generating the profiles for them. Also added support of 'coverage' parameter in the `suite.yaml` file, which can be used to disable coverage for a specific suite, this parameter defaults to True but if a suite is known to not generate profiles or the suite profile data is not needed or obfuscate the result it can be set to false in order to cancel profiles routing and processing for this suite. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	759d70deee	configure.py support coverage profiles on standrad build modes We already have a dedicated coverage build, however, this build is dedicated mostly for coverage in boost and standalone unit tests. This added configuration option will compile every configured build mode with coverage profiling support (excluding 'coverage' mode). It also does targeted profiling that is narrowed down only to ScyllaDB code and doesn't instrument seastar and testing code, this should give a more accurate coverage reporting and also impact performance less, as one example, the reactor loop in seastar will not be profiled (along with everything else). The targeted profiling is done with the help of the newly added `coverage_sources.list` file which excludes all seastar sub directories from the profiling. Also an extra measure is taken to make sure that the seastar library will not be linked with the coverage framework (so it will not dump confusing empty profiles). Some of the seastar headers are still going to be included in the profile since they are indirectly included by profiled source files in order to remove them from the final report a processing step on the resulting profile will need to take place. A note about expected performance impact: It is expected to have minimal impact on performance since the instrumentation adds counter increments without locking. Ref: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fprofile-update This means that the numbers themselves are less reliable but all covered lines are guarantied to have at least non-zero value. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Kefu Chai	f5d1836a45	types: fix indent `f344e130` failed to get the indent right, so fix it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16834	2024-01-18 09:14:39 +02:00
Botond Dénes	8087bc72f0	Merge 'Basic tablet repair support' from Asias He This patch adds basic tablet repair support. Below is an example showing how tablet repairs works. The `nodetool repair -pr` cmd was performed on all the nodes, which makes sure no duplication repair work will be performed and each tablet will be repaired exactly once. Three nodes in the cluster. RF = 2. 16 initial tablets. Tablets: ``` cqlsh> SELECT * FROM system.tablets; keyspace_name \| table_id \| last_token \| table_name \| tablet_count \| new_replicas \| replicas \| session \| stage ---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+------- ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -8070450532247928833 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -6917529027641081857 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -5764607523034234881 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -4611686018427387905 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -3458764513820540929 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -2305843009213693953 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1152921504606846977 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 1152921504606846975 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 2305843009213693951 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 3458764513820540927 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 4611686018427387903 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 5764607523034234879 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 6917529027641081855 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 8070450532247928831 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] \| null \| null ``` node1: ``` $nodetool repair -p 7199 -pr ks1 standard1 [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true [shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true [shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true [shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true [shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s ``` node2: ``` $nodetool repair -p 7200 -pr ks1 standard1 [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true [shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s ``` node3: ``` $nodetool repair -p 7300 -pr ks1 standard1 [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true [shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true [shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true [shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s ``` Fixes #16599 Closes scylladb/scylladb#16600 * github.com:scylladb/scylladb: test: Add test_tablet_missing_data_repair test: Add test_tablet_repair test: Allow timeout in server_stop_gracefully test: Increase STOP_TIMEOUT_SECONDS repair: Wire tablet repair with the user repair request repair: Pass raft_address_map to repair service repair: Add host2ip_t type repair: Add finished user-requested log for vnode table too repair: Log error in the rpc_stream_handler repair: Make row_level repair work with tablet repair: Add get_dst_shard_id repair: Add shard to repair_node_state repair: Add shard map to repair_neighbors	2024-01-18 09:13:00 +02:00
Asias He	1399dc0ff2	test: Add test_tablet_missing_data_repair The test verifies repair brings the missing rows to the owner. - Shutdown part of the nodes in the cluster - Insert data - Start all nodees - Run repair - Shutdown part of the nodes - Check all data is present	2024-01-18 08:49:06 +08:00
Asias He	bfe5894a9f	test: Add test_tablet_repair A basic repair test that verifies tablet repair works.	2024-01-18 08:49:06 +08:00
Asias He	39912d7bed	test: Allow timeout in server_stop_gracefully The default is 60s. Sometimes it takes more than 60s to stop a node for some reason.	2024-01-18 08:49:06 +08:00
Asias He	276b04a572	test: Increase STOP_TIMEOUT_SECONDS It is observed that the stop of scylla took more than 60s to finish in some cases. Increase the hard coded stop timeout.	2024-01-18 08:49:06 +08:00
Asias He	54239514af	repair: Wire tablet repair with the user repair request Currently, only the table and primary replica selection options are supported. Reject repair request if the repair options are not supported yet. With this patch, users can repair tablet tables by running nodetool repair -pr myks mytable on each node in the cluster, so that each tablet will be repaired only once without duplication work. Below is an example showing how tablet repairs works. The `nodetool repair -pr` cmd was performed on all the nodes. Three nodes in the cluster. RF = 2. 16 initial tablets. Tablets: cqlsh> SELECT * FROM system.tablets; keyspace_name \| table_id \| last_token \| table_name \| tablet_count \| new_replicas \| replicas \| session \| stage ---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+------- ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -8070450532247928833 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -6917529027641081857 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -5764607523034234881 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -4611686018427387905 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -3458764513820540929 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -2305843009213693953 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1152921504606846977 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 1152921504606846975 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 2305843009213693951 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 3458764513820540927 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 4611686018427387903 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 5764607523034234879 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 6917529027641081855 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 8070450532247928831 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] \| null \| null node1: $nodetool repair -p 7199 -pr ks1 standard1 [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true [shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true [shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true [shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true [shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s node2: $nodetool repair -p 7200 -pr ks1 standard1 [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true [shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s node3: $nodetool repair -p 7300 -pr ks1 standard1 [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true [shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true [shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true [shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s Fixes #16599	2024-01-18 08:49:06 +08:00
Asias He	93028f4848	repair: Pass raft_address_map to repair service It is needed to translate hostid to ip address.	2024-01-18 08:49:06 +08:00
Asias He	194e870996	repair: Add host2ip_t type It is used to translate hostid to ip address in repair code.	2024-01-18 08:49:06 +08:00
Asias He	637b8e4f51	repair: Add finished user-requested log for vnode table too	2024-01-18 08:49:06 +08:00
Asias He	b24f6fbc92	repair: Log error in the rpc_stream_handler It is useful for debug when the handler goes wrong. In addition to send the error back to the peer. Log the error as well.	2024-01-18 08:49:06 +08:00
Asias He	fd774862be	repair: Make row_level repair work with tablet Since a given tablet belongs to a single shard on both repair master and repair followers, row level repair code needs to be changed to work on a single shard for a given tablet. In order to tell the repair followers which shard to work on, a dst_cpu_id value is passed over rpc from the repair master.	2024-01-18 08:49:06 +08:00
Asias He	e1f68ea64a	repair: Add get_dst_shard_id A helper to get the dst shard id on the repair follower. If the repair master specifies the shard id for the follower, use it. Otherwise, the follower chooses one itself.	2024-01-18 08:49:06 +08:00
Asias He	2e8c6ebfca	repair: Add shard to repair_node_state It is used to specify the shard id that repair instance runs on.	2024-01-18 08:49:06 +08:00
Asias He	16349be37e	repair: Add shard map to repair_neighbors It is used to specify the shard id that repair instance should run repair on.	2024-01-18 08:49:06 +08:00
Avi Kivity	394ef13901	build: regenerate frozen toolchain for tablets-aware Python driver Pull in scylla-driver 3.26.5, which supports tablets. Closes scylladb/scylladb#16829	2024-01-17 22:47:36 +02:00
Kefu Chai	0ae81446ef	./: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16766	2024-01-17 16:30:14 +02:00
Kamil Braun	787b24cd24	Merge 'raft topology: join: shut down a node on error in response handler' from Patryk Jędrzejczak If the joining node fails while handling the response from the topology coordinator, it hangs even though it knows the join operation has failed. Therefore, we ensure it shuts down in this patch. Additionally, we ensure that if the first join request response was a rejection or the node failed while handling it, the following acceptances by the (possibly different) coordinator don't succeed. The node considers the join operation as failed. We shouldn't add it to the cluster. Fixes scylladb/scylladb#16333 Closes scylladb/scylladb#16650 * github.com:scylladb/scylladb: topology_coordinator: clarify warnings raft topology: join: allow only the first response to be a succesful acceptance storage_service: join_node_response_handler: fix indentation raft topology: join: shut down a node on error in response handler	2024-01-17 14:55:26 +01:00
Botond Dénes	f22fc88a64	Merge 'Configure service levels interval' from Michał Jadwiszczak Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests. This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest. Closes scylladb/scylladb#16394 * github.com:scylladb/scylladb: test:cql-pytest: change service levels intervals in tests configure service levels interval	2024-01-17 12:24:49 +02:00
Benny Halevy	0d937f3974	table: add_sstable_and_update_cache: trigger compaction only in compaction group There is no need to trigger compaction in all compaction groups when an sstable is added to only one of them. And with that level of control, if the caller passes sstables::offstrategy::yes, we know it will trigger offstrategy compaction later on so there is no need to trigger compaction at all for this sstable at this time. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 12:13:17 +02:00
Benny Halevy	51a46aa83b	compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact Prevent the creation of a compaction task when the list of sstables is known to be empty ahead of time. Refs scylladb/scylladb#16694 Fixes scylladb/scylladb#16803 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 11:53:39 +02:00
Benny Halevy	bd1d65ec38	compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction `3b424e391b` introduced a loop in `perform_cleanup` that waits until all sstables that require cleanup are cleaned up. However, with `f1bbf705f9`, an sstable that is_eligible_for_compaction (i.e. it is not in staging, awaiting view update generation), may already be compacted by e.g. regular compaction. And so perform_cleanup should interrupt that by calling try_perform_cleanup, since the latter reevaluates `update_sstable_cleanup_state` with compaction disabled - that stops ongoing compactions. Refs scylladb/scylladb#15673 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 11:53:39 +02:00
David Garcia	f555a2cb05	docs: dynamic include based on flag docs: extend include options Closes scylladb/scylladb#16753	2024-01-17 09:33:40 +02:00
Calle Wilund	af0772d605	commitlog: Add wait_for_pending_deletes Refs #16757 Allows waiting for all previous and pending segment deletes to finish. Useful if a caller of `discard_completed_segments` (i.e. a memtable flush target) not only wants to ensure segments are clean and released, but thoroughly deleted/recycled, and hence no treat to resurrecting data on crash+restart. Test included. Closes scylladb/scylladb#16801	2024-01-17 09:30:55 +02:00
Kefu Chai	84a9d2fa45	add formatter for auth::role_or_anonymous before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for auth::role_or_anonymous, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16812	2024-01-17 09:28:13 +02:00
Kefu Chai	3f0fbdcd86	replica: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16810	2024-01-17 09:27:09 +02:00
Tomasz Grabiec	3d76aefb98	Merge "Enhance topology request status tracking" from Gleb Currently to figure out if a topology request is complete a submitter checks the topology state and tries to figure out from that the status of the request. This is not exact. Lets look at rebuild handling for instance. To figure out if request is completed the code waits for request object to disappear from the topology, but if another rebuild starts between the end of the previous one and the code noticing that it completed the code will continue waiting for the next rebuild. Another problem is that in case of operation failure there is no way to pass an error back to the initiator. This series solves those problems by assigning an id for each request and tracking the status of each request in a separate table. The initiator can query the request status from the table and see if the request was completed successfully or if it failed with an error, which is also evadable from the table. The schema for the table is: CREATE TABLE system.topology_requests ( id timeuuid PRIMARY KEY, initiating_host uuid, start_time timestamp, done boolean, error text, end_time timestamp, ); and all entries have TTL of one month.	2024-01-17 00:37:19 +01:00
Benny Halevy	d6071945c8	compaction, table: ignore foreign sstables replay_position The sstables replay_position in stats_metadata is valid only on the originating node and shard. Therefore, validate the originating host and shard before using it in compaction or table truncate. Fixes #10080 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16550	2024-01-16 18:45:59 +02:00
Benny Halevy	7a7a1db86b	sstables_loader: load_new_sstables: auto-enable load-and-stream for tablets And call on_internal_error if process_upload_dir is called for tablets-enabled keyspace as it isn't supported at the moment (maybe it could be in the future if we make sure that the sstables are confined to tablets boundaries). Refs #12775 Fixes #16743 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16788	2024-01-16 18:43:52 +02:00
Gleb Natapov	9a7243d71a	storage_service: topology coordinator: Consolidate some mutation builder code	2024-01-16 17:02:54 +02:00
Gleb Natapov	a145a73136	storage_service: topology coordinator: make topology operation rollback error more informative Include an error which caused the rollback.	2024-01-16 17:02:54 +02:00
Gleb Natapov	bf91eb37f2	storage_service: topology coordinator: make topology operation cancellation error more informative Include the list of nodes that were down when cancellation happened.	2024-01-16 17:02:54 +02:00
Gleb Natapov	8beb399b72	storage_service: topology coordinator: consolidate some code in cancel_all_requests There is a code duplication that can be avoided.	2024-01-16 17:02:54 +02:00
Gleb Natapov	fba6877b3e	storage_service: topology coordinator: TTL topology request table To prevent topology_request table growth TTL all writes to expire after a month.	2024-01-16 17:02:54 +02:00
Gleb Natapov	d576ed31dc	storage_service: topology request: drop explicit shutdown rpc Now that we have explicit status for each request we may use it to replace shutdown notification rpc. During a decommission, in left_token_ring state, we set done to true after metadata barrier that waits for all request to the decommissioning node to complete and notify the decommissioning node with a regular barrier. At this point the node will see that the request is complete and exit.	2024-01-16 17:02:54 +02:00
Gleb Natapov	84197ff735	storage_service: topology coordinator: check topology operation completion using status in topology_requests table Instead of trying to guess if a request completed by looking into the topology state (which is sometimes can be error prone) look at the request status in the new topology_requests. If request failed report a reason for the failure from the table.	2024-01-16 17:02:54 +02:00
Kefu Chai	0092700ad1	memtable: add formatter for replica::{memtable,memtable_entry} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for replica::memtable and replica::memtable_entry, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16793	2024-01-16 16:46:52 +02:00
Kefu Chai	2dbf044b91	cql3: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16791	2024-01-16 16:43:17 +02:00
Avi Kivity	a9844ed69a	Merge 'view: revert cleanup filter that doesn't work with tablets' from Nadav Har'El The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass. The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place. Fixes #16598 Closes scylladb/scylladb#16670 * github.com:scylladb/scylladb: view: revert cleanup filter that doesn't work with tablets mv: sleep a bit before view-update-generator restart	2024-01-16 16:42:20 +02:00
Gleb Natapov	1c18476385	storage_service: topology coordinator: update topology_requests table with requests progress Make topology coordinator update request's status in topology_requests table as it changes.	2024-01-16 15:35:18 +02:00
Benny Halevy	e277ec6aef	force_keyspace_cleanup: skip keyspaces that do not require or support cleanup Local keyspaces do not need cleanup, and keyspaces configured with tablets, where their replication strategy is per-table do not support cleanup. In both cases, just skip their cleanup via the api. Fixes #16738 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16785	2024-01-16 15:01:49 +03:00
Gleb Natapov	1ce1c5001d	topology coordinator: add topology_requests table to group0 snapshot Since the table is updated through raft's group0 state machine its content needs to be part of the snapshot.	2024-01-16 13:57:27 +02:00
Gleb Natapov	584551f849	topology coordinator: add request_id to the topology state machine Provide a unique ID for each topology request and store it the topology state machine. It will be used to index new topology requests table in order to retrieve request status.	2024-01-16 13:57:27 +02:00
Gleb Natapov	ecb8778950	system keyspace: introduce local table to store topology requests status The table has the following schema and will be managed by raft: CREATE TABLE system.topology_requests ( id timeuuid PRIMARY KEY, initiating_host uuid, start_time timestamp, done boolean, error text, end_time timestamp, ); In case of an request completing with an error the "error" filed will be non empty when "done" is set to true.	2024-01-16 13:57:16 +02:00
Tomasz Grabiec	49026dc319	Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to. This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is * `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration * `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing fixes: #16319 Closes scylladb/scylladb#16364 * github.com:scylladb/scylladb: code: Enable tablets if cluster feature is enabled test: Turn off tablets feature by default test: Move test_tablet_drain_failure_during_decommission to another suite test/tablets: Enable tables for real on test keyspace test/tablets: Make timestamp local cql3: Add feature service to as_ks_metadata_update() cql3: Add feature service to ks_prop_defs::as_ks_metadata() cql3: Add feature service to get_keyspace_metadata() cql: Add tablets on/off switch to CREATE KEYSPACE cql: Move initial_tablets from REPLICATION to TABLETS in DDL network_topology_strategy: Estimate initial_tablets if 0 is set	2024-01-16 00:15:10 +01:00
Avi Kivity	5e70dd1dbe	database: don't allow keyspace objects to be copied keyspace objects are heavyweight and copies are immediately our-of-date, so copying them is bad. Fix by deleting the copy constructor and copy assignment operator. One call site is fixed. This call site is safe since the it's only used for accessing a few attributes (introduced in `f70c4127c6`). Closes scylladb/scylladb#16782	2024-01-15 21:48:32 +01:00
Botond Dénes	204d3284fa	readers/multishard: evictable_reader::fast_forward_to(): close reader on exception When the reader is currently paused, it is resumed, fast-forwarded, then paused again. The fast forwarding part can throw and this will lead to destroying the reader without it being closed first. Add a try-catch surrounding this part in the code. Also mark `maybe_pause()` and `do_pause()` as noexcept, to make it clear why that part doesn't need to be in the try-catch. Fixes: #16606 Closes scylladb/scylladb#16630	2024-01-15 20:55:55 +01:00
Kefu Chai	e5300f3e21	topology_state_machine: add formatter for service::cleanup_status before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for service::cleanup_status, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16778	2024-01-15 21:31:42 +02:00
Anna Stuchlik	af1405e517	doc: remove support for CentOS 7 This commit removes support for CentOS 7 from the docs. The change applies to version 5.4,so it must be backported to branch-5.4. Refs https://github.com/scylladb/scylla-enterprise/issues/3502 In addition, this commit removes the information about Amazon Linux and Oracle Linux, unnecessarily added without request, and there's no clarity over which versions should be documented. Closes scylladb/scylladb#16279	2024-01-15 15:37:29 +02:00
Anna Stuchlik	bca39b2a93	doc: remove Serverless from the Drivers page This commit removes the information about ScyllaDB Cloud Serverless, which is no longer valid. Closes scylladb/scylladb#16700	2024-01-15 15:36:51 +02:00
Botond Dénes	66bef6e961	cql3: cluster_describe_statement: don't produce range ownership for tablet keyspaces Tablet keyspaces have per/table range ownership, which cannot currently be expressed in a DESC CLUSTER statement, which describes range ownership in the current keyspace (if set). Until we figure out how to represent range ownership (tablets) of all tables of a keyspace, we disable range ownership for tablet keyspaces. Fixes: #16483 Closes scylladb/scylladb#16713	2024-01-15 14:03:54 +01:00
Patryk Wrobel	aec0db1b96	cql_auth_query_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&) from test/boost/cql_auth_query_test.cc. It prepares the test for removal of the templated helpers. Such removal is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16758	2024-01-15 13:30:05 +02:00
Kefu Chai	ece2bd2f6e	service: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16764	2024-01-15 13:29:33 +02:00
Kefu Chai	fc97d91f1a	auth: add fmt::format for auth::resource and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `auth::resource` and friends, * update their callers of `operator<<` to use `fmt::print()`. * drop `operator<<`, as they are not used anymore. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16765	2024-01-15 13:26:39 +02:00
Kefu Chai	f344e13066	types: add formatter for data_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for data_value, but its its operator<<() is preserved as we are still using the generic homebrew formatter for formatting std::vector, which in turn uses operator<< of the element type. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16767	2024-01-15 13:18:23 +02:00
Kefu Chai	218334eaf5	test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate because the CMake-generated build.ninja is located under build/, and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla, instead of at build/$scylla_build_mode/scylla, so let's adapt to this change accordingly. we will promote this change to a shared place if we have similar needs in other tests as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16775	2024-01-15 12:52:35 +02:00
Pavel Emelyanov	dd892b0d8a	code: Enable tablets if cluster feature is enabled If the TABLETS map is missing in the CREATE KEYSPACE statement the tablets are anyway enabled if the respective cluster feature is enabled. To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4838eeb201	test: Turn off tablets feature by default Next patches will make per-keyspace initial_tables option really optional and turn tablets ON when the feature is ON. This will break all other tests' assumptions, that they are testing vnodes replication. So turn the feature off by default, tests that do need tables will need to explicitly enable this feature on their own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	ae7da54f88	test: Move test_tablet_drain_failure_during_decommission to another suite In its current location it will be started with 3 pre-created scylla nodes with default features ON. Next patch will exclude `tablets` from the default list, so the test needs to create servers on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	46b36d8c07	test/tablets: Enable tables for real on test keyspace When started cql_test_env creates a test keyspace. Some tablets test cases create a table in this keyspace, but misuse the whole feature. The thing is that while tablets feature is ON in those test cases, the keyspace itself doesn _not_ have the initial_tables option and thus tablets are not enabled for the ks' table for real. Currently test cases work just because this table is only used as a transparent table ID placeholder. If turning on tablets for the keyspace, several test cases would get broken for two reasons. First, the tables map will no longer be empty on test start. Second, applying changes to tablet metadata may not be visible, becase test case uses "ranom" timestamp, that can be less that the initial metadata mutations' timestamp. This patch fixes all three places: 1. enables tables for the test keyspace 2. removes assumption that the initial metadata is empty 3. uses large enough timestamp for subsequent mutations Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	2376b699e0	test/tablets: Make timestamp local Just to make next patching simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	f3a69bfaca	cql3: Add feature service to as_ks_metadata_update() To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4dede19e4f	cql3: Add feature service to ks_prop_defs::as_ks_metadata() To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	267770bf0f	cql3: Add feature service to get_keyspace_metadata() To be passed down to ks_prop_defs::as_ks_metadata() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	6cb3055059	cql: Add tablets on/off switch to CREATE KEYSPACE Now the user can do CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false } to turn tablets off. It will be useful in the future to opt-out keyspace from tablets when they will be turned on by default based on cluster features only. Also one can do just CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true } and let Scylla select the initial tablets value by its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:11 +03:00
Pavel Emelyanov	941f6d8fca	cql: Move initial_tablets from REPLICATION to TABLETS in DDL This patch changes the syntax of enabling tablets from CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> } to be CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> } and updates all tests accordingly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Pavel Emelyanov	4c4a9679d8	network_topology_strategy: Estimate initial_tablets if 0 is set If user configured zero initial tablets (spoiler: or this value was set automagically when enabling tablets begind the scenes) we still need some value to start with and this patch calculates one. The math is based on topology and RF so that all shards are covered: initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters) The estimation is done when a table is created, not when the keyspace is created. For that, the keyspace is configured with zero initial tabled, and table-creation time zero is converted into auto-estimated value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Kamil Braun	423234841e	Merge 'add automatic sstable cleanup to the topology coordinator' from Gleb For correctness sstable cleanup has to run between (some) topology changes. Sometimes even a failed topology change may require running the cleanup. The series introduces automatic sstable cleanup step to the topology change coordinator. Unlike other operations it is not represented as a global transition state, but done by each node independently which allows cleanup to run without locking the topology state machine so tablet code can run in parallel with the cleanup. It is done by having a cleanup state flag for each node in the topology. The flag is a tri state: "clean" - the node is clean, "needed" - cleanup is needed (but not running), "running" - cleanup is running. No topology operation can proceed if there is a node in "running" state, but some operation can proceed even if there are nodes in "needed" state. If the coordinator needs to perform a topology operation that cannot run while there are nodes that need cleanup the coordinator will start one automatically and continue only after cleanup completes. There is also a possibility to kick cleanup manually through the new RAFT API call. * 'cleanup-needed-v8' of https://github.com/gleb-cloudius/scylla: test: add test for automatic cleanup procedure test: add test for topology requests queue management storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator storage_service: topology coordinator: add logging to removenode and decommission storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator storage_service: topology coordinator: manage cluster cleanup as part of the topology management storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter test: use servers_see_each_other when needed test: add servers_see_each_other helper storage_service: topology coordinator: make topology coordinator lifecycle subscriber system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request storage_service: topology coordinator: introduce sstable cleanup fiber storage_proxy: allow to wait for all ongoing writes storage_service: topology coordinator: mark nodes as needing cleanup when required storage_service: add mark_nodes_as_cleanup_needed function vnode_effective_replication_map: add get_all_pending_nodes() function vnode_effective_replication_map: pre calculate dirty endpoints during topology change raft topology: add cleanup state to the topology state machine	2024-01-14 18:54:02 +01:00
Gleb Natapov	f8b90aeb14	test: add test for automatic cleanup procedure The test runs two bootstraps and checks that there is no cleanup in between. Then it runs a decommission and checks that cleanup runs automatically and then it runs one more decommission and checks that no cleanup runs again. Second part checks manual cleanup triggering. It adds a node, triggers cleanup through the REST API, checks that is runs, decommissions a node and check that the cleanup did not run again.	2024-01-14 15:45:53 +02:00
Gleb Natapov	5882855669	test: add test for topology requests queue management This test creates a 5 node cluster with 2 down nodes (A and B). After that it creates a queue of 3 topology operation: bootstrap, removenode A and removenode B with ignore_nodes=A. Check that all operation manage to complete. Then it downs one node and creates a queue with two requests: bootstrap and decommission. Since none can proceed both should be canceled.	2024-01-14 15:45:53 +02:00
Gleb Natapov	ba7aa0d582	storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator	2024-01-14 15:45:53 +02:00
Gleb Natapov	1afc891bd5	storage_service: topology coordinator: add logging to removenode and decommission Add some useful logging to removenode and decommission to be used by tests later.	2024-01-14 15:45:53 +02:00
Gleb Natapov	97ab3f6622	storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator Introduce new REST API "/storage_service/cleanup_all" that, when triggered, instructs the topology coordinator to initiate cluster wide cleanup on all dirty nodes. It is done by introducing new global command "global_topology_request::cleanup".	2024-01-14 15:45:53 +02:00
Gleb Natapov	0adb3904d8	storage_service: topology coordinator: manage cluster cleanup as part of the topology management Sometimes it is unsafe to start a new topology operation before cleanup runs on dirty nodes. This patch detects the situation when the topology operation to be executed cannot be run safely until all dirty nodes do cleanup and initiates the cleanup automatically. It also waits for cleanup to complete before proceeding with the topology operation. There can be a situation that nodes that needs cleanup dies and will never clear the flag. In this case if a topology operation that wants to run next does not have this node in its ignore node list it may stuck forever. To fix this the patch also introduces the "liveness aware" request queue management: we do not simple choose _a_ request to run next, but go over the queue and find requests that can proceed considering the nodes liveness situation. If there are multiple requests eligible to run the patch introduces the order based on the operation type: replace, join, remove, leave, rebuild. The order is such so to not trigger cleanup needlessly.	2024-01-14 15:45:50 +02:00
Nadav Har'El	2d04070120	Update seastar submodule * seastar 0ffed835...8b9ae36b (4): > net/posix: Track ap-server ports conflict Fixes #16720 > include/seastar/core: do not include unused header > build: expose flag like -std=c++20 via seastar.pc > src: include used headers for C++ modules build Closes scylladb/scylladb#16769	2024-01-14 14:51:11 +02:00
Gleb Natapov	c9b7bd5a33	storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter Needed by the next patch.	2024-01-14 14:44:07 +02:00
Gleb Natapov	0e68073b22	test: use servers_see_each_other when needed In the next patch we want to abort topology operations if there is no enough live nodes to perform them. This will break tests that do a topology operation right after restarting a node since a topology coordinator may still not see the restarted node as alive. Fix all those tests to wait between restart and a topology operation until UP state propagates.	2024-01-14 14:44:07 +02:00
Gleb Natapov	455ffaf5d8	test: add servers_see_each_other helper The helper makes sure that all nodes in the cluster see each other as alive.	2024-01-14 14:44:07 +02:00
Gleb Natapov	067267ff76	storage_service: topology coordinator: make topology coordinator lifecycle subscriber We want to change the coordinator to consider nodes liveness when processing the topology operation queue. If there is no enough live nodes to process any of the ops we want to cancel them. For that to work we need to be able to kick the coordinator if liveness situation changes.	2024-01-14 14:44:07 +02:00
Gleb Natapov	a4ac64a652	system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request Next patch will need ignore nodes list while processing removenode request. Load it.	2024-01-14 14:44:07 +02:00
Gleb Natapov	f70c4127c6	storage_service: topology coordinator: introduce sstable cleanup fiber Introduce a fiber that waits on a topology event and when it sees that the node it runs on needs to perform sstable cleanup it initiates one for each non tablet, non local table and resets "cleanup" flag back to "clean" in the topology.	2024-01-14 14:44:07 +02:00
Gleb Natapov	5b246920ae	storage_proxy: allow to wait for all ongoing writes We want to be able to wait for all writes started through the storage proxy before a fence is advanced. Add phased_barrier that is entered on each local write operation before checking the fence to do so. A write will be either tracked by the phased_barrier or fenced. This will be needed to wait for all non fenced local writes to complete before starting a cleanup.	2024-01-14 14:44:07 +02:00
Gleb Natapov	b2ba77978c	storage_service: topology coordinator: mark nodes as needing cleanup when required A cleanup needs to run when a node loses an ownership of a range (during bootstrap) or if a range movement to an normal node failed (removenode, decommission failure). Mark all dirty node as "cleanup needed" in those cases.	2024-01-14 14:43:59 +02:00
Gleb Natapov	dbededb1a6	storage_service: add mark_nodes_as_cleanup_needed function The function creates a mutation that sets cleanup to "needed" for each normal node that, according to the erm, has data it does not own after successful or unsuccessful topology operation.	2024-01-14 14:43:33 +02:00
Gleb Natapov	23a27ccc24	vnode_effective_replication_map: add get_all_pending_nodes() function Add a function that returns all nodes that have vnode been moved to them during a topology change operation. Needed to know which nodes need to do cleanup in case of failed topology change operation.	2024-01-14 14:37:16 +02:00
Gleb Natapov	a8f11852da	vnode_effective_replication_map: pre calculate dirty endpoints during topology change Some topology change operations causes some nodes loose ranges. This information is needed to know which nodes need to do cleanup after topology operation completes. Pre calculate it during erm creation.	2024-01-14 14:11:19 +02:00
Gleb Natapov	cc54796e23	raft topology: add cleanup state to the topology state machine The patch adds cleanup state to the persistent and in memory state and handles the loading. The state can be "clean" which means no cleanup needed, "needed" which means the node is dirty and needs to run cleanup at some point, "running" which means that cleanup is running by the node right now and when it will be completed the state will be reset to "clean".	2024-01-14 13:30:54 +02:00
Nadav Har'El	1bcaeb89c7	view: revert cleanup filter that doesn't work with tablets This patch reverts commit `10f8f13b90` from November 2022. That commit added to the "view update generator", the code which builds view updates for staging sstables, a filter that ignores ranges that do not belong to this node. However, 1. I believe this filter was never necessary, because the view update code already silently ignores base updates which do not belong to this replica (see get_view_natural_endpoint()). After all, the view update needs to know that this replica is the Nth owner of the base update to send its update to the Nth view replica, but if no such N exists, no view update is sent. 2. The code introduced for that filter used a per-keyspace replication map, which was ok for vnodes but no longer works for tablets, and causes the operation using it to fail. 3. The filter was used every time the "view update generator" was used, regardless of whether any cleanup is necessary or not, so every such operation would fail with tablets. So for example the dtest test_mvs_populating_from_existing_data fails with tablets: * This test has view building in parallel with automatic tablet movement. * Tablet movement is streaming. * When streaming happens before view building has finished, the streamed sstables get "view update generator" run on them. This causes the problematic code to be called. Before this patch, the dtest test_mvs_populating_from_existing_data fails when tablets are enabled. After this patch, it passes. Fixes #16598 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:24:44 +02:00
Nadav Har'El	0fe40f729e	mv: sleep a bit before view-update-generator restart The "view update generator" is responsible for generating view updates for staging sstables (such as coming from repair). If the processing fails, the code retries - immediately. If there is some persistent bug, such as issue #16598, we will have a tight loop of error messages, potentially a gigabyte of identical messages every second. In this patch we simply add a sleep of one second after view update generation fails before retrying. We can still get many identical error messages if there is some bug, but not more than one per second. Refs #16598. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:13:52 +02:00
Kamil Braun	4e18f8b453	Merge 'topology_state_load: stop waiting for IP-s' from Petr Gusev The loop in `id2ip` lambda makes problems if we are applying an old raft log that contains long-gone nodes. In this case, we may never receive the `IP` for a node and stuck in the loop forever. In this series we replace the loop with an if - we just don't update the `host_id <-> ip` mapping in the `token_metadata.topology` if we don't have an `IP` yet. The PR moves `host_id -> IP` resolution to the data plane, now it happens each time the IP-based methods of `erm` are called. We need this because IPs may not be known at the time the erm is built. The overhead of `raft_address_map` lookup is added to each data plane request, but it should be negligible. In this PR `erm/resolve_endpoints` continues to treat missing IP for `host_id` as `internal_error`, but we plan to relax this in the follow-up (see this PR first comment). Closes scylladb/scylladb#16639 * github.com:scylladb/scylladb: raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes storage_service: topology_state_load: remove IP waiting loop storage_service: sync_raft_topology_nodes: add target_node parameter storage_service: sync_raft_topology_nodes: move loops to the end storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node storage_service: sync_raft_topology_nodes: move update_topology up storage_service: topology_state_load: remove clone_async/clear_gently overhead storage_service: fix indentation storage_service: extract sync_raft_topology_nodes storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata address_map: move gossiper subscription logic into storage_service topology_coordinator: exec_global_command: small refactor, use contains + reformat storage_service: wait_for_ip for new nodes storage_service.idl.hh: fix raft_topology_cmd.command declaration erm: for_each_natural_endpoint_until: use is_vnode == true erm: switch the internal data structures to host_id-s erm: has_pending_ranges: switch to host_id	2024-01-12 18:46:51 +01:00
Petr Gusev	e24bee545b	raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater	2024-01-12 18:29:22 +04:00
Petr Gusev	6e7bbc94f4	gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes When a node changes its IP we need to store the mapping in system.peers and update token_metadata.topology and erm in-memory data structures. The test_change_ip was improved to verify this new behaviour. Before this patch the test didn't check that IPs used for data requests are updated on IP change. In this commit we add the read/write check. It fails on insert with 'node unavailable' error without the fix.	2024-01-12 18:28:57 +04:00
Petr Gusev	6d6e1ba8fb	storage_service: topology_state_load: remove IP waiting loop The loop makes problems if we are applying an old raft log that contains long-gone nodes. In this case, we may never receive the IP for a node and stuck in the loop forever. The idea of the patch is to replace the loop with an if - we just don't update the host_id <-> ip mapping in the token_metadata.topology if we don't have an IP yet. When we get the mapping later, we'll call sync_raft_topology_nodes again from gossiper_state_change_subscriber_proxy.	2024-01-12 15:37:50 +04:00
Petr Gusev	260874c860	storage_service: sync_raft_topology_nodes: add target_node parameter If it's set, instead of going over all the nodes in raft topology, the function will update only the specified node. This parameter will be used in the next commit, in the call to sync_raft_topology_nodes from gossiper_state_change_subscriber_proxy.	2024-01-12 15:37:50 +04:00
Petr Gusev	a9d58c3db5	storage_service: sync_raft_topology_nodes: move loops to the end	2024-01-12 15:37:50 +04:00
Petr Gusev	d1bce3651b	storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node	2024-01-12 15:37:50 +04:00
Petr Gusev	aa37b6cfd3	storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node	2024-01-12 15:37:50 +04:00
Petr Gusev	a508d7ffc5	storage_service: sync_raft_topology_nodes: move update_topology up In this and the following commits we prepare sync_raft_topology_nodes to handle target_node parameter - the single host_id which should be updated.	2024-01-12 15:37:50 +04:00
Petr Gusev	1b12f4b292	storage_service: topology_state_load: remove clone_async/clear_gently overhead Before the patch we used to clone the entire token_metadata and topology only to immediately drop everything in clear_gently. This is a sheer waste.	2024-01-12 15:37:50 +04:00
Petr Gusev	1531e5e063	storage_service: fix indentation	2024-01-12 15:37:50 +04:00
Petr Gusev	9c50637f28	storage_service: extract sync_raft_topology_nodes In the following commits we need part of the topology_state_load logic to be applied from gossiper_state_change_subscriber_proxy. In this commit we extract this logic into a new function sync_raft_topology_nodes.	2024-01-12 15:37:50 +04:00
Petr Gusev	9679b49cf4	storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata In the next commit we extract the loops by nodes into a new function, in this commit we just move them closer to each other. Now the remove_endpoint function might be called under token_metadata_lock (mutate_token_metadata takes it). It's not a problem since gossiper event handlers in raft_topology mode doesn't modify token_metadata so we won't get a deadlock.	2024-01-12 15:37:50 +04:00
Petr Gusev	15b8e565ed	address_map: move gossiper subscription logic into storage_service We are going to remove the IP waiting loop from topology_state_load in subsequent commits. An IP for a given host_id may change after this function has been called by raft. This means we need to subscribe to the gossiper notifications and call it later with a new id<->ip mapping. In this preparatory commit we move the existing address_map update logic into storage_service so that in later commits we can enhance it with topology_state_load call.	2024-01-12 15:37:50 +04:00
Petr Gusev	743be190f9	topology_coordinator: exec_global_command: small refactor, use contains + reformat	2024-01-12 15:37:50 +04:00
Petr Gusev	db1f0d5889	storage_service: wait_for_ip for new nodes When a new node joins the cluster we need to be sure that it's IP is known to all other nodes. In this patch we do this by waiting for the IP to appear in raft_address_map. A new raft_topology_cmd::command::wait_for_ip command is added. It's run on all nodes of the cluster before we put the topology into transition state. This applies both to new and replacing nodes. It's important to run wait_for_ip before moving to topology::transition_state::join_group0 since in this state node IPs are already used to populate pending nodes in erm.	2024-01-12 15:37:46 +04:00
Michał Jadwiszczak	013487e1e1	test:cql-pytest: change service levels intervals in tests Set the interval to 0.5s to reduce required sleep time.	2024-01-12 10:28:28 +01:00
Michał Jadwiszczak	f6a464ad81	configure service levels interval So far the service levels interval, responsible for updating SL configuration, was hardcoded in main. Now it's extracted to `service_levels_interval_ms` option.	2024-01-12 10:28:24 +01:00
Kefu Chai	a0e5c14c55	alternator: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16736	2024-01-12 10:53:32 +02:00
Botond Dénes	5f44ae8371	Merge 'Add more logging for `gossiper::lock_endpoint` and `storage_service::handle_state_normal`' from Kamil Braun In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. Thus the plan is to continue debugging using the longevity test, but we need more logs. To check whether `handle_state_normal` was called and which branches were taken, include some INFO level logs there. Also, detect deadlocks inside `gossiper::lock_endpoint` by reporting an error message if `lock_endpoint` waits for the lock for too long. Ref: scylladb/scylladb#16668 Closes scylladb/scylladb#16733 * github.com:scylladb/scylladb: gossiper: report error when waiting too long for endpoint lock gossiper: store source_location instead of string in endpoint_permit storage_service: more verbose logging in handle_state_normal	2024-01-12 10:51:21 +02:00
Lakshmi Narayanan Sreethar	cd9e027047	types: fix ambiguity in align_up call Compilation fails with recent boost versions (>=1.79.0) due to an ambiguity with the align_up function call. Fix that by adding type inference to the function call. Fixes #16746 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16747	2024-01-12 10:50:31 +02:00
Kefu Chai	344ea25ed8	db: add fmt::format for db::consistency_level before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `db::consistency_level` * drop its `operator<<`, as it is not used anymore Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16755	2024-01-12 10:49:00 +02:00
Patryk Wrobel	87545e40c7	test/boost/auth_resource_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<T>&) from auth_resource_test.cc. It prepares the test for removal of the templated helpers from utils/to_string.hh, which is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16754	2024-01-12 10:48:01 +02:00
Petr Gusev	802da1e7a5	storage_service.idl.hh: fix raft_topology_cmd.command declaration Make IDL correspond to the declaration of raft_topology_cmd::command in topology_state_machine.hh.	2024-01-12 12:23:22 +04:00
Petr Gusev	41c15814e6	erm: for_each_natural_endpoint_until: use is_vnode == true This is an optimisation - for_each_natural_endpoint_until is called only for vnode tokens, we don't need to run the binary search for it in tm.first_token. Also the function is made private since it's only used in erm itself.	2024-01-12 12:23:22 +04:00
Petr Gusev	07f2ec63c7	erm: switch the internal data structures to host_id-s Before this patch the host_id -> IP mapping was done in calculate_effective_replication_map. This function is called from mutate_token_metadata, which means we have to have an IP for each host_id in topology_state_load, otherwise we get an error. We are going to remove the IP waiting loop from topology_state_load, so we need to get rid of IPs resolution from calculate_effective_replication_map. In this patch we move the host_id -> IP resolution to the data plane. When a write or read request is sent the target endpoints are requested from erm through get_natural_endpoints_without_node_being_replaced, get_pending_endpoints and get_endpoints_for_reading methods and this is where the IP resolution will now occur.	2024-01-12 12:23:22 +04:00
Petr Gusev	1928dc73a8	erm: has_pending_ranges: switch to host_id In the next patches we are going to change erm data structures (replication_map and ring_mapping) from IP to host_id. Having locator::host_id instead of IP in has_pending_ranges arguments makes this transition easier.	2024-01-12 12:23:19 +04:00
Botond Dénes	b69f7126c3	Update tools/java submodule * tools/java 24e51259...c75ce2c1 (1): > Update JNA dependency to 5.14.0	2024-01-12 09:47:20 +02:00
Benny Halevy	3e938dbb5a	storage_service: get rid of handle_state_moving declaration The implementation was already removed in `e64613154f` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16742	2024-01-12 09:38:23 +02:00
Nadav Har'El	5c7e029012	test/cql-pytest: add reproducer for task-tracking memory leak This patch adds a reproducer test for the memory leak described in issue #16493: If a table is repeatedly created and dropped, memory is leaked by task tracking. Although this "leak" can be temporary if task_ttl_in_seconds is properly configured, it may still use too much memory if tables are too frequently created and dropped. The test here shows that (before #16493 was fixed) as little as 100 tables created and deleted can cause Scylla to run out of memory. The problem is severely exacerbated when tablets are used which is why the test here uses tablets. Before the fix for #16493 (a Seastar patch, scylladb/seastar#2023), this test of 100 iterations always failed (with test/cql-pytest/run's default memory allowance). After the fix, the test doesn't fail in 100 iterations - and even if increased manually to 10,000 iterations it doesn't fail. The new test uses the initial_tablets feature, so requires Scylla to be run with the "tablets" experimental option turned on. This is not currently the default of test.py or test/cql-pytest/run, so I turned it on manually to check this test. I also checked that the test is correctly skipped if tablets are not turned on. Refs #16493 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16717	2024-01-12 09:37:32 +02:00
Botond Dénes	63b266e94c	Merge ' db: Make the "me" sstable format mandatory' from Kefu Chai The `me` sstable format includes an important feature of storing the `host_id` of the local node when writing sstables. The is crucial for validating the sstable's `replay_position` in stats metadata as it is valid only on the originating node and shard (#10080), therefor we would like to make the me format mandatory. in this series, `sstable_format` option is deprecated, and the default sstable format is bumped up from `mc` to `md`, so that a cluster composed of nodes with this change should always use `me` as the sstable format. if a node with this change joins a 5.x cluster which still using `md` because they are configured as such, this node will also be using `md`, unless the other node(s) changes its `sstable_format` setting to `me`. Fixes #16551 Closes scylladb/scylladb#16716 * github.com:scylladb/scylladb: db/config.cc: do not respect sstable_format option feature_service: abort if sstable_format < md db, sstable: bump up default sstable format to "md"	2024-01-12 09:33:08 +02:00
Kamil Braun	cf646022cb	gossiper: report error when waiting too long for endpoint lock In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127). Modify the code so it reports an error if `lock_endpoint` waits for the lock for more than a minute. When the issue reproduces again in longevity, we will see if `lock_endpoint` got stuck.	2024-01-11 17:29:25 +01:00
Kefu Chai	7abd263ee6	db/config.cc: do not respect sstable_format option "me" sstable format includes an important feature of storing the `host_id` of the local node when writing sstables. The is crucial for validating the sstable's `replay_position` in stats metadata as it is valid only on the originating node and shard (#10080), therefor we would like to make the `me` format mandatory. before making `me` mandatory, we need to stop handling `sstable_format` option if it is "md". in this change - gms/feature_service: do not disable `ME_SSTABLE_FORMAT` even if `sstable_format` is configured with "md". and in that case, instead, a warning is printed in the logging message to note that this setting is not valid anymore. - docs/architecture/sstable: note that "me" is used by default now. after this change, "sstable_format" will only accept "me" if it's explicitly configured. and when a server with this change joins a cluster, it uses "md" if the any of the node in the cluster still has `sstable_format`. practically, this change makes "me" mandatory in a 6.x cluster, assuming this change will be included in 6.x releases. Fixes #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Kefu Chai	bece3eff0c	feature_service: abort if sstable_format < md sstable_format comes from scylla.yaml or from the command line arguments, and we gate scylla from unallowed sstable formats lower than `md` when parsing the configuration, and scylla bails out at seeing the unallowed sstable format like: ``` terminate called after throwing an instance of 'std::invalid_argument' what(): Invalid value for sstable_format: got ka which is not inside the set of allowed values md, me Aborted (core dumped) ``` scylla errors out way before `feature_config_from_db_config()` gets called -- it throws in `bpo::notify(configuration)`, way before `func` is evaluated in `app_template::run_deprecated()`. so, in this change, we do not handle these values anymore, and consider it a bug if we run into any of them. Refs #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Kefu Chai	54d49c04e0	db, sstable: bump up default sstable format to "md" before this change, we defaults to use "mc" sstable format, and switch to "md" if the cluster agrees on using it, and to "me" if the cluster agrees on using this. the cluster feature is used to get the consensus across the members in the cluster, if any of the existing nodes in the cluster has its `sstable_format` configured to, for instance, "mc", then the cluster is stuck with "mc". but we disabled "mc" sstable format back in `3d345609`, the first LTS release including that change was scylla v5.2.0. which means, the cluster of the last major version Scylla should be using "md" or "me". per our document on upgrade, see docs/upgrade/index.rst, > You should perform the upgrades consecutively - to each > successive X.Y version, without skipping any major or minor version. > > Before you upgrade to the next version, the whole cluster (each > node) must be upgraded to the previous version. we can assume that, a 6.x node will only join a cluster with 5.x or 6.x nodes. (joining a 7.x cluster should work, but this is not relevant to this change). in both cases, since 5.x and up scylla can only configured with "md" `sstable_format`, there is no need to switch from "mc" to "md" anymore. so we can ditch the code supporting it. Refs #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Avi Kivity	f0d6330204	build: add crypto++ to dependencies We depend on the crypto++ library (see utils/hashers.hh) but don't list it in install-dependencies.sh. Currently this works because Seastar's install-dependencies.sh installs it, but that's going away in [1]. List crypto++ directly to keep install-dependencies.sh working. Regenerating the frozen toolchain is unnecessary since we're re-adding an existing dependency. [1] `6bdef1e431` Closes scylladb/scylladb#16563	2024-01-11 16:26:20 +02:00
Patryk Jędrzejczak	e99d03a21e	topology_coordinator: clarify warnings It was unclear where the error messages ended if they consisted of multiple sentences.	2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak	b4b170047b	raft topology: join: allow only the first response to be a succesful acceptance The joining node might receive more than one join response (see the comment at the beginning of `join_node_response_handler`). If the first response was a rejection or it was an acceptance but the joining node failed while handling it, the following acceptances by the coordinator shouldn't succeed. The joining node considers the join operation as failed. Currently, we always immediately return from non-first response handler calls. However, if the response is an acceptance, and the first response wasn't a successfully handled acceptance, we need to throw an exception to ensure the topology coordinator moves the node to the left state. We do it in this patch. We throw the exception set while handling the first response. It explains why we are failing the current acceptance. We don't want to throw the exception on rejection. The topology coordinator will move the node to the left state anyway. Also, failing the rejection with an error message containing "the topology coordinator rejected request to join the cluster" (from the previous rejection) would be very confusing.	2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak	f3a08757af	storage_service: join_node_response_handler: fix indentation Broken in the previous patch.	2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak	ddfd9c3173	raft topology: join: shut down a node on error in response handler If the joining node fails while handling the response from the topology coordinator, it hangs even though it knows the join operation has failed. Therefore, we ensure it shuts down in this patch. We rethrow the caught exception to ensure the topology coordinator knows the RPC has failed. In case of rejection, it does not matter because the coordinator behaves the same way in both cases: RPC success and RPC failure. It transitions the rejected node to the left state. However, in case of acceptance, this only happens if the RPC fails. Otherwise, the coordinator continues handling the request. On abort, one of the two events happens first: - the new catch statement catches `abort_requested_exeption` and sets it on `_join_node_response_done`, - `co_await _ss._join_node_response_done.get_shared_future(as);` in `join_node_rpc_handshaker::post_server_start` resolves with `abort_requested_exception` after triggering `as`. In both cases, `join_node_rpc_handshaker::post_server_start` throws `abort_requested_exception`. Therefore, we don't need a separate catch statement for `abort_requested_exception` in `join_node_response_handler`.	2024-01-11 14:19:37 +01:00
Botond Dénes	697ebef149	Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk Make compaction tasks internal. Drop all internal tasks without parents immediately after they are done. Fixes: #16735 Refs: #16694. Closes scylladb/scylladb#16698 * github.com:scylladb/scylladb: compaction: make regular compaction tasks internal tasks: don't keep internal root tasks after they complete	2024-01-11 12:10:44 +02:00
Nadav Har'El	5762170526	main: fix "starting {}" messages The supervisor::notify() function expects a single string - not a format and parameters. Calls we have in main.cc like supervisor::notify("starting {}", what); end up printing the silly message "starting {}". The second parameter "what" is converted to a bool, also having an unintended consequence for telling notify we're "ready". This patch fixes it to call fmt::format, as intended. Fixes #16728 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16729	2024-01-11 11:43:07 +02:00
Botond Dénes	ac69473bac	Merge 'utils/pretty_printers: add "I" specifier support' from Kefu Chai this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Closes scylladb/scylladb#16726 * github.com:scylladb/scylladb: utils/pretty_printers: add "I" specifier support utils/pretty_printers: use the formatting of to_hr_size()	2024-01-11 10:54:14 +02:00
Kefu Chai	0c2ef5de54	test/unit/bptree_validation: use "{}" for formatting test_data before this change, "{:d}" is used for formatting `test_data` y bptree_stress_test.cc. but the "d" specifier is only used for formatting integers, not for formatting `test_data` or generic data types, so this fails when the test is compiled with {fmt} v10, like: ``` In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20: /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression 294 \| fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here 267 \| return forward_check(); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here 92 \| if (!itc->step()) { \| ^ /usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression 2322 \| if (!in(arg_type, set)) throw_format_error("invalid format specifier"); \| ^ ``` in this change, instead of specifying "{:d}", let's just use "{}", which works for both integer and `test_data`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16727	2024-01-11 10:53:33 +02:00
Kefu Chai	6c06751640	cdc: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16725	2024-01-11 09:13:37 +02:00
Kefu Chai	5874652967	cql3: define format_as() for formatting cql3::cql3_type in the same spirit of `724a6e26`, format_as() is defined for cql3::cql3_type. despite that this is not used yet by fmt v9, where we still have FMT_DEPRECATED_OSTREAM, this prepares us for fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16232	2024-01-11 09:07:18 +02:00
Botond Dénes	3d1667c720	Update ./tools/java submodule * ./tools/java e106b500...24e51259 (1): > build.xml: update io.airlift to 0.9	2024-01-11 08:55:51 +02:00
Lakshmi Narayanan Sreethar	76f0d5e35b	reader_permit: store schema_ptr instead of raw schema pointer Store schema_ptr in reader permit instead of storing a const pointer to schema to ensure that the schema doesn't get changed elsewhere when the permit is holding on to it. Also update the constructors and all the relevant callers to pass down schema_ptr instead of a raw pointer. Fixes #16180 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16658	2024-01-11 08:37:56 +02:00
Kefu Chai	f11a53856d	utils/pretty_printers: add "I" specifier support this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:47 +08:00
Patryk Wrobel	f4e311e871	cql3: add formatter for cql3::expr::oper_t This change introduces a specialization of fmt::formatter for cql3::expr::oper_t. This enables the usage of this type with FMTv10, which dropped the default generated formatter. Usage of cql3::expr::oper_t without the defined formatter resulted in compilation error when compiled with FMTv10. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16719	2024-01-11 08:33:35 +02:00
Kefu Chai	7d627b328f	utils/pretty_printers: use the formatting of to_hr_size() keep the precision of 4 digits, for instance, so that we format "8191" as "8191" instead of as "8 Ki". this is modeled after the behavior of `to_hr_size()`. for better user experience. and also prepares to consolidate these two formatters. tests are updated to exercise both IEC and SI notations. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:03 +08:00
Kefu Chai	8c4576f55d	api: storage_service: correct the descriptions of two APIs this change is more about documentation of the RESTful API of storage_service. as we define the API using Swagger 2.0 format, and generate the API document from the definitions. so would be great if the document matches with the API. in this change, since the keyspace is not queried but mutated. so changed to a more accurate description. from the code perspective, it is but cosmetic. as we don't read the description fields or verify them in our tests. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16637	2024-01-11 08:28:14 +02:00
Kamil Braun	6e39c2ffde	gossiper: store source_location instead of string in endpoint_permit The original code extracted only the function_name from the source_location for logging. We'll use more information from the source_location in later commits.	2024-01-10 17:02:52 +01:00
Kamil Braun	664349a10f	storage_service: more verbose logging in handle_state_normal In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. Improve the INFO level logging in handle_state_normal to aid debugging in the future. The amount of logs is still constant per-node. Even though some log messages report all tokens owned by a node, handle_state_normal calls are still rare. The most "spammy" situation is when a node starts and calls handle_state_normal for every other node in the cluster, but it is a once-per-startup event.	2024-01-10 16:39:55 +01:00
Patryk Wrobel	a64eb92369	utils: specialize fmt::formatter for utils::tagged_integer This change introduces a specialization of fmt::formatter for utils::tagged_integer. This enables the usage of this type with FMTv10, which dropped the default generated formatter. Usage of utils::tagged_integer without the defined formatter resulted in compilation error when compiled with FMTv10. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16715	2024-01-10 18:32:43 +03:00
Nadav Har'El	083868508c	Update seastar submodule * seastar 70349b74...0ffed835 (15): > http/client: include used header files > treewide: s/format/fmt::format/ when appropriate > shared_future: shared_state::run_and_dispose(): release reserve of _peers Fixes #16493 > metrics_tester - A demo app to test metrics > build: silence the waring of -Winclude-angled-in-module-purview > estimated_histogram.hh: Support native histograms > prometheus.cc: Clean the pick representation code > prometheus.cc add native histogram > memory: fix the indentation. > metrics_types.hh: add optional native histogram information > memory: include used header > prometheus.cc: Add filter, aggregate by label and skip_when_empty > src/proto/metrics2.proto: newer proto buf definition > print: deprecate format_separated() > reactor: use fmt::join() when appropriate Closes scylladb/scylladb#16712	2024-01-10 14:02:04 +02:00
Nadav Har'El	39dd2a2690	cql-pytest: translated Cassandra's test for LWT with static column This is a translation of Cassandra's CQL unit test source file validation/operations/InsertUpdateIfConditionStaticsTest.java into our cql-pytest framework. This test file checks various LWT conditional updates which involve static columns or UDTs (there are separate test file for LWT conditional updates that do not involve static columns). This test did not uncover any new bugs, but demonstrates yet again several places where we intentionally deviated from Cassandra's behavior, forcing me to add "is_scylla" checks in many of the checks to allow them to pass on both Scylla and Cassanda. These deviations are known, intentional and some are documented in docs/kb/lwt-differences.rst but not all, so it's worth listing here the ones re-discovered by this test: 1. On a successful conditional write, Cassandra returns just True, Scylla also returns the old contents of the row. This difference is officially documented in docs/kb/lwt-differences.rst. 2. On a batch request, Scylla always returns a row per statement, Cassandra doesn't - it often returns just a single failed row, or just True if the whole batch succeeded. This difference is officially documented in docs/kb/lwt-differences.rst. 3. In a DELETE statement with a condition, in the returned row Cassandra lists the deleted column first - while Scylla lists the static column first (as in any other row). This difference is probably inconsequential, because columns also have names so their order in the response usually doesn't matter. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16643	2024-01-10 12:14:06 +02:00
Nadav Har'El	b1a441ba56	test/cql-pytest: correct xfail status of timestamp parser The recently-added test test_fromjson_timestamp_submilli demonstrated a difference between Scylla's and Cassandra's parsing timestamps in JSON: Trying to use too many (more than 3) digits of precision is forbidden in Scylla, but ignored in Cassandra. So we marked the test "xfail", suggesting we think it's a Scylla bug that should be fixed in the future. However, it turns out that we already had a different test, test_type_timestamp_from_string_overprecise, which showed the same difference in a different context (without JSON). In that older test, the decision was to consider this a Cassandra bug, not Scylla bug - because Cassandra seemingly allows the sub-millisecond timestap but in reality drops the extra precision. So we need to be consistent in the tests - this is either a Scylla bug or a Cassandra bug, we can't make once choice in one test and another in a different test :-) So let's accept our older decision, and consider Scylla's behavior the correct one in this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16586	2024-01-10 12:12:26 +02:00
Kefu Chai	eb9216ef11	compaction: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16707	2024-01-10 11:07:36 +02:00
Kefu Chai	317af97e41	test/pylib: shutdown unix RESTful client when stopping the ManagerClient, it would be better to close all connected connector, otherwise aiohttp complains like: ``` 13:57:53.763 ERROR> Unclosed connector connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]'] connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890> ``` this warning message is printed to the console, and it is distracting when testing manually. so, in this change, let's close the client connecting to unix domain socket. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16675	2024-01-10 11:07:14 +02:00
Kefu Chai	f61f6c27e3	gms: add formatter for gms::endpoint_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::endpoint_state, and change update the callers of `operator<<` to use `fmt::print()`. but we cannot drop `operator<<` yet, as we are still using the templated operator<< and templated fmt::formatter to print containers in scylla and in seastar -- they are still using `operator<<` under the hood. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16705	2024-01-10 09:16:23 +02:00
Sylwia Szunejko	eabe97bcd0	transport: remove additional options from TABLETS_ROUTING_V1 Closes scylladb/scylladb#16701	2024-01-10 09:00:25 +02:00
Botond Dénes	5981900dca	Update tools/jmx submodule * tools/jmx 80ce5996...3257897a (1): > scylla-apiclient: drop hk2-locator dependency	2024-01-10 08:53:20 +02:00
Kefu Chai	34b03867b2	tools: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16673	2024-01-10 08:44:09 +02:00
Kefu Chai	0dc7db54d1	build: cmake: add "unit_test_list" target this target is used by test.py for enumerating unit tests * test/CMakeLists.txt: append executable's full path to `scylla_tests`. add `unit_test_list` target printing `scylla_tests`, please note, `cmake -E echo` does not support the `-e` option of `echo`, and ninja does not support command line with newline in it, we have to use `echo` to print the list of tests. * test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests only if $PWD/suite.yaml exists. we could hardwire this logic in these files, as it is known that this file exists in these directory, but this is still put this way, so that it serves as a comment explaining that the reason why we update scylla_tests here but not somewhere else where we also use `add_scylla_test()` function is just suite.yaml exists here. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16702	2024-01-10 08:43:04 +02:00
Botond Dénes	4aba445ef6	Merge 'test.py: adapt to cmake building system' from Kefu Chai in this series, we adapt to cmake building system by mapping scylla build mode to `CMAKE_BUILD_TYPE` and by using `build/build.ninja` if it exists, as `configure.py` generates `build.ninja` in `build` when using CMake for creating `build.ninja`. Closes scylladb/scylladb#16703 * github.com:scylladb/scylladb: test.py: build using build/build.ninja when it exists test.py: extract ninja() test.py: extract path_to() test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE	2024-01-10 08:39:33 +02:00
Kefu Chai	382a5e2d0c	test.py: build using build/build.ninja when it exists CMake puts `build.ninja` under `build`, so use it if it exists, and fall back to current directory otherwise. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	6674e87842	test.py: extract ninja() use ninja() to build target using `ninja`. since CMake puts `build.ninja` under "build", while `configure.py` puts it under the root source directory, this change prepares us for a follow-up change to build with build/build.ninja. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	5fda822c4e	test.py: extract path_to() use path_to() to find the path to the directory under build directory. this change helps to find the executables built using CMake as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	0b11ae9fe6	test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE because scylla build mode and CMAKE_BUILD_TYPE is not identical, let's define `all_modes` as a dict so we can look it up. this change prepares for a follow-up commit which adds a path resolver which support both build system generator: the plain `configure.py` and CMake driven by `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Botond Dénes	f4f724921c	load_meter: get_load_map(): don't unconditionally dereference _lb Said method has a check on `_lb` not being null, before accessing it. However, since `0e5754a`, there was an unconditional access, adding an entry for the local node. Move this inside the if, so it is covered by the null-check. The only caller is the api (probably nodetool), the worst that can happend is that they get completely empty load-map if they call too early during startup. Fixes: #16617 Closes scylladb/scylladb#16659	2024-01-09 16:02:12 +03:00
Aleksandra Martyniuk	6b87778ef2	compaction: make regular compaction tasks internal Regular compaction tasks are internal. Adjust test_compaction_task accordingly: modify test_regular_compaction_task, delete test_running_compaction_task_abort (relying on regular compaction) which checks are already achived by test_not_created_compaction_task_abort. Rename the latter.	2024-01-09 13:13:54 +01:00
Aleksandra Martyniuk	6b2b384c83	tasks: don't keep internal root tasks after they complete	2024-01-09 13:13:54 +01:00
Pavel Emelyanov	cdf5124003	Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538 Closes scylladb/scylladb#16657 * github.com:scylladb/scylladb: tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() tools/scylla-sstable: allow always passing --scylla-yaml-file option	2024-01-09 14:28:49 +03:00
Kefu Chai	b91eb89ffa	gms: heart_beat_state: add formatter for gms::heart_beat_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::heart_beat_state, and remove its operator<<(). the only caller site of its operator<< is updated to use `fmt::print()` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16652	2024-01-09 11:52:40 +02:00
Kefu Chai	cca786e847	gms: endpoint_state: fix a typo in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16653	2024-01-09 11:51:49 +02:00
Kefu Chai	c1beba1f7d	utils: config_file: throw bpo::invalid_option_value() when seeing invalid option before this change, `std::invalid_argument` is thrown by `bpo::notify(configuration)` in `app_template::run_deprecated()` when invalid option is passed in via command line. `utils::named_value` throws `std::invalid_argument` if the given value is not listed in `_allowed_values`. but we don't handle `std::invalid_argument` in `app_template::run_deprecated()`. so the application aborts with unhandled exception if the specified argument is not allowed. in this change, we convert the `std::invalid_argument` to a derived class of `bpo::error` in the customized notify handler, so that it can be handled in `app_template::run_deprecated()`. because `name_value::operator()` is also used otherwhere, we should not throw a bpo::error there. so its exception type is preserved. Fixes #16687 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16688	2024-01-09 11:49:06 +02:00
Kefu Chai	a6152cb87b	sstables: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16666	2024-01-09 11:45:44 +02:00
Kefu Chai	be364d30fd	db: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16664	2024-01-09 11:44:19 +02:00
Aleksandra Martyniuk	6f13e55187	tasks: call release_resources when task is finished Call task_manager::task::impl::release_resources when task is finished instead of putting the responsibility on user. Closes scylladb/scylladb#16660	2024-01-09 11:41:54 +02:00
Pavel Emelyanov	cfeff893c6	network_topology_strategy: Print map of dc:rf pairs in one go The strategy constructor prints the dc:rf at the end making the sstring for it by hand. Modern fmt-based logger can format unordered_map-s on its own. The message would look slightly different though: Configured datacenter replicas are: foo:1 bar:2 into Configured datacenter replicas are: {"foo": 1, "bar": 2} Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16443	2024-01-09 11:30:49 +02:00
Kamil Braun	d93074e87e	cql3: don't parallelize select aggregates to local tables We've observed errors during shutdown like the following: ``` ERROR 2023-12-26 17:36:17,413 [shard 0:main] raft - [088f01a3-a18b-4821-b027-9f49e55c1926] applier fiber stopped because of the error: std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down) INFO 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft_state_monitor_fiber aborted with raft::stopped_error (Raft instance is stopped) ERROR 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft topology: failed to fence previous coordinator raft::stopped_error (Raft instance is stopped, reason: "background error, std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)") ``` some CQL statement execution was trying to use `forward_service` during shutdown. It turns out that the statement is in `system_keyspace::load_topology_state`: ``` auto gen_rows = co_await execute_cql( format("SELECT count(range_end) as cnt FROM {}.{} WHERE key = '{}' AND id = ?", NAME, CDC_GENERATIONS_V3, cdc::CDC_GENERATIONS_V3_KEY), gen_uuid); ``` It's querying a table in the `system` keyspace. Pushing local table queries through `forward_service` doesn't make sense as the data is not distributed. Excluding local tables from this logic also fixes the shutdown error. Fixes scylladb/scylladb#16570 Closes scylladb/scylladb#16662	2024-01-08 14:44:22 -05:00
Kamil Braun	d4f4b58f3a	Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak The removenode operation is defined to succeed only if the node being removed is dead. Currently, we reject this operation on the initiator side (in `storage_service::raft_removenode`) when the failure detector considers the node being removed alive. However, it is possible that even if the initiator considers the node dead, the topology coordinator will consider it alive when handling the topology request. For example, the topology coordinator can use a bigger failure detector timeout, or the node being removed can suddenly resurrect. This PR makes the topology coordinator reject removenode if the node being removed is considered alive. It also adds `test_remove_alive_node` that verifies this change. Fixes scylladb/scylladb#16109 Closes scylladb/scylladb#16584 * github.com:scylladb/scylladb: test: add test_remove_alive_node topology_coordinator: reject removenode if the removed node is alive test: ManagerClient: remove unused wait_for_host_down test: remove_node: wait until the node being removed is dead	2024-01-08 12:39:23 +01:00
Kamil Braun	d11e824802	Merge 'storage_service: make all Raft-based operations abortable' from Patryk Jędrzejczak During a shutdown, we call `storage_service::stop_transport` first. We may try to apply a Raft command after that, or still be in the the process of applying a command. In such a case, the shutdown process will hang because Raft retries replicating a command until it succeeds even in the case of a network error. It will stop when a corresponding abort source is set. However, if we pass `nullptr` to a function like `add_entry`, it won't stop. The shutdown process will hang forever. We fix all places that incorrectly pass `nullptr`. These shutdown hangs are not only theoretical. The incorrect `add_entry` call in `update_topology_state` caused scylladb/scylladb#16435. Additionally, we remove the default `nullptr` values in all member functions of `server` and `raft_group0_client` to avoid similar bugs in the future. Fixes scylladb/scylladb#16435 Closes scylladb/scylladb#16663 * github.com:scylladb/scylladb: server, raft_group0_client: remove the default nullptr values storage_service: make all Raft-based operations abortable	2024-01-08 11:30:56 +01:00
Botond Dénes	9119bcbd67	tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538	2024-01-08 02:18:15 -05:00
Botond Dénes	16791a63c9	tools/scylla-sstable: allow always passing --scylla-yaml-file option Currently, if multiple schema sources are provided, the tool complains about ambiguity, over which to consider. One of these option is --scylla-yaml-file. However, we want to allow passing this option any time, otherwise encrypted sstables cannot be read. So relax the multiple schema source check to also allow this option to be used even when e.g. --schema-file was used as the schema source.	2024-01-08 02:18:12 -05:00
Nadav Har'El	61395a3658	Update tools/java submodule * tools/java b7ebfd38...e106b500 (3): > build.xml: update scylla-driver-core to 3.11.5.1 > Use ReplicaOrdering.NEUTRAL in TokenAwarePolicy to respect RackAwareness > treewide: update "guava" package Refs https://github.com/scylladb/scylladb/pull/16491 Refs https://github.com/scylladb/scylla-tools-java/pull/372	2024-01-07 15:12:15 +02:00
Patryk Jędrzejczak	df2034ebd7	server, raft_group0_client: remove the default nullptr values The previous commit has fixed 5 bugs of the same type - incorrectly passing the default nullptr to one of the changed functions. At least some of these bugs wouldn't appear if there was no default value. It's much harder to make this kind of a bug if you have to write "nullptr". It's also much easier to detect it in review. Moreover, these default values are rarely used outside tests. Keeping them is just not worth the time spent on debugging.	2024-01-05 18:45:50 +01:00
Patryk Jędrzejczak	3d4af4ecf1	storage_service: make all Raft-based operations abortable During a shutdown, we call `storage_service::stop_transport` first. We may try to apply a Raft command after that, or still be in the the process of applying a command. In such a case, the shutdown process will hang because Raft retries replicating a command until it succeeds even in the case of a network error. It will stop when a corresponding abort source is set. However, if we pass `nullptr` to a function like `add_entry`, it won't stop. The shutdown process will hang forever. We fix all places that incorrectly pass `nullptr`. These shutdown hangs are not only theoretical. The incorrect `add_entry` call in `update_topology_state` caused scylladb/scylladb#16435.	2024-01-05 18:45:20 +01:00
Kefu Chai	7e84e03f52	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. because the removal of `#include "unimplemented.hh"`, `service/migration_manager.cc` misses the definition of `unimplemented::cause::VALIDATION`, so include the header where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16654	2024-01-05 13:37:08 +02:00
Nadav Har'El	94580df1c5	test/alternator: fix flaky test in test_filter_expression.py The test test_filter_expression.py::test_filter_expression_precedence is flaky - and can fail very rarely (so far we've only actually seen it fail once). The problem is that the test generates items with random clustering keys, chosen as an integer between 1 and 1 million, and there is a chance (roughly 2/10,000) that two of the 20 items happen to have the same key, so one of the items is "lost" and the comparison we do to the expected truth fails. The solution is to just use sequential keys, not random keys. There is nothing to gain in this test by using random keys. To make this test bug easy to reproduce, I temporarily changed random_i()'s range from 1,000,000 to 3, and saw the test failing every single run before this patch. After this patch - no longer using random_i() for the keys - the test doesn't fail any more. Fixes #16647 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16649	2024-01-04 21:36:40 +02:00
Kamil Braun	bf068dd023	Merge `handle error in cdc generation propagation during bootstrap` from Gleb Bootstrap cannot proceed if cdc generation propagation to all nodes fails, so the patch series handles the error by rolling the ongoing topology operation back. * 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev: test: add test to check failure handling in cdc generation commit storage_service: topology coordinator: rollback on failure to commit cdc generation	2024-01-04 15:38:51 +01:00
Kamil Braun	f942bf4a1f	Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy Currently, `add_saved_endpoint` is called from two paths: One, is when loading states from system.peers in the join path (join_cluster, join_token_ring), when `_raft_topology_change_enabled` is false, and the other is from `storage_service::topology_state_load` when raft topology changes are enabled. In the later path, from `topology_state_load`, `add_saved_endpoint` is called only if the endpoint_state does not exist yet. However, this is checked without acquiring the endpoint_lock and so it races with the gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint state may already be populated. Since `add_saved_endpoint` applies local information about the endpoint state (e.g. tokens, dc, rack), it uses the local heart_beat_version, with generation=0 to update the endpoint states, and that is incompatible with changes applies via gossip that will carry the endpoint's generation and version, determining the state's update order. This change makes sure that the endpoint state is never update in `add_saved_endpoint` if it has non-zero generation. An internal error exception is thrown if non-zero generation is found, and in the only call site that might reach that state, in `storage_service::topology_state_load`, the caller acquires the endpoint_lock for checking for the existence of the endpoint_state, calling `add_saved_endpoint` under the lock only if the endpoint_state does not exist. Fixes #16429 Closes scylladb/scylladb#16432 * github.com:scylladb/scylladb: gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found storage_service: topology_state_load: lock endpoint for add_saved_endpoint raft_group_registry: move on_alive error injection to gossiper	2024-01-04 14:47:10 +01:00
qiulijuan2	7fa2c33ba1	replica: remove duplicated function calling set_skip_when_empty is duplicated of metric column_family_row_hits in replica/table.cc fix: #16582 Signed-off-by: qiulijuan2<qiulijuan2_yewu@cmss.chinamobile.com> Closes scylladb/scylladb#16581	2024-01-04 15:04:31 +02:00
Kefu Chai	ee28a1cf4b	build: enable -Wimplicit-int-float-conversion `a209ae15` addresses that last -Wimplicit-int-float-conversion warning in the tree, so we now have the luxury of enabling this warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16640	2024-01-04 12:45:23 +02:00
Botond Dénes	9f0bd62d78	test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI	2024-01-04 03:20:17 -05:00
Botond Dénes	58d5339baa	test/cql-pytest: test_tools.py: extract some fixture logic to functions Namely, the fixture for preparing an sstable and the fixture for producing a reference dump (from an sstable). In the next patch we will add more similar fixtures, this patch enables them to share their core logic, without repeating code.	2024-01-04 03:20:17 -05:00
Botond Dénes	f7d59b3af0	test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class In the next patch, we want to add schema-load tests specific to views and indexes. Best to place these into a separate class, so extract the to-be-shared parts into a common base-class.	2024-01-04 03:20:17 -05:00
Botond Dénes	bea21657ec	tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas The table information of MVs (either user-created, or those backing a secondary index) is stored in system_schema.views, not system_schema.tables. So load this table when system_schema.tables has no entries for the looked-up table. Base table schema is not loaded.	2024-01-04 03:20:17 -05:00
Botond Dénes	79a006d6a8	tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas The underlying infrastructure (`load_schemas()`) already supports loading views and inedxes, extend this to said method. When loading a view/index, expect `load_schemas()` to return two schemas. The first is the base schema, the second is the view/index schema (this is validated). Only the latter is returned.	2024-01-04 03:20:17 -05:00
Botond Dénes	276bb16013	test/boost/schema_loader_test: add test for mvs and indexes	2024-01-04 03:20:17 -05:00
Botond Dénes	f5d4c1216e	tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL Add support for processing cql3::statement::create_view_statement and cql3::statement::create_index_statement statements. The CQL text (usually a file) has to provide the definition of the base table, before the definition of the views/indexes.	2024-01-04 03:20:17 -05:00
Botond Dénes	94aac35169	replica/database: extract existing_index_names and get_available_index_name To standalone functions in index/secondary_index_manager.{hh,cc}. This way, alternative data dictionary implementations (in tools/schema_loader.cc), can also re-use this code without having to instantiate a database or resorting to copy-paste. The functions are slighly changed: there are some additional params added to cover for things not internally available in the database object. const sstring& is converted to std::string_view.	2024-01-04 03:20:17 -05:00
Kefu Chai	cf932888de	Update seastar submodule * seastar e0d515b6...70349b74 (33): > util/log: drop unused function > util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0 > Fix edge case in memory sampler at OOM > exp/geo distribution benchmark > Additional allocation tests > Remove null pointer check on free hot path > Optimize final part of allocation hot path > Optimize zero size checking in allocator > memory: Optimize free fast path > memory: Optimize small alloc alloation path > memory: Limit alloc_sites size > memory: Add general comment about sampling strategy > memory: Use probabilistic sampler > util: Adapt memory sampler to seastar > util: Import Android Memory Sampler > memory: Use separate small pool for tracking sampled allocations > memory: Support enabling memory profiling at runtime > util/source_location-compat: mark `source_location::current()` consteval > build: use new behavior defined by CMP0155 when building C++ modules > circleci: build with C++20 modules enabled > seastar.cc: replace cryptopp with gnutls when building seastar modules > alien: include used header > seastar.cc: include used headers in the global purview > docker: install clang-tools-17 > net/tcp: generate a random src_port hashed to current shard if smp::count > 1 > net, websocket: replace Crypto++ calls with GnuTLS > README-DPDK.md: point user to DPDK's quick start guide > reactor: print fatal error using logger as well > Avoid ping-pong in spinlock::lock > memory: Add allocator perf tests > memory: Add a basic sized deletion test > Prometheus: Disable Prometheus protobuf with a configuration > treewide: bring back prometheus protobuf support * test/manual/sstable_scan_footprint_test: update to adapt to the breaking change of "memory: Use probabilistic sampler" in seastar Closes scylladb/scylladb#16610	2024-01-04 09:36:53 +02:00
Kefu Chai	47d8edc0fc	test.py: s/asyncio.get_event_loop()/asyncio.get_running_loop()/ the latter raises a RuntimeError if there is no no running event loop, while the former gets one from the the default policy in this case. in the use cases in test.py, there is always a running event loop, when `asyncio.get_event_loop()` gets called. so let's use the preferred `asyncio.get_running_loop()`. see https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16398	2024-01-04 08:39:49 +02:00
Botond Dénes	d9c30833ea	tools/schema_loader: make real_db.tables the only source of truth on existing tables Currently, we have `real_db.tables` and `schemas`, the former containing system tables needed to parse statements, and the latter accumulating user tables parsed from CQL. This will be error-prone to maintain with view/index support, so ditch `schemas` and instead add a `user` flag to `table` and accumulate all tables in `real_db.tables`. At the end, just return the schemas of all user tables.	2024-01-04 01:32:10 -05:00
Botond Dénes	ef3d143886	tools/schema_loader: table(): store const keyspace& No need for mutable reference, const ref makes life easier, because some lookup APIs of data_dictinary::database return const keyspace& only.	2024-01-04 01:32:10 -05:00
Botond Dénes	1003508066	tools/schema_loader: make database,keyspace,table non-movable These types contain self-references. Make sure they are not moved, not even accidentally.	2024-01-04 01:32:10 -05:00
Botond Dénes	1f7b03672c	cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value Scylla's schema tables code determines which index was added, by diffing index definitions with previous ones. This is clunky to use in tools/schema_loader.cc, so also return the index metadata for the newly created index.	2024-01-04 01:32:10 -05:00
Botond Dénes	94dbb7cb29	cql3/statements/create_index_statement: make build_index_schema() public tools/schema_builder.cc wants it.	2024-01-04 01:32:10 -05:00
Botond Dénes	039d41f5d4	cql3/statements/create_index_statement: relax some method's dependence on qp The methods `validate_while_excuting()` and its only caller, `build_index_schema()`, only use the query processor to get db from it. So replace qp parameter with db one, relaxing requirements w.r.t. callers.	2024-01-04 01:32:10 -05:00
Botond Dénes	5f42c2c7c4	cql3/statements/create_view_statement: make prepare_view() public tools/schema_loader.cc wants to use it.	2024-01-04 01:32:10 -05:00
Kefu Chai	50cf62e186	build: cmake: do not link against Boost::dynamic_linking Boost::dynamic_linking was introduced as a compatibility target which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since Scylla only runs on Linux, there is no need to link against this library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16544	2024-01-04 08:06:19 +02:00
Lakshmi Narayanan Sreethar	1d6eaf2985	compaction manager: remove: cleanup _compaction_state on exceptions If for some reason an exception is thrown in compaction_manager::remove, it might leave behind stale table pointers in _compaction_state. Fix that by setting up a deffered action to perform the cleanup. Fixes #16635 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16632	2024-01-03 22:03:24 +02:00
Benny Halevy	9e8998109f	gossiper: get_*_members_synchronized: acquire endpoint update semaphore To ensure that the value they return is synchronized on all shards. This got broken recently by `147f30caff`. Refs https://github.com/scylladb/scylladb/pull/16597#discussion_r1440445432 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16629	2024-01-03 17:41:46 +01:00
Michał Chojnowski	a209ae1573	cql3: type_json: fix an edge case in float-to-int conversion Refer to the added comment for details. This problem was found by a compiler warning, and I'm fixing it mainly to silence the warning. I didn't give any thought to its effects in practice. Fixes #13077 Closes scylladb/scylladb#16625 [avi: changed Refs to Fixes]	2024-01-03 17:59:01 +02:00
Kefu Chai	2ad532df43	test: randomized_nemesis_test: move std::variant formatter up we format `std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>` in this source file. and currently, we format `std::variant<..>` using the default-generated `fmt::formatter` from `operator<<`, so in order to format it using {fmt}'s compile-time check enabled, we have to make the `operator<<` overload for `std::variant<...>` visible from the caller sites which format `std::variant<...>` using {fmt}. in this change, the `operator<<` for `std::variant<...>` is moved to from the middle of the source file to the top of it, so that it can be found when the compiler looks up for a matched `fmt::formatter` for `std::variant<...>`. please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`, as its specialization for `std::variant` requires that all the types of the variant is `is_formattable`. but the default generated formatter for type `T` is not considered as the proof that `T` is formattable. this should address the FTBFS with the latest seastar like: ``` /usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>') ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16616	2024-01-03 16:38:25 +01:00
Kefu Chai	2c394e3f6f	tablets: remove unused #includes the removed #include headers are not used, so let's drop their `#include`s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16619	2024-01-03 15:30:40 +01:00
Avi Kivity	20531872a7	Merge 'test: randomized_nemesis_test: add formatter for append_entry' from Kefu Chai we are using `seastar::format()` to format `append_entry` in `append_reg_model`, so we have to provide a `fmt::formatter` for these callers which format `append_entry`. despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined by fmt v9, we don't have it since fmt v10. so this change prepares us for fmt v10. Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#16614 * github.com:scylladb/scylladb: test: randomized_nemesis_test: add formatter for append_entry test: randomized_nemesis_test: move append_reg_model::entry out	2024-01-03 15:06:33 +02:00
Kefu Chai	dde8f694f6	build: cmake: use # for line comment it was a copy-pasta error introduced by `2508d339`. the copyright blob was copied from a C++ source code, but the CMake language define the block comment is different from the C++ language. let's use the line comment of CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16615	2024-01-03 15:05:00 +02:00
Tomasz Grabiec	715e062d4a	Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity In `7d5e22b43b` ("replica: memtable: don't forget memtable memory allocation statistics") we taught memtable_list to remember learned memory allocation reserves so a new memtable inherits these statistics from an older memtable. Share it now further across tablets that belong to the same table as well. This helps the statistics be more accurate for tablets that are migrated in, as they can share existing tablet's memory allocation history. Closes scylladb/scylladb#16571 * github.com:scylladb/scylladb: table, memtable: share log-structured allocator statistics across all memtables in a table memtable: consolidate _read_section, _allocating_section in a struct	2024-01-03 14:03:40 +01:00
Avi Kivity	b8a0e3543e	docs: ddl: document the initial_tablets replication strategy option While the feature is experimental, this makes it easier to experiment with it. An example is provided. Closes scylladb/scylladb#16193	2024-01-03 13:49:30 +01:00
Benny Halevy	147f30caff	gossiper: mutate_live_and_unreachable_endpoints: make exception safe Change the mutate_live_and_unreachable_endpoints procedure so that the called `func` would mutate a cloned `live_and_unreachable_endpoints` object in place. Those are replicated to temporary copies on all shards using `foreign<unique_ptr<>>` so that the would be automatically freed on exception. Only after all copies are made, they are applied on all gossiper shards in a noexcept loop and finally, a `on_success` function is called to apply further side effects if everything else was replicated successfully. The latter is still susceptible to exceptions, but we can live with those as long as `_live_endpoints` and `_unreachable_endpoints` are synchronized on all shards. With that, the read-only methods: `get_live_members_synchronized` and `get_unreachable_members_synchronized` become trivial and they just return the required data from shard 0. Fixes #15089 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16597	2024-01-03 14:46:10 +02:00
Benny Halevy	fadcef01f5	database: setup_scylla_memory_diagnostics_producer: replace infinity sign with `unlimited` string The infinity unicode sign used for dumping read concurrency semaphore state, `∞` may be misrendered. For example: https://jenkins.scylladb.com/job/scylla-master/job/dtest-release/451/artifact/logs-full.release.011/1703288463175_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_add_dc_during_mv_insert/node1.log ``` Read Concurrency Semaphores: user: 0/100, 1K/9M, queued: 0 streaming: 0/10, 0B/9M, queued: 0 system: 0/10, 0B/9M, queued: 0 compaction: 0/âˆž, 0B/âˆž ``` Instead, just print the word `unlimited`. This was introduced in `34c213f9bb` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16534	2024-01-03 14:46:10 +02:00
Kefu Chai	3e4159fece	repair: remove unused #include remove the unused #include headers from repair.hh, as they are not directly used. after this change, task_manager_module.hh fails to have access to stream_reason, so include it where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16618	2024-01-03 14:46:10 +02:00
Kefu Chai	1f4b5126f6	build: cmake: add comment explaining CMAKE_CXX_FLAGS_RELWITHDEBINFO to clarify why we need to set this flagset instead of appending to it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16546	2024-01-03 14:46:10 +02:00
Kefu Chai	3ef0345b7f	test/nodetool: log response from mock server when handling JSONDecodeError it's observed that the mock server could return something not decodable as JSON. so let's print out the response in the logging message in this case. this should help us to understand the test failure better if it surfaces again. Refs #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16543	2024-01-03 14:46:10 +02:00
Kefu Chai	0484ac46af	test: randomized_nemesis_test: add formatter for append_entry we are using `seastar::format()` to format `append_entry` in `append_reg_model`, so we have to provide a `fmt::formatter` for these callers which format `append_entry`. despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined by fmt v9, we don't have it since fmt v10. so this change prepares us for fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-03 08:38:43 +08:00
Kefu Chai	32e55731ab	test: randomized_nemesis_test: move append_reg_model::entry out this change prepares for adding fmt::formatter for append_entry. as we are using its formatter in the inline member functions of `append_reg_model`. but its `fmt::formatter` can only be specialized out of this class. and we don't have access to `format_as()` yet in {fmt} 9.1.0 which is shipped along with fedora38, which is in turn used for our base build image. so, in this change, `append_reg_model::entry` is extracted and renamed to `append_entry`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-03 08:38:43 +08:00
Sylwia Szunejko	91a5a41313	add a way to negotiate generation of the tablet info for drivers Tablets metadata is quite expensive to generate (each data_value is an allocation), so an old driver (without support for tablets) will generate huge amounts of such notifications. This commit adds a way to negotiate generation of the notification: a new driver will ask for them, and an old driver won't get them. It uses the OPTIONS/SUPPORTED/STARTUP protocol described in native_protocol_v4.spec. Closes scylladb/scylladb#16611	2024-01-02 20:00:50 +02:00
Kefu Chai	2508d33946	build: cmake: add Findcryptopp.cmake seastar dropped the dependency to Crypto++, and it also removed Findcryptopp.cmake from its `cmake` directory. but scylladb still depends on this library. and it has been using the `Findcryptopp.cmake` in seastar submodule for finding it. after the removal of this file, scylladb would not be able to use it anymore. so, we have to provide our own `Findcryptopp.cmake`. Findcryptopp.cmake is copied from the Seastar project. So its date of copyright is preserved. and it was licensed under Apache 2.0, since we are creating a derivative work from it. let's relicense it under Apache 2.0 and AGPL 3.0 or later. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16601	2024-01-02 19:09:50 +02:00
Kefu Chai	34259a03d0	treewide: use consteval string as format string when formatting log message seastar::logger is using the compile-time format checking by default if compiled using {fmt} 8.0 and up. and it requires the format string to be consteval string, otherwise we have to use `fmt::runtime()` explicitly. so adapt the change, let's use the consteval string when formatting logging messages. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16612	2024-01-02 19:08:47 +02:00
Kefu Chai	64a227fba0	alternator/auth: remove unused #include in `alternator/auth.cc`, none of the symbols in "query" namespace provided by the removed headers is used is used, so there is no need to include this header file. the same applies to other removed header files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16603	2024-01-02 17:50:59 +02:00
Kamil Braun	949658590f	Merge 'raft topology: do not update token metadata in on_alive and on_remove' from Patryk Jędrzejczak In the Raft-based topology, we should never update token metadata through gossip notifications. `storage_service::on_alive` and `storage_service::on_remove` do it, so we ignore their parts that touch token metadata. Additionally, we improve some logs in other places where we ignore the function because of using the Raft-based topology. Fixes scylladb/scylladb#15732 Closes scylladb/scylladb#16528 * github.com:scylladb/scylladb: storage_service: handle_state_left, handle_state_normal: improve logs raft topology: do not update token metadata in on_alive and on_remove	2024-01-02 16:08:50 +01:00
Kefu Chai	dd496afff3	mutation: add formatter for {atomic_cell_view,atomic_cell}::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for `atomic_cell_view::printer` and `atomic_cell::printer` respectively, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16602	2024-01-02 16:14:42 +02:00
Kamil Braun	7f6955b883	Merge 'test: make use of concurrent bootstrap' from Patryk Jędrzejczak In #16102, we added a test for concurrent bootstrap in the raft-based topology. This test was running in CI for some time and never failed. Now, we can believe that concurrent bootstrap is not bugged or at least the probability of a failure is very low. Therefore, we can safely make use of it in all tests using the raft-based topology. This PR: - makes all initial servers start concurrently in topology tests, - replaces all multiple `server_add` calls with a single `servers_add` call in tests using the raft-based topology, - removes no longer needed `test_concurrent_bootstrap`. The changes listed above: - make running tests a bit faster due to concurrent bootstraps, - make multiple tests test concurrent bootstrap previously tested by a single test. Fixes scylladb/scylladb#15423 Closes scylladb/scylladb#16384 * github.com:scylladb/scylladb: test: test_different_group0_ids: fix comments test: remove test_concurrent_bootstrap test: replace multiple server_add calls with servers_add test: ScyllaCluster: start all initial servers concurrently test: ManagerClient: servers_add: specify consistent-topology-changes assumption	2024-01-02 15:11:18 +01:00
Sylwia Szunejko	467d466f7e	put all tablet info into one field of custom_payload and update docs Previously, the tablet information was sent to the drivers in two pieces within the custom_payload. We had information about the replicas under the `tablet_replicas` key and token range information under `token_range`. These names were quite generic and might have caused problems for other custom_payload users. Additionally, dividing the information into two pieces raised the question of what to do if one key is present while the other is missing. This commit changes the serialization mechanism to pack all information under one specific name, `tablets-routing-v1`. From: Sylwia Szunejko <sylwia.szunejko@scylladb.com> Closes scylladb/scylladb#16148	2024-01-02 14:35:37 +02:00
Patryk Jędrzejczak	215534d527	test: test_different_group0_ids: fix comments The test disables consistent topology changes, not cluster management.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	466723a74f	test: remove test_concurrent_bootstrap This test only adds 3 nodes concurrently to the empty cluster. After making many other tests use ManagerClient.servers_add, it serves no purpose. We had added this test before we decided to use ManagerClient.servers_add in many tests to avoid multiple failures in CI if it turned out that the concurrent bootstrap is flaky with high frequency there. This test was running in CI for some time and never failed. Now, we can believe that concurrent bootstrap is not bugged or at least the probability of a failure is very low.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	a8513bd41b	test: replace multiple server_add calls with servers_add ManagerClient.servers_add can be used in every test that uses consistent topology changes. We replace all multiple server_add calls in such tests with a single servers_add call to make these tests faster and simplify their code. Additionally, these servers_add calls will test concurrent bootstraps for free.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	debf1db3ef	test: ScyllaCluster: start all initial servers concurrently Starting all initial servers concurrently makes tests in suites with initial_size > 1 run a bit faster. Additionally, these tests test concurrent bootstraps for free. add_servers can be called only if the cluster uses consistent topology changes. We can use this function unconditionally in install_and_start because every suite uses consistent topology changes by default. The only way to not use it is by adding all servers with a config that contains experimental_features without consistent-topology-changes.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	16b0eeb3d6	test: ManagerClient: servers_add: specify consistent-topology-changes assumption ManagerClient.servers_add can be called only if the cluster uses consistent topology changes. We add this specification to the leading comment.	2024-01-02 12:19:31 +01:00
Kefu Chai	f4bd86384b	install.sh: use a temporary file when packaging scylla.yaml we create a default `scylla.yaml` on the fly in `install.sh`. but the path to the temporary file holding the default yaml file is hardwired to `/tmp/scylla.yaml`. this works fine if we only have a single `install.sh` at a certain time point. but if we have multiple `install.sh` process running in parallel, these packaging jobs could step on each other when they create and remove the `scylla.yaml`. in this change, because the limit of `installconfig`, it always consider the "dest" parameter as a directory, `mktemp` is used for creating a parent directory of the temporary file. Fixes #16591 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16592	2024-01-01 21:50:29 +02:00
Kefu Chai	48b8544a63	.git: add skip more words and directories we use "ue" for the short of "update_expressions", before we change our minds and use a more readable name, let's add "ue" to the "ignore_word_list" option of the codespell. also, use the abslolute path in "skip" option. as the absolute paths are also used by codespell's own github workflow. and we are still observing codespell github workflow is showing the misspelling errors in our "test/" directory even we have it listed in "skip". so this change should silence them as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16593	2024-01-01 14:32:16 +02:00
Avi Kivity	8ba0decda5	Merge 'System.peers: enforce host_id' from Benny Halevy The HOST_ID is already written to system.peers since inception pretty much (See https://github.com/scylladb/scylladb/pull/16376#discussion_r1429248185 for details). However, it is written to the table using an individual CQL query and so it is not set atomically with other columns. If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value. This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes. On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted. With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id. Refs https://github.com/scylladb/scylladb/pull/15903 Closes scylladb/scylladb#16376 * github.com:scylladb/scylladb: gms: endpoint_state: change application_state_map to std::unordered_map system_keyspace: update_peer_info: drop single-column overloads storage_service: drop do_update_system_peers_table storage_service: on_change: fixup indentation endpoint_state subscriptions: batch on_change notification everywhere: drop before_change subscription system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id system_keyspace: drop update_tokens(endpoint, tokens) overload storage_service: seal peer info with host_id storage_service: update_peer_info: pass peer_info to sys_ks gms: endpoint_state: define application_state_map system_keyspace: update_peer_info: use struct peer_info for all optional values query_processor: execute_internal: support unset values types: add data_value_list system_keyspace: get rid of update_cached_values storage_service: do not update peer info for this node	2023-12-31 21:22:04 +02:00
Benny Halevy	cdd5605d81	gms: endpoint_state: change application_state_map to std::unordered_map State changes are processed as a batch and there is no reason to maintain them as an ordered map. Instead, use a std::unordered_map that is more efficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	c520fc23f0	system_keyspace: update_peer_info: drop single-column overloads They are no longer used. Instead, all callers now pass peer_info. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	0e5a666e6f	storage_service: drop do_update_system_peers_table It is no longer used after previous patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	13d395fa6a	storage_service: on_change: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	ad8a9104d8	endpoint_state subscriptions: batch on_change notification Rather than calling on_change for each particular application_state, pass an endpoint_state::map_type with all changed states, to be processed as a batch. In particular, thise allows storage_service::on_change to update_peer_info once for all changed states. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	1d07a596bf	everywhere: drop before_change subscription None of the subscribers is doing anything before_change. This is done before changing `on_change` in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	7670f60b83	system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id Skip rows that have no host_id to make sure the node state we load always has a valid host_id. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	74159bb5ae	system_keyspace: drop update_tokens(endpoint, tokens) overload It is unused now after the previous patch to update_peer_info in one call. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	2075c85b70	storage_service: seal peer info with host_id When adding a peer via update_peer_info, insert all columns in a single query using system_keyspace::peer_info. This ensures that `host_id` is inserted along with all other app states, so we can rely on it when loading the peer info after restart. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	eb4cd388ce	storage_service: update_peer_info: pass peer_info to sys_ks Use the newly added system_keyspace::peer_info to pass a struct of all optional system.peea members to system_keyspace::update_peer_info. Add `get_peer_info_for_update` to construct said struct from the endpoint state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	5abf556399	gms: endpoint_state: define application_state_map Have a central definition for the map held in the endpoint_state (before changing it to std::unordered_map). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	b2735d47f7	system_keyspace: update_peer_info: use struct peer_info for all optional values Define struct peer_info holding optional values for all system.peers columns, allowing the caller to update any column. Pass the values as std::vector<std::optional<data_value>> to query_processor::execute_internal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:30 +02:00
Benny Halevy	6123dc6b09	query_processor: execute_internal: support unset values Add overloads for execute_internal and friends accepting a vector of optional<data_value>. The caller can pass nullopt for any unset value. The vector of optionals is translated internally to `cql3::raw_value_vector_with_unset` by `make_internal_options`. This path will be called by system_keyspace::update_peer_info for updating a subset of the system.peers columns. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:21:35 +02:00
Benny Halevy	328ce23c78	types: add data_value_list data_value_list is a wrapper around std::initializer_list<data_value>. Use it for passing values to `cql3::query_processor::execute_internal` and friends. A following path will add a std::variant for data_value_or_unset and extend data_value_list to support unset values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:17:27 +02:00
Benny Halevy	3cba079b26	gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found Currently, when loading peers' endpoint state from system.peers, add_saved_endpoint is called. The first instance of the endpoint state is created with the default heart_beat_state, with both generation and version set to zero. However, if add_saved_endpoint finds an existing instance of the endpoint state, it reuses it, but it updates its heart_beat_state with the local heart_beat_state() rather than keeping the existing heart_beat_state, as it should. This is a problem since it may confuse updates over gossip later on via do_apply_state_locally that compares the remote generation vs. the local generation, so they must stem from the same root that is the endpoint itself. Fixes #16429 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 16:48:57 +02:00
Benny Halevy	3099c5b8ab	storage_service: topology_state_load: lock endpoint for add_saved_endpoint `topology_state_load` currently calls `add_saved_endpoint` only if it finds no endpoint_state_ptr for the endpoint. However, this is done before locking the endpoint and the endpoint state could be inserted concurrently. To prevent that, a permit_id parameter was added to `add_saved_endpoint` allowing the caller to call it while the endpoint is locked. With that, `topology_state_load` locks the endpoint and checks the existence of the endpoint state under the lock, before calling `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 16:48:57 +02:00
Benny Halevy	db434e8cb5	raft_group_registry: move on_alive error injection to gossiper Move the `raft_group_registry::on_alive` error injection point to `gossiper::real_mark_alive` so it can delay marking the endpoint as alive, and calling the `on_alive` callback, but without holding the endpoint_lock. Note that the entry for this endpoint in `_pending_mark_alive_endpoints` still blocks marking it as alive until real_mark_alive completes. Fixes #16506 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 15:28:54 +02:00
Konstantin Osipov	246da8884a	test.py: override SCYLLA_* env keys test.py inherits its env from the user, which is the right thing: some python modules, e.g. logging, do accept env-based configuration. However, test.py also starts subprocesses, i.e. tests, which start scylladb instances. And when the instance is started without an explicit configuration file, SCYLLA_CONF from user environment can be used. If this scylla.conf contains funny parameters, e.g. unsupported configuration options, the tests may break in an unexpected way. Avoid this by resetting the respecting env keys in test.py. Fixes gh-16583 Closes scylladb/scylladb#16577	2023-12-31 13:02:49 +02:00
Benny Halevy	85b3232086	system_keyspace: get rid of update_cached_values It's a no-op. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 10:10:51 +02:00
Benny Halevy	f64ecc2edf	storage_service: do not update peer info for this node system_keyspace had a hack to skip update_peer_info for the local node, and then to remove an entry for the local node in system.peers if `update_tokens(endpoint, ...)` was called for this node. This change unhacks system_keyspace by considering update of system.peers with the local address as an internal error and fixing the call sites that do that. Fixes #16425 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 10:10:51 +02:00
Patryk Jędrzejczak	da37e82fb9	test: add test_remove_alive_node We add a test for the Raft-based topology's new feature - rejecting the removenode operation on the topology coordinator side if the node being removed is considered alive by the failure detector. Additionally, the test tests a case when the removenode operation is rejected on the initiator side.	2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak	bd5ee04c18	topology_coordinator: reject removenode if the removed node is alive The removenode operation is defined to succeed only if the node being removed is dead. Currently, we reject this operation on the initiator side (in storage_service::raft_removenode) when the failure detector considers the node being removed alive. However, it is possible that even if the initiator considers the node dead, the topology coordinator will consider it alive when handling the topology request. For example, the topology coordinator can use a bigger failure detector timeout, or the node being removed can suddenly resurrect. This patch adds a check on the topology coordinator side. Note that the only goal of this change is to improve the user experience. The topology coordinator does not rely on the gossiper to ensure correctness.	2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak	cf955094c1	test: ManagerClient: remove unused wait_for_host_down The previous commit removed the only call to wait_for_host_down. Moreover, this function is identical to server_not_sees_other_server. We can safely remove it.	2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak	7038a033f2	test: remove_node: wait until the node being removed is dead In the following commits, we make the topology coordinator reject removenode requests if the node being removed is considered alive by the gossiper. Before making this change, we need to adapt the testing framework so that we don't have flaky removenode operations that fail because the node being removed hasn't been marked as dead yet. We achieve this by waiting until all other running nodes see the node being removed as dead in all removenode operations. Some tests are simplified after this change because they don't have to call server_not_sees_other_server anymore.	2023-12-29 17:12:45 +01:00
Patryk Jędrzejczak	6ffacae0c7	storage_service: handle_state_left, handle_state_normal: improve logs We log the information about ignoring the `handle_state_left` function after logging the general entry information. It is better to know what exactly is being ignored during debugging. We also add the `permit_id` info to the log. All other functions called through gossip notifications log it.	2023-12-29 15:10:56 +01:00
Patryk Jędrzejczak	3e551ef485	raft topology: do not update token metadata in on_alive and on_remove In the Raft-based topology, we should never update token metadata through gossip notifications. `storage_service::on_alive` and `storage_service::on_remove` do it, so we ignore their parts that touch token metadata. There are other functions in storage_service called through gossip notifications that are not ignored in the Raft-based topology. However, we don't have to or cannot ignore them. We cannot ignore `on_join` and `on_change` because they update the PEERS table used by drivers. The rest of those functions don't have to be ignored. These are: - `before_change` - it does nothing, - `on_dead` and `on_restart` - they only remove the RPC client and send notifications, - `handle_state_bootstrap` and `handle_state_removed` - they are never called in the Raft-based topology.	2023-12-29 15:10:35 +01:00
Patryk Jędrzejczak	f1dea4bc8a	storage_proxy: do not fence reads and writes to local tables Fencing is necessary only for reads and writes to non-local tables. Moreover, fencing a read or write to a local table can cause an error on the bootstrapping node. It is explained in the comment in storage_proxy::get_fence. A scenario described in the comment has been reported in scylladb/scylladb#16423. A write to the local RAFT table failed because of fencing, and it killed server_impl::io_fiber. Fixes scylladb/scylladb#16423 Closes scylladb/scylladb#16525	2023-12-28 19:34:27 +02:00
Nadav Har'El	91636f6d21	test/cql-pytest: reproducer of slightly too strict parser of timestamp Scylla refuses the timestamp format "2014-01-01 12:15:45.0000000Z" that has 6 digits of precision for the fractional second, and only allows 3 digits of precision. This restriction makes sense - after all CQL timestamp columns (note - this is NOT "using timestamp"!) only have millisecond precision. Nevertheless, Cassandra does not have this restriction and does allow these over-precise timestamps. In this patch we add a test that demonstrates this difference. Curiously, in the past Scylla generated this forbidden timestamp format when outputting the timestamp to a string (e.g. toJson()), which it then couldn't read back! This was issue #16575. Today Scylla no longer generates this forbidden timestamp format. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16576	2023-12-28 19:01:25 +02:00
Takuya ASADA	7275b614aa	scylla_util.py: wait for apt operation on other processes apt_install() / apt_uninstall() may fail if background process running apt operation, such as unattended-upgrades. To avoid this, we need to add two things: 1. For apt-get install / remove, we need to option "DPkg::Lock::Timeout=-1" to wait for dpkg lock. 2. For apt-get update, there is no option to wait for cache lock. Therefore, we need to implement retry-loop to wait for apt-get update succeed. Fixes #16537 Closes scylladb/scylladb#16561	2023-12-28 19:00:36 +02:00
Takuya ASADA	331d9ce788	install.sh: fix scylla-server.service failure on nonroot mode On `3da346a86d`, we moved AmbientCapabilities to scylla-server.service, but it causes "Operation not permitted" on nonroot mode. It is because nonroot user does not have enough privilege to set capabilities, we need to disable the parameter on nonroot mode. Closes scylladb/scylladb#16574	2023-12-27 20:52:17 +02:00
Avi Kivity	6394854f04	Merge 'Some cleanups in tests for tablets + MV ' from Nadav Har'El This small series improves two things in the multi-node tests for tablet supports in materialized views: 1. The test for Alternator LSI, which "sometimes" could reproduce the bug by creating 10-node cluster with a random tablet distribution, is replaced by a reliable 2-node cluster which controls the tablet distribution. The new test also confirms that tablets are actually enabled in Alternator (reviewers of the original test noted it would be easy to pass the test if tablets were accidentally not enabled... :-)). 2. Simplify the tablet lookup code in the test to not go through a "table id", and lookup the table's (or view's) name directly (requires a full-table of the tablets table, but that's entirely reasonable in a test). The third patch in this series also fixes a comment typo discovered in a previous review. Closes scylladb/scylladb#16440 * github.com:scylladb/scylladb: materialized views: fix typo in comment test_mv_tablets: simplify lookup of tablets alternator, tablets: improve Alternator LSI tablets test	2023-12-27 20:18:14 +02:00
Gleb Natapov	e31f6893af	storage_service: topology coordinator: fix accessing outdated node in case of barrier failure When metadata barrier fails a guard is released and node becomes outdated. Failure handling path needs to re-take the guard and re-create the node before continuing. Fixes: #16568 Message-ID: <ZYxEm+SaBeFcRT8E@scylladb.com>	2023-12-27 18:40:10 +02:00
Avi Kivity	3ce0576a31	Merge 'Sanitize keyspace_metadata creation' from Pavel Emelyanov The amount of arguments needed to create ks metadata object is pretty large and there are many different ways it can be and it is created over the code. This set simplifies it for the most typical patterns. closes: #16447 closes: #16449 Closes scylladb/scylladb#16565 * github.com:scylladb/scylladb: schema_tables: Use new_keyspace() sugar keyspace_metadata: Drop vector-of-schemas argument from new_keyspace() keyspace_metadata: Add default value for new_keyspace's durable_writes keyspace_metadata: Pack constructors with default arguments	2023-12-27 17:15:04 +02:00
Botond Dénes	1647b29cba	tools/schema_loader: add db::config parameter to all load methods So that a single centrally managed db::config instance can be shared by all code requiring it, instead of creating local instances where needed. This is required to load schema from encrypted schema-tables, and it also helps memory consumption a bit (db::config consumes a lot of memory). Fixes: #16480 Closes scylladb/scylladb#16495	2023-12-27 16:28:38 +02:00
Nadav Har'El	e6dc9bca0d	Merge 'Profile dumping rest api support' from Eliran Sinvani This change is motivated by wanting to have code coverage reporting support. Currently the only way to get a profile dump in ScyllaDB is stopping it with SIGTERM, however, this doesn't suite all cases, more specifically: 1. In dtest, when some of the tests intentionally abruptly kill a node 2. In test.py, where we would like to distinguish (at least for now), graceful shutdown of ScyllaDB testing and teardown procedures (which currently kills the nodes). This mini series adds two changes: 1. It adds the support for profile dumping in ScyllaDB with rest api ('/system/dump_profile') 2. It adds the support for this API in test.py and also adds a call for it as part of the node stop procedure in a permissive way that will not fail the teardown or test if the call doesn't succeed for whatever reason - after this change, all current test.py suits except for pylib_test (expected) dumps profiles if instrumented and will be able to participate in coverage reporting. Refs #16323 Closes scylladb/scylladb#16557 * github.com:scylladb/scylladb: test.py: Dump coverage profile before killing a node rest api: Add an api for profile dumping	2023-12-27 12:06:39 +02:00
Eliran Sinvani	e49b3ffc89	test.py: Dump coverage profile before killing a node Up until now the only way to get a coverage profile was to shut down the ScyllaDB nodes gracefully (using SIGTERM), this means that the coverage profile was lost for every node that was killed abruptly (SIGKILL). This in turn would have been requiring us to shut down all nodes gracefully which is not something we set out to do. Here we use the rest API for dumping the coverage profile which will cause the most minimal impact possible on the test runs. If the dumping fails (due to the node doesn't support the API or due to a real error in dumping we ignore it as it is not part of the system we would like to test. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-27 07:17:26 +02:00
Eliran Sinvani	4c60804c4c	rest api: Add an api for profile dumping As part of code coverage support we need to work with dumped profiles for ScyllaDB executables. Those profiles are created on two occasions: 1. When an application exits notmaly (which will trigger __llvm_dump_profile registered in the exit hooks. 2. For ScyllaDB commit `d7b524cf10` introduced a manual call to __llvm_dump_profile upon receiving a SIGTERM signal. This commit adds a third option, a rest API to dump the profile. In addition the target file is logged and the counters are reset, which enables incremental dumping of the profile. Except for logging, if the executable is not instrumented, this API call becomes a no-op so it bears minimal risk in keeping it in our releases. Specifically for code coverage, the gain will be that we will not be required to change the entire test run to shut down clusters gracefully and this will cause minimal effect to the actual test behavior. The change was tested by manually triggering the API in with and without instrumentation as well as re triggering it with write permissions for the profile file disabled (to test fault tolerance). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-27 07:06:54 +02:00
Avi Kivity	2a76065e3d	table, memtable: share log-structured allocator statistics across all memtables in a table The log-structured allocator collects allocation statistics (which it uses to manage memory reserves) in some objects kept in memtable_table_shared_data. Right now, this object is local to memtable_list, which itself is local to a tablet replica. Move it to table scope so different tablets in the shard share the statistics. This helps a newly-migrated tablet adjust more quickly.	2023-12-26 21:24:51 +02:00
Avi Kivity	02111d6754	memtable: consolidate _read_section, _allocating_section in a struct Those two members are passed from memtable_list to memtable. Since we wish to pass them from table, it becomes awkward to pass them as two separate variables as their contents are specific to memtable internals. Wrap them in a name that indicates their role (being table-wide shared data for memtables) and pass them as a unit.	2023-12-26 21:11:48 +02:00
Nadav Har'El	fc71c34597	Merge 'select statement: verify EXECUTE permissions only for non native functions' from Eliran Sinvani Commit `62458b8e4f` introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs. The code does not distinguish between the two so the permissions are also unintenionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions. Fixes #16526 Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions. Also added test that checks for regression on native functions execution and verified that it fails on authorization before the fix and passes after the fix. Closes scylladb/scylladb#16556 * github.com:scylladb/scylladb: test.py: Add test for native functions permissions select statement: verify EXECUTE permissions only for non native functions	2023-12-26 18:14:21 +02:00
Gleb Natapov	74d17719db	test: add test to check failure handling in cdc generation commit	2023-12-26 16:01:34 +02:00
Gleb Natapov	21063b80fb	storage_service: topology coordinator: rollback on failure to commit cdc generation If the coordinator fail to notify all nodes about new cdc generation during bootstrap it cannot proceed booting since it can cause data lose with cdc. Rollback the topology operation if failure happens during this state.	2023-12-26 15:58:15 +02:00
Pavel Emelyanov	129196db98	schema_tables: Use new_keyspace() sugar The create_keyspace_from_schema_partition code creates ks metadata without schemas and user-types. There's new_keyspace() convenience helper for such cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 13:26:58 +03:00
Pavel Emelyanov	a1ad2571fc	keyspace_metadata: Drop vector-of-schemas argument from new_keyspace() It's only testing code that wants to call new_keyspace with existing schemas, all the other callers either construct the ks metadata directly, or use convenience new_keyspace with explicitly empty schemas. By and large it's nicer if new_keyspace() doesn't requires this argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 13:00:44 +03:00
Pavel Emelyanov	ffdafe4024	keyspace_metadata: Add default value for new_keyspace's durable_writes Almost all callers call new_keyspace with durable writes ON, so it's worth having default value for it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 11:47:37 +03:00
Pavel Emelyanov	9ab0065796	keyspace_metadata: Pack constructors with default arguments There's a cascade of keyspace_metadata constructors each adding one default argument to the prevuous one. All this can be expressed shorter with the help of native default argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 11:41:01 +03:00
Eliran Sinvani	a336550041	test.py: Add test for native functions permissions Native functions (non UDF/UDA functions), should be usable even if a user is not granted EXECUTE permissions on them. This is a regression test that was added following: https://github.com/scylladb/scylladb/issues/16526 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-26 10:27:04 +02:00
Eliran Sinvani	cac79977d6	select statement: verify EXECUTE permissions only for non native functions Commit `62458b8e4f` introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs. The code does not distinguish between the two so the permissions are also unintentionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions. Fixes #16526 Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-26 10:27:04 +02:00
Avi Kivity	3968fc11bf	Merge 'cql: fix regression in SELECT * GROUP BY' from Nadav Har'El This short series fixes a regression from Scylla 5.2 to Scylla 5.4 in "SELECT * GROUP BY" - this query was supposed to return just a single row from each partition (the first one in clustering order), but after the expression rewrite started to wrongly return all rows. The series also includes a regression test that verifies that this query works doesn't work correctly before this series, but works with this patch - and also works as expected in Scylla 5.2 and in Cassadra. Fixes #16531. Closes scylladb/scylladb#16559 * github.com:scylladb/scylladb: test/cql-pytest: check that most aggregators don't take "" cql-pytest: add reproducer for GROUP BY regression cql: fix regression in SELECT GROUP BY	2023-12-25 19:53:55 +02:00
Avi Kivity	3da346a86d	Merge 'Drop CentOS7 specific codes' from Takuya ASADA Since we decided to drop CentOS7 support from latest version of Scylla, now we can drop CentOS7 specific codes from packaging scripts and setup scripts. Related scylladb/scylla-enterprise#3502 Closes scylladb/scylladb#16365 * github.com:scylladb/scylladb: scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+ dist: drop legacy control group parameters scylla-server.slice: Drop workaround for MemorySwapMax=0 bug dist: move AmbientCapabilities to scylla-server.service Revert "scylla_setup: add warning for CentOS7 default kernel" [avi: CentOS 7 reached EOL on June 2024]	2023-12-25 18:25:05 +02:00
Kefu Chai	68c98d2203	build: cmake: link against boost static when --static-boost is specified `--static-boost` is an option provided by `configure.py`. this option is not used by our CI or building scripts. but in order to be compatible with the existing behavior of `configure.py`, let's support this option when building with CMake. `Boost_USE_STATIC_LIBS` is a cmake variable supported by CMake's FindBoost and Boost's own `BoostConfig.cmake`. see https://cmake.org/cmake/help/latest/module/FindBoost.html#other-variables by default boost is linked via its shared libraries. by setting this variable, we link boost's static libraries. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16545	2023-12-25 18:23:49 +02:00
Avi Kivity	da022ca4e8	Merge 'build: cmake: add "mode_list" target ' from Kefu Chai scylla uses build modes like "debug" and "release" to differentiate different build modes. while we intend to use the typical build configurations / build types used by CMake like "Debug" and "RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and CMAKE_BUILD_TYPE. the former is used for naming the build directory and for the preprocess macro named "SCYLLA_BUILD_MODE". `test.py` and scylladb's CI are designed based on the naming of build directory. in which, `test.py` lists the build modes using the dedicated build target named `list_modes`, which is added by `configure.py`. so, in this change, the target is added to CMake as well. the variables of "scylla_build_mode" defined by the per-mode configuration are collected and printed by the `list_modes`. because, by default, CMake generates a target for each build configuration when a multi-config generator is used. but we only want to print the build mode for a single time when "list_modes" is built. so a "BYPRODUCTS" is deliberately added for the target, and the patch of this "BYPRODUCTS" is named without the "$<CONFIG>" it its path. Closes scylladb/scylladb#16532 * github.com:scylladb/scylladb: build: cmake: add "mode_list" target build: cmake: define scylla_build_mode	2023-12-25 18:20:34 +02:00
Kefu Chai	4a817f8a2a	data_dictionary: use insert_or_assign() when appropriate when compiling clang-18 in "release" mode, `assert()` is optimized out. so `i` is not used. and clang complains like: ``` /home/kefu/dev/scylladb/data_dictionary/user_types_metadata.hh:29:14: error: unused variable 'i' [-Werror,-Wunused-variable] 29 \| auto i = _user_types.find(type->_name); \| ^ ``` in this change, we use `i` as the hint for the insertion, for two reasons: - silence the warning. - avoid the looking up in the unordered_map twice with the same key. `type` is not moved away when being passed to `insert_or_assign()`, because otherwise, `type->_name` could be referencing a moved-away shared_ptr, because the order of evaluating a function's parameter is not determined. since `type` is a shared_ptr, the overhead is negligible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16530	2023-12-25 18:18:20 +02:00
Takuya ASADA	0b894a7cac	locator::ec2_snitch: change retry logic to exponential backoff Since Amazon recommended to use exponential backoff logic when retries to call AWS API, we should switch the logic on ec2_snitch. see https://docs.aws.amazon.com/general/latest/gr/api-retries.html Related with #12160 Closes scylladb/scylladb#13442	2023-12-25 18:17:23 +02:00
Yaron Kaikov	8917947f29	build_docker: Add `description` and `summary` labels Adding description and summary labels to our docker images per @tzach and @mykaul request, Closes scylladb/scylladb#16419	2023-12-25 18:14:56 +02:00
Pavel Emelyanov	ac3dd4bf5d	test: Coroutinize some secondary_index_test cases Now they are long then-chains that are hard to read Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16547	2023-12-25 18:08:19 +02:00
Nadav Har'El	55317666c6	test/cql-pytest: check that most aggregators don't take "" Although you can "SELECT COUNT()", this has special handling in the CQL parser (it is converted into a special row-counting request) and you can't give "" to other aggregators - e.g., "SELECT SUM()". This patch includes a simple test that confirms this. I wanted to check this in relation to the previous patch, which did, sort of, a "SELECT $$first$$(*)" - a syntax which this test shows wouldn't have actually worked if we tried it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-25 17:53:42 +02:00
Nadav Har'El	e2773b4a3a	cql-pytest: add reproducer for GROUP BY regression test/cql-pytest/test_group_by.py has tests that verifies that requests like SELECT p,c1,c2,v FROM tbl WHERE p=0 GROUP BY p work as expected - the "GROUP BY p" means in this case that we should only return the first row in the p=0 partition. As a user discovered, it turns out that the almost identical request: SELECT * FROM tbl WHERE p=0 GROUP BY p Doesn't work the same - before the fix in the previous patch, it erroneously returned all rows in p=0, not just the first one. The test in this patch demonstrates this - it fails on Scylla 5.4, passes on Scylla 5.2 and on Cassandra - and passes when the fix from the previous patch is used. This patch includes another tiny test, to check the interaction of GROUP BY with filtering. This second test passes on Scylla - but I want it in anyway because it is yet another interaction that might break (the user that reported #16531 also had filtering, and I was worried it might have been related). Refs #16531 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-25 17:53:42 +02:00
Nadav Har'El	1aea2136c8	cql: fix regression in SELECT * GROUP BY Recently, the expression-rewrite effort changed the way that GROUP BY is implemented. Usually GROUP BY involves an aggregation function (e.g., if you want a separate SUM per partition). But there's also a query like SELECT p, c1, c2, v FROM tbl GROUP BY p This query is supposed to return one row - the first row in clustering order - per group (in this case, partition). The expression rewrite re-implemented this feature by introducing a new internal aggregator, first(), which returns the first aggregated value. The above query is rewritten into: SELECT first(p), first(c1), first(c2), first(v) FROM tbl GROUP BY p This case works correctly, and we even have a regression test for it. But unfortunately the rewrite broke the following query: SELECT * FROM tbl GROUP BY p Note the "" instead of the explicit list of columns. In our implementation, a selection of "" is looks like an empty selection, and it didn't get the "first()" treatment and it remained a "SELECT " - and wrongly returned all rows instead of just the first one in each partition. This was a regression - it worked correctly in Scylla 5.2 (and also in Cassandra) - see the next patch for a regression test. In this patch we fix this regression. When there is a GROUP BY, the "" is rewritten to the appropriate list of all visible columns and then gets the first() treatment, so it will return only the first row as expected. The next patch will be a test that confirms the bug and its fix. Fixes #16531 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-25 17:52:57 +02:00
Avi Kivity	a7efaca878	Merge 'Move initial_tablets to system_schema.scylla_keyspaces' from Pavel Emelyanov Right now the initial_tablets is kept as replication strategy option in the legacy system_schema.keyspaces table. However, r.s. options are all considered to be replication factors, not anything else. Other than being confusing, this also makes it impossible to extend keyspace configuration with non-integer tablets-related values. This PR moves the initial_tablets into scylla-specific part of the schema. This opens a way to more ~~ugly~~ flexible ways of configuring tablets for keyspace, in particular it should be possible to use boolean on/off switch in CREATE KEYSPACE or some other trick we find appropriate. Mos of what this PR does is extends arguments passed around keyspace_metadata and abstract_replication_strategy. The essence of the change is in last patches * schema_tables: Relax extract_scylla_specific_ks_info() check * locator,schema: Move initial tablets from r.s. options to params refs: #16319 refs: #16364 Closes scylladb/scylladb#16555 * github.com:scylladb/scylladb: test: Add sanity tests for tablets initialization and altering locator,schema: Move initial tablets from r.s. options to params schema_tables: Relax extract_scylla_specific_ks_info() check locator: Keep optional initial_tablets on r.s. params ks_prop_defs: Add initial_tablets& arg to prepare_options() keyspace_metadata: Carry optional<initial_tablets> on board locator: Pass abstract_replication_strategy& into validate_tablet_options() locator: Carry r.s. params into process_tablet_options() locator: Call create_replication_strategy() with r.s. params locator: Wrap replication_strategy_config_options into replication_strategy_params locator: Use local members in ..._replication_strategy constructors	2023-12-25 17:44:10 +02:00
Pavel Emelyanov	1d2c871219	test: Add sanity tests for tablets initialization and altering Check that the initial_tablets appears in system_schema.scylla_keyspaces if turned on explicitly Check that it's possible to change initial_tablets with ALTER KEYSPACE Check that changing r.s. from simple to network-topology doesn't activate tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:09:01 +03:00
Pavel Emelyanov	c43501d973	locator,schema: Move initial tablets from r.s. options to params The option is kepd in DDL, but is _not_ stored in system_schema.keyspaces. Instead, it's removed from the provided options and kept in scylla_keyspaces table in its own column. All the places that had optional initial_tablets disengaged now set this value up the way the find appropriate. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:07:10 +03:00
Pavel Emelyanov	30e7273658	schema_tables: Relax extract_scylla_specific_ks_info() check Nowadays reading scylla-specific info from schema happens under respective schema feature. However (at least in raft case) when a new node joins the cluster merging schema for the first time may happen _before_ features are merged and enabled. Thus merging schema can go the wrong way by errorneously skipping the scylla-specific info. On the other hand, if system_schema.scylla_keyspaces is there it's there, there's no reason _not_ to pick this data up in that case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:05:01 +03:00
Pavel Emelyanov	562fcf0c19	locator: Keep optional initial_tablets on r.s. params Now all the callers have it at hands (spoiler: not yet initialized, but still) so the params can also have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:02:41 +03:00
Pavel Emelyanov	2d480a2093	ks_prop_defs: Add initial_tablets& arg to prepare_options() The prepare_options() method is in charge of pre-tuning the replication strategy CQL parameters so that real keyspace and r.s. creation code doesn't see some of those. The "initial_tablets" option is going to be removed from the real options and be placed into scylla-specific part of the schema. So the prepare_options() will need to modify both -- the legacy options _and_ the (soon to be separate) initial_tablets thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:00:50 +03:00
Pavel Emelyanov	a67c535539	keyspace_metadata: Carry optional<initial_tablets> on board The object in question fully describes the keyspace to be created and, among other things, contains replication strategy options. Next patches move the "initial_tablets" option out of those options and keep it separately, so the ks metadata should also carry this option separately. This patch is _just_ extending the metadata creation API, in fact the new field is unused (write-only) so all the places that need to provide this data keep it disengaged and are explicitly marked with FIXME comment. Next patches will fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:58:05 +03:00
Pavel Emelyanov	45f4276de6	locator: Pass abstract_replication_strategy& into validate_tablet_options() It will need to check if the r.s. in question had been marked as per-table one in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:56:49 +03:00
Pavel Emelyanov	bf824d79d9	locator: Carry r.s. params into process_tablet_options() The latter method is the one that will need extended params in next patches. It's called from network_topology_strategy() constructor which already has params at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:56:02 +03:00
Pavel Emelyanov	a943bd927b	locator: Call create_replication_strategy() with r.s. params Previous patch added params to r.s. classes' constructors, but callers don't construct those directly, instead they use the create_r.s.() wrapper. This patch adds params to the wrapper too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:54:59 +03:00
Pavel Emelyanov	f88ba0bf5a	locator: Wrap replication_strategy_config_options into replication_strategy_params When replication strategy class is created caller parr const reference on the config options which is, in turn, a map<string, string>. In the future r.s. classes will need to get "scylla specific" info along with legacy options and this patch prepares for that by passing more generic params argument into constructor. Currently the only inhabitant of the new params is the legacy options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:53:03 +03:00
Pavel Emelyanov	ecbafd81f2	locator: Use local members in ..._replication_strategy constructors The `config_options` arg had been used to initialize `_config_options` field of the base abstract_replication_strategy class, so it's more idiomatic to use the latter. Also it makes next patches simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:51:51 +03:00
Pavel Emelyanov	f621afa3ec	database: Copy storage options too when updating keyspace metadata When altering a keyspace several keyspace_metadata objects are created along the way. The last one, that is then kept on the keyspace_metadata object, forgets to get its copy of storage options thus transparently converting to LOCAL type. The bug surfaces itself when altering replication strategy class for S3-backed storage -- the 2nd attempt fails, because after the 1st one the keyspace_metadata gets LOCAL storage options and changing storage options is not allowed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16524	2023-12-25 13:31:15 +02:00
Benny Halevy	060b16f987	view: apply_to_remote_endpoints: fix use-after-free `b815aa021c` added a yield before the trace point, causing the moved `frozen_mutation_and_schema` (and `inet_address_vector_topology_change`) to drop out of scope and be destroyed, as the rvalue-referenced objects aren't moved onto the coroutine frame. This change passes them by value rather than by rvalue-reference so they will be stored in the coroutine frame. Fixes #16540 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16541	2023-12-24 21:43:48 +02:00
Botond Dénes	da033343b7	tools/schema_loader: read_schema_table_mutation(): close the reader The reader used to read the sstables was not closed. This could sometimes trigger an abort(), because the reader was destroyed, without it being closed first. Why only sometimes? This is due to two factors: * read_mutation_from_flat_mutation_reader() - the method used to extract a mutation from the reader, uses consume(), which does not trigger `set_close_is_required()` (#16520). Due to this, the top-level combined reader did not complain when destroyed without close. * The combined reader closes underlying readers who have no more data for the current range. If the circumstances are just right, all underlying readers are closed, before the combined reader is destoyed. Looks like this is what happens for the most time. This bug was discovered in SCT testing. After fixing #16520, all invokations of `scylla-sstable`, which use this code would trigger the abort, without this patch. So no further testing is required. Fixes: #16519 Closes scylladb/scylladb#16521	2023-12-24 17:21:32 +02:00
Nadav Har'El	6640278aa7	materialized views: fix typo in comment Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-24 10:12:44 +02:00
Nadav Har'El	f9f20e779c	test_mv_tablets: simplify lookup of tablets The tests looked up a table's tablets in an elaborate two-stage search - first find the table's "id", and then look up this id in the list of tablets. It is much simpler to just look up the table's name directly in the list of tablets - although this name is not a key, an ALLOW FILTERING search is good enough for a test. As a bonus, with the new technique we don't care if the given name is the name of a table or a view, further simplifying the test. This is just a test code cleanup - there is no functional change in the test. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-24 10:12:44 +02:00
Nadav Har'El	cdd5b19f12	alternator, tablets: improve Alternator LSI tablets test The test test_tablet_alternator_lsi_consistency, checking that Alternator LSI allow strongly-consistent reads even with tablets, used a large cluster (10 nodes), to improve the chance of reaching an "unlucky" tablet placement - and even then only failed in about half the runs without the code fixed. In this patch, we rewrite the test using a much more reliable approach: We start only two nodes, and force the base's tablet onto one node, and the view's table onto the second node. This ensures with 100% certainty that the view update is remote, and the new test fails every single time before the code fix (I reverted the fix to verify) - and passes after it. The new test is not only more reliable, it's also significantly faster because it doesn't need to start a 10-node cluster. We can also remove the tag that excluded this test from debug build mode tests because the 10-node boot was too slow. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-24 10:11:43 +02:00
Kefu Chai	2bec6751d3	build: cmake: add "mode_list" target scylla uses build modes like "debug" and "release" to differentiate different build modes. while we intend to use the typical build configurations / build types used by CMake like "Debug" and "RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and CMAKE_BUILD_TYPE. the former is used for naming the build directory and for the preprocess macro named "SCYLLA_BUILD_MODE". `test.py` and scylladb's CI are designed based on the naming of build directory. in which, `test.py` lists the build modes using the dedicated build target named `list_modes`, which is added by `configure.py`. so, in this change, the target is added to CMake as well. the variables of "scylla_build_mode" defined by the per-mode configuration are collected and printed by the `list_modes`. because, by default, CMake generates a target for each build configuration when a multi-config generator is used. but we only want to print the build mode for a single time when "list_modes" is built. so a "BYPRODUCTS" is deliberately added for the target, and the patch of this "BYPRODUCTS" is named without the "$<CONFIG>" it its path. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-24 12:35:02 +08:00
Kefu Chai	79943e0516	build: cmake: define scylla_build_mode scylla uses build modes like "debug" and "release" to differentiate different build modes. while we intend to use the typical build configurations / build types used by CMake like "Debug" and "RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and CMAKE_BUILD_TYPE. the former is used for naming the build directory and for the preprocess macro named "SCYLLA_BUILD_MODE". `test.py` and scylladb's CI are designed based on the naming of build directory. in which, `test.py` lists the build modes using the dedicated build target named "list_modes", which is added by `configure.py`. so, in this change, to prepare for adding the target, "scylla_build_mode" is defined, so we can reuse it in a following-up change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-24 12:28:23 +08:00
Tomasz Grabiec	2590274f95	Merge 'Don't allow ALTER KEYSPACE to change replication strategy vnode/per-table flavor' from Pavel Emelyanov This switch is currently possible, but results in not supported keyspace state Closes scylladb/scylladb#16513 * github.com:scylladb/scylladb: test: Add a test that switching between vnodes and tablets is banned cql3/statements: Don't allow switching between vnode and per-table replication strategies cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate	2023-12-22 17:22:36 +01:00
Kefu Chai	642652efab	test/cql-pytest/test_tools.py: test shard-of with a single partition test_scylla_sstable_shard_of takes lots of time preparing the keys for a certain shard. with the debug build, it takes 3 minutes to complete the test. so in order to test the "shard-of" subcommand in an more efficient way, in this change, we improve the test in two ways: 1. cache the output of 'scylla types shardof`. so we can avoid the overhead of running a seastar application repeatly for the same keys. 2. reduce the number of partitions from 42 to 1. as the number of partitions in an sstable does not matter when testing the output of "shard-of" command of a certain sstable. because, the sstable is always generated by a certain shard. before this change, with pytest-profiling: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 181.950 60.650 runner.py:219(call_and_report) 4/3 0.000 0.000 181.948 60.649 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 181.948 60.649 runner.py:318(from_call) 4/3 0.000 0.000 181.948 60.649 runner.py:262(<lambda>) 44/11 0.000 0.000 181.935 16.540 _hooks.py:427(__call__) 43/11 0.000 0.000 181.935 16.540 _manager.py:103(_hookexec) 43/11 0.000 0.000 181.935 16.540 _callers.py:30(_multicall) 361 0.001 0.000 181.531 0.503 contextlib.py:141(__exit__) 782/81 0.001 0.000 177.578 2.192 {built-in method builtins.next} 1044 0.006 0.000 92.452 0.089 base_events.py:1894(_run_once) 11 0.000 0.000 91.129 8.284 fixtures.py:686(<lambda>) 17/11 0.000 0.000 91.129 8.284 fixtures.py:1025(finish) 4 0.000 0.000 91.128 22.782 fixtures.py:913(_teardown_yield_fixture) 2/1 0.000 0.000 91.055 91.055 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 91.055 91.055 runner.py:119(runtestprotocol) 2 0.000 0.000 91.052 45.526 conftest.py:50(cql) 2 0.000 0.000 91.040 45.520 util.py:161(cql_session) 1 0.000 0.000 91.040 91.040 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 91.040 91.040 runner.py:509(teardown_exact) 1945 0.002 0.000 90.722 0.047 events.py:82(_run) ``` after this change: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 8.271 2.757 runner.py:219(call_and_report) 44/11 0.000 0.000 8.270 0.752 _hooks.py:427(__call__) 44/11 0.000 0.000 8.270 0.752 _manager.py:103(_hookexec) 44/11 0.000 0.000 8.270 0.752 _callers.py:30(_multicall) 4/3 0.000 0.000 8.269 2.756 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 8.269 2.756 runner.py:318(from_call) 4/3 0.000 0.000 8.269 2.756 runner.py:262(<lambda>) 48 0.000 0.000 8.269 0.172 {method 'send' of 'generator' objects} 27 0.000 0.000 5.671 0.210 contextlib.py:141(__exit__) 11 0.000 0.000 4.297 0.391 fixtures.py:686(<lambda>) 2/1 0.000 0.000 4.228 4.228 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 4.228 4.228 runner.py:119(runtestprotocol) 2 0.000 0.000 4.213 2.106 capture.py:877(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:509(teardown_exact) 2 0.000 0.000 3.628 1.814 capture.py:872(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 runner.py:160(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 python.py:1797(runtest) 114/81 0.001 0.000 3.505 0.043 {built-in method builtins.next} 15 0.784 0.052 3.183 0.212 subprocess.py:417(check_output) ``` Fixes #16516 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16523	2023-12-22 15:20:03 +02:00
Petr Gusev	c05fd8c018	storage_service: node_ops_cmd_handler: decommission rollback, ignore the node if's already removed This is a regression after #15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes #16466 Closes scylladb/scylladb#16508	2023-12-22 12:42:38 +01:00
Avi Kivity	6f6170aae7	Update seastar submodule * seastar ae8449e04f...e0d515b6cf (18): > reactor: poll less frequently in debug mode > build: s/exec_program/execute_process/ > Merge 'httpd: support temporary redirect from inside async reply' from Noah Watkins > Merge 'core: enable seastar to run multiple times in a single process' from Kefu Chai > rpc/rpc_types: add formatter for rpc::optional<T> > memory: do not set_reclaim_hook if cpu_mem_ptr is not set > circleci: do not set disable dpdk explicitly > fair_queue: Do not pop unplugged class immediately > build: install Finducontext.cmake and FindSystem-SDT.cmake > treewide: include used headers > build: define SEASTAR_COROUTINES_ENABLED for Seastar module > seastar.cc: include "core/prefault.hh" > build: enable build C++20 modules with GCC 14 > build: replace seastar_supports_flag() with check_cxx_compiler_flag() > Merge 'build: cleanups configure.py to be more PEP8 compatible' from Kefu Chai > circleci: build with dpdk enabled > build: add "--enable-cxx-modules" option to configure.py > build: use a different *_CMAKE_API for CMake 3.27 Closes scylladb/scylladb#16500	2023-12-22 12:58:39 +02:00
Tzach Livyatan	45ffa5221e	Improve nodetool scrub definition fix #16505 Closes scylladb/scylladb#16518	2023-12-22 12:09:58 +02:00
Tomasz Grabiec	9c7e5f6277	Merge 'Fix secondary index feature with tablets' from Nadav Har'El Before this series, materialized views already work correctly on keyspaces with tablets, but secondary indexes do not. The goal of these series is make CQL secondary indexes fully supported on tablets: 1. First we need to make CREATE INDEX work with tablets (it didn't before this series). Fixes #16396. 2. Then we need to keep the promise that our documentation makes - that local secondary index should be synchronously updated - Fixes #16371. As you can see in the patches below, and as was expected already in the design phase, the code changes needed to make indexes support tablets were minimal. But writing reliable tests for these issues was the biggest effort that went into this series. Closes scylladb/scylladb#16436 * github.com:scylladb/scylladb: secondary-index, tablets: ensure that LSI are synchronous test: add missing "tags" schema extension to cql_test_env mv, test: fix delay_before_remote_view_update injection point secondary index: fix view creation when using tablets	2023-12-21 23:37:00 +01:00
Botond Dénes	1ce07c6f27	test/cql-pytest: test_select_from_mutation_fragments: bump timeout for test_many_partitions The test test_many_partitions is very slow, as it tests a slow scan over a lot of partitions. This was observed to time out on the slower ARM machines, making the test flaky. To prevent this, create an extra-patient cql connection with a 10 minutes timeout for the scan itself. This is a follow-up to `fb9379edf1`, which attempted to fix this, but didn't patch all the places doing slow scans. This patch fixes the other scan, the one actually observed to time-out in CI. Fixes: #16145 Closes scylladb/scylladb#16370	2023-12-21 19:55:06 +02:00
Pavel Emelyanov	a03755d6d7	test: Add a test that switching between vnodes and tablets is banned Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:57:55 +03:00
Pavel Emelyanov	4de433ac23	cql3/statements: Don't allow switching between vnode and per-table replication strategies When ALTER-ing a keyspace one may as well change its vnode/tablet flavor, which is not currently supported, so prohibit this change explicitly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:57:00 +03:00
Pavel Emelyanov	299219833b	cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate For convenience of next patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:56:18 +03:00
Nadav Har'El	79011eeb24	Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity Both virtual tables and schema registry contain thread_local caches that are destroyed at thread exit. after a Seastar change[1], these destructions can happen after the reactor is destroyed, triggering a use-after-free. Fix by scoping the destruction so it takes place earlier. [1] `101b245ed7` Closes scylladb/scylladb#16510 * github.com:scylladb/scylladb: schema_registry, database: flush entries when no longer in use virtual_tables: scope virtual tables registry in system_keyspace	2023-12-21 17:10:25 +02:00
Avi Kivity	c00b376a3e	schema_registry, database: flush entries when no longer in use The schema registry disarms internal timers when it is destroyed. This accesses the Seastar reactor. However, after [1] we don't have ordering between the reactor destruction and the thread_local registry destruction. Fix this by flushing all entries when the database is destroyed. The database object is fundamental so it's unlikely we'll have anything using the registry after it's gone. [1] `101b245ed7`	2023-12-21 17:00:41 +02:00
Michał Chojnowski	d7b524cf10	main: add a call to LLVM profile dump before exit Scylla skips exit hooks so we have to manually trigger the data dump to disk from the LLVM profiling instrumentation runtime which we need in order to support code coverage. We use a weak symbol to get the address of the profile dump function. This is legal: the function is a public interface of the instrumentation runtime. Closes scylladb/scylladb#16430	2023-12-21 16:48:42 +02:00
Avi Kivity	2853f79f96	virtual_tables: scope virtual tables registry in system_keyspace Virtual tables are kept in a thread_local registry for deduplication purposes. The problem is that thread_local variables are destroyed late, possibly after the schema registry and the reactor are destroyed. Currently this isn't a problem, but after a seastar change to destroy the reactor after termination [1], things break. Fix by moving the registry to system_keyspace. system_keyspace was chosen since it was the birthplace of virtual tables. Pimpl is used to avoid increasing dependencies. [1] `101b245ed7`	2023-12-21 16:19:42 +02:00
Nadav Har'El	a41140f569	Merge 'scylla-sstable: handle attempt to load schema for non-existent tables more gracefully' from Botond Dénes In other words, print more user-friendly messages, and avoid crashing. Specifically: * Don't crash when attempting to load schema tables from configured data-dir, while configuration does not have any configured data-directories. * Detect the case where schema mutations have no rows for the current table -- the keyspace exists, but the table doesn't. * Add negative tests for schema-loading. Fixes: https://github.com/scylladb/scylladb/issues/16459 Closes scylladb/scylladb#16494 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add test for failed schema loadig tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs tools/schema_loader: also check for empty table/column mutations tools/schema_loader: log more details when loading schema from schema tables	2023-12-21 15:40:51 +02:00
Kefu Chai	6018e0fea7	database: log when done with truncating truncating is an unusual operation, and we write a logging message when the truncate op starts with INFO level, it would be great if we can have a matching logging messge indicating the end of truncate on the server side. this would help with investigation the TRUNCATE timeout spotted on the client. at least we can rule out the problem happening we server is performing truncate. Refs #15610 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16247	2023-12-21 13:59:09 +02:00
Raphael S. Carvalho	5e55954f27	replica: Make the storage snapshot survive concurrent compactions Consider this: 1) file streaming takes storage snapshot = list of sstables 2) concurrent compaction unlink some of those sstables from file system 3) file streaming tries to send unlinked sstables, but files other than data and index cannot be read as only data and index have file descriptors opened To fix it, the snapshot now returns a set of files, one per sstable component, for each sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16476	2023-12-21 12:50:28 +02:00
Botond Dénes	e6147c1853	Merge 'Some cleanup in compaction group' from Raphael "Raph" Carvalho Closes scylladb/scylladb#16448 * github.com:scylladb/scylladb: replica: Fix indentation replica: Kill unused calculate_disk_space_used_for()	2023-12-21 12:48:38 +02:00
Nadav Har'El	a613a3cad2	secondary-index, tablets: ensure that LSI are synchronous CQL Local Secondary Index is a Scylla-only extension to Cassandra's secondary index API where the index is separate per partition. Scylla's documentation guarantees that: "As of Scylla Open Source 4.0, updates for local secondary indexes are performed synchronously. When updates are synchronous, the client acknowledges the write operation only after both the base table modification and the view up date are written." This happened automatically with vnodes, because the base table and the view have the same partition key, so base and view replicas are co-located, and the view update is always local and therefore done synchronously. But with tablets, this does NOT happen automatically - the base and view tablets may be located on different nodes, and the view update may be remote, and NOT synchronous. So in this patch we explicitly mark the view as synchronous_update when building the view for an LSI. The bigger part of this patch is to add a test which reliably fails before this patch, and passes after it. The test creates a two-node cluster and a table with LSI, and pins the base's tablets to one node and the view's to the second node, forcing the view updates to be remote. It also uses an injection point to make the view update slower. The test then writes to the base and immediately tries to use the index to read. Before this patch, the read doesn't find the new data (contrary to the guarantee in the documentation). After this patch, the read does find the new data - because the write waited for the index to be updated. Fixes #16371 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	7c5092cb8f	test: add missing "tags" schema extension to cql_test_env One of the unfortunate anti-features of cql_test_env (the framework used in our CQL tests that are written in C++) is that it needs to repeat various bizarre initializations steps done in main.cc, otherwise various requests work incorrectly. One of these steps that main.cc is to initialize various "schema extensions" which some of the Scylla features need to work correctly. We remembered to initialize some schema extensions in cql_test_env, but forgot others. The one I will need in the following patch is the "tags" extension, which we need to mark materialized views used by local secondary indexes as "synchronous_updates" - without this patch the LSI tests in secondary_index_test.cc will crash. In addition to adding the missing extension, this patch also replaces the segmentation-fault crash when it's missing (caused by a dynamic cast failure) by a clearer on_internal_error() - so if we ever have this bug again, it will be easier to debug. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	b815aa021c	mv, test: fix delay_before_remote_view_update injection point The "delay_before_remote_view_update" is a recently-added injection point which should add a delay before remove view updates, but NOT force the writer to wait for it (whether the writer waits for it or not depends on whether the view is configured as synchronous or not). Unfortunately, the delay was added at the WRONG place, which caused it to sometimes be done even on asynchronous views, breaking (with false-negative) the tests that need this delay to reproduce bugs of missing synchronous updates (Refs #16371). The fix here is even simpler then the (wrong) old code - we just add the sleep to the existing function apply_to_remote_endpoints() instead of making the caller even more complex. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	8181e28731	secondary index: fix view creation when using tablets In commit `88a5ddabce`, we fixed materialized view creation to support tablets. We added to the function called to create materialized views in CQL, prepare_new_view_announcement() a missing call to the on_before_create_column_family() notifier that creates tablets for this new view. Unfortunately, We have the same problem when creating a secondary index, because it does not use prepare_new_view_announcement(), and instead uses a generic function to "update" the base table, which in some cases ends up creating new views when a new index is requested. In this path, the notifier did not get called to the notifier, so we must add it here too. Unfortunately, the notifiers must run in a Seastar thread, which means that yet another function now needs to run in a Seastar thread. Before this patch, creating a secondary index in a table using tablets fails with "Tablet map not found for table <uuid>". With this patch, it works. The patch also includes tests for creating a regular and local secondary index. Both tests fail (with the aforementioned error) before this patch, and pass with it. Fixes #16396 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Raphael S. Carvalho	ee203f846e	test: Fix segfault when running offstrategy test Observer, that references table_for_test, must of course, not outlive table_for_test. Observer can be called later after the last input sstable is removed from sstable manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16428	2023-12-20 19:04:41 +02:00
David Garcia	9af6c7e40b	docs: add myst parser Closes scylladb/scylladb#16316	2023-12-20 19:04:41 +02:00
Raphael S. Carvalho	d1e6dfadea	sstables: Harden estimate_droppable_tombstone_ratio() interface The interface is fragile because the user may incorrectly use the wrong "gc before". Given that sstable knows how to properly calculate "gc before", let's do it in estimate__d__t__r(), leaving no room for mistakes. sstable_run's variant was also changed to conform to new interface, allowing ICS to properly estimate droppable ratio, using GC before that is calculated using each sstable's range. That's important for upcoming tablets, as we want to query only the range that belongs to a particular tablet in the repair history table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15931	2023-12-20 19:04:41 +02:00
Botond Dénes	758d9cf005	Merge 'build: cmake: map 'release' to 'RelWithDebInfo'' from Kefu Chai this preserves the existing behavior of `configure.py` in the CMake generated `build.ninja`. * configure.py: map 'release' to 'RelWithDebInfo' * cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake * CMakeLists.txt: s/Release/RelWithDebInfo/ Closes scylladb/scylladb#16479 * github.com:scylladb/scylladb: build: cmake: map 'release' to 'RelWithDebInfo' build: define BuildType for enclosing build_by_default	2023-12-20 19:04:40 +02:00
Pavel Emelyanov	5866d265c3	Merge ' tools/utils: tool_app_template: handle the case of no args ' from Botond Dénes Currently, `tool_app_template::run_async()` crashes when invoked with empty argv (with just `argv[0]` populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of `argv[1]` to get the current operation. To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2. Fixes: #16451 Closes scylladb/scylladb#16456 * github.com:scylladb/scylladb: test: add regression tests for invoking tools with no args tools/utils: tool_app_template: handle the case of no args tools/utils: tool_app_template: remove "scylla-" prefix from app name	2023-12-20 19:04:40 +02:00
Kamil Braun	6fcaec75db	Merge 'Add maintenance socket' from Mikołaj Grzebieluch It enables interaction with the node through CQL protocol without authentication. It gives full-permission access. The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node. It is created before the node joins the cluster. To set up the maintenance socket, use the `maintenance-socket` option when starting the node. * If set to `ignore` maintenance socket will not be created. * If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`. * Otherwise maintenance socket will be created in the specified path. The default value is `ignore`. * With python driver ```python from cassandra.cluster import Cluster from cassandra.connection import UnixSocketEndPoint from cassandra.policies import HostFilterPolicy, RoundRobinPolicy socket = "<node's workdir>/cql.m" cluster = Cluster([UnixSocketEndPoint(socket)], # Driver tries to connect to other nodes in the cluster, so we need to filter them out. load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket)) session = cluster.connect() ``` Merge note: apparently cqlsh does not support unix domain sockets; it will have to be fixed in a follow-up. Closes scylladb/scylladb#16172 * github.com:scylladb/scylladb: test.py: add maintenance socket test test.py: enable maintenance socket in tests by default docs: add maintenance socket documentation main: add maintenance socket main: refactor initialization of cql controller and auth service auth/service: don't create system_auth keyspace when used by maintenance socket cql_controller: maintenance socket: fix indentation cql_controller: add option to start maintenance socket db/config: add maintenance_socket_enabled bool class auth: add maintenance_socket_role_manager db/config: add maintenance_socket variable	2023-12-20 19:04:40 +02:00
Botond Dénes	5ef0d16eb3	test/cql-pytest: test_tools.py: add test for failed schema loadig	2023-12-20 10:31:03 -05:00
Botond Dénes	3e0058a594	tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs The configuration is not guaranteed to have any, so use the safe variant, to simply abort the schema load attempt, instead of crashing the tool.	2023-12-20 10:31:03 -05:00
Botond Dénes	208d2e890e	tools/schema_loader: also check for empty table/column mutations system_schema.tables and system_schema.columns must have content for every existing table. To detect a failed load of a table, before attempting to invoke `db::schema_tables::create_table_from_mutations()`, we check for the mutations read from these two tables, to not be disengaged. There is another failure scenario however. The mutations are not null, but do not have any clustering rows. This currently results in a cryptic error message, about failing to lookup a row in a result-set. This happens when the lookup-up keyspace exists, but the table doesn't. Add this to the check, so we get a human-readeable error message when this happens.	2023-12-20 10:31:00 -05:00
Botond Dénes	81e5033902	tools/schema_loader: log more details when loading schema from schema tables Currently, there is no visibility at all into what happens when attempting to load schema from schema tables. If it fails, we are left guessing on what went wrong. Add a logger and add various debug/trace logs to help following the process and identify what went wrong.	2023-12-20 10:30:21 -05:00
Nadav Har'El	7ee55dd03e	cdc, tablets: don't allow enabling CDC with tablets We do not yet support enabling CDC in a keyspace that uses tablets (Refs #16317). But the problem is that today, if this is attempted, we get a nasty failure: the CDC code creates the extra CDC log table, it doesn't get tablets, and Raft gets surprised and croaks with a message like: Raft instance is stopped, reason: "background error, std::_Nested_exceptionraft::state_machine_error (State machine error at raft/server.cc:1230): std::runtime_error (Tablet map not found for table 48ca1620-9ea5-11ee-bd7c-22730ed96b85) After Raft croaks, Scylla never recovers until it is rebooted. In this patch, we replace this disaster by a graceful error - a CREATE TABLE or ALTER TABLE operation with CDC enabled will fail in a clear way, and allowing Scylla to continue operating normally after this failed request. This fix is important for allowing us to run tests on Scylla with tablets, and although CDC tests will fail as expected, they won't fail the other tests that follow (Refs #16473). Fixes #16318 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16474	2023-12-20 10:06:34 +01:00
Kamil Braun	ffb6ae917f	Merge 'Add support for tablets in Alternator' from Nadav Har'El The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views) to work. After this series support for tablets in Alternator _mostly_ work, but not completely: 1. CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams"). 2. Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace. Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight. Fixes #16203 Fixes #16313 Closes scylladb/scylladb#16353 * github.com:scylladb/scylladb: mv, tablets, alternator: test for Alternator LSI with tablets mv: coroutinize wait code for remote view updates mv, test: add injection point to delay remove view update alternator: explicitly request synchronous updates for LSI alternator: fix view creation when using tablets alternator: add experimental method to create a table with tablets	2023-12-20 10:00:31 +01:00
Kamil Braun	1f6460972b	Merge 'Fix crash on table drop concurrent with streaming ' from Tomasz Grabiec The observed crash was in the following piece on "cf" access: if (table_is_dropped) { sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name()); Fixes #16181 Also, add a test case which reproduces the problem by doing table drop during tablet migration. But note that the problem is not tablet-specific. Closes scylladb/scylladb#16341 github.com:scylladb/scylladb: test: tablets: Add test case which tests table drop concurrent with migration tests: tablets: Do read barrier in get_tablet_replicas() streaming: Keep table by shared ptr to avoid crash on table drop	2023-12-20 09:57:06 +01:00
Kefu Chai	db9e314965	treewide: apply codespell to the comments in source code for less spelling errors in comment. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16408	2023-12-20 10:25:03 +02:00
Kefu Chai	fafe9d9c38	build: cmake: map 'release' to 'RelWithDebInfo' this preserves the existing behavior of `configure.py` in the CMake generated `build.ninja`. * configure.py: map 'release' to 'RelWithDebInfo' * cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake * CMakeLists.txt: s/Release/RelWithDebInfo/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-20 15:07:43 +08:00
Kefu Chai	72dcb2466d	build: define BuildType for enclosing build_by_default in existing `modes` defined in `configure.py`, "release" is mapped to "RelWithDebInfo". this behavior matches that of seastar's `configure.py`, where we also map "release" build mode to "RelWithDebInfo" CMAKE_BUILD_TYPE. but in scylladb's existing cmake settings, it maps "release" to "Release", despite "Release" is listed as one of the typical CMAKE_BUILD_TYPE values. so, in this change, to prepare for the mapping, `BuildType` is introduced to map a build mode to its related settings. the building settings are still kept in `cmake.${CMAKE_BUILD_TYPE}.cmake`, but the other settings, like if a build type should be enabled or its mappings, are stored in `BuildType` in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-20 15:07:43 +08:00
Nadav Har'El	2e031f2d8e	mv, tablets, alternator: test for Alternator LSI with tablets This patch adds a test (in the topology test framework) for issue #16313 - the bug where Alternator LSI must use synchronous view updates but didn't. This test fails with high probability (around 50%) before the previous patch, which fixed this bug - and passes consistently after the patch (I ran it 100 times and it didn't fail even once). This is the first test in the topology framework that uses the DynamoDB API and not CQL. This required a couple of tiny convenience functions, which are introduced in the only test file that uses them - but if we want we can later move them out to a library file. Unfortunately, the standard AWS SDK for Python - boto3 - is not asynchronous, so this test is also not really asynchronous, and will block the event loop while making requests to Alternator. However, for now it doesn't matter (we do NOT run multiple tests in the same event loop), and if it ever matters, I mentioned a couple of options what we can do in a comment. Because this test uses a 10-node cluster, it is skipped in debug-mode runs. In a later patch we will replace it by a more efficent - and more reliable - 2-node test. Refs #16313 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-19 15:41:15 +02:00
Avi Kivity	15acceb69f	Merge 'commitlog_test::test_commitlog_reader: handle segment_truncation' from Calle Wilund Fixes #16312 This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments). (First patch makes the test coroutinized to make for a nicer, easier fix change. Closes scylladb/scylladb#16368 * github.com:scylladb/scylladb: commitlog_test::test_commitlog_reader: handle segment_truncation commitlog_test: coroutinize test_commitlog_reader	2023-12-19 15:33:38 +02:00
Botond Dénes	6abdced7b9	test: add regression tests for invoking tools with no args This was recently found to produce a crash. Add a simple regression test, to make sure future changes don't re-introduce problems with this rarely used code-path.	2023-12-19 04:08:48 -05:00
Botond Dénes	76492407ab	tools/utils: tool_app_template: handle the case of no args Currently, tool_app_template::run_async() crashes when invoked with empty argv (with just argv[0] populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of argv[1] to get the current operation. To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2.	2023-12-19 04:08:33 -05:00
Botond Dénes	975c11a54b	tools/utils: tool_app_template: remove "scylla-" prefix from app name In other words, have all tools pass their name without the "scylla-" prefix to `tool_app_template::config::name`. E.g., replace "scylla-nodetool" with just "nodetool". Patch all usages to re-add the prefix if needed. The app name is just more flexible this way, some users might want the name without the "scylla-" prefix (in the next patch).	2023-12-19 04:04:57 -05:00
Botond Dénes	ce317d50bc	bytes.hh: correct spelling of delimiter and delimited Pointed out by the new spellcheck workflow. Closes scylladb/scylladb#16450	2023-12-18 20:46:21 +02:00
Mikołaj Grzebieluch	ef10b497e1	test.py: add maintenance socket test Test that when connecting to the maintenance socket, the user has superuser permissions, even if the authentication is enabled on the regular port.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	e327478bb5	test.py: enable maintenance socket in tests by default	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	21b3ba4927	docs: add maintenance socket documentation	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	f96d30c2b5	main: add maintenance socket Add initialization of maintenance_auth_service and cql_maintenance_server_ctl. Create maintenance socket which enables interaction with the node through CQL protocol without authentication. The maintenance port is available by Unix domain socket. It gives full-permission access. It is created before the node joins the cluster.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	16ab2c28e4	main: refactor initialization of cql controller and auth service Move initialization of cql controller and auth service to functions. It will make it easier to create a new cql controller with a seperate auth service, for example for the maintenance socket. Make it possible to initialize new services before joining group0.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	999be1d14b	auth/service: don't create system_auth keyspace when used by maintenance socket The maintenance socket is created before joining the cluster. When maintenance auth service is started it creates system_auth keyspace if it's missing. It is not synchronized with other nodes, because this node hasn't joined the group0 yet. Thus a node has a mismatched schema and is unable to join the cluster. The maintenance socket doesn't use role management, thus the problem is solved by not creating system_auth keyspace when maintenance auth service is created. The logic of regular CQL port's auth service won't be changed. For the maintenance socket will be created a new separate auth service.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	2b9a88d17a	cql_controller: maintenance socket: fix indentation	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	ac61d0f695	cql_controller: add option to start maintenance socket Add an option to listen on the maintenance socket. It is set up on an unix domain socket and the metrics are disabled. This enables having an independent authentication mechanism for this socket. To start the maintenance socket, a new cql_controller has to be created with `db::maintenance_socket_enabled::yes` argument. Creating maintenance socket will raise an exception if * the path is longer than 107 chars (due to linux limits), * a file or a directory already exists in the path. The indentation is fixed in the next commit.	2023-12-18 17:58:13 +01:00
Tomasz Grabiec	84ea8b32b2	test: tablets: Restart cluster in a graceful manner to avoid connection drop in the middle of request serving After restarting each node, we should wait for other nodes to notice the node is UP before restarting the next server. Otherwise, the next node we restart may not send the shutdown notification to the previously restarted node, if it still sees it as down when we initiate its shutdown. In this case, the node will learn about the restart from gossip later, possible when we already started CQL requests. When a node learns that some node restarted while it considers it as UP, it will close connections to that node. This will fail RPC sent to that node, which will cause CQL request to time-out. Fixes #14746 Closes scylladb/scylladb#16010	2023-12-18 16:22:02 +01:00
Raphael S. Carvalho	63e4d6c965	test: Enable debug compaction logging for sstable_compaction_test It will make it easier to understand obscure issues like https://github.com/scylladb/scylladb/issues/13280. Refs #13280. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16426	2023-12-18 16:57:46 +03:00
Kefu Chai	db16048761	test/pylib: avoid using asyncio.get_event_loop() asyncio.get_event_loop() returns the current event loop. but if there is not, the result of `get_event_loop_policy().get_event_loop()` is returned. but this behavior is deprecated since Python 3.12, so let's use asyncio.run() as recommended by https://docs.python.org/3/library/asyncio-eventloop.html. asyncio.run() was introduced by Python 3.7, so we should be able to use it. this change should silence the waring when running this script as a stand-alone script with Python 3.12. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16385	2023-12-18 16:47:31 +03:00
Raphael S. Carvalho	5fa69b8a67	replica: Fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:23:22 -03:00
Raphael S. Carvalho	8a9784d29c	replica: Kill unused calculate_disk_space_used_for() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:22:19 -03:00
Avi Kivity	cd88f9eb76	Update tools/java submodule (native nodetool) * tools/java 3963c3abf7...b7ebfd38ef (1): > Merge 'Add nodetool interposer script' from Botond Dénes	2023-12-18 14:50:25 +02:00
Mikołaj Grzebieluch	cf43787295	db/config: add maintenance_socket_enabled bool class	2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch	11a2748d7f	auth: add maintenance_socket_role_manager Add `maintenance_socket_role_manager` which will disable all operations associated with roles to not depend on system_auth keyspace, which may be not yet created when the maintenance socket starts listening	2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch	e682e362a3	db/config: add maintenance_socket variable If set to "ignore", maintenance socket will be disabled. If set to "workdir", maintenance socket will be opened on <scylla's workdir>/cql.m. Otherwise it will be opened on path provided by maintenance_socket variable. It is set by default to 'ignore'.	2023-12-18 11:42:05 +01:00
Kamil Braun	3b108f2e31	Merge 'db: config: make consistent_cluster_management mandatory' from Patryk Jędrzejczak We make `consistent_cluster_management` mandatory in 5.5. This option will be always unused and assumed to be true. Additionally, we make `override_decommission` deprecated, as this option has been supported only with `consistent_cluster_management=false`. Making `consistent_cluster_management` mandatory also simplifies the code. Branches that execute only with `consistent_cluster_management` disabled are removed. We also update documentation by removing information irrelevant in 5.5. Fixes scylladb/scylladb#15854 Note about upgrades: this PR does not introduce any more limitations to the upgrade procedure than there are already. As in scylladb/scylladb#16254, we can upgrade from the first version of Scylla that supports the schema commitlog feature, i.e. from 5.1 (or corresponding Enterprise release) or later. Assuming this PR ends up in 5.5, the documented upgrade support is from 5.4. For corresponding Enterprise release, it's from 2023.x (based on 5.2), so all requirements are met. Closes scylladb/scylladb#16334 * github.com:scylladb/scylladb: docs: update after making consistent_cluster_management mandatory system_keyspace, main, cql_test_env: fix indendations db: config: make consistent_cluster_management mandatory test: boost: schema_change_test: replace disable_raft_schema_config db: config: make override_decommission deprecated db: config: make force_schema_commit_log deprecated	2023-12-18 09:44:52 +01:00
Botond Dénes	a6200e99e6	Merge 'Handle S3 partial read overflows' from Pavel Emelyanov The test case that validates upload-sink works does this by getting several random ranges from the uploaded object and checks that the content is what it should be. The range boundaries are generated like this: ``` uint64_t len = random(1, chunk_size); uint64_t offset = random(file_size) - len; ``` The 2nd line is not correct, if random number happens less than the len the offset befomes "negative", i.e. -- very large 64-bit unsigned value. Next, this offset:len gets into s3 client's get_object_contiguous() helper which in turn converts them into http range header's bytes-specifier format which is "first_bytet-last_byte" one. The math here is ``` first_byte = offset; last_byte = offset + len - 1; ``` Here the overflow of the offset thing results in underflow of the last_byte -- it becomes less than the first_byte. According to RFC this range-specifier is invalid and (!) can be ignored by the server. This is what minio does -- it ignores invalid range and returns back full object. But that's not all. When returning object portion the http request status code is PartialContent, but when the range is ignored and full object is returned, the status is OK. This makes s3 client's request fail with unexpected_status_error in the middle of the test. Then the object is removed with deferred action and actual error is printed into logs. In the end of the day logs look as if deletion of an object failed with OK status %) fixes: #16133 Closes scylladb/scylladb#16324 * github.com:scylladb/scylladb: test/s3: Avoid object range overflow s3/client: Handle GET-with-Range overflows correctly	2023-12-18 10:00:32 +02:00
Avi Kivity	081f30d149	Merge 'Add support to tablet storage splitting' from Raphael "Raph" Carvalho Support for splitting tablet storage is added. Until now, tablet storage was composed of a single compaction group, i.e. a group of sstables eligible to be compacted together. For splitting, tablet storage can now be composed of multiple compaction groups, main, left and right. Main group stores sstables that require splitting, whereas left and right groups store sstables that were already split according to the tablet's token range. After table storage is put in splitting mode, new writes will only go to either left or right group, depending on the token. When all main groups completed splitting their sstables, then coordinator can proceed with tablet metadata changes. The coordination part is not implemented yet. Only the storage part. The former will come next and will be wired into the latter. Missing: - splitting monitor (verify whether coordinator asked for splitting and acts accordingly) (will come next) Closes scylladb/scylladb#16158 * github.com:scylladb/scylladb: replica: Introduce storage group splitting replica: Add storage_group::memtable_count() replica: Add compaction_group::empty() replica: Rename compaction_group_manager to storage_group_manager replica: Introduce concept of storage group compaction: Add splitting compaction task to manager compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor replica: Allow uncompacted SSTables to be moved into a new set compaction: Add splitting compaction flat_mutation_reader: Allow interposer consumers to be stacked mutation_writer: Introduce token-group-based mutation segregator locator: Introduce tablet_map::get_tablet_id_and_range_side(token)	2023-12-17 21:12:01 +02:00
Nadav Har'El	37b5c03865	mv: coroutinize wait code for remote view updates In the previous patch we added a delay injection point (for testing) in the view update code. Because the code was using continuation style, this resulted in increased indentation and ugly repetition of captures. So in this patch we coroutinize the code that waits for remote view updates, making it simpler, shorter, and less indented. Note that this function still uses continuations in one place: The remote view update is still composed of two steps that need to happen one after another, but we don't necessarily need to wait for them to happen. This is easiest to do with chaining continuations, and then either waiting or not waiting for the resulting future. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 20:15:08 +02:00
Nadav Har'El	bf6848d277	mv, test: add injection point to delay remove view update It's difficult to write a test (as we plan to do in to in the next patch) that verifies that synchronous view updates are indeed synchronous, i.e., that write with CL=QUORUM on the base-table write returns only after CL=QUORUM was also achieved in the view table. The difficulty is that in a fast test machine, even if the synchronous-view-update is completely buggy, it's likely that by the time the test reads from the view, all view updates will have been completed anyway. So in this patch we introduce an injection point, for testing, named "delay_before_remote_view_update", which adds a delay before the base replica sends its update to the remote view replica (in case the view replica is indeed remote). As usual, this injection point isn't configurable - when enabled it adds a fixed (0.5 second) delay, on all view updates on all tables. The existing code used continuation-style Seastar programming, and the addition of the injection point in this patch made it even uglier, so in the next patch we will coroutine-ize this code. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 20:15:08 +02:00
Nadav Har'El	2c0b472f44	alternator: explicitly request synchronous updates for LSI DynamoDB's local secondary index (LSI) allows strongly-consistent reads from the materialized view, which must be able to read what was previously written to the base. To support this, we need the view to use the "synchronous_updates". Previously, with vnodes, there was no need for using this option explicitly, because an LSI has the same partition key as the base table so the base and view replicas are the same, and the local writes are done synchronously. But with tablets, this changes - there is no longer a guarantee that the base and view tablets are located on the same node. So to restore the strong consistency of LSIs when tablets are enabled, this patch explicitly adds the "synchronous_updates" option to views created by Alternator LSIs. We do not add this option for GSIs - those do not support strongly-consistent reads. This fix was tested by a test that will be introduced in the following patches. The test showed that before this patch, it was possible that reading with ConsistentRead=True from an LSI right after the base was written would miss the new changes, but after this patch, it always sees the new data in the LSI. Fixes #16313. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 20:14:59 +02:00
Nadav Har'El	d11f5e9625	alternator: fix view creation when using tablets In commit `88a5ddabce`, we fixed materialized view creation to support tablets. We added to the function called to create materialized views in CQL, prepare_new_view_announcement() a missing call to the on_before_create_column_family() notifier that creates tablets for this new view. We have the same problem in Alternator when creating a view (GSI or LSI). The Alternator code does not use prepare_new_view_announcement(), and instead uses the lower-level function add_table_or_view_to_schema_mutation() so it didn't get the call to the notifier, so we must add it here too. Before this patch, creating an Alternator table with tablets (which has become possible after the previous patch) fails with "Tablet map not found for table <uuid>". With this patch, it works. A test for materialized views in Alternator will come in a following patch, and will test everything together - the CreateTable tag to use tablets (from the previous patch), the LSI/GSI creation (fixed in this patch) and the correct consistency of the LSI (fixed in the next patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 19:55:36 +02:00
Nadav Har'El	8e356d8c31	alternator: add experimental method to create a table with tablets As explained in issue #16203, we cannot yet enable tablets on Alternator keyspaces by default, because support for some of the features that Alternator needs, such as CDC, is not yet available. Nevertheless, to start testing Alternator integration with tablets, we want to provide a way to enable tablets in Alternator for tests. In this patch we add support for a tag, 'experimental:initial_tablets', which if added on a table during creation, uses tablets for its keyspace. The value of this tag is a numeric string, and it is exactly analogous to the 'initial_tablets' property we have in CQL's NetworkTopologyStrategy. We name this tag with the "experimental:" prefix to emphesize that it is experimental, and the way to enable or disable tablets will probably change later. The new tag only has effect when added while creating a table. Adding, deleting or changing it later on an existing table will have no effect. A later patch will have tests that use this tag to test Alternator with tablets. Refs #16203. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 19:55:30 +02:00
Kefu Chai	e436856cf7	token_metadata: pass node id when formatting it before this change, we use the format string of "Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended. in this change, we pass `existing_node`, so it can be formatted, and the intended error message can be printed in log. Refs `11a4908683` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16342	2023-12-17 19:54:09 +02:00
Evgeniy Naydanov	10eebe3c66	test: use different IP addresses for listen and RPC addresses Scylla can be configured to use different IPs for the internode communication and client connections. This test allocates and configure unique IP addresses for the client connections (`rpc_address`) for 2-nodes cluster. Two scenarios tested: 1) Change RPC IPs sequentially 2) Change RPC IPs simultaneously Closes scylladb/scylladb#15965	2023-12-17 18:00:09 +02:00
Raphael S. Carvalho	546b31846a	replica: Introduce storage group splitting This introduces the ability to split a storage group. The main compaction group is split into left and right groups. set_split() is used to set the storage group to splitting mode, which will create left and right compaction groups. Incoming writes will now be placed into memtable of either left or right groups. split() is used to complete the splitting of a group. It only returns when all preexisting data is split. That means main compaction group will be empty and all the data will be stored in either left or right group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 12:02:01 -03:00
Raphael S. Carvalho	3c5b00ea04	replica: Add storage_group::memtable_count() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	e5a9299696	replica: Add compaction_group::empty() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	213b2f1382	replica: Rename compaction_group_manager to storage_group_manager That's to reflect the fact that the manager now works with storage groups instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	15de1cdcbc	replica: Introduce concept of storage group Storage group is the storage of tablets. This new concept is helpful for tablet splitting, where the storage of tablet will be split in multiple compaction groups, where each can be compacted independently. The reason for not going with arena concept is that it added complexity, and it felt much more elegant to keep compaction group unchanged which at the end of the day abstracts the concept of a set of sstables that can be compacted and operated independently. When splitting, the storage group for a tablet may therefore own multiple compaction groups, left, right, and main, where main keeps the data that needs splitting. When splitting completes, only left and right compaction groups will be populated. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	dd1a6d6309	compaction: Add splitting compaction task to manager The task for splitting compaction will run until all sstables in the main set are split. The only exceptions are shutdown or user has explicitly asked for abort. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	f87161e556	compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	c96938c49b	compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	55bcfba4de	replica: Allow uncompacted SSTables to be moved into a new set With off-strategy, we allow sstables to be moved into a new sstable set even if they didn't undergo reshape compaction. That's done by specifying a sstable is present both in input and output, with the completion desc. We want to do the same with other compaction types. Think for example of split compaction: compaction manager may decide a sstable doesn't need splitting, yet it wants that sstable to be moved into a new sstable set. Theoretically, we could introduce new code to do this movement, but more code means increased maintenance burden and higher chances of bugs. It makes sense to reuse the compaction completion path, as we do today with off-strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	b1c5d5dd4e	compaction: Add splitting compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:08 -03:00
Raphael S. Carvalho	3dcb800a96	flat_mutation_reader: Allow interposer consumers to be stacked reader_consumer_v2 being a noncopyable_function imposes a restriction when stacking one interposer consumer on top of another. Think for example of a token-based segregator on top of a timestamp based one. To achieve that, the interposer consumer creator must be reentrant, such that the consumer can be created on each "channel", but today the creator becomes unusable after first usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Raphael S. Carvalho	c8668b90e3	mutation_writer: Introduce token-group-based mutation segregator Token group is an abstraction that allows us to easily segregate a mutation stream into buckets. Groups share the same properties as compaction groups. Groups follow the ring order and they don't overlap each other. Groups are defined according to a classifier, which return an id given a token. It's expected that classifier return ids in monotonic increasing order. The reasons for this abstraction are: 1) we don't want to make segregator aware of compaction groups 2) splitting happens before tablet metadata is changed, so the the segregator will have to classify based on whether the token belongs to left (group id 0) or right (group id 1) side of the range to be split. The reason for not extending sstable writer instead, is that today, writer consumer can only tell producer to switch to a new writer, when consuming the end of a partition, but that would be too late for us, as we have to decide to move to a new writer at partition start instead. It will be wired into compaction when it happens in split mode. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Raphael S. Carvalho	bcbba9a5e3	locator: Introduce tablet_map::get_tablet_id_and_range_side(token) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Kefu Chai	c36945dea2	tasks: include used headers when compiling with Clang-18 + libstdc++-13, the tree fails to build: ``` /home/kefu/dev/scylladb/tasks/task_manager.hh:45:36: error: no template named 'list' in namespace 'std' 45 \| using foreign_task_list = std::list<foreign_task_ptr>; \| ~~~~~^ ``` so let's include the used header Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16433	2023-12-17 15:28:02 +02:00
Kefu Chai	81d5c4e661	db/system_keyspace: explicitly instantiate used template future<std::optional<utils::UUID>> system_keyspace::get_scylla_local_param_as<utils::UUID>(const sstring&) is used by db/schema_tables.cc. so let's instantiate this template explicitly. otherwise we'd have following link failure: ``` : && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 -fuse-ld=lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections CMakeFiles/scylla_version.dir/Release/release.cc.o CMakeFiles/scylla.dir/Release/main.cc.o -o Release/scylla Release/libscylla-main.a api/Release/libapi.a alternator/Release/libalternator.a db/Release/libdb.a cdc/Release/libcdc.a compaction/Release/libcompaction.a cql3/Release/libcql3.a data_dictionary/Release/libdata_dictionary.a gms/Release/libgms.a index/Release/libindex.a lang/Release/liblang.a message/Release/libmessage.a mutation/Release/libmutation.a mutation_writer/Release/libmutation_writer.a raft/Release/libraft.a readers/Release/libreaders.a redis/Release/libredis.a repair/Release/librepair.a replica/Release/libreplica.a schema/Release/libschema.a service/Release/libservice.a sstables/Release/libsstables.a streaming/Release/libstreaming.a test/perf/Release/libtest-perf.a thrift/Release/libthrift.a tools/Release/libtools.a transport/Release/libtransport.a types/Release/libtypes.a utils/Release/libutils.a seastar/Release/libseastar.a /usr/lib64/libboost_program_options.so.1.81.0 test/lib/Release/libtest-lib.a Release/libscylla-main.a -Xlinker --push-state -Xlinker --whole-archive auth/Release/libscylla_auth.a -Xlinker --pop-state /usr/lib64/libcrypt.so cdc/Release/libcdc.a compaction/Release/libcompaction.a mutation_writer/Release/libmutation_writer.a -Xlinker --push-state -Xlinker --whole-archive dht/Release/libscylla_dht.a -Xlinker --pop-state index/Release/libindex.a -Xlinker --push-state -Xlinker --whole-archive locator/Release/libscylla_locator.a -Xlinker --pop-state message/Release/libmessage.a gms/Release/libgms.a sstables/Release/libsstables.a readers/Release/libreaders.a schema/Release/libschema.a -Xlinker --push-state -Xlinker --whole-archive tracing/Release/libscylla_tracing.a -Xlinker --pop-state service/Release/libservice.a node_ops/Release/libnode_ops.a service/Release/libservice.a node_ops/Release/libnode_ops.a raft/Release/libraft.a repair/Release/librepair.a streaming/Release/libstreaming.a replica/Release/libreplica.a /usr/lib64/libabsl_raw_hash_set.so.2308.0.0 /usr/lib64/libabsl_hash.so.2308.0.0 /usr/lib64/libabsl_city.so.2308.0.0 /usr/lib64/libabsl_bad_variant_access.so.2308.0.0 /usr/lib64/libabsl_low_level_hash.so.2308.0.0 /usr/lib64/libabsl_bad_optional_access.so.2308.0.0 /usr/lib64/libabsl_hashtablez_sampler.so.2308.0.0 /usr/lib64/libabsl_exponential_biased.so.2308.0.0 /usr/lib64/libabsl_synchronization.so.2308.0.0 /usr/lib64/libabsl_graphcycles_internal.so.2308.0.0 /usr/lib64/libabsl_kernel_timeout_internal.so.2308.0.0 /usr/lib64/libabsl_stacktrace.so.2308.0.0 /usr/lib64/libabsl_symbolize.so.2308.0.0 /usr/lib64/libabsl_malloc_internal.so.2308.0.0 /usr/lib64/libabsl_debugging_internal.so.2308.0.0 /usr/lib64/libabsl_demangle_internal.so.2308.0.0 /usr/lib64/libabsl_time.so.2308.0.0 /usr/lib64/libabsl_strings.so.2308.0.0 /usr/lib64/libabsl_int128.so.2308.0.0 /usr/lib64/libabsl_strings_internal.so.2308.0.0 /usr/lib64/libabsl_string_view.so.2308.0.0 /usr/lib64/libabsl_throw_delegate.so.2308.0.0 /usr/lib64/libabsl_base.so.2308.0.0 /usr/lib64/libabsl_spinlock_wait.so.2308.0.0 /usr/lib64/libabsl_civil_time.so.2308.0.0 /usr/lib64/libabsl_time_zone.so.2308.0.0 /usr/lib64/libabsl_raw_logging_internal.so.2308.0.0 /usr/lib64/libabsl_log_severity.so.2308.0.0 -lsystemd /usr/lib64/libz.so /usr/lib64/libdeflate.so types/Release/libtypes.a utils/Release/libutils.a /usr/lib64/libcryptopp.so /usr/lib64/libboost_regex.so.1.81.0 /usr/lib64/libicui18n.so /usr/lib64/libicuuc.so /usr/lib64/libboost_unit_test_framework.so.1.81.0 seastar/Release/libseastar_perf_testing.a /usr/lib64/libjsoncpp.so.1.9.5 interface/Release/libinterface.a /usr/lib64/libthrift.so db/Release/libdb.a data_dictionary/Release/libdata_dictionary.a cql3/Release/libcql3.a transport/Release/libtransport.a cql3/Release/libcql3.a transport/Release/libtransport.a lang/Release/liblang.a /usr/lib64/liblua-5.4.so -lm rust/Release/libwasmtime_bindings.a rust/librust_combined.a /usr/lib64/libsnappy.so.1.1.10 mutation/Release/libmutation.a seastar/Release/libseastar.a /usr/lib64/libboost_program_options.so /usr/lib64/libboost_thread.so /usr/lib64/libboost_chrono.so /usr/lib64/libboost_atomic.so /usr/lib64/libcares.so /usr/lib64/libcryptopp.so /usr/lib64/libfmt.so.10.0.0 /usr/lib64/liblz4.so -ldl /usr/lib64/libgnutls.so -latomic /usr/lib64/libsctp.so /usr/lib64/libyaml-cpp.so /usr/lib64/libhwloc.so //usr/lib64/liburing.so /usr/lib64/libnuma.so /usr/lib64/libxxhash.so && : ld.lld: error: undefined symbol: seastar::future<std::optional<utils::UUID>> db::system_keyspace::get_scylla_local_param_as<utils::UUID>(seastar::basic_sstring<char, unsigned int, 15u, true> const&) >>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981) >>> schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()()) in archive db/Release/libdb.a >>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981) >>> schema_tables.cc.o:(db::schema_tables::recalculate_schema_version(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&)::$_0::operator()() const) in archive db/Release/libdb.a >>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981) >>> schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()() (.resume)) in archive db/Release/libdb.a clang++: error: linker command failed with exit code 1 (use -v to see invocation) ``` it seems that, without the explicit instantiation, clang-18 just inlines the body of the instantiated template function at the caller site. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16434	2023-12-17 15:12:05 +02:00
Wojciech Mitros	629ea63922	rust: update dependencies The currently used versions of "time" and "rustix" depencies had minor security vulnerabilities. In this patch: - the "rustix" crate is updated - the "chrono" crate that we depend on was not compatible with the version of the "time" crate that had fixes, so we updated the "chrono" crate, which actually removed the dependency on "time" completely. Both updated were performed using "cargo update" on the relevant package and the corresponding version. Fixes #15772 Closes scylladb/scylladb#16378	2023-12-17 13:20:25 +02:00
Kefu Chai	10a11c2886	token_metadata: pass node id when formatting it before this change, we use the format string of "Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended. in this change, we pass `existing_node`, so it can be formatted, and the intended error message can be printed in log. Refs `11a4908683` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16422	2023-12-15 16:43:44 +01:00
Kefu Chai	273ee36bee	tools/scylla-sstable: add `scylla sstable shard-of` command when migrating to the uuid-based identifiers, the mapping from the integer-based generation to the shard-id is preserved. we used to have "gen % smp_count" for calculating the shard which is responsible to host a given sstable. despite that this is not a documented behavior, this is handy when we try to correlate an sstable to a shard, typically when looking at a performance issue. in this change, a new subcommand is added to expose the connection between the sstable and its "owner" shards. Fixes #16343 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16345	2023-12-15 11:36:45 +02:00
Kefu Chai	fa3efe6166	.git: use ssh/key or token for auth enable checkout action to get authenticated if the action need to clone a non-public repo. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16421	2023-12-15 11:34:50 +02:00
Kamil Braun	6a4106edf3	migration_manager: don't attach empty system.scylla_local mutation in migration request handler In `effb9fb3cb` migration request handler (called when a node requests schema pull) was extended with a `system.scylla_local` mutation: ``` cm.emplace_back(co_await self._sys_ks.local().get_group0_schema_version()); ``` This mutation is empty if the GROUP0_SCHEMA_VERSIONING feature is disabled. Nevertheless, it turned out to cause problems during upgrades. The following scenario shows the problem: We upgrade from 5.2 to enterprise version with the aforementioned patch. In 5.2, `system.scylla_local` does not use schema commitlog. After the first node upgrades to the enterprise version, it immediately on boot creates a new enterprise-only table (`system_replicated_keys.encrypted_keys`) -- the specific table is not important, only the fact that a schema change is performed. This happens before the restarting node notices other nodes being UP, so the schema change is not immediately pushed to the other nodes. Instead, soon after boot, the other non-upgraded nodes pull the schema from the upgraded node. The upgraded node attaches a `system.scylla_local` mutation to the vector of returned mutations. The non-upgraded nodes try to apply this vector of mutations. Because some of these mutations are for tables that already use schema commitlog, while the `system.scylla_local` table does not use schema commitlog, this triggers the following error (even though the mutation is empty): ``` Cannot apply atomically across commitlog domains: system.scylla_local, system_schema.keyspaces ``` Fortunately, the fix is simple -- instead of attaching an empty mutation, do not attach a mutation at all if the handler of migration request notices that group0_schema_version is not present. Note that group0_schema_version is only present if the GROUP0_SCHEMA_VERSIONING feature is enabled, which happens only after the whole upgrade finishes. Refs: scylladb/scylladb#16414 Not using "Fixes" because the issue will only be fixed once this PR is merged to `master` and the commit is cherry-picked onto next-enterprise. Closes scylladb/scylladb#16416	2023-12-14 22:58:13 +01:00
Avi Kivity	2b8392b8b8	Merge 'database, reader_concurrency_semaphore: deduplicate reader_concurrency_semaphore metrics ' from Botond Dénes Reduce code duplication by defining each metric just once, instead of three times, by having the semaphore register metrics by itself. This also makes the lifecycle of metrics contained in that of the semaphore. This is important on enterprise where semaphores are added and removed, together with service levels. We don't want all semaphores to export metrics, so a new parameter is introduced and all call-sites make a call whether they opt-in or not. Fixes: https://github.com/scylladb/scylladb/issues/16402 Closes scylladb/scylladb#16383 * github.com:scylladb/scylladb: database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_semaphore: add register_metrics constructor parameter sstables: name sstables_manager	2023-12-14 18:26:24 +02:00
Patryk Jędrzejczak	f23f8628b7	docs: update after making consistent_cluster_management mandatory We remove Raft documentation irrelevant in 5.5. One of the changes is removing a part of the "Enabling Raft" section in raft.rst. Since Raft is mandatory in 5.5, the only way to enable it in this version is by performing a rolling upgrade from 5.4. We only need to have this case well-documented. In particular, we remove information that also appears in the upgrade guides like verifying schema synchronization. Similarly, we remove a sentence from the "Manual Recovery Procedure" section in handling-node-failures.rst because it mentions enabling Raft manually, which is impossible in 5.5. The rest of the changes are just removing information about checking or setting consistent_cluster_management, which has become unused.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	dced4bb924	system_keyspace, main, cql_test_env: fix indendations Broken in the previous patch.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	5ebfbf42bc	db: config: make consistent_cluster_management mandatory Code that executed only when consistent_cluster_management=false is removed. In particular, after this patch: - raft_group0 and raft_group_registry are always enabled, - raft_group0::status_for_monitoring::disabled becomes unused, - topology tests can only run with consistent_cluster_management.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	7dd7ec8996	test: boost: schema_change_test: replace disable_raft_schema_config In the following commits, we make consistent cluster management mandatory. This will make disable_raft_schema_config unusable, so we need to get rid of it. However, we don't want to remove tests that use it. The idea is to use the Raft RECOVERY mode instead of disabling consistent cluster management directly.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	a54f9052fc	db: config: make override_decommission deprecated The override_decommission option is supported only when consistent_cluster_management is disabled. In the following commit, we make consistent_cluster_management mandatory, which makes overwrite_decommission unusable.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	571db3c983	db: config: make force_schema_commit_log deprecated In scylladb/scylladb#16254, we made force_schema_commit_log unused. After this change, if someone passes this option as the command line argument, the boot fails. This behavior is undesired. We only want this option to be ignored. We can achieve this effect by making it deprecated.	2023-12-14 16:53:46 +01:00
Paweł Zakrzewski	5af066578a	doc: Offer replication_factor=3 as the default in the examples The goal is to make the available defaults safe for future use, as they are often taken from existing config files or documentation verbatim. Referenced issue: #14290 Closes scylladb/scylladb#15947	2023-12-14 16:14:01 +01:00
Piotr Dulikowski	c0cf3e398a	raft_rpc: use compat source location instead of std one The std::source_location is broken on some versions of clang. In order to be able to use its functionality in code, seastar defines seastar::compat::source_location, which is a typedef over std::source_location if the latter works, or s custom, dummy implementation if the std type doesn't work. Therefore, sometimes seastar::compat::source_location == std::source_location, but not always. In service/raft/raft_rpc.cc, both std source location and compat source location are used and std source location sometimes passed as an argument to compat source location, breaking builds on older toolchains. Fix this by switching the code there to only use compat source location. Fixes: scylladb/scylladb#16336 Closes scylladb/scylladb#16337	2023-12-14 16:14:01 +01:00
Kefu Chai	764d1e01da	locator: include used headers * exceptions/exceptions.hh is not used * std::set is not used, while std::unordered_set is uset Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16406	2023-12-14 16:14:01 +01:00
Kefu Chai	37868e5fdc	tools: fix spelling errors in user-facing messages they are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16409	2023-12-13 21:39:46 +02:00
Kefu Chai	caa0230e5d	test/cql-pytest: use raw string when appropriate we use "\w" to represent a character class in Python. see https://docs.python.org/3/library/re.html. but "\" should be escaped as well, CPython accepts "\w" after trying to find an escaped character of "\." but failed, and leave "\." as it is. but it complains. in this change, we use raw string to avoid escaping "\" in the regular expression. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16405	2023-12-13 21:14:32 +02:00
Israel Fruchter	514ef48d75	docker: put cqlsh configuration in correct place since always we were putting cqlsh configuration into `~/.cqlshrc` acording to commit from 8 years ago [1], this path is deprecated. until this commit [2], actully remove this path from cqlsh code as part of moving to scylla-cqlsh, we got [2], and didn't notice until the first release with it. this change write the configuration into `~/.casssndra/cqlshrc` as this is the default place cqlsh is looking. [1]: `13ea8a6669/bin/cqlsh.py (L264)` [2]: `2024ea4796` Fixes: scylladb/scylladb#16329 Closes scylladb/scylladb#16340	2023-12-13 18:40:52 +02:00
Kamil Braun	26cbd28883	Merge 'token_metadata: switch to host_id' from Petr Gusev In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes. The refactoring is structured as follows: * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version. * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading. * Go over all the places which read `token_metadata` and switch them to the new version. * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template. These series [depends](`1745a1551a`) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](`95c726a8df`) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node. Closes scylladb/scylladb#15903 * github.com:scylladb/scylladb: topology: remove_endpoint: remove inet_address overload token_metadata: topology: cleanup add_or_update_endpoint token_metadata: add_replacing_endpoint: forbid replacing node with itself topology: drop key_kind, host_id is now the primary key dc_rack_fn: make it non-template token_metadata: drop the template shared_token_metadata: switch to the new token_metadata gossiper: use new token_metadata database: get_token_metadata -> new token_metadata erm: switch to the new token_metadata storage_service: get_token_metadata -> token_metadata2 storage_service: get_token_to_endpoint_map: use new token_metadata api/token_metadata: switch to new version storage_service::on_change: switch to new token_metadata cdc: switch to token_metadata2 calculate_natural_endpoints: fix indentation calculate_natural_endpoints: switch to token_metadata2 storage_service: get_changed_ranges_for_leaving: use new token_metadata decommission_with_repair, removenode_with_repair -> new token_metadata rebuild_with_repair, replace_with_repair: use new token_metadata bootstrap: use new token_metadata tablets: switch to token_metadata2 calculate_effective_replication_map: use new token_metadata calculate_natural_endpoints: fix formatting abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata network_topology_strategy_test: update new token_metadata storage_service: on_alive: update new token_metadata storage_service: handle_state_bootstrap: update new token_metadata storage_service: snitch_reconfigured: update new token_metadata storage_service: leave_ring: update new token_metadata storage_service: node_ops_cmd_handler: update new token_metadata storage_service: node_ops_cmd_handler: add coordinator_host_id storage_service: bootstrap: update new token_metadata storage_service: join_token_ring: update new token_metadata storage_service: excise: update new token_metadata storage_service: join_cluster: update new token_metadata storage_service: on_remove: update new token_metadata storage_service: handle_state_normal: fill new token_metadata storage_service: topology_state_load: fill new token_metadata storage_service: adjust update_topology_change_info to update new token_metadata topology: set self host_id on the new topology locator::topology: allow being_replaced and replacing nodes to have the same IP token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known token_metadata: get_host_id: exception -> on_internal_error token_metadata: add get_all_ips method token_metadata: support host_id-based version token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. locator: make dc_rack_fn a template locator/topology: add key_kind parameter token_metadata: topology_change_info: change field types to token_metadata_ptr token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-13 16:35:52 +01:00
Avi Kivity	7fce057cda	database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_sempaphore are triplicated: each metrics is registered for streaming, user, and system classes. To fix, just move the metrics registration from database to reader_concurrency_sempaphore, so each reader_concurrency_sempaphore instantiated will register its metrics (if its creator asked for it). Adjust the names given to reader_concurrency_sempaphore so we don't change the labels. scylla-gdb is adjusted to support the new names.	2023-12-13 09:16:18 -05:00
Nadav Har'El	89d311ec23	tablet, mv: fix doc on implicit synchronous update The document docs/cql/cql-extensions.md documents Scylla's extension of synchronous view updates, and mentioned a few cases where view updates are synchronous even if synchronous updates are not requested explicitly. But with tablets, these statements and examples are no longer correct - with tablets, base and view tablets may find themselves migrated to entirely different nodes. So in this patch we correct the statements that are no longer accurate. Note that after this patch we still have in this document, and in other documents, similar promises about CQL local secondary indexes. Either the documentation or the implementation needs to change in that case too, but we'll do it in a separate patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16369	2023-12-13 14:58:06 +02:00
Botond Dénes	e1b30f50be	reader_concurrency_semaphore: add register_metrics constructor parameter To be used in the next patch to control whether the semaphore registers and exports metrics or not. We want to move metric registration to the semaphore but we don't want all semaphores to export metrics. The decision on whether a semaphore should or shouldn't export metrics should be made on a case-by-case basis so this new parameter has no default value (except for the for_tests constructor).	2023-12-13 06:25:45 -05:00
Avi Kivity	814f3eb6b5	sstables: name sstables_manager Soon, the reader_concurrency_semaphore will require a unique and meaningful name in order to label its metrics. To prepare for that, name sstable_manager instances. This will be used to generate a name for sstable_manager's reader_concurrency_semaphore.	2023-12-13 04:40:33 -05:00
Kefu Chai	5ea3af067d	.git: add codespell workflow to identify misspelling in the code. The GitHub actions in this workflow run codespell when a new pull request is created targetting master or enterprise branch. Errors will be annotated in the pull request. A new entry along with the existing tests like build, unit test and dtest will be added to the "checks" shown in github PR web UI. one can follow the "Details" to find the details of the errors. unfortunately, this check checks all text files unless they are explicitly skipped, not just the new ones added / changed in the PR under test. in other words, if there are 42 misspelling errors in master, and you are adding a new one in your PR, this workflow shows all of the 43 errors -- both the old and new ones. the misspelling in the code hurts the user experience and some time developer's experience, but the text files under test/cql can be sensitive to the text, sometimes, a tiny editing could break the test, so it is added to the skip list. So far, since there are lots of errors identified by the tool, before we address all of them, the identified problem are only annotated, they are not considered as error. so, they don't fail the check. and in this change `only_warn` is set, so the check does not fail even if there are misspellings. this prevents the distractions before all problems are addressed. we can remove this setting in future, once we either fix all the misspellings or add the ignore words or skip files. but either way, the check is not considered as blockers for merging the tested PR, even if this check fails -- the check failure is just represented for information purpose, unless we make it a required in the github settings for the target branch. if want to change this, we can configure it in github's Branch protectionn rule on a per-branch basis, to make this check a must-pass. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16285	2023-12-13 10:53:09 +02:00
Aleksandra Martyniuk	9b9ea1193c	tasks: keep task's children in list If std::vector is resized its iterators and references may get invalidated. While task_manager::task::impl::_children's iterators are avoided throughout the code, references to its elements are being used. Since children vector does not need random access to its elements, change its type to std::list<foreign_task_ptr>, which iterators and references aren't invalidated on element insertion. Fixes: #16380. Closes scylladb/scylladb#16381	2023-12-13 10:47:27 +02:00
Yaniv Kaul	0b0a3ee7fc	Typos: fix typos in code Last batch, hopefully, sing codespell, went over the docs and fixed some typos. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16388	2023-12-13 10:45:21 +02:00
Botond Dénes	57f5ac03e1	Merge 'scripts/coverage.py: cleanups' from Kefu Chai various cleanups in `scripts/coverage.py`. they do not change the behavior of this script in the happy path. Closes scylladb/scylladb#16399 * github.com:scylladb/scylladb: scripts/coverage.py: s/exit/sys.exit/ scripts/coverage.py: do not inherit Value from argparse.Action scripts/coverage.py: use `is not None` scripts/coverage.py: correct the formatted string in error message scripts/coverage.py: do not use f-string when nothing to format scripts/coverage.py: use raw string to avoid escaping "\"	2023-12-13 10:25:44 +02:00
Kefu Chai	1b57ba44eb	scripts/coverage.py: s/exit/sys.exit/ the former is supposed to be used in "the interactive interpreter shell and should not be used in programs.". this function prints out its argument, and the exit code is 1. so just print the error message using sys.exit() see also https://docs.python.org/3/library/sys.html#sys.exit and https://docs.python.org/3/library/constants.html#exit Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:50:00 +08:00
Kefu Chai	7600b68d5c	scripts/coverage.py: do not inherit Value from argparse.Action as Value is not an argparse.Action, and it is not passed as the argument of the "action" parameter. neither does it implement the `__call__` function. so just derive it from object. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	9c112dacc4	scripts/coverage.py: use `is not None` `is not None` is the more idiomatic Python way to check if an expression evaluates to not None. and it is more readable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	0d15fc57d5	scripts/coverage.py: correct the formatted string in error message the formatted string should be `basename`. `input_file` is not defined in that context. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	bc94b7bc04	scripts/coverage.py: do not use f-string when nothing to format there is no string interpolation in this case, so drop the "f" prefix. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	c3c715236d	scripts/coverage.py: use raw string to avoid escaping "\" we use "\." to escape "." in a regular expression. but "\" should be escaped as well, CPython accepts "\." after trying to find an escaped character of "\." but failed, and leave "\." as it is. but it complains: ``` /home/kefu/dev/scylladb/scripts/coverage.py:107: SyntaxWarning: invalid escape sequence '\.' input_file_re_str = f"(.+)\.profraw(\.{__DISTINCT_ID_RE})?" ``` in this change, we use raw string to avoid escaping "\" in the regular expression. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Tomasz Grabiec	cdc53d0a49	test: tablets: Add test case which tests table drop concurrent with migration	2023-12-13 00:06:56 +01:00
Avi Kivity	1f7c049791	Update tools/java submodule (minor security fixes) * tools/java 29fe44da84...3963c3abf7 (2): > Revert "build: update `guava` dependency" > Merge "Update Netty , Guava and Logback dependencies" from Yaron Kaikov Ref scylladb/scylla-tools-java#363 Ref scylladb/scylla-tools-java#364	2023-12-12 22:23:20 +02:00
Avi Kivity	c3d679e31e	Merge 'sstables, utils: do not include unused header' from Kefu Chai do not include unused header Closes scylladb/scylladb#16386 * github.com:scylladb/scylladb: utils: bit_cast: drop unused #includes sstables: writer: do not include unused header	2023-12-12 22:22:36 +02:00
Avi Kivity	22b77edef3	Merge 'scylla-nodetool: implement the scrub command' from Botond Dénes On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented: * Expose quarantine-mode option of the scrub_keyspace REST API * Exit with error and print a message, when scrub finishes with abort or validation_errors return code The command comes with tests and all tests pass with both the new and the current nodetool implementations. Refs: #15588 Refs: #16208 Closes scylladb/scylladb#16391 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the scrub command test/nodetool: rest_api_mock.py: add missing "f" to error message f string api: extract scrub_status into its own header	2023-12-12 22:22:35 +02:00
Petr Gusev	9d93a518ac	topology: remove_endpoint: remove inet_address overload The overload was used only in tests.	2023-12-12 23:19:54 +04:00
Petr Gusev	fbf507b1ba	token_metadata: topology: cleanup add_or_update_endpoint Make host_id parameter non-optional and move it to the beginning of the arguments list. Delete unused overloads of add_or_update_endpoint. Delete unused overload of token_metadata::update_topology with inet_address argument.	2023-12-12 23:19:54 +04:00
Petr Gusev	11a4908683	token_metadata: add_replacing_endpoint: forbid replacing node with itself This used to work before in replace-with-same-ip scenario, but with host_id-s it's no longer relevant. base_token_metadata has been removed from topology_change_info because the conditions needed for its creation are no longer met.	2023-12-12 23:19:54 +04:00
Petr Gusev	3b59919a9c	topology: drop key_kind, host_id is now the primary key	2023-12-12 23:19:54 +04:00
Petr Gusev	8c551f9104	dc_rack_fn: make it non-template	2023-12-12 23:19:54 +04:00
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	799f747c8f	shared_token_metadata: switch to the new token_metadata	2023-12-12 23:19:54 +04:00
Petr Gusev	c7314aa8e2	gossiper: use new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	e50dbef3e2	database: get_token_metadata -> new token_metadata database::get_token_metadata() is switched to token_metadata2. get_all_ips method is added to the host_id-based token_metadata, since its convenient and will be used in several places. It returns all current nodes converted to inet_address by means of the topology contained within token_metadata. hint_sender::can_send: if the node has already left the cluster we may not find its host_id. This case is handled in the same way as if it's not a normal token owner - we simply send a hint to all replicas.	2023-12-12 23:19:53 +04:00
Petr Gusev	11cc21d0a9	erm: switch to the new token_metadata In this commit we replace token_metadata with token_metadata2 in the erm interface and field types. To accommodate the change some of strategy-related methods are also updated. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	309e08e597	storage_service: get_token_metadata -> token_metadata2 In this commit we change the return type of storage_service::get_token_metadata_ptr() to token_metadata2_ptr and fix whatever breaks. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	f53f34f989	storage_service: get_token_to_endpoint_map: use new token_metadata The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map method was used only here. It's inlined in this commit since it's too specific and incurs the overhead of creating an intermediate map.	2023-12-12 23:19:53 +04:00
Petr Gusev	0e4c90dca6	api/token_metadata: switch to new version	2023-12-12 23:19:53 +04:00
Petr Gusev	b2d3dc33e2	storage_service::on_change: switch to new token_metadata The check ep == endpoint is needed when a node changes its IP - on_change can be called by the gossiper for old IP as part of its removal, after handle_state_normal has already been called for the new one. Without the check, the do_update_system_peers_table call overwrites the IP back to its old value. Previously token_metadata used endpoint as the key and the ep == endpoint condition was followed from the is_normal_token_owner check. Now with host_id-s we have an additional layer of indirection, and we need *ep == endpoint check to get the same end condition. This case was revealed by the dtest update_cluster_layout_tests.py::TestUpdateClusterLayout::test_change_node_ip	2023-12-12 23:19:53 +04:00
Petr Gusev	7eb7863635	cdc: switch to token_metadata2 Change the token_metadata type to token_metadata2 in the signatures of CDC-related methods in storage_service and cdc/generation. Use get_new_strong to get a pointer to the new host_id-based token_metadata from the inet_address-based one, living in the shared_token_metadata. The starting point of the patch is in storage_service::handle_global_request. We change the tmptr type to token_metadata2 and propagate the change down the call chains. This includes token-related methods of the boot_strapper class.	2023-12-12 23:19:53 +04:00
Petr Gusev	b2fb650098	calculate_natural_endpoints: fix indentation	2023-12-12 23:19:53 +04:00
Petr Gusev	80ccbc0d53	calculate_natural_endpoints: switch to token_metadata2 All usages of calculate_natural_endpoints are migrated, now we can change its interface to take token_metadata2 instead of token_metadata.	2023-12-12 23:19:53 +04:00
Petr Gusev	933acb0f72	storage_service: get_changed_ranges_for_leaving: use new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	7c7dbe3779	decommission_with_repair, removenode_with_repair -> new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	ef534ac876	rebuild_with_repair, replace_with_repair: use new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	93263bf9e7	bootstrap: use new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	d9283bd025	tablets: switch to token_metadata2 locator_topology_test, network_topology_strategy_test and tablets_test are fully switched to the host_id-based token_metadata, meaning they no longer populate the old token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	f5038f6c72	calculate_effective_replication_map: use new token_metadata In this commit we switch the function calculate_effective_replication_map to use the new token_metadata. We do this by employing our new helper calculate_natural_ips function. We can't use this helper for current_endpoints/target_endpoints though, since in that case we won't add the IP to the pending_endpoints in the replace-with-same-ip scenario The token_metadata_test is migrated to host_ids in the same commit to make it pass. Other tests work because they fill both versions of the token_metadata, but for this test it was simpler to just migrate it straight away. The test constructs the old token_metadata over the new token_metadata, this means only the get_new() method will work on it. That's why we also need to switch some other functions (maybe_remove_node_being_replaced, do_get_natural_endpoints, get_replication_factor) to the new version in the same commit. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	fe3c543c4e	calculate_natural_endpoints: fix formatting	2023-12-12 23:19:53 +04:00
Petr Gusev	d5b4b02b28	abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata We've updated all the places where token_metadata is mutated, and now we can progress to the next stage of the refactoring - gradually switching the read code paths. The calculate_natural_endpoints function is at the core of all of them. It decides to what nodes the given token should be replicated to for the given token_metadata. It has a lot of usages in various contexts, we can't switch them all in one commit, so instead we allowed the function to behave in both ways. If use_host_id parameter is false, the function uses the provided token_metadata as is and returns endpoint_set as a result. If it's true, it uses get_new() on the provided token_metadata and returns host_id_set as a result. The scope of the whole refactoring is limited to the erm data structure, its interface will be kept inet_address based for now. This means we'll often need to resolve host_ids to inet_address-es as soon as we got a result from calculated_natural_endpoints. A new calculate_natural_ips function is added for convenience. It uses the new token_metadata and immediately resolves returned host_id-s to inet_address-es. The auxiliary declarations natural_ep_type, set_type, vector_type, get_self_id, select_tm are introduced only for the sake of migration, they will be removed later.	2023-12-12 23:19:53 +04:00
Petr Gusev	1960436d93	network_topology_strategy_test: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	90234861ac	storage_service: on_alive: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	5c04a47d6f	storage_service: handle_state_bootstrap: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	4e03ba3ede	storage_service: snitch_reconfigured: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	0aab20d3fe	storage_service: leave_ring: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	278c832285	storage_service: node_ops_cmd_handler: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	1745a1551a	storage_service: node_ops_cmd_handler: add coordinator_host_id We'll need it in the next commits to address to replacing and bootstrapping nodes by id. We assume this change will be shipped in 6.0 with upgrade from 5.4, where host_id already exists in client_info. We don't support upgrade between non-adjacent versions.	2023-12-12 23:19:48 +04:00
Botond Dénes	47450ae4db	tools/scylla-nodetool: implement the scrub command On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented: * Expose quarantine-mode option of the scrub_keyspace REST API * Exit with error and print a message, when scrub finishes with abort or validation_errors return code	2023-12-12 09:39:58 -05:00
Botond Dénes	892683cace	test/nodetool: rest_api_mock.py: add missing "f" to error message f string	2023-12-12 09:33:39 -05:00
Botond Dénes	8064d17f78	api: extract scrub_status into its own header So it can be shared with scylla-nodetool code.	2023-12-12 09:33:39 -05:00
Petr Gusev	2794b14a80	storage_service: bootstrap: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	c20c8c653c	storage_service: join_token_ring: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	fde20bddc0	storage_service: excise: update new token_metadata excise is called from handle_state_left, the endpoint may have already been removed from tm by then - test_raft_upgrade_majority_loss fails if we use unconditional tmptr->get_new()->get_host_id instead of get_host_id_if_known	2023-12-12 17:27:25 +04:00
Petr Gusev	23811486d8	storage_service: join_cluster: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	711aaa0e29	storage_service: on_remove: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	6412cd64f1	storage_service: handle_state_normal: fill new token_metadata	2023-12-12 17:27:15 +04:00
Kefu Chai	c485644303	utils: bit_cast: drop unused #includes Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-12 21:09:51 +08:00
Kefu Chai	af0ba3d648	sstables: writer: do not include unused header the helpers in bit_cast.hh are not used, so drop this #include. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-12 21:09:51 +08:00
Tomasz Grabiec	9b0d9e7c6b	tests: tablets: Do read barrier in get_tablet_replicas() In order for the call to see all prior changes to group0. Also, we should query on the host on which we executed the barrier. I hope this will reduce flakiness observed in CI runs on https://github.com/scylladb/scylladb/pull/16341 where the expected tablet replica didn't match the one returned by get_tablet_replica() after tablet movement, possibly because the node is still behind group0 changes.	2023-12-12 12:46:39 +01:00
Botond Dénes	493b6bc65f	Merge 'Guard tables in compaction tasks' from Benny Halevy Currently, if a compaction function enters the table or compaction_group async_gate, we can't stop it on the table/compaction_group stop path as they co_await their respective async_gate.close(). This series introduces a table_ptr smart pointer to guards the table object by entering its async_gate, and it also defers awaiting the gate.close future till after stopping ongoing compaction so that closing the gate will prevent starting new compactions while ongoing compaction can be stopped and finally awaiting the close() future will wait for them to unwind and exit the gate after being stopped. Fixes #16305 Closes scylladb/scylladb#16351 * github.com:scylladb/scylladb: compaction: run_on_table: skip compaction also on gate_closed_exception compaction: run_on_table: hold table table: add table_holder and hold method table: stop: allow compactions to be stopped while closing async_gate	2023-12-12 12:50:17 +02:00
Botond Dénes	885a807c71	Merge 'api: storage_service: api for starting async compaction' from Aleksandra Martyniuk For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api. Closes scylladb/scylladb#15092 * github.com:scylladb/scylladb: test: use async api in test_not_created_compaction_task_abort test: test compaction task started asynchronously api: tasks: api for starting async compaction api: compaction: pass pointer to top level compaction tasks	2023-12-12 12:06:52 +02:00
Asias He	5f20e33e15	api: Reject unsupported http api options for repair If an option is not supported, reject the request instead of silently ignoring the unsupported options. It prevents the user thinks the option is supported but it is ignored by scylla core. Fixes #16299 Closes scylladb/scylladb#16300	2023-12-12 09:18:00 +02:00
Benny Halevy	7843025a53	compaction: run_on_table: skip compaction also on gate_closed_exception Similar to the no_such_column_family error, gate_closed_exception indicates that the table is stopped and we should skip compaction on it gracefully. Fixes #16305 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:46:37 +02:00
Benny Halevy	92c718c60a	compaction: run_on_table: hold table To ensure the table will not be dropped while the compaction task is ongoing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:45:59 +02:00
Benny Halevy	cddcf3ad0c	table: add table_holder and hold method A smart pointer that guards the table object while it's being used by async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:43:49 +02:00
Benny Halevy	c8768f9102	table: stop: allow compactions to be stopped while closing async_gate To make sure a table object is kept valid throughout the lifetime of compaction a following patch will enter the table's _async_gate when the compaction task starts. This change defers awaiting the gate.close future till after stopping ongoing compaction so that closing the gate will prevent starting new compactions while ongoing compaction can be stopped and finally awaiting the close() future will wait for them to unwind and exit the gate after being stopped. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:31:50 +02:00
Anna Stuchlik	ff2457157d	doc: add the 5.4-to-5.5 upgrade guide This commit adds the upgrade guide from version 5.4 to 5.5. Also, it removes all previous OSS guides not related to version 5.5. The guide includes the required Raft-related information. NOTE: The content of the guide must be further verified closer to the release. I'm making these updates now to avoid errors and warnings related to outdated upgrade guides in other PRs, and to include the Raft information. Closes scylladb/scylladb#16350	2023-12-11 16:58:43 +01:00
Botond Dénes	3c125891f4	Update ./tools/java submodule * ./tools/java 26f5f71c...29fe44da (3): > tools: catch and print UnsupportedOperationException > tools/SSTableMetadataViewer: continue if sstable does not exist > throw more informative error when fail to parse sstable generation Fixes: scylladb/scylla-tools-java#360	2023-12-11 17:08:01 +02:00
Tomasz Grabiec	a33d45f889	streaming: Keep table by shared ptr to avoid crash on table drop The observed crash was in the following piece on "cf" access: if (*table_is_dropped) { sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name()); Fixes #16181	2023-12-11 14:58:04 +01:00
Calle Wilund	b34366957e	commitlog_test::test_commitlog_reader: handle segment_truncation Fixes #16312 This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments).	2023-12-11 11:53:12 +00:00
Calle Wilund	d85c0ea26f	commitlog_test: coroutinize test_commitlog_reader To make it easier to read and modify.	2023-12-11 11:47:48 +00:00
Takuya ASADA	7c38aff368	scylla_swap_setup: fix AttributeError On `dffadabb94` we mistakenly added "if args.overwrite_unit_file", but the option is comming from unmerged patch. So we need to drop this to fix script error. Fixes #16331 Closes scylladb/scylladb#16358	2023-12-11 13:41:00 +02:00
Tomasz Grabiec	effb9fb3cb	Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620). If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957). When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary. We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`. Fixes: #7620 Fixes: #13957 --- This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller. I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR. Closes scylladb/scylladb#16242 * github.com:scylladb/scylladb: docs: describe group 0 schema versioning in raft docs test: add test for group 0 schema versioning feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations schema_tables: use schema version from group 0 if present migration_manager: store `group0_schema_version` in `scylla_local` during schema changes system_keyspace: make `get/set_scylla_local_param` public feature_service: add `GROUP0_SCHEMA_VERSIONING` feature	2023-12-11 12:17:57 +01:00
Eliran Sinvani	befd910a06	install-dependencies.sh : Add packages for supporting code coverage As part of code coverage we need some additional packages in order to being able to process the code coverage data and being able to provide some meaningful information in logs. Here we add the following packages: fedora packages: ---------------- lcov - A package of utilities to manipulate lcov traces and generate coverage html reports fedora python3 packages: ------------------------ The following packages are added into fedora_packages and not the python3_packages since we don't need them to be packaged into scylla-python3 package but we only require them for the build environment. python3-unidiff - A python library for working with patch files, this is required in order to generate "patch coverage" reports. python3-humanfriendly - A python library to format some quantities into a human readable strings (time spans, sizes, etc...) we use it to print meaningful logs that tracks the volume and time it takes to process coverage data so we can better debug and optimize it in the future. python3-jinja3 - This is a template based generator that will eventually will allow to consolidate and rearrange several reports into one so we can publish a single report "site" for all of the coverage information. For example, include both, coverage report as well as patch report in a tab based site. pip packages: ------------- treelib - A tree data structure that supports also pretty printing of the tree data. We use it to log the coverage processing steps in order to have debugging capabilities in the future. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes scylladb/scylladb#16330 [avi: regenerate toolchain] Closes scylladb/scylladb#16357	2023-12-11 13:12:05 +02:00
Aleksandra Martyniuk	31977a1cde	test: use async api in test_not_created_compaction_task_abort	2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk	68f6886d50	test: test compaction task started asynchronously Check whether task id returned by asynchronous api is correct and whether tasks of proper type are created.	2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk	b485897704	api: tasks: api for starting async compaction For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api.	2023-12-11 11:39:33 +01:00
Takuya ASADA	cc90ff1646	scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+ Since we dropped CentOS7 support, now we can switch from "PermissionsStartsOnly=True" to "ExecStartPre=+". Fixes scylladb/scylla-enterprise#1067	2023-12-11 19:38:28 +09:00
Takuya ASADA	6f1fff58ba	dist: drop legacy control group parameters Since we dropped CentOS7 support, now we can drop legacy control group parameters which is deprecated on systemd v252.	2023-12-11 19:38:28 +09:00
Takuya ASADA	dcb5fd6fce	scylla-server.slice: Drop workaround for MemorySwapMax=0 bug It was workaround for https://github.com/systemd/systemd/issues/8363, but the bug was fixed at `906bdbf5e7` and merged from systemd v239-8. Since we dropped support CentOS7, now we don't need the workaround anymore.	2023-12-11 19:38:28 +09:00
Takuya ASADA	6d7cb97645	dist: move AmbientCapabilities to scylla-server.service Since we dropped support CentOS7, now we always can use AmbientCapabilities without systemd version check, so we can move it from capabilities.conf to scylla-server.service. Although, we still cannnot hardcode CAP_PERFMON since it is too new, only newer kernel supported this, so keep it on scylla_post_install.sh	2023-12-11 19:38:28 +09:00
Takuya ASADA	1dc4feb68d	Revert "scylla_setup: add warning for CentOS7 default kernel" This reverts commit `85339d1820`.	2023-12-11 19:38:28 +09:00
Aleksandra Martyniuk	ceec5577d8	api: compaction: pass pointer to top level compaction tasks As a preparation for asynchronous compaction api, from which we cannot take values by reference, top level compaction tasks get pointers which need to be set to nullptr when they are not needed (like in async api).	2023-12-11 11:36:10 +01:00
Nadav Har'El	12f0007ede	Merge 'Skip auto snapshot for non-local storages' from Pavel Emelyanov When a table is truncated or dropped it can be auto-snapshotted if the respective config option is set (by default it is). Non local storages don't implement snapshotting yet and emit on_internal_error() in that case aborting the whole process. It's better to skip snapshot with a warning instead. Closes scylladb/scylladb#16220 * github.com:scylladb/scylladb: database: Do not auto snapshot non-local storages' tables database: Simplify snapshot booleans in truncate_table_on_all_shards()	2023-12-11 12:13:48 +02:00
Petr Gusev	b6fbbe28aa	storage_service: topology_state_load: fill new token_metadata For each inet_address-based modification of token_metadata we make a corresponding host_id-based change in token_metadata->get_new(). The _gossiper.add_saved_endpoint logic is switched to the new token_metadata.	2023-12-11 12:51:34 +04:00
Piotr Dulikowski	e7e1c4e63c	storage_service: adjust update_topology_change_info to update new token_metadata Both versions of the token_metadata need to be updated. For the new version we provide a dc_rack_fn function which looks for dc_rack by host_id in topology_state_machine if raft topology is on. Otherwise, it looks for IP for the given host_id and falls back to the gossiper-based function get_dc_rack_for.	2023-12-11 12:51:34 +04:00
Petr Gusev	66c30e4f8e	topology: set self host_id on the new topology With this commit, we begin the next stage of the refactoring - updating the new version of the token_metadata in all places where the old version is currently being updated. In this commit we assign host_id of this node, both in main.cc and in boost tests.	2023-12-11 12:51:34 +04:00
Petr Gusev	e4253776a1	locator::topology: allow being_replaced and replacing nodes to have the same IP When we're replacing a node with the same IP address, we want the following behavior: * host_id -> IP mapping should work and return the same IP address for two different host_ids - old and new. * the IP -> host_id mapping should return the host_id of the old (replaced) host. This variant is most convenient for preserving the current behavior of the code, especially the functions maybe_remove_node_being_replaced, erm::get_natural_endpoints_without_node_being_replaced, erm::get_pending_endpoints. The 'being_replaced' node will be properly removed in maybe_remove_node_being_replaced and 'replacing' node will be added to the pending_endpoints.	2023-12-11 12:51:34 +04:00
Petr Gusev	5a1418fdba	token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known This commit fixes an inconsistency in method names: get_host_id and get_host_id_if_known are (internal_error, returns null), but there was only one method for the opposite conversion - get_endpoint_for_host_id, and it returns null. In this commit we change it to on_internal_error if it can't find the argument and add another method get_endpoint_for_host_id_if_known which returns null in this case. We can't use get_endpoint_for_host_id/get_host_id in host_id_or_endpoint::resolve since it's called from storage_service::parse_node_list -> token_metadata::parse_host_id_and_endpoint, and exceptions are caught and handled in `storage_service::parse_node_list`.	2023-12-11 12:51:34 +04:00
Petr Gusev	08b47d645a	token_metadata: get_host_id: exception -> on_internal_error It's a bug to use get_host_id on a non-existent endpoint, so on_internal_error is more appropriate. Also, it's easier to debug since it provides a backtrace. If a missing inet_address is expected, get_host_id_if_known should be used instead. We update one such case in storage_service::force_remove_completion. Other usages of get_host_id are correct.	2023-12-11 12:51:34 +04:00
Petr Gusev	39bbe5f457	token_metadata: add get_all_ips method This is convenient for migrating code that uses get_all_endpoints.	2023-12-11 12:51:34 +04:00
Petr Gusev	9edf0709e6	token_metadata: support host_id-based version In this commit we enhance token_metadata with a pointer to the new host_id-based generic_token_metadata specialisation (token_metadata2). The idea is that in the following commits we'll go over all token_metadata modifications and make the corresponding modifications to its new host_id-based alternative. The pointer to token_metadata2 is stored in the generic_token_metadata::_new_value field. The pointer can be mutable, immutable, or absent altogether (std::monostate). It's mutable if this generic_token_metadata owns it, meaning it was created using the generic_token_metadata(config cfg) constructor. It's immutable if the generic_token_metadata(lw_shared_ptr<const token_metadata2> new_value); constructor was used. This means this old token_metadata is a wrapper for new token_metadata and we can only use the get_new() method on it. The field _new_value is empty for the new host_id-based token_metadata version. The generic_token_metadata(std::unique_ptr<token_metadata_impl<NodeId>> impl, token_metadata2 new_value); constructor is used for clone methods. We clone both versions, and we need to pass a cloned token_metadata2 into constructor. There are two overloads of get_new, for mutable and immutable generic_token_metadata. Both of them throws an exception if they can't get the appropriate pointer. There is also a get_new_strong method, which returns an immutable owning pointer. This is convenient since a lot of API's want an owning pointer. We can't make the get_new/get_new_strong API simpler and use get_new_strong everywhere since it mutate the original generic_token_metadata by incrementing the reference counter and this causes raises when it's passed between shards in replicate_to_all_cores.	2023-12-11 12:51:34 +04:00
Petr Gusev	63f64f3303	token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. generic_token_metadata::update_topology overload with host_id parameter is added to make update_topology_change_info work, it now uses NodeId as a parameter type. topology::remove_endpoint(host_id) is added to make generic_token_metadata::remove_endpoint(NodeId) work. pending_endpoints_for and endpoints_for_reading are just removed - they are not used and not implemented. The declarations were left by mistake from a refactoring in which these methods were moved to erm. generic_token_metadata_base is extracted to contain declarations, common to both token_metadata versions. Templates are explicitly instantiated inside token_metadata.cc, since implementation part is also a template and it's not exposed to the header. There are no other behavioral changes in this commit, just syntax fixes to make token_metadata a template.	2023-12-11 12:51:34 +04:00
Petr Gusev	c9fbe3d377	locator: make dc_rack_fn a template In the next commits token_metadata will be made a template with NodeId=inet_address\|host_id parameter. This parameter will be passed to dc_rack_fn function, so it also should be made a template.	2023-12-11 12:51:33 +04:00
Piotr Dulikowski	5227b71363	locator/topology: add key_kind parameter For the host_id-based token_metadata we want host_id to be the main node key, meaning it should be used in add_or_update_endpoint to find the node to update. For the inet_address-based token_metadata version we want to retain the old behaviour during transition period. In this commit we introduce key_kind parameter and use key_kind::inet_address in all current topology usages. Later we'll use key_kind::host_id for the new token_metadata. In the last commits of the series, when the new token_metadata version is used everywhere, we will remove key_kind enum.	2023-12-11 12:51:33 +04:00
Petr Gusev	2f137776c3	token_metadata: topology_change_info: change field types to token_metadata_ptr In subsequent commits we'll need the following api for token_metadata: token_metadata(token_metadata2_ptr); get_new() -> token_metadata2* where token_metadata2 is the new version of token_metadata, based on host_id. In other words: * token_metadata knows the new version of itself and returns a pointer to it through get_new() * token_metadata can be constructed based solely on the new version, without its own implementation. In this case the only method we can use on it is get_new. This allows to pass token_metadata2 to API's with token_metadata in method signature, if these APIs are known to only use the get_new method on the passed token_metadata. And back to topology_change_info - if we got it from the new token_metadata we want to be able to construct token_metadata from token_metadata2 contained in it, and this requires it to be a ptr, not value.	2023-12-11 12:51:33 +04:00
Petr Gusev	f21f23483c	token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-11 12:51:22 +04:00
Alexander Turetskiy	f30b5473ab	cql: Reject empty options while altering a keyspace Reject ALTER KEYSPACE request for NetworkTopologyStrategy when replication options are missed. Also reject CREATE KEYSPACE with no replication factor options. Cassandra has a default_keyspace_rf configuration that may allow such CREATE KEYSPACE commands, but Scylla doesn't have this option (refs #16028). fixes #10036 Closes scylladb/scylladb#16221	2023-12-10 17:44:35 +02:00
Kefu Chai	818343b57d	build: build session.cc in CMake building system this source file was added in `d3d83869`. so let's update cmake as well. sessions_tests was added in the same commit, so add it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16344	2023-12-09 22:14:47 +02:00
Avi Kivity	d62a5fc60b	Merge 'tools/scylla-nodetool: implement additional commands, part 5/N ' from Botond Dénes This PR implements the following new nodetool commands: * decomission * rebuild * removenode * getlogginglevels * setlogginglevel * move * refresh All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#16348 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the refresh command tools/scylla-nodetool: implement the move command tools/scylla-nodetool: implement setlogginglevel command tools/sclla-sstable: implement the getlogginglevels command tools/scylla-nodetool: implement the removenode command tools/scylla-nodetool: implement the rebuild command tools/scylla-nodetool: implement the decommission command	2023-12-09 21:47:22 +02:00
Pavel Emelyanov	5e69415387	guardrails: Do not validate initial_tablets as replication factor When checking replication strategy options the code assumes (and it's stated in the preceeding code comment) that all options are replication factors. Nowadays it's no longer so, the initial_tablets one is not replication factor and should be skipped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16335	2023-12-09 15:56:41 +02:00
Kamil Braun	3352d9bccc	docs: describe group 0 schema versioning in raft docs	2023-12-08 17:46:31 +01:00
Kamil Braun	30fc36f8d2	test: add test for group 0 schema versioning Perform schema changes while mixing nodes in RECOVERY mode with nodes in group 0 mode: - schema changes originating from RECOVERY node use digest-based schema versioning. - schema changes originating from group 0 nodes use persisted versions committed through group 0. Verify that schema versions are in sync after each schema change, and that each schema change results in a different version. Also add a simple upgrade test, performing a schema change before we enable Raft (which also enables the new versioning feature) in the entire cluster, then once upgrade is finished. One important upgrade test is missing, which we should add to dtest: create a cluster in Raft mode but in a Scylla version that doesn't understand GROUP0_SCHEMA_VERSIONING. Then start upgrading to a version that has this patchset. Perform schema changes while the cluster is mixed, both on non-upgraded and on upgraded nodes. Such test is especially important because we're adding a new column to the `system.scylla_local` table (which we then redact from the schema definition when we see that the feature is disabled).	2023-12-08 17:46:31 +01:00
Kamil Braun	7dad31c78f	feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode As promised in earlier commits: Fixes: #7620 Fixes: #13957 Also modify two test cases in `schema_change_test` which depend on the digest calculation method in their checks. Details are explained in the comments.	2023-12-08 17:46:31 +01:00
Kamil Braun	522540da40	schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 As explained in the previous commit, we use the new `committed_by_group0` flag attached to each row of a `scylla_tables` mutation to decide whether the `version` cell needs to be deleted or not. The rest of #13957 is solved by pre-existing code -- if the `version` column is present in the mutation, we don't calculate a hash for `schema::version()`, but take the value from the column: ``` table_schema_version schema_mutations::digest(db::schema_features sf) const { if (_scylla_tables) { auto rs = query::result_set(_scylla_tables); if (!rs.empty()) { auto&& row = rs.row(0); auto val = row.get<utils::UUID>("version"); if (val) { return table_schema_version(val); } } } ... ``` The issue will therefore be fixed once we enable `GROUP0_SCHEMA_VERSIONING`.	2023-12-08 17:46:31 +01:00
Kamil Braun	defcf9915c	migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations As described in #13957, when creating or altering a table in group 0 mode, we don't want each node to calculate `schema::version()`s independently using a hash algorithm. Instead, we want to all nodes to use a single version for that table, commited by the group 0 command. There's even a column ready for this in `system.scylla_tables` -- `version`. This column is currently being set for system tables, but it's not being used for user tables. Similarly to what we did with global schema version in earlier commits, the obvious thing to do would be to include a live cell for the `version` column in the `system.scylla_tables` mutation when we perform the schema change in Raft mode, and to include a tombstone when performing it outside of Raft mode, for the RECOVERY case. But it's not that simple because as it turns out, we're already sending a `version` live cell (and also a tombstone, with timestamp decremented by 1) in all `system.scylla_tables` mutations. But then we delete that cell when doing schema merge (which begs the question why were we sending it in the first place? but I digress): ``` // We must force recalculation of schema version after the merge, since the resulting // schema may be a mix of the old and new schemas. delete_schema_version(mutation); ``` the above function removes the `version` cell from the mutation. So we need another way of distinguishing the cases of schema change originating from group 0 vs outside group 0 (e.g. RECOVERY). The method I chose is to extend `system.scylla_tables` with a boolean column, `committed_by_group0`, and extend schema mutations to set this column. In the next commit we'll decide whether or not the `version` cell should be deleted based on the value of this new column.	2023-12-08 17:46:31 +01:00
Kamil Braun	87b2c8a041	schema_tables: use schema version from group 0 if present As promised in the previous commit, if we persisted a schema version through a group 0 command, use it after a schema merge instead of calculating a digest. Ref: #7620 The above issue will be fixed once we enable the `GROUP0_SCHEMA_VERSIONING` feature.	2023-12-08 17:46:31 +01:00
Kamil Braun	3db8ac80cb	migration_manager: store `group0_schema_version` in `scylla_local` during schema changes We extend schema mutations with an additional mutation to the `system.scylla_local` table which: - in Raft mode, stores a UUID under the `group0_schema_version` key. - outside Raft mode, stores a tombstone under that key. As we will see in later commits, nodes will use this after applying schema mutations. If the key is absent or has a tombstone, they'll calculate the global schema digest on their own -- using the old way. If the key is present, they'll take the schema version from there. The Raft-mode schema version is equal to the group 0 state ID of this schema command. The tombstone is necessary for the case of performing a schema change in RECOVERY mode. It will force a revert to the old digest-based way. Note that extending schema mutations with a `system.scylla_local` mutation is possible thanks to earlier commits which moved `system.scylla_local` to schema commitlog, so all mutations in the schema mutations vector still go to the same commitlog domain. Also, since we introduce a replicated tombstone to `system.scylla_local`, we need to set GC grace to nonzero. We set it to `schema_gc_grace`, which makes sense given the use case.	2023-12-08 17:45:41 +01:00
Botond Dénes	496459165e	tools/scylla-nodetool: implement the refresh command	2023-12-08 08:58:16 -05:00
Botond Dénes	ad148a9dbc	tools/scylla-nodetool: implement the move command In the java nodetool, this command ends up calling an API endpoint which just throws an exception saying moving tokens is not supported. So in the native implementation we just throw an exception to the same effect in scylla-nodetool itself.	2023-12-08 08:29:39 -05:00
Botond Dénes	58d3850da1	tools/scylla-nodetool: implement setlogginglevel command	2023-12-08 08:18:56 -05:00
Botond Dénes	3a8590e1af	tools/sclla-sstable: implement the getlogginglevels command	2023-12-08 07:32:45 -05:00
Botond Dénes	c35ed794de	tools/scylla-nodetool: implement the removenode command	2023-12-08 07:32:31 -05:00
Botond Dénes	9a484cb145	tools/scylla-nodetool: implement the rebuild command	2023-12-08 07:05:30 -05:00
Botond Dénes	ea62f7c848	tools/scylla-nodetool: implement the decommission command	2023-12-08 06:14:36 -05:00
Kefu Chai	893f319004	sstables: add formatter for index_consume_entry_context_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, in order to enable the code in the header to access the formatter without being moved down after the full specialization's definition, we * move the enum definition out of the class and before the class, * rename the enum's name from state to index_consume_entry_context_state * define a formatter for index_consume_entry_context_state * remove its operator<<(). as fmt v10 is able to use `format_as()` as a fallback, the formatter full specialization is guarded with `#if FMT_VERSION < 10'00'00`. we will remove it after we start build with fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16204	2023-12-08 12:45:38 +02:00
Kurashkin Nikita	c071cd92b5	cql3:statement_restrictions.cc add more conditions to prevent "allow filtering" error to pop up in delete/update statements Modified Cassandra tests to check for Scylla's error messages Fixes #12474 Closes scylladb/scylladb#15811	2023-12-07 21:25:18 +02:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Avi Kivity	4b1ef00dbb	Merge 'File stream for tablet preparation' from Asias He This series adds preparation patches for file stream tablet implementation in enterprise branch. It minimizes the differences between those two branches. Closes scylladb/scylladb#16297 * github.com:scylladb/scylladb: messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb compaction_group_for_token: Handle minimum_token and maximum_token token serializer: Add temporary_buffer support cql_test_env: Allow messaging_service to start listen	2023-12-07 16:26:22 +02:00
Pavel Emelyanov	3eaadfcd4a	database: Do not auto snapshot non-local storages' tables Snapshotting is not yet supported for those (see #13025) and auto-snapshot would step on internal error. Skip it and print a warning into logs fixes #16078 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:47:12 +03:00
Avi Kivity	ed2a9b8750	Merge 'Commitlog: Fix reading/writing position calculations and allocation size checks' from Calle Wilund Fixes #16298 The adjusted buffer position calculation in buffer_position(), introduced in https://github.com/scylladb/scylladb/pull/15494 was in fact broken. It calculated (like previously) a "position" based on diff between underlying buffer size and ostream size() (i.e. avail), then adjusted this according to sector overhead rules. However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted. The two cannot be compared as such, which means the "positions" we get here are borked. Luckily for us (sarcasm), the position calculation in replayer made a similar error, in that it adjusts up current position by one sector overhead to much, leading to us more or less getting the same, erroneous results in both ends. However, when/iff one needs to adjust the segment file format further, one might very quickly realize that this does not work well if, say, one needs to be able to safely read some extra bytes before first chunk in a segment. Conversely, trying to adjust this also exposes a latent potential error in the skip mechanism, manifesting here. Issue fixed by keeping track of the initial ostream capacity for segment buffer, and use this for position calculation, and in the case of replayer, move file pos adjustment from read_data() to subroutine (shared with skipping), that better takes data stream position vs. file position adjustment. In implementaion terms, we first inc the "data stream" pos (i.e. pos in data without overhead), then adjust for overhead. Also fix replayer::skip, so that we handle the buffer/pos relation correctly now. Added test for intial entry position, as well as data replay consistency for single entry_writer paths. Fixes #16301 The calculation on whether data may be added is based on position vs. size of incoming data. However, it did not take sector overhead into account, which lead us to writing past allowed segment end, which in turn also leads to metrics overflows. Closes scylladb/scylladb#16302 * github.com:scylladb/scylladb: commitlog: Fix allocation size check to take sector overhead into account. commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart	2023-12-07 12:27:54 +02:00
Pavel Emelyanov	44c076472c	database: Simplify snapshot booleans in truncate_table_on_all_shards() There are three of them in this function -- with_snapshot argument, auto_snapshot local copy of db::config option and the should_snapshot local variable that's && of the above two. The code can go with just one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:06:28 +03:00
Botond Dénes	fb9379edf1	test/cql-pytest: test_select_from_mutation_fragments: bump timeout for slow test The test test_many_partitions is very slow, as it tests a slow scan over a lot of partitions. This was observed to time out on the slower ARM machines, making the test flaky. To prevent this, create an extra-patient cql connection with a 10 minutes timeout for the scan itself. Fixes: #16145 Closes scylladb/scylladb#16303	2023-12-07 11:55:53 +02:00
Yaniv Kaul	862909ee4f	Typos: fix typos in documentation Using codespell, went over the docs and fixed some typos. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16275	2023-12-07 11:10:17 +02:00
Anna Stuchlik	8b01cb7fb8	doc: set 5.4 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: - 5.4 is the latest version. - 5.4 is removed from the list of unstable versions. It must be merged when ScyllaDB 5.4 is released. No backport is required. Closes scylladb/scylladb#16308	2023-12-07 10:04:26 +02:00
Pavel Emelyanov	76705b6ba2	test/s3: Avoid object range overflow There's a test case the validates uploading sink by getting random portions of the uploaded object. The portions are generated as len = random % chunk_size off = random % file_size - len The latter may apparently render negative value which will translate into huuuuge 64-bit range offset which, in turn, would result in invalid http range specifier and getting object part fails with status OK Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 10:54:54 +03:00
Pavel Emelyanov	3e9309caf4	s3/client: Handle GET-with-Range overflows correctly The get_object_contiguous() accepts optional range argument in a form of offset:lengh and then converts it into first_byte:last_byte pair to satisfy http's Range header range-specifier. If the lat_byte, which is offset + lenght - 1, overflows 64-bits the range specifier becomes invalid. According to RFC9110 servers may ignore invalid ranges if they want to and this is what minio does. The result is pretty interesting. Since the range is specified, client expect PartialContent response, but since the range is ignored by server the result is OK, as if the full object was requested. So instead of some sane "overflow" error, the get_object_contiguous() fails with status "success". The fix is in pre-checking provided ranges and failing early Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 10:50:55 +03:00
Calle Wilund	dba39b47bd	commitlog: Fix allocation size check to take sector overhead into account. Fixes #16301 The calculation on whether data may be added is based on position vs. size of incoming data. However, it did not take sector overhead into account, which lead us to writing past allowed segment end, which in turn also leads to metrics overflows.	2023-12-07 07:36:27 +00:00
Calle Wilund	0d35c96ef4	commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart Fixes #16298 The adjusted buffer position calculation in buffer_position(), introduced in #15494 was in fact broken. It calculated (like previously) a "position" based on diff between underlying buffer size and ostream size() (i.e. avail), then adjusted this according to sector overhead rules. However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted. The two cannot be compared as such, which means the "positions" we get here are borked. Luckily for us (sarcasm), the position calculation in replayer made a similar error, in that it adjusts up current position by one sector overhead to much, leading to us more or less getting the same, erroneous results in both ends. However, when/iff one needs to adjust the segment file format further, one might very quickly realize that this does not work well if, say, one needs to be able to safely read some extra bytes before first chunk in a segment. Conversely, trying to adjust this also exposes a latent potential error in the skip mechanism, manifesting here. Issue fixed by keeping track of the initial ostream capacity for segment buffer, and use this for position calculation, and in the case of replayer, move file pos adjustment from read_data() to subroutine (shared with skipping), that better takes data stream position vs. file position adjustment. In implementaion terms, we first inc the "data stream" pos (i.e. pos in data without overhead), then adjust for overhead. Also fix replayer::skip, so that we handle the buffer/pos relation correctly now. Added test for intial entry position, as well as data replay consistency for single entry_writer paths.	2023-12-07 07:36:27 +00:00
Asias He	6beadab9e6	messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb They will be used to implement file stream for tablet in the future. Reserve the verb ID.	2023-12-07 14:54:12 +08:00
Asias He	67cfa12c7d	compaction_group_for_token: Handle minimum_token and maximum_token token The following error was seen: [shard 0] table - compaction_group_for_token: compaction_group idx=0 range=(minimum token,-6917529027641081857] does not contain token=minimum token Since minimum_token or maximum_token will not be inside a token range. Skip the in token range check.	2023-12-07 14:54:12 +08:00
Asias He	974b28a750	serializer: Add temporary_buffer support It will be used by file stream for tablet.	2023-12-07 09:46:37 +08:00
Asias He	faaf58f62c	cql_test_env: Allow messaging_service to start listen This is needed for rpc calls to work in the tests. With this patch, by default, messaging_service does not listen as it was before. This is useful for file stream for tablet test.	2023-12-07 09:46:36 +08:00
Avi Kivity	92d61def57	Merge 'scylla_swap_setup: run error check before allocating swap and increase swap allocation speed' from Takuya ASADA This patch fixes error check and speed up swap allocation. Following patches are included: - scylla_swap_setup: run error check before allocating swap avoid create swapfile before running error check - scylla_swap_setup: use fallocate on ext4 this inclease swap allocation speed on ext4 Closes scylladb/scylladb#12668 * github.com:scylladb/scylladb: scylla_swap_setup: use fallocate on ext4 scylla_swap_setup: run error check before allocating swap	2023-12-06 21:40:10 +02:00
Avi Kivity	55dacb8480	Merge 'Generalize atomic sstables deletion' from Pavel Emelyanov The current implementation starts in sstables_manager that gets the deletion function from storage which, in turn, should atomically do sst.unlink() over a list of sstables (s3 driver is still not atomic though #13567). This PR generalizes the atomic deletion inside sstables_manager method and removes the atomic deletor function that nobody liked when it was introduced (#13562) Closes scylladb/scylladb#16290 * github.com:scylladb/scylladb: sstables/storage: Drop atomic deleter sstables/storage: Reimplement atomic deletion in sstables_manager sstables/storage: Add prepare/complete skaffold for atomic deletion	2023-12-06 19:48:07 +02:00
Tomasz Grabiec	7d0f4c10a2	test: tablets: Add test for failed streaming being fenced away	2023-12-06 18:37:01 +01:00
Tomasz Grabiec	083a0279a9	error_injection: Introduce poll_for_message() To allow more complex waiting, which involves other exit conditions.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	ce0dc9e940	error_injection: Make is_enabled() public	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	733eb21601	api: Add API to kill connection to a particular host For testing failure scenarios.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	9dac0febce	range_streamer: Do not block topology change barriers around streaming Streaming was keeping effective_replication_map_ptr around the whole process, which blocks topology change barriers. This will inhibit progress of tablet load balancer or concurrent migrations, resulting in worse performance. Fix by switching to the most recent erm on sharder calls. multishard_writer calls shard_of() for each new partition. A better way would be to switch immediately when topology version changes, but this is left for later.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	c228f2c940	range_streamer, tablets: Do not keep token metadata around streaming It holds back global token metadata barrier during streaming, which limits parallelism of load balancing.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	7a59acf248	tablets: Fail gracefully when migrating tablet has no pending replica Before the patch we SIGSEGV trying to access pending replica in this case. Fail early instead.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	d1c1b59236	storage_service, api: Add API to disable tablet balancing Load balancing needs to be disabled before making a series of manual migrations so that we don't fight with the load balancer. Also will be used in tests to ensure tablets stick to expected locations.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	1f57d1ea28	storage_service, api: Add API to migrate a tablet Will be used in tests, or for hot fixes in production.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	31c995332c	storage_service, raft topology: Run streaming under session topology guard Prevents stale streaming operation from running beyond topology operation they were started in. After the session field is cleared, or changed to something else, the old topology_guard used by streaming is interrupted and fenced and the next barrier will join with any remaining work.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	080169cad6	storage_service, tablets: Use session to guard tablet streaming	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	5381792401	tablets: Add per-tablet session id field to tablet metadata range_streamer will pick it up when creating topology_guard. It's materialized in memory only for migrating tablets in tablet_transition_info.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	fd3c089ccc	service: range_streamer: Propagate topology_guard to receivers	2023-12-06 18:36:16 +01:00
Tomasz Grabiec	063095ea50	streaming: Always close the rpc::sink rpc::sink::~sink aborts if not closed. There is a try/catch clause which ensures that close() is called, but there was code after sink is created which is not covered by it. Move sink construction past that code.	2023-12-06 18:35:41 +01:00
Nadav Har'El	300e549267	tablets, mv: disable self-pairing when tablets are used A write to a base table can generate one or more writes to a materialized view. The write to RF base replicas need to cause writes to RF view replicas. Our MV implementation, based on Cassandra's implementation, does this via "pairing": Each one of the base replicas involved in this write sends each view update to exactly one view replica. The function get_view_natural_endpoint() tells a base replica which of the view replicas it should send the update to. The standard pairing is based on the ring order: The first owner of the base token sends to the first owner of the view token, the second to the second, and so on. However, the existing code also uses an optimization we call self-pairing: If a single node is both a base replica and a base replica, the pairing is modified so this node sends the update to itself. This patch disables the self-pairing optimization in keyspaces that use tablets: The self-pairing optimization can cause the pairing to change after token ranges are moved between nodes, so it can break base-view consistency in some edge cases, leading to "ghost rows". With tablets, these range movements become even more frequent - they can happen even if the cluster doesn't grow. This is why we want to solve this problem for tablets. For backward compatibility and to avoid sudden inconsistencies emerging during upgrades, we decided to continue using the self-pairing optimization for keyspaces that are not using tablets (i.e., using vnoodes). Currently, we don't introduce a "CREATE MATERIALIZED VIEW" option to override these defaults - i.e., we don't provide a way to disable self-pairing with vnodes or to enable them with tablets. We could introduce such a schema flag later, if we ever want to (and I'm not sure we want to). It's important to note, that in some cases, this change has implications on when view updates become synchronous, in the tablets case. For example: * If we have 3 nodes and RF=3, with the self-pairing optimization each node is paired with itself, the view update is local, and is implicitly synchronous (without requiring a "synchronous_updates" flag). * In the same setup with tablets, without the self-pairing optimization (due to this patch), this is not guaranteed. Some view updates may not be synchronous, i.e., the base write will not wait for the view write. If the user really wants synchronous updates, they should be requested explicitly, with the "synchronous_updates" view option. Fixes #16260. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16272	2023-12-06 17:11:17 +02:00
Kefu Chai	f483309165	compaction, api: drop unused functions run_on_existing_tables() is not used at all. and we have two of them. in this change, let's drop them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16304	2023-12-06 14:31:08 +02:00
Takuya ASADA	f90c10260f	scylla_post_install.sh: Add CAP_PERFMON to AmbientCapabilities Add CAP_PERFMON to AmbientCapabilities in capabilities.conf, to enable perf_event based stall detector in Seastar. However, on Debian/Ubuntu CAP_PERFMON with non-root user does not work because it sets kernel.perf_event_paranoid=4 which disallow all non-root user access. (On Debian it kernel.perf_event_paranoid=3) So we need to configure kernel.perf_event_paranoid=2 on these distros. see: https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do Also, CAP_PERFMON is only available on linux-5.8+, older kernel does not have this capability. To enable older kernel environment such as CentOS7, we need to configure kernel.perf_event_paranoid=1 to allow non-root user access even without the capability. Fixes #15743 Closes scylladb/scylladb#16070	2023-12-06 13:53:08 +02:00
Avi Kivity	3e8f37f0a4	Update seastar submodule * seastar 55a821524d...ae8449e04f (22): > Revert "Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai" > http/exception: Make unexpected status message more informative > docker: bump up to clang {16,17} and gcc {12,13} > doc: replace space (0xA0) in unicode with ASCII space (0x20) > file: Remove reactor class friendship > dpdk: adjust for poller in internal namespace > http: make_requests accept optional expected > Merge 'future: future_state_base: assert owner shard in debug mode' from Benny Halevy > Merge 'Keep pollers in internal/poll.hh' from Pavel Emelyanov > sharded: access instance promise only on instance shard > test: network_interface_test: add tests for format and parse > Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc (v2) > reactor: set local_engine after it is fully initialized > build: do not error when running into GCC BZ-1017852 > Merge 'shared_future: make available() immediate after set_value()' from Piotr Dulikowski > tls: add format_as(subject_alt_name_type) overload > tls: linearize small packets on send > shared_future: remove unused #include > shared_ptr: add fmt::formatter for shared_ptr types > lazy: add fmt::formatter for lazy_eval types > Merge 'file: use unbuffered generator in experimental_list_directory()' from Kefu Chai Closes scylladb/scylladb#16274	2023-12-06 13:24:53 +02:00
Kamil Braun	9b73bff752	docs: raft: mention unavailability for topology changes under quorum loss Closes scylladb/scylladb#16307	2023-12-06 13:18:28 +02:00
Botond Dénes	56c3515751	Merge 'doc: fix Rust Driver release information' from Anna Stuchlik This PR removes the incorrect information that the ScyllaDB Rust Driver is not GA. In addition, it replaces "Scylla" with "ScyllaDB". Fixes https://github.com/scylladb/scylladb/issues/16178 (nobackport) Closes scylladb/scylladb#16199 * github.com:scylladb/scylladb: doc: remove the "preview" label from Rust driver doc: fix Rust Driver release information	2023-12-06 08:59:49 +02:00
Botond Dénes	d2a88cd8de	Merge 'Typos: fix typos in code' from Yaniv Kaul Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16289 * github.com:scylladb/scylladb: Update unified/build_unified.sh Update main.cc Update dist/common/scripts/scylla-housekeeping Typos: fix typos in code	2023-12-06 07:36:41 +02:00
Avi Kivity	12f160045b	Merge 'Get rid of fb_utilities' from Benny Halevy utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address. As part of the effort to get rid of all global state, this series gets rid of fb_utilities. This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address. Closes scylladb/scylladb#16250 * github.com:scylladb/scylladb: treewide: get rid of now unused fb_utilities tracing: use locator::topology rather than fb_utilities streaming: use locator::topology rather than fb_utilities raft: use locator::topology/messaging rather than fb_utilities storage_service: use locator::topology rather than fb_utilities storage_proxy: use locator::topology rather than fb_utilities service_level_controller: use locator::topology rather than fb_utilities misc_services: use locator::topology rather than fb_utilities migration_manager: use messaging rather than fb_utilities forward_service: use messaging rather than fb_utilities messaging_service: accept broadcast_addr in config rather than via fb_utilities messaging_service: move listen_address and port getters inline test: manual: modernize message test table: use gossiper rather than fb_utilities repair: use locator::topology rather than fb_utilities dht/range_streamer: use locator::topology rather than fb_utilities db/view: use locator::topology rather than fb_utilities database: use locator::topology rather than fb_utilities db/system_keyspace: use topology via db rather than fb_utilities db/system_keyspace: save_local_info: get broadcast addresses from caller db/hints/manager: use locator::topology rather than fb_utilities db/consistency_level: use locator::topology rather than fb_utilities api: use locator::topology rather than fb_utilities alternator: ttl: use locator::topology rather than fb_utilities gossiper: use locator::topology rather than fb_utilities gossiper: add get_this_endpoint_state_ptr test: lib: cql_test_env: pass broadcast_address in cql_test_config init: get_seeds_from_db_config: accept broadcast_address locator: replication strategies: use locator::topology rather than fb_utilities locator: topology: add helpers to retrieve this host_id and address snitch: pass broadcast_address in snitch_config snitch: add optional get_broadcast_address method locator: ec2_multi_region_snitch: keep local public address as member ec2_multi_region_snitch: reindent load_config ec2_multi_region_snitch: coroutinize load_config ec2_snitch: reindent load_config ec2_snitch: coroutinize load_config thrift: thrift_validation: use std::numeric_limits rather than fb_utilities	2023-12-05 19:40:14 +02:00
Eliran Sinvani	d1aaca893c	install-dependencies.sh: Complete the pip install logic install-dependencies.sh includes a list of pip packages that the build environment requires. This functionality was added in `729d0feef0`, however, the actual use of the list is missing and instead the `pip install` commands are hard coded into the logic. This change complete the transition to pip-packages list. It includes also modifying the `pip_packages` array to include a constrain (if needed) for every package. Fixes #16269 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes scylladb/scylladb#16282	2023-12-05 16:35:31 +02:00
Benny Halevy	0bcce35abd	treewide: get rid of now unused fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:22:49 +02:00
Benny Halevy	f8a957898b	tracing: use locator::topology rather than fb_utilities Get my_address via query_processor->proxy and pass it to all static make_ methods, instead of getting it from utils::fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:22:15 +02:00
Benny Halevy	6f7de427f0	streaming: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:12:11 +02:00
Anna Stuchlik	409e20e5ab	doc: enabling experimental Raft-managed topology This commit adds a short paragraph to the Raft page to explain how to enable consistent topology updates with Raft - an experimental feature in version 5.4. The paragraph should satisfy the requirements for version 5.4. The Raft page will be rewritten in the next release when consistent topology changes with Raft will be GA. Fixes https://github.com/scylladb/scylladb/issues/15080 Requires backport to branch-5.4. Closes scylladb/scylladb#16273	2023-12-05 14:49:17 +01:00
Pavel Emelyanov	b9abd504be	sstables/storage: Drop atomic deleter Now the deleter function is not in use and can be dropped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 16:47:52 +03:00
Pavel Emelyanov	604279f064	sstables/storage: Reimplement atomic deletion in sstables_manager Right now the atomic deletion is called on manager, but it gets the actual deletion function from storage and off-loads the deletion to it. This patch makes the manager fully responsible for the delition by implemeting the sequence of auto ctx = storage.prepare() for sst in sstables: sst.unlink() storage.complate(ctx) Storage implementations provide the prepare/complete methods. The filesystem storage does it via deletion log and the s3 storage is still not atomic :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 16:46:01 +03:00
Pavel Emelyanov	4ecf4c4a6a	sstables/storage: Add prepare/complete skaffold for atomic deletion The atomic deletion is going to look like auto ctx = storage.prepare() for sst in sstables: sst.unlink() storage.complate(ctx) and this patch prepares the class storage for that by extending it with prepare and complete methods. The opaque ctx object is also here Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 16:44:13 +03:00
Yaniv Kaul	fef565482c	Update unified/build_unified.sh fix sentence overall	2023-12-05 15:23:38 +02:00
Yaniv Kaul	8f97429b16	Update main.cc fix sentence overall, not just the typo	2023-12-05 15:21:48 +02:00
Yaniv Kaul	f2b810a16a	Update dist/common/scripts/scylla-housekeeping cobvert -> convert	2023-12-05 15:20:35 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Tomasz Grabiec	0e42fe4c3c	storage_service: Introduce concept of a topology_guard topology_guard is used to track distributed operations started by the topology change coordinator, e.g. streaming, to make sure that those operations have no side effects after topology change coordinator moved to the next migration stage, of a given tablet or of the whole ring. topology_guard can be sent over the wire in the form of frozen_topology_guard. It can be materialized again on the other side. While in transit, it doesn't block the coordinator barriers. But if the coordinator moved on, materialization of the guard will fail. So tracking safety is preserved. In this patch, the guard implementation is based on tracking work under global sessions, but the concept is flexible and other mechanisms can be used without changing user code.	2023-12-05 14:09:35 +01:00
Tomasz Grabiec	d3d83869ce	storage_service: Introduce session concept	2023-12-05 14:09:34 +01:00
Tomasz Grabiec	2d4cd9c574	tablets: Fix topology_metadata_guard holding on to the old erm Since abort callbacks are fired synchronously, we must change the table's erm before we do that so that the callbacks obtain the new erm. Otherwise, we will block barriers.	2023-12-05 14:09:34 +01:00
Tomasz Grabiec	6cd310fc1a	docs: Document the topology_guard mechanism	2023-12-05 14:09:34 +01:00
Botond Dénes	5fb0d667cb	tools/scylla-sstable: always read scylla.yaml Currently, scylla.yaml is read conditionally, if either the user provided `--scylla-yaml-file` command line parameter, or if deducing the data dir location from the sstable path failed. We want the scylla.yaml file to be always read, so that when working with encrypted file (enterprise), scylla-sstable can pick up the configuration for the encryption. This patch makes scylla-sstable always attempt to read the scylla-yaml file, whether the user provided a location for it or not. When not, the default location is used (also considering the `SCYLLA_CONF` and `SCYLLA_HOME` environment variables. Failing to find the scylla.yaml file is not considered an error. The rational is that the user will discover this if they attempt to do an operation that requires this anyway. There is a debug-level log about whether it was successfully read or not. Fixes: #16132 Closes scylladb/scylladb#16174	2023-12-05 15:06:29 +02:00
Kefu Chai	2ebdc40b0b	docs: add Deprecated to value_status_count despite that the "value_status_count" is not rendered/used yet, it'd be better to keep it in sync with the code. since `5fd30578d7` added "Deprecated" to `value_status` enum, let's update the sphinx extension accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16236	2023-12-05 14:52:13 +02:00
Avi Kivity	4498979b14	Merge 'When discarding table's sstables, delete them in one atomic batch' from Pavel Emelyanov The table::discard_sstables() removes sstables attached to a table. For that it tries to atomically delete _each_ suitable sstable, which is a bit heavyweight -- each atomic deletion operation results in a deletion log file written. This PR deletes all table's sstables in one atomic batch. While at it, the body of the discard_sstables is simplified not to allocate the "pruner" object. The latter is possible after the method had become coroutine Closes scylladb/scylladb#16202 * github.com:scylladb/scylladb: discard_sstables: Atomically delete all sstables discard_sstables: Indentation and formatting fix after previous patch discard_sstable: Open-code local prune() lambda discard_sstables: Do not allocate pruner	2023-12-05 14:17:06 +02:00
Kamil Braun	1763c65662	system_keyspace: make `get/set_scylla_local_param` public We'll use it outside `system_keyspace` code in later commit.	2023-12-05 13:03:29 +01:00
Kamil Braun	07984215a3	feature_service: add `GROUP0_SCHEMA_VERSIONING` feature This feature, when enabled, will modify how schema versions are calculated and stored. - In group 0 mode, schema versions are persisted by the group 0 command that performs the schema change, then reused by each node instead of being calculated as a digest (hash) by each node independently. - In RECOVERY mode or before Raft upgrade procedure finishes, when we perform a schema change, we revert to the old digest-based way, taking into account the possibility of having performed group0-mode schema changes (that used persistent versions). As we will see in future commits, this will be done by storing additional flags and tombstones in system tables. By "schema versions" we mean both the UUIDs returned from `schema::version()` and the "global" schema version (the one we gossip as `application_state::SCHEMA`). For now, in this commit, the feature is always disabled. Once all necessary code is setup in following commits, we will enable it together with Raft.	2023-12-05 13:03:28 +01:00
Benny Halevy	6c00c9a45d	raft: use locator::topology/messaging rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 13:26:46 +02:00
Benny Halevy	b3bede8141	storage_service: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 13:23:27 +02:00
Kamil Braun	52ae6b8738	Merge 'fix shutdown order between group0 and storage service' from Gleb Storage service uses group0 internally, but group0 is create long after storage service is initialized and passed to it using ss::set_group0() function. What it means is that during shutdown group0 is destroyed before ss::stop() is called and thus storage service is left with a dangling reference. Fix it by introducing a function that cancels all group0 operations and waits for background fibers to complete. For that we need separate abort source for group0 operation which the patch series also introduces. * 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev: storage_service: topology coordinator: ignore abort_requested_exception in background fibers storage_service: fix de-initialization order between storage service and group0_service	2023-12-05 11:20:52 +01:00
Kefu Chai	e88bd9c5bd	gms/inet_address: pass sstring param by std::move() less overhead this way. the caller of lookup() always passes a rvalue reference. and seastar::dns::get_host_by_name() actually moves away from the parameter, so let's pass by std::move() for slightly better performance, and to match the expectation of the underlying seastar API. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16280	2023-12-05 12:05:21 +03:00
Benny Halevy	a529097d96	storage_proxy: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:44:13 +02:00
Benny Halevy	0b310c471c	service_level_controller: use locator::topology rather than fb_utilities Expose cql3::query_processor in auth::service to get to the topology via storage_proxy.replica::database Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:17:47 +02:00
Pavel Emelyanov	9bbbe7a99f	discard_sstables: Atomically delete all sstables When collected sstables are deleted each is passed into sstables_manager.delete_atomically(). For on-disk sstables this creates a deletion log for each removed stable, which is quite an overkill. The atomic deletion callback already accepts vector of shared sstables, so it's simpler (and a bit faster) to remove them all in a batch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:14:23 +03:00
Pavel Emelyanov	96bc530a57	discard_sstables: Indentation and formatting fix after previous patch By "formatting" fix I mean -- remove the temporary on-stack references that were left for the ease of patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	6d135fea43	discard_sstable: Open-code local prune() lambda The lambda in question was the struct pruner method and was left there for the ease of patching. Now, when this lambda is only called once inside the function it is declared in, it can be open-coded into the place where it's called Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	68cb2e66fc	discard_sstables: Do not allocate pruner This allocation remained from the pre-coroutine times of the method. Now the contents of prumer -- refernce on table, vector and replay_position can reside on coroutine frame Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Benny Halevy	0e5754adc6	misc_services: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:01:36 +02:00
Benny Halevy	d49d10dbdb	migration_manager: use messaging rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:48:33 +02:00
Benny Halevy	860b2d38c6	forward_service: use messaging rather than fb_utilities Use _forwarder._messaging to get to the broadcast address rather than the global fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:48:12 +02:00
Benny Halevy	984a576405	messaging_service: accept broadcast_addr in config rather than via fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:46:25 +02:00
Benny Halevy	586f35bb55	messaging_service: move listen_address and port getters inline And make them const noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:44:41 +02:00
Benny Halevy	eabd4570da	test: manual: modernize message test Basically, make it work (great) again. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:44:26 +02:00
Benny Halevy	f9acc90926	table: use gossiper rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:43:47 +02:00
Benny Halevy	6826d87052	repair: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:09:06 +02:00
Benny Halevy	e1239e63bf	dht/range_streamer: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:01:31 +02:00
Benny Halevy	63b556123b	db/view: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:55:46 +02:00
Benny Halevy	f40bb7c583	database: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	64145388c9	db/system_keyspace: use topology via db rather than fb_utilities So not to rely on fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	4bb4d673c3	db/system_keyspace: save_local_info: get broadcast addresses from caller So not to rely on fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	6e79d647e6	db/hints/manager: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	4c20b84680	db/consistency_level: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	e5d3c6741f	api: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	03fe674314	alternator: ttl: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	f3e0358563	gossiper: use locator::topology rather than fb_utilities And add `get_endpoint_state_ptr` for this_node. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	25754f843b	gossiper: add get_this_endpoint_state_ptr Returns this node's endpoint_state_ptr. With this entry point, the caller doesn't need to get_broadcast_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	21ace44f03	test: lib: cql_test_env: pass broadcast_address in cql_test_config For getting rid of fb_utilities. In the future, that could be used to instantiate multiple scylla node instances. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	3c846d3801	init: get_seeds_from_db_config: accept broadcast_address Pass the broadcast_address from main to get_seeds_from_db_config rather than getting it from fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	4d461fc788	locator: replication strategies: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	86716b2048	locator: topology: add helpers to retrieve this host_id and address And respective `is_me()` predicates, to prepare for getting rid of fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	52412087b7	snitch: pass broadcast_address in snitch_config To untangle snitch from fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	94fc8e2a9a	snitch: add optional get_broadcast_address method and set broadcast_address / broadcast_rpc_address in main to remove this dependency of snitch on fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	1d0e71308b	locator: ec2_multi_region_snitch: keep local public address as member To be used in the next patch to retrieve the broadcast_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	90af71ffa7	ec2_multi_region_snitch: reindent load_config Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	fecb597ad6	ec2_multi_region_snitch: coroutinize load_config Now that ec2_snitch::load_config is a coroutine there's no need for a seastar thread here either. Refs #16241 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	cb7e096a59	ec2_snitch: reindent load_config Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	1c1a048d3f	ec2_snitch: coroutinize load_config Fixes #16241 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:48 +02:00
Benny Halevy	9e1dd78539	thrift: thrift_validation: use std::numeric_limits rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:48 +02:00
Kefu Chai	50332f796e	script/base36-uuid.py: interpret timestamp with Gregorian calendar UUID v1 uses an epoch derived frmo Gregorian calendar. but base36-uuid.py interprets the timestamp with the UNIX epoch time. that's why it prints a UUID like ```console $ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx date = 2411-02-16 16:05:52 decimicro_seconds = 0x7ad550 lsb = 0xafe141a195fe0d59 ``` even this UUID is generated on nov 30, 2023. so in this change, we shift the time with the timestamp of UNIX epoch derived from the Gregorian calendar's day 0. so, after this change, we have: ```console $ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx date = 2023-11-30 16:05:52 decimicro_seconds = 0x7ad550 lsb = 0xafe141a195fe0d59 ``` see https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.4 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16235	2023-12-05 07:39:34 +02:00
Anna Stuchlik	97244eb68e	doc: add metric upgrade info to the 5.4 upgrade This commit adds the information about metrics update to the 5.2-to-5.4 upgrade guide. Fixes https://github.com/scylladb/scylladb/issues/15966 Closes scylladb/scylladb#16161	2023-12-05 07:36:29 +02:00
Kefu Chai	3608d9be97	gms/inet_address: remove unused '#include' neither <iomanip> nor "utils/to_string.hh" is used in `gms/inet_address.cc`, so let's remove their "#include"s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16281	2023-12-05 08:30:03 +03:00
Kurashkin Nikita	1438e531f8	cql3: statement_restrictions: cartesian product size error message fix. This commit fixes: 1.The error message will be specific about what type of keys exceeds the limit (e.g clustering keys or partition keys). 2.Error message will be more general about what causes it, cartesian product or simple list. 3.Error message will advise to use --max-partition-key-restrictions-per-query or --max-clustering-key-restrictions-per-query configuration options to override current (100) limit. Fixes #15627 Closes scylladb/scylladb#16226	2023-12-05 07:27:03 +02:00
Kefu Chai	a03be17da7	test/boost/sstable_generation_test: s/LE/LT/ when appropriate in `7a1fbb38`, a new test is added to an existing test for comparing the UUIDs with different time stamps, but we should tighten the test a little bit to reflect the intention of the test: the timestamp of "2023-11-24 23:41:56" should be less than "2023-11-24 23:41:57". in this change, we replace LE with LT to correct it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16245	2023-12-05 08:25:04 +03:00
Anna Stuchlik	1e80bdb440	doc: fix rollback in the 4.6-to-5.0 upgrade guide This commit fixes the rollback procedure in the 4.6-to-5.0 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4, branch-5.2, and branch-5.1 Closes scylladb/scylladb#16155	2023-12-05 07:17:49 +02:00
Anna Stuchlik	52c2698978	doc: fix rollback for RHEL (install) in 5.4 This commit fixes the installation command in the Rollback section for RHEL/Centos in the 5.2-5.4 upgrade guide. It's a follow-up to https://github.com/scylladb/scylladb/pull/16114 where the command was not updated. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4. Closes scylladb/scylladb#16156	2023-12-05 07:17:14 +02:00
Anna Stuchlik	91cddb606f	doc: fix rollback in the 5.1-to-5.2 upgrade guide This commit fixes the rollback procedure in the 5.1-to-5.2 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Also, I've the section removed the rollback section for images, as it's not correct or relevant. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4 and branch-5.2. Closes scylladb/scylladb#16152	2023-12-05 07:16:44 +02:00
Anna Stuchlik	7ad0b92559	doc: fix rollback in the 5.0-to-5.1 upgrade guide This commit fixes the rollback procedure in the 5.0-to-5.1 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Also, I've the section removed the rollback section for images, as it's not correct or relevant. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4, branch-5.2, and branch-5.1 Closes scylladb/scylladb#16154	2023-12-05 07:15:41 +02:00
Patryk Jędrzejczak	c8ee7d4499	db: make schema commitlog feature mandatory Using consistent cluster management and not using schema commitlog ends with a bad configuration throw during bootstrap. Soon, we will make consistent cluster management mandatory. This forces us to also make schema commitlog mandatory, which we do in this patch. A booting node decides to use schema commitlog if at least one of the two statements below is true: - the node has `force_schema_commitlog=true` config, - the node knows that the cluster supports the `SCHEMA_COMMITLOG` cluster feature. The `SCHEMA_COMMITLOG` cluster feature has been added in version 5.1. This patch is supposed to be a part of version 6.0. We don't support a direct upgrade from 5.1 to 6.0 because it skips two versions - 5.2 and 5.4. So, in a supported upgrade we can assume that the version which we upgrade from has schema commitlog. This means that we don't need to check the `SCHEMA_COMMITLOG` feature during an upgrade. The reasoning above also applies to Scylla Enterprise. Version 2024.2 will be based on 6.0. Probably, we will only support an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even if we support an upgrade from 2023.x, this patch won't break anything because 2023.1 is based on 5.2, which has schema commitlog. Upgrades from 2022.x definitely won't be supported. When we populate a new cluster, we can use the `force_schema_commitlog=true` config to use schema commitlog unconditionally. Then, the cluster feature check is irrelevant. This check could fail because we initiate schema commitlog before we learn about the features. The `force_schema_commitlog=true` config is especially useful when we want to use consistent cluster management. Failing feature checks would lead to crashes during initial bootstraps. Moreover, there is no point in creating a new cluster with `consistent_cluster_management=true` and `force_schema_commitlog=false`. It would just cause some initial bootstraps to fail, and after successful restarts, the result would be the same as if we used `force_schema_commitlog=true` from the start. In conclusion, we can unconditionally use schema commitlog without any checks in 6.0 because we can always safely upgrade a cluster and start a new cluster. Apart from making schema commitlog mandatory, this patch adds two changes that are its consequences: - making the unneeded `force_schema_commitlog` config unused, - deprecating the `SCHEMA_COMMITLOG` feature, which is always assumed to be true. Closes scylladb/scylladb#16254	2023-12-04 21:02:16 +02:00
Calle Wilund	75a8be5b87	commitlog.hh: Fix numeric constant for file format version 3 to be actual '3' Fixes #16277 When the PR for 'tagged pages' was submitted for RFC, it was assumed that PR #12849 (compression) would be committed first. The latter introduced v3 format, and the format in #12849 (tagged pages) was assumed to have to be bumped to 4. This ended up not the case, and I missed that the code went in with file format tag numeric value being '4' (and constant named v3). While not detrimental, it is confusing, and should be changed asap (before anything depends on files with the tag applied). Closes scylladb/scylladb#16278	2023-12-04 21:01:44 +02:00
Calle Wilund	e94070db64	commitlog_test: Add test for commit log replay skip past EOF Refs #15269 Unit test to check that trying to skip past EOF in a borked segment will not crash the process. file_data_input_impl asserts iff caller tries this.	2023-12-04 20:50:42 +02:00
Takuya ASADA	6eb9344cb3	dist: introduce scylla-tune-sched.service to tune kernel scheduler On /usr/lib/sysctl.d/99-scylla-sched.conf, we have some sysctl settings to tune the scheduler for lower latency. This is mostly to prevent softirq threads processing tcp and reactor threads from injecting latency into each other. However, these parameters are moved to debugfs from linux-5.13+, so we lost scheduler tuneing on recent kernels. To support tuning recent kernel, let's add a new service which support to configure both sysctl and debugfs. The service named scylla-tune-sched.service The service will unconditionally enables when installed, on older kernel it will tune via sysctl, on recent kernel it will tune via debugfs. Fixes #16077 Closes scylladb/scylladb#16122	2023-12-04 19:29:46 +02:00
Kefu Chai	3ffd8737e4	gms/inet_address: format gms::inet_address via net::inet_address in `4ea6e06c`, we specialized fmt::formatter<gms::inet_address> using the formatter of bytes if the underlying address is an IPv6 address. this breaks the tests with JMX which expected the shortened form of the text representation of the IPv6 address. in this change, instead of reinventing the wheel, let's reuse the existing formatter of net::inet_address, which is able to handle both IPv4 and IPv6 addresses, also it follows https://datatracker.ietf.org/doc/html/rfc5952 by compressing the consecutive zeros. since this new formatter is a thin wrapper of seastar::net::inet_addresss, the corresponding unit test will be added to Seastar. Refs #16039 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16267	2023-12-04 19:24:00 +02:00
Kefu Chai	28906725df	repair: add formatter for row_level_diff_detect_algorithm before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for row_level_diff_detect_algorithm. but its operator<<() is preserved, as we are still using our homebrew the generic formatter for std::vector, and this formatter is still using operator<< for formatting the elements in the vector. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16248	2023-12-04 18:59:52 +02:00
Yaniv Kaul	21cce458d8	test: alternator: fix typo passs instead of pass in test_gsi.py Fix a typo. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16258	2023-12-04 18:58:31 +02:00
Avi Kivity	c1d0baf11a	Merge 'build: add an option to create building system with CMake' from Kefu Chai as part of the efforts to migrate to the CMake-based building system, this change enables us to `configure.py` to optionally create `build.ninja` with CMake. in this change, we add a new option named `--use-cmake` to `configure.py` so we can create `build.ninja`. please note, instead of using the "Ninja" generator used by Seastar's `configure.py` script, we use "Ninja Multi-Config" generator along with `CMAKE_CROSS_CONFIGS` setting in this project. so that we can generate a `build.ninja` which is capable of building the same artifacts with multiple configuration. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15916 * github.com:scylladb/scylladb: build: cmake: add compatibility target of dev-headers build: add an option to use CMake as the build build system	2023-12-04 18:51:24 +02:00
Kefu Chai	3a8a3100af	raft: add formatter for raft::logical_clock::time_point before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for logical_clock::time_point, as fmt does not provide formatter for this time_point, as it is not a part of the standard library * remove operator<<() for logical_clock::time_point, as its soly purpose is to generate the corresponding fmt::formatter when FMT_DEPRECATED_OSTREAM is defined. * remove operator<<() for logical_clock::duration, as fmt provides a default implementation for formatting std::chrono::nanoseconds already, which uses `int64_t` as its rep template parameter as well. * include "fmt/chrono.h" so that the source files including this header can have access the formatter without including it by themselves, this preserve the existing behavior which we have before removal of "operator<<()". Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16263	2023-12-04 18:32:03 +02:00
Nadav Har'El	4505a86f46	tablets, mv: fix base-view pairing to consider base replication map In the view update code, the function get_view_natural_endpoint() determines which view replica this base replica should send an update to. It currently gets the view table's replication map (i.e., the map from view tokens to lists of replicas holding the token), but assumes that this is also the base table's replication map. This assumption was true with vnodes, but is no longer true with tablets - the base table's replication map can be completely different from the view table's. By looking at the wrong mapping, get_view_natural_endpoint() can believe that this node isn't really a base-replica and drop the view update. Alternatively, it can think it is a base replica - but use the wrong base-view pairing and create base-view inconsistencies. This patch solves this bug - get_view_natural_endpoint() now gets two separate replication maps - the base's and the view's. The callers need to remember what the base table was (in some cases they didn't care at the point of the call), and pass it to the function call. This patch also includes a simple test that reproduces the bug, and confirms it is fixed: The test has a 6-node cluster using tablets and a base table with RF=1, and writes one row to it. Before this patch, the code usually gets confused, thinking the base replica isn't a replica and loses the view update. With this patch, the view update works. Fixes #16227. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16228	2023-12-04 16:38:54 +02:00
Avi Kivity	60af2f3cb2	Merge 'New commitlog file format using tagged pages' from Calle Wilund Prototype implementation of format suggested/requested by @avikivity: Divides segments into disk-write-alignment sized pages, each tagged with segment ID + CRC of data content. When read, we both verify sector integrity (CRC) to detect corruption, as well as matching ID read with expected one. If the latter mismatches we have a prematurely terminated segment (read truncation), which, depending on whether the CL is written in batch or periodic mode, as well as explicit sync, can mean data loss. Note: all-zero pages are treated as kosher, both to align with newly allocated segments, as well as fully terminated (zero-page) ones. Note: This is a preview/RFC - the rest of the file format is not modified. At least parts of entry CRC could probably be removed, but I have not done so yet (needs some thinking). Note: Some slight abstraction breaks in impl. and probably less than maximal efficiency. v2: * Removed entry CRC:s in file format. * Added docs on format v3 * Added one more test for recycling-truncation v3: * Fixed typos in size calc and docs * Changed sect metadata order * Explicit iter type Closes scylladb/scylladb#15494 * github.com:scylladb/scylladb: commitlog_test: Add test for replaying large-ish mutation commitlog_test: Add additional test for segmnent truncation docs: Add docs on commitlog format 3 commitlog: Remove entry CRC from file format commitlog: Implement new format using CRC:ed sectors commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges fragmented_temporary_buffer: Add const iterator access to underlying buffers commitlog_replayer: differentiate between truncated file and corrupt entries	2023-12-04 13:31:13 +01:00
Avi Kivity	8fa2e3ad2a	Merge 'Remove sstables::remove_by_toc_name()' from Pavel Emelyanov The helper in question complicates the logic of sstable_directory::process() by making garbage collection differently for sstables deleted "atomically" and deleted "one-by-one". Also, the code that deletes sstables one-by-one and uses remove_by_toc_name() renders excessive TOC file reading, because there's sstable object at hand and it had all_components() ready for use. Surprisingly, there was no test for the deletion-log functionality. This PR adds one. The test passes before the g.c. and regular unlink fix, and (of course) continues passing after it. Closes scylladb/scylladb#16240 * github.com:scylladb/scylladb: sstables: Drop remove_by_name() sstables/fs_storage: Wipe by recognized+unrecognized components sstable_directory: Enlight deletion log replay sstables: Split remove_by_toc_name() test: Add test case to validate deletion log work sstable_directory: Close dir on exception sstable_directory: Fix indentation after previous patch sstable_directory: Coroutinize delete_with_pending_deletion_log() test: Sstable on_delete() is not necessarily in a thread sstable_directory: Split delete_with_pending_deletion_log()	2023-12-03 17:29:34 +02:00
Wojciech Mitros	a8c9451fb2	commitlog: add max disk size api Currently, the max size of commitlog is obtained either from the config parameter commitlog_total_space_in_mb or, when the parameter is -1, from the total memory allocated for Scylla. To facilitate testing of the behavior of commitlog hard limit, expose the value of commitlog max_disk_size in a dedicated API. Closes scylladb/scylladb#16020	2023-12-03 17:16:58 +02:00
Kefu Chai	39b2ee9751	dist/redhat: avoid mixed use of spaces and tabs rpmlint complains about "mixed-use-of-spaces-and-tabs". and it does not good in the editor. so let's replace tab with spaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16246	2023-12-03 17:11:03 +02:00
Nadav Har'El	59ff27ea4a	Merge 'Typos: fix typos in comments' from Yaniv Kaul Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16257 * github.com:scylladb/scylladb: Update service/topology_state_machine.hh Update raft/tracker.hh Update db/view/view.cc Typos: fix typos in comments	2023-12-03 11:23:51 +02:00
Yaniv Kaul	030d421931	Update service/topology_state_machine.hh	2023-12-03 10:08:11 +02:00
Yaniv Kaul	7c4b742583	Update raft/tracker.hh	2023-12-03 10:07:55 +02:00
Yaniv Kaul	2b73793a39	Update db/view/view.cc	2023-12-03 10:07:45 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Kamil Braun	01e54f5b12	Merge 'test: delete topology_raft_disabled suite' from Patryk Jędrzejczak This PR is a necessary step to fix #15854 -- making consistent cluster management mandatory on master. Before making consistent cluster management mandatory, we have to get rid of all tests that depend on the `consistent_cluster_management=false` config. These are the tests in the `topology_raft_disabled` suite. There's the internal Raft upgrade procedure, which is the bulk of the upgrade logic. Then, there are two thin "layers" around it that invoke it underneath: recovery procedure and enable-raft-in-the-cluster procedure. We're getting rid of the second one by making Raft always enabled, so we naturally have to get rid of tests that depend on it. The idea is to replace every necessary enable-raft-in-the-cluster procedure in these tests with the recovery procedure. Then, we will still be testing the internal Raft upgrade procedure in the in-tree tests. The enable-raft-in-the-cluster procedure is already tested by QA tests, so we don't need to worry about these changes. Unfortunately, we cannot adapt `test_raft_upgrade_no_schema`. After making consistent cluster management mandatory on master, schema commitlog will also become mandatory because `consistent_cluster_management: True`, `force_schema_commit_log: False` is considered a bad configuration. These changes will make `test_raft_upgrade_no_schema` unimplementable in the Scylla repo. Therefore, we remove this test. If we want to keep it, we must rewrite it as an upgrade dtest. After making all tests in `topology_raft_disabled` use consistent cluster management, there is no point in keeping this suite. Therefore, we delete it and move all the tests to `topology_custom`. Closes scylladb/scylladb#16192 * github.com:scylladb/scylladb: test: delete topology_raft_disabled suite test: topology_raft_disabled: move tests to topology_custom suite test: topology_raft_disabled: move utils to topology suite test: topology_raft_disabled: use consistent cluster management test: topology_raft_disabled: add new util functions test: topology_raft_disabled: delete test_raft_upgrade_no_schema	2023-12-01 17:11:32 +01:00
Pavel Emelyanov	17fd558df8	sstables: Drop remove_by_name() It was used by deletion log replay and by storage wipe, now it's unused Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	4405a625f6	sstables/fs_storage: Wipe by recognized+unrecognized components Currently wiping fs-backed sstable happens via reading and parsing its TOC file back. Then the three-step process goes: - move TOC -> TOC.tmp - remove components (obtained from TOC.tmp) - remove TOC.tmp However, wiping sstable happens in one of two cases -- the sstable was loaded from the TOC file _or_ sstable had evaluated the needed components and generated TOC file. With that, the 2nd step can be made without reading the TOC file, just by looking at all components sitting on the sstable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	de931702ec	sstable_directory: Enlight deletion log replay Garbage collection of sstables is scattered between two strages -- g.c. per-se and the regular processing. The former stage collects deletion logs and for each log found goes ahead and deletes the full sstable with the standard sequence: - move TOC -> TOC.tmp - remove components - remove TOC.tmp The latter stage picks up partially unlinked sstables that didn't go via atomic deletion with the log. This comes as - collect all components - keep TOC's and TOC.tmp's in separate lists - attach other components to TOC/TOC.tmp by generation value - for all TOC.tmp's get all attached components and remove them - continue loading TOC's with attached components Said that, replaying deletion log can be as light as just the first step out of the above sequence -- just move TOC to TOC.tmp. After that the regular processing would pick the remaining components and clean them Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	5ff5946520	sstables: Split remove_by_toc_name() The helper consists of three phases: - move TOC -> TOC.tmp - remove components listed in TOC - remove TOC.tmp The first step is needed separately by the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	b10ca96e07	test: Add test case to validate deletion log work The test sequence is - create several sstables - create deletion log for a sub-set of them - partially unlink smaller sub-sub-set - make sstable directory do the processing with g.c. - check that the sstables loaded do NOT include the deleted ones The .throw_on_missing_toc bit set additionally validates that the directory doesn't contain garbage not attached to any other TOCs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	fcf080b63b	sstable_directory: Close dir on exception When committing the deletion log creation its containing directory is sync-ed via opened file. This place is not exception safe and directory can be left unclosed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	bb167dcca5	sstable_directory: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	28b1289d4b	sstable_directory: Coroutinize delete_with_pending_deletion_log() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	92f0aa04d0	test: Sstable on_delete() is not necessarily in a thread One of the test cases injects an observer into sstable->unlink() method via its _on_delete() callback. The test's callback assumes that it runs in an async context, but it's a happy coincidence, because deletion via the deletion log runs so. Next patch is changing it and the test case will no longer work. But since it's a test case it can just directly call a libc function for its needs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	ed043e5762	sstable_directory: Split delete_with_pending_deletion_log() The helper consists of three parts -- prepare the deletion log, unlink sstables and drop the deletion log. For testing the first part is needed as a separate step, so here's this split. It renders two nested async contexts, but it will change soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:37 +03:00
Nadav Har'El	bae6f3387f	CODEOWNERS: remove some entries The ".github/CODEOWNERS" is used by github to recommend reviewers for pull requests depending on the directories touched in the pull request. Github ignores entries on that file who are not maintainers. Since Jan is no longer a Scylla maintainer, I remove his entries in the list. Additionally, I am removing myself from some of the directories. For many years, it was an (unwritten) policy that experienced Scylla developers are expected to help in reviewing pieces of the code they are familiar with - even if they no longer work on that code today. But as ScyllaDB the company grew, this is no longer true; The policy is now that experienced developers are requested review only code in their own or their team's area of responsibility - experienced developers should help review designs of other parts, but not the actual code. For this reason I'm removing my name from various directories. I can still help review such code if asked specifically - but I will no longer be the "default" reviewer for such code. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16239	2023-11-30 20:29:05 +02:00
Tomasz Grabiec	c64ae7b733	scripts: Introduce tablet-mon.py Closes scylladb/scylladb#15512	2023-11-30 19:15:36 +02:00
Nadav Har'El	49860952f9	Merge '`LIST EFFECTIVE SERVICE LEVEL` statement' from Michał Jadwiszczak Add `LIST EFFECTIVE SERVICE LEVEL` statement to be able to display from which service level come which service level options. Example: There are 2 roles: role1 and role2. Role1 is assigned with sl1 (timeout = 2s, workload_type = interactive) and role2 is assigned with sl2 (timeout = 10s, workload_type = batch). Then, if we grant role1 to role2, the user with role2 will have 2s timeout (from sl1) and batch workload type (from sl2). ``` > LIST EFFECTIVE SERVICE LEVEL OF role2; service_level_option \| effective_service_level \| value ----------------------+-------------------------+------------- workload_type \| sl2 \| batch timeout \| sl1 \| 2s ``` Fixes: https://github.com/scylladb/scylladb/issues/15604 Closes scylladb/scylladb#14431 * github.com:scylladb/scylladb: cql-pytest: add `LIST EFFECTIVE SERVICE LEVEL OF` test docs: add `LIST EFFECTIVE SERVICE LEVEL` statement docs cql3:statements: add `LIST EFFECTIVE SERVICE LEVEL` statement service:qos: add option to include effective names to SLO	2023-11-30 18:12:52 +02:00
Gleb Natapov	3ddc1458ee	storage_service: topology coordinator: ignore abort_requested_exception in background fibers The exception may be thrown by "event" CV during shutdown.	2023-11-30 17:52:40 +02:00
Gleb Natapov	8ed8b151da	storage_service: fix de-initialization order between storage service and group0_service Storage service uses group0 internally, but group0 is create long after storage service is initialized and passed to it using ss::set_group0() function. But what it means is that during shutdown group0 is destroyed before ss::stop() is called and thus storage service is left with a dangling reference. Fix it by introducing a function that cancels all group0 operations and waits for background fibers to complete. For that we need separate abort source for group0 operation which the patch also introduces.	2023-11-30 17:52:38 +02:00
Patryk Jędrzejczak	77c4ee92e5	test: delete topology_raft_disabled suite After moving all tests out of topology_raft_disabled, we can safely remove this suite.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	ba990d90bb	test: topology_raft_disabled: move tests to topology_custom suite We move the remaining tests in topology_raft_disabled to topology_custom. We choose topology_custom because these tests cannot use consistent topology changes. We need to modify these tests a bit because we cannot pass RandomTables to a test case function if the initial cluster size equals 0. RandomTables.__init__ requires manager.cql to be present.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	659ac9c7f5	test: topology_raft_disabled: move utils to topology suite We move all used util functions from topology_raft_disabled to topology before we remove topology_raft_disabled. After this change, util.py in topology will be the single util file for all topology tests. Some util functions in topology_raft_disabled aren't used anymore. We don't move such functions and remove them instead.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	684b070b20	test: topology_raft_disabled: use consistent cluster management Soon, we will make consistent cluster management mandatory on master. Before this, we have to change all tests in the topology_raft_disabled suite so that they do not depend on the consistent_cluster_management=false config. Adapting test_raft_upgrade_majority_loss is simple. We only have to get rid of the initial upgrade. This initial upgrade didn't test anything. Every test in topology_raft_disabled had to do it at the beginning because of consistent_cluster_management=false. Adapting test_raft_upgrade_basic and test_raft_upgrade_stuck is more difficult. It requires changing the initial upgrade to clearing Raft data in RECOVERY mode on all servers and restarting them. Then, the servers will run the same upgrade procedure as before. After changing the tests, we also update their names appropriately. test_raft_upgrade_stuck becomes a bit slower, so we remove the comment about running time. Also, one TODO was fixed in the process of rewriting the test. This fix forced us to skip the test in the release mode since we cannot update the list of error injections through manager.server_update_config in this mode.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	1059fece19	test: topology_raft_disabled: add new util functions They are shorter and more readable than long CQL queries. We use them even more in the following commit.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	7e43ebf88e	test: topology_raft_disabled: delete test_raft_upgrade_no_schema After making consistent cluster management mandatory on master, schema commitlog will also become mandatory because consistent_cluster_management: True, force_schema_commit_log: False is considered a bad configuration. These changes will make test_raft_upgrade_no_schema unimplementable in the Scylla repo, so we remove it. If we want to keep this test, we must rewrite it as an upgrade dtest.	2023-11-30 15:50:21 +01:00
Kefu Chai	7a1fbb38f9	sstable: order uuid-based generation as timeuuid under most circumstances, we don't care the ordering of the sstable identifiers, as they are just identifiers. so, as long as they can be compared, we are good. but we have tests with expect that the sstables can be ordered by the time they are created. for instance, sstable_run_based_compaction_test has this expectaion. before this change, we compare two UUID-based generations by its (MSB, LSB) lexicographically. but UUID v1 put the lower bits of the timestamp at the higher bits of MSB, so the ordering of the "time" in timeuuid is not preserved when comparing the UUID-based generations. this breaks the test of sstable_run_based_compaction_test, which feeds the sstables to be compacted in a set, and the set is ordered with the generation of the sstables. after this change, we consider the UUID-based generation as a timeuuid when comparing them. Fixes #16215 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16238	2023-11-30 14:50:44 +02:00
Michał Jadwiszczak	e3515cfc1b	cql-pytest: add `LIST EFFECTIVE SERVICE LEVEL OF` test	2023-11-30 13:07:20 +01:00
Michał Jadwiszczak	e1d86f9afb	docs: add `LIST EFFECTIVE SERVICE LEVEL` statement docs	2023-11-30 13:07:20 +01:00
Michał Jadwiszczak	2438965b6a	cql3:statements: add `LIST EFFECTIVE SERVICE LEVEL` statement Add statement to print effective service level of a specified role.	2023-11-30 13:07:20 +01:00
Michał Jadwiszczak	1b08338fe7	service:qos: add option to include effective names to SLO Allow to include `slo_effective_names` in `service_level_options` to be able to determine from which service level the specific option value comes from.	2023-11-30 13:07:20 +01:00
Yaron Kaikov	7ce6962141	build_docker.sh: Upgrade package during creation and remove sshd service When scanning our latest docker image using `trivy` (command: `trivy image docker.io/scylladb/scylla-nightly:latest`), it shows we have OS packages which are out of date. Also removing `openssh-server` and `openssh-client` since we don't use it for our docker images Fixes: https://github.com/scylladb/scylladb/issues/16222 Closes scylladb/scylladb#16224	2023-11-30 14:00:15 +02:00
Botond Dénes	d6d9751dd8	tools/scylla-sstable: validate,validate-checksums: print JSON last Said commands print errors as they validate the sstables. Currently this intermingles with the regular JSON output of these commands, resulting in ugly and confusing output. This is not a problem for scripted use, as logs go to stderr while the JSON go to stdout, but it is a problem for human users. Solve this by outputting the JSON into a std::stringstream and printing it in one go at the very end. This means JSON is accumulated in a memory buffer, but these commands don't output a lot of JSON, so this shouldn't be a problem. Closes scylladb/scylladb#16216	2023-11-30 09:53:47 +03:00
Piotr Smaroń	5fd30578d7	config: introduce value_status::Deprecated Current mechanism to deprecate config options is implemented in a hacky way in `main.cpp` and doesn't account for existing `db::config/boost::po` API controlling lifetime of config options, hence it's being replaced in this PR by adding yet another `value_status` enumerator: `Deprecated`, so that deprecation of config options is controlled in one place in `config.cc`,i.e. when specifying config options. Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing With this change, if a `Deprecated` config option is specified as 1. a command line parameter, scylla will run and log: ``` WARN 2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated) ``` (Previously it was only a message printed to standard output, not a scylla log of warn level). 2. an option in `scylla.yaml`, scylla will run and log: ``` WARN 2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota ``` Fixes #15887 Incorporates dropped https://github.com/scylladb/scylladb/pull/15928 Closes scylladb/scylladb#16184	2023-11-30 08:52:57 +03:00
Avi Kivity	8e9d3af431	Merge 'Commitlog: complete prerequisites and enforce hard limit by default' from Eliran Sinvani This miniset, completes the prerequisites for enabling commitlog hard limit on by default. Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances. It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however, once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant. Tests: unit tests. LWT write test (#9331) A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations: 1. The default configuration 2. Hard limit on (without changing the flush threshold) 3. the changes in this PR applied. The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others. Closes scylladb/scylladb#10974 * github.com:scylladb/scylladb: commitlog: enforce commitlog size hard limit by default commitlog: set flush threshold to half of the limit size commitlog: unfold flush threshold assignment	2023-11-29 20:55:53 +02:00
Kamil Braun	8a14839a00	Merge 'handle more failures during topology operations' from Gleb This series adds handling for more failures during a topology operation (we already handle a failure during streaming). Here we add handling of tablet draining errors by aborting the operation and handling of errors after streaming where an operation cannot be aborted any longer. If the error happens when rollback is no longer possible we wait for ring delay and proceed to the next step. Each individual patch that adds the sleep has an explanation what the consequences of the patch are. * 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev: test: add test to check errro handling during tablet draining test: fix test_topology_streaming_failure test to not grep the whole file storage_service: add error injection into the tablet migration code storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure storage_service: topology coordinator: add rollback_to_normal node state storage_service: topology coordinator: put fence version into the raft state storage_service: topology coordinator: do fencing even if draining failed	2023-11-29 19:02:35 +01:00
Avi Kivity	cccd2e7fa7	Merge 'Generalize sstables TOC file reading' from Pavel Emelyanov TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place. To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream. Closes scylladb/scylladb#16175 * github.com:scylladb/scylladb: sstable: Generalize toc file read and parse s3/client: Don't GET object contents on out-of-bound reads s3/client: Cache stats on readable_file	2023-11-29 19:18:31 +02:00
Nadav Har'El	62f89d49e5	tablets, mv: fix on_internal_error on write to base table This situation before this patch is that when tablets are enabled for a keyspace, we can create a materialized view but later any write to the base table fails with an on_internal_error(), saying that: "Tried to obtain per-keyspace effective replication map of test but it's per-table." Indeed, with tablets, the replication is different for each table - it's not the same for the entire keyspace. So this patch changes the view update code to take the replication map from the specific base table, not the keyspace. This is good enough to get materialized-views reads and writes working in a simple single-node case, as the included test demonstrates (the test fails with on_internal_error() before this patch, and passes afterwards). But this fix is not perfect - the base-view pairing code really needs to consider not only the base table's replication map, but also the view table's replication map - as those can be different. We'll fix this remaining problem as a followup in a separate patch - it will require a substantially more elaborate test to reproduce the need for the different mapping and to verify that fix. Fixes #16209. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16211	2023-11-29 15:29:17 +01:00
Anna Stuchlik	ce6b15af34	doc: remove the "preview" label from Rust driver	2023-11-29 15:01:31 +01:00
Avi Kivity	cd732b1364	Update seastar submodule * seastar 830ce8673...55a821524 (34): > Revert "reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc" > epoll: Avoid spinning on aborted connections Fixes #12774 Fixes #7753 Fixes #13337 > Merge 'Sanitize test-only reactor facilities' from Pavel Emelyanov > test/unit: fix fmt version check > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc > build: add spaces before () and after commands > reactor: use zero-initialization to initialize io_uring_params > Merge 'build: do not return a non-false condition if the option is off ' from Kefu Chai > memory: do not use variable length array > build: use tri_state_option() to link against Sanitizers > build: do not define SEASTAR_TYPE_ERASE_MORE on all builds > Revert "shared_future: make available() immediate after set_value()" > test_runner: do not throw when seastar.app fails to start > Merge 'Address issue where Seastar faults in toeplitz hash when reassembling fragment' from John Hester > defer, closeable: do not use [[nodiscard(str)]] > Merge 'build: generate config-specific rules using generator expressions' from Kefu Chai > treewide: use _v and _t for better readability > build: use different names for .pc files for each build mode > perftune.py: skip discovering IRQs for iSCSI disks > io-tester: explicit use uint64_t for boost::irange(...) > gate: correct the typo in doxygen comment > shared_future: make available() immediate after set_value() > smp: drop unused templates > include fmt/ostream.h to make headers self-sufficient > Support ccache in ./configure.py > rpc_tester: Disable -Wuninitialized when including boost.accumulators > file: construct directory_entry with aggregated ctor > file: s/ino64_t/ino_t/, s/off64_t/off_t/ > sstring_test: include fmt/std.h only if fmtlib >= 10.0.0 > file: do not include coroutine headers if coroutine is disabled > fair_queue::unregister_priority_class:fix assertion > Merge 'Generalize `net::udp_channel` into `net::datagram_channel`' from Michał Sala > Merge 'Add file::list_directory() that co_yields entries' from Pavel Emelyanov > http/file_handler: remove unnecessary cast Closes scylladb/scylladb#16201	2023-11-29 14:34:30 +02:00
Kefu Chai	c40da20092	utils/pretty_printers: stop using undocumented fmt api format_parse_context::on_error() is an undocumented API in fmt v9 and in fmt v10, see - https://fmt.dev/9.1.0/api.html#_CPPv4I0EN3fmt16basic_format_argE - https://fmt.dev/10.0.0/api.html#_CPPv4I0EN3fmt26basic_format_parse_contextE despite that this API was once used in its document for fmt v10.0.0, see https://fmt.dev/10.0.0/api.html#formatting-user-defined-types. it's still, well, undocumented. so, to have better compatibility, let's use the documented API in place of undocumented one. please note, `throw_format_error()` was still not a public API before 10.1.0, so before that release we have to throw `fmt::format_error` explicitly. so we cannot use it yet during the transitional period. because the class of `fmt::format_error` is defined in `fmt/format.h`, we need to include this header for using it. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16212	2023-11-29 12:49:04 +02:00
Pavel Emelyanov	0da37d5fa6	sstable: Generalize toc file read and parse There are several places where TOC file is parsed into a vector of components -- sstable::read_toc(), remove_by_toc_name() and remove_by_registry_entry(). All three deserve some generalization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:09:52 +03:00
Pavel Emelyanov	c5d85bdf79	s3/client: Don't GET object contents on out-of-bound reads If S3 readable file is used inside file input stream, the latter may call its read methods with position that is above file size. In that case server replies with generic http error and the fact that the range was invalid is encoded into reply body's xml. That's not great to catch this via wrong reply status exception and xml parsing all the more so we can know that the read is out-of-bound in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:09:52 +03:00
Pavel Emelyanov	339182287f	s3/client: Cache stats on readable_file S3-based sstables components are immutable, so every time stat is called there's no need to ping server again. But the main intention of this patch is to provide stats for read calls in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:06:54 +03:00
Calle Wilund	3b70fde3cd	commitlog: Make named_files in delete_segments have updated size Fixes #16207 commitlog::delete_segments deletes (or recycles) segments replayed. The actual file size here is added to footprint so actual delete then can determine iff things should be recycled or removed. However, we build a pending delete list of named_files, and the files we added did not have size set. Bad. Actual deletion then treated files as zero-byte sized, i.e. footprint calculations borked. Simple fix is just filling in the size of the objects when addind. Added unit test for the problem. Closes scylladb/scylladb#16210	2023-11-29 09:58:47 +02:00
Yaron Kaikov	c3ee53f3be	test.py: enable xml validation Following https://github.com/scylladb/scylladb/issues/4774#issuecomment-1752089862 Adding back xml validation Closes: https://github.com/scylladb/scylla-pkg/issues/3441 Closes scylladb/scylladb#16198	2023-11-29 09:02:36 +02:00
Botond Dénes	3ed6925673	Merge 'Major compaction: flush commitlog by forcing new active segment and flushing all tables' from Benny Halevy Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Closes scylladb/scylladb#15820 * github.com:scylladb/scylladb: docs: nodetool: flush: enrich examples docs: nodetool: compact: fix example api: add /storage_service/compact api: add /storage_service/flush compaction_manager: flush_all_tables before major compaction database: add flush_all_tables api: compaction: add flush_memtables option test/nodetool: jmx: fix path to scripts/scylla-jmx scylla-nodetool, docs: improve optional params documentation	2023-11-29 08:48:40 +02:00
Kefu Chai	65994b1e83	build: cmake: add compatibility target of dev-headers our CI builds "dev-headers" as a gating check. but the target names generated by CMake's Ninja Multi-Config generator does not follow this naming convention. we could have headers:Dev, but still, it's different from what we are using, before completely switching to CMake, let's keep this backward compatibility by adding a target with the same name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-29 10:08:59 +08:00
Kefu Chai	2d284f4749	build: add an option to use CMake as the build build system as part of the efforts to migrate to the CMake-based building system, this change enables us to `configure.py` to optionally create `build.ninja` with CMake. in this change, we add a new option named `--use-cmake` to `configure.py` so we can create `build.ninja`. please note, instead of using the "Ninja" generator used by Seastar's `configure.py` script, we use "Ninja Multi-Config" generator along with `CMAKE_CROSS_CONFIGS` setting in this project. so that we can generate a `build.ninja` which is capable of building the same artifacts with multiple configuration. Fixes #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-29 10:08:59 +08:00
Nadav Har'El	88a5ddabce	tablets, mv: create tablets for a new materialized view Before this patch, trying to create a materialized view when tablets are enabled for a keyspace results in a failure: "Tablet map not found for table <uuid>", with uuid referring to the new view. When a table schema is created, the handler on_before_create_column_family() is called - and this function creates the tablet map for the new table. The bug was that we forgot to do the same when creating a materialized view - which also a bona-fide table. In this patch we call on_before_create_column_family() also when creating the materialized view. I decided not to create a new callback (e.g., on_before_create_view()) and rather call the existing on_before_create_column_family() callback - after all, a view is a column family too. This patch also includes a test for this issue, which fails to create the view before this patch, and passes with the patch. The test is in the test/topology_experimental_raft suite, which runs Scylla with the tablets experimental feature, and will also allow me to create tests that need multiple nodes. However, the first test added here only needs a single node to reproduce the bug and validate its fix. Fixes #16194. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16205	2023-11-28 21:54:32 +01:00
Kamil Braun	3582095b79	schema_tables: use smaller timestamp for base mutations included with view update When a view schema is changed, the schema change command also includes mutations for the corresponding base table; these mutations don't modify the base schema but are included in case if the receiver of view mutations somehow didn't receive base mutations yet (this may in theory happen outside Raft mode). There are situations where the schema change command contains both mutations that describe the current state of the base table -- included by a view update, as explained above -- and mutations that want to modify the base table. Such situation arises, for example, when we update a user-defined type which is referenced by both a view and its corresponding base table. This triggers a schema change of the view, which generates mutations to modify the view and includes mutations of the current base schema, and at the same time it triggers a schema change of the base, which generates mutations to modify the base. These two sets of mutations are conflicting with each other. One set wants to preserve the current state of the base table while the other wants to modify it. And the two sets of mutations are generated using the same timestamp, which means that conflict resolution between them is made on a per-mutation-cell basis, comparing the values in each cell and taking the "larger" one (meaning of "larger" depends on the type of each cell). Fortunately, this conflict is currently benign -- or at least there is no known situation where it causes problems. Unfortunately, it started causing problems when I attempted to implement group 0 schema versioning (PR scylladb/scylladb#15331), where instead of calculating table versions as hashes of schema mutations, we would send versions as part of schema change command. These versions would be stored inside the `system_schema.scylla_tables` table, `version` column, and sent as part of schema change mutations. And then the conflict showed. One set of mutations wanted to preserve the old value of `version` column while the other wanted to update it. It turned out that sometimes the old `version` prevailed, because the `version` column in `system_schema.scylla_tables` uses UUID-based comparison (not timeuuid-based comparison). This manifested as issue scylladb/scylladb#15530. To prevent this, the idea in this commit is simple: when generating mutations for the base table as part of corresponding view update, do not use the provided timestamp directly -- instead, decrement it by one. This way, if the schema change command contains mutations that want to modify the base table, these modifying mutations will win all conflicts based on the timestamp alone (they are using the same provided timestamp, but not decremented). One could argue that the choice of this timestamp is anyway arbitrary. The original purpose of including base mutations during view update was to ensure that a node which somehow missed the base mutations, gets them when applying the view. But in that case, the "most correct" solution should have been to use the original base mutations -- i.e. the ones that we have on disk -- instead of generating new mutations for the base with a refreshed timestamp. The base mutations that we have on disk have smaller timestamps already (since these mutations are from the past, when the base was last modified or created), so the conflict would also not happen in this case. But that solution would require doing a disk read, and we can avoid the read while still fixing the conflict by using an intermediate solution: regenerating the mutations but with `timestamp - 1`. Ref: scylladb/scylladb#15530 Closes scylladb/scylladb#16139	2023-11-28 21:51:18 +01:00
Benny Halevy	310ff20e1e	docs: nodetool: flush: enrich examples Provide 3 examples, like in the nodetool/compact page: global, per-keyspace, per-table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:48:22 +02:00
Benny Halevy	d32b90155a	docs: nodetool: compact: fix example It looks like `nodetool compact standard1` is meant to show how to compact a specified table, not a keyspace. Note that the previous example like is for a keyspace. So fix the table compaction example to: `nodetool compact keyspace1 standard1` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:45:20 +02:00
Benny Halevy	b12b142232	api: add /storage_service/compact For major compacting all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool compact` translates to a sequence of `/storage_service/keyspace_compaction` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1b576f358b	api: add /storage_service/flush For flushing all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool flush` translates to a sequence of `/storage_service/keyspace_flush` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	66ba983fe0	compaction_manager: flush_all_tables before major compaction Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	be763bea34	database: add flush_all_tables Flushes all tables after forcing force_new_active_segment of the commitlog to make sure all commitlog segments can get recycled. Otherwise, due to "false sharing", rarely-written tables might inhibit recycling of the commitlog segments they reference. After `f42eb4d1ce`, that won't allow compaction to purge some tombstones based on the min_gc_time. To be used in the next patch by major compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1fd85bd37b	api: compaction: add flush_memtables option When flushing is done externally, e.g. by running `nodetool flush` prior to `nodetool compact`, flush_memtables=false can be passed to skip flushing of tables right before they are major-compacted. This is useful to prevent creation of small sstables due to excessive memtable flushing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	7f860d612a	test/nodetool: jmx: fix path to scripts/scylla-jmx The current implementation makes no sense. Like `nodetool_path`, base the default `jmx_path` on the assumption that the test is run using, e.g. ``` (cd test/nodetool; pytest --nodetool=cassandra test_compact.py) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	9324363e55	scylla-nodetool, docs: improve optional params documentation Document the behavior if no keyspace is specified or no table(s) are specified for a given keyspace. Fixes scylladb/scylladb#16032 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Anna Stuchlik	bfe19c0ed2	doc: add experimental support for object storage This commit adds information on how to enable object storage for a keyspace. The "Keyspace storage options" section already existed in the doc, but it was not valid as the support was only added in version 5.4 The scope of this commit: - Update the "Keyspace storage options" section. - Add the information about object storage support to the Data Definition> CREATE KEYSPACE section * Marked as "Experimental". * Excluded from the Enterprise docs with the .. only:: opensource directive. This commit must be backported to branch-5.4, as support for object storage was added in version 5.4. Closes scylladb/scylladb#16081	2023-11-28 14:27:01 +02:00
Anna Stuchlik	37f20f2628	doc: fix Rust Driver release information This PR removes the incorrect information that the ScyllaDB Rust Driver is not GA. In addition, it replaces "Scylla" with "ScyllaDB". Fixes https://github.com/scylladb/scylladb/issues/16178	2023-11-28 10:32:08 +01:00
Botond Dénes	f46cdce9d3	Merge 'Make memtable flush tolerate misconfigured S3 storage' from Pavel Emelyanov Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal. Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well. fixes: #13745 Closes scylladb/scylladb#15635 * github.com:scylladb/scylladb: test: Add object_store test to validate config reloading works test: Add config update facility to test cluster test: Make S3_Server export config file as pathlib.Path config: Make object storage config updateable_value_source memtable: Extend list of checking codes sstables/storage/s3: Fix missing TOC status check s3/client: Map http exceptions into storage_io_error exceptions: Extend storage_io_error construction options	2023-11-28 09:33:37 +02:00
Botond Dénes	3ccf1e020b	Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16177 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-28 09:08:04 +02:00
Pavel Emelyanov	1efddc228d	sstable: Do not nest io-check wrappers into each other When sealing an sstable on local storage the storage driver performs several flushes on a file that is directory open via checked-file. Flush calls are wrapped with sstable_write_io_check, but that's excessive, the checked file will wrap flushes with io-checks on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16173	2023-11-27 15:53:02 +02:00
Kefu Chai	724a6e26f3	cql3: define format_as() for formatting cql3::cql3_type::raw before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. to define a formatter which can be used by raw class and its derived classes, we have to put the full template specialization before the call sites. also, please note, the forward declaration is not enough, as the compile-time formatter check of fmt requires the definition of formatter. since fmt v10 also enables us to use `format_as()` to format a certain type with the return value of `format_as()`. this fulfills our needs. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16125	2023-11-27 15:28:19 +02:00
Kefu Chai	0b69a1badc	transport: cast unaligned<T> to T for formatting it in fmt v10, it does not cast unaligned<T> to T when formatting it, instead it insists on finding a matched fmt::formatter<> specialization for it. that's why we have FTBFS with fmt v10 when printing these packed<T> variables with fmtlib v10. in this change, we just cast them to the underlying types before formatting them. because seastar::unaligned<T> does not provide a method for accessing the raw value, neither does it provide a type alias of the type of the underlying raw value, we have to cast to the type without deducing it from the printed value. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16167	2023-11-27 15:26:13 +02:00
Gleb Natapov	e68e998b15	test: add test to check errro handling during tablet draining The test checks that the topology operation is aborted if an error happens during tablet migration stage.	2023-11-27 15:06:52 +02:00
Gleb Natapov	b1c0b57acf	test: fix test_topology_streaming_failure test to not grep the whole file A cluster can be reused between tests, so lets grep only the part of the log that is relevant for the test itself.	2023-11-27 15:05:21 +02:00
Petr Gusev	dca28417b2	storage_service: drop unused method handle_state_replacing_update_pending_ranges	2023-11-27 12:37:26 +01:00
Tomasz Grabiec	ae5220478c	tablets: Release group0 guard when waiting for streaming to finish This bug manifested as delays in DDL statement execution, which had to wait until streaming is finished so that the topology change coordinator releases the guard. The reason is that topology change coordinator didn't release the group0 guard if there is no work to do with active migrations, and awaits the condition variable without leaving the scope. Fixes #16182 Closes scylladb/scylladb#16183	2023-11-27 12:24:27 +01:00
Gleb Natapov	c83ff5a0dd	storage_service: add error injection into the tablet migration code	2023-11-27 13:09:58 +02:00
Gleb Natapov	4ebdddc31b	storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage During remove or decommission as a first step tables are drained from the leaving node. Theoretically this step may fail. Rollback the topology operation if it happen. Since some tables may stay in migration state the topology needs to go to the tablet_migration state. Lets do it always since it should be save to do it even if there is no on going tablet migrations.	2023-11-27 13:09:58 +02:00
Nadav Har'El	8d040325ab	cql: fix SELECT toJson() or SELECT JSON of time column The implementation of "SELECT TOJSON(t)" or "SELECT JSON t" for a column of type "time" forgot to put the time string in quotes. The result was invalid JSON. This is patch is a one-liner fixing this bug. This patch also removes the "xfail" marker from one xfailing test for this issue which now starts to pass. We also add a second test for this issue - the existing test was for "SELECT TOJSON(t)", and the second test shows that "SELECT JSON t" had exactly the same bug - and both are fixed by the same patch. We also had a test translated from Cassandra which exposed this bug, but that test continues to fail because of other bugs, so we just need to update the xfail string. The patch also fixes one C++ test, test/boost/json_cql_query_test.cc, which enshrined the wrong behavior - JSON output that isn't even valid JSON - and had to be fixed. Unlike the Python tests, the C++ test can't be run against Cassandra, and doesn't even run a JSON parser on the output, which explains how it came to enshrine wrong output instead of helping to discover the bug. Fixes #7988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16121	2023-11-27 10:03:04 +02:00
Anna Stuchlik	24d5dbd66f	doc: replace the OSS-only link on the Raft page This commit replaces the link to the OSS-only page (the 5.2-to-5.4 upgrade guide not present in the Enterprise docs) on the Raft page. While providing the link to the specific upgrade guide is more user-friendly, it causes build failures of the Enterprise documentation. I've replaced it with the link to the general Upgrade section. The ".. only:: opensource" directive used to wrap the OSS-only content correctly excludes the content form the Enterprise docs - but it doesn't prevent build warnings. This commit must be backported to branch-5.4 to prevent errors in all versions. Closes scylladb/scylladb#16176	2023-11-27 08:52:58 +02:00
Kefu Chai	c937827308	mutation_query: add formatter for reconcilable_result::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for reconcilable_result::printer, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16186	2023-11-26 20:20:50 +02:00
Konstantin Osipov	f0aa325187	test: provide overview of the contents of test/ directory Fixes #16080 Closes scylladb/scylladb#16088	2023-11-26 15:51:07 +02:00
Marcin Maliszkiewicz	81be3e0935	test/alternator/run: port -h and --omit-scylla-output options from cql-pytest Closes scylladb/scylladb#16171	2023-11-26 13:52:01 +02:00
Botond Dénes	fe7c81ea30	Update ./tools/jmx and ./tools/java submodules * ./tools/jmx 05bb7b68...80ce5996 (4): > StorageService: Normalize endpoint inetaddress strings to java form Fixes #16039 > ColumnFamilyStore: only quote table names if necessary > APIBuilder: allow quoted scope names > ColumnFamilyStore: don't fail if there is a table with ":" in its name Fixes #16153 * ./tools/java 10480342...26f5f71c (1): > NodeProbe: allow addressing table name with colon in it Also needed for #16153 Closes scylladb/scylladb#16146	2023-11-26 13:35:38 +02:00
Kefu Chai	ba3dce3815	build: do escape "\" in regular string in Python, a raw string is created using 'r' or 'R' prefix. when creating the regex using Python string, sometimes, we have to use "\" to escape the parenthesis so the tools like "sed" can consider the parenthesis as a capture group. but "\" is also used to escape strings in Python, in order to put "\" as it is, we use "\" instead of escaping "\" with "\\" which is obscure. when generating rules, we use multiple-lines string and do not want to have an empty line at the beginning of the string so added "\" continuation mark. but we fail to escape some of the "\" in the string, and just put "\(", despite that Python accepts it after failing to find a matched escaped char for it, and interprets it as "\\(". this should still be considered a misuse of oversight. with python's warning enabled, one is able see its complaints. in this change, we escape the "\" properly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16179	2023-11-26 13:34:10 +02:00
Kefu Chai	3053d63c7f	main: notify systemd that the service is ready this change addresses a regression introduced by `f4626f6b8e`, which stopped notifying systemd with the status that scylla is READY. without the notification, systemd would wait in vain for the readiness of scylla. Refs `f4626f6b8e` Fixes #16159 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16166	2023-11-26 10:38:53 +02:00
Aleksandra Martyniuk	9c2c964b8e	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk	8639eae0ce	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-24 19:25:20 +01:00
Botond Dénes	a472700309	Merge 'Minor fixes and refactors' from Kamil Braun - remove some code that is obsolete in newer Scylla versions, - fix some minor bugs. These bugs appear to be benign, there are no known issues caused by them, but fixing them is a good idea nevertheless, - refactor some code for better maintainability. Parts of this PR were extracted from https://github.com/scylladb/scylladb/pull/15331 (which was merged but later reverted), parts of it are new. Closes scylladb/scylladb#16162 * github.com:scylladb/scylladb: test/pylib: log_browsing: fix type hint migration_manager: take `abort_source&` in get_schema_for_read/write migration_manager: inline merge_schema_in_background migration_manager: remove unused merge_schema_from overload migration_manager: assume `canonical_mutation` support migration_manager: add `std::move` to avoid a copy schema_tables: refactor `scylla_tables(schema_features)` schema_tables: pass `reload` flag when calling `merge_schema` cross-shard system_keyspace: fix outdated comment	2023-11-24 17:34:21 +02:00
Patryk Jędrzejczak	15d3ed4357	test: topology: update run_first lists `run_first` lists in `suite.yaml` files provide a simple way to shorten the tests' average running time by running the slowest tests at first. We update these lists, since they got outdated over time: - `test_topology_ip` was renamed to `test_replace` and changed suite, - `test_tablets` changed suite, - new slow tests were added: - `test_cluster_features`, - `test_raft_cluster_features`, - `test_raft_ignore_nodes`, - `test_read_repair`. Closes scylladb/scylladb#16104	2023-11-24 16:18:30 +01:00
Aleksandra Martyniuk	c74b3ec596	tasks: fail if a task was aborted run() method of task_manager::task::impl does not have to throw when a task is aborted with task manager api. Thus, a user will see that the task finished successfully which makes it inconsistent. Finish a task with a failure if it was aborted with task manager api.	2023-11-24 15:45:00 +01:00
Aleksandra Martyniuk	aa7bba2d8b	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-11-24 15:44:34 +01:00
Kefu Chai	ca31dab9d2	sstable: drop repaired_at related code before we support incremental repair, these is no point have the code path setting / getting it. and even worse, it incurs confusion. so, in this change, we * just set the field to 0, * drop the corresponding field in metadata_collector, as we never update it. * add a comment to explain why this variable is initialized to 0 Fixes #16098 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16169	2023-11-24 15:12:25 +02:00
Botond Dénes	697cf41b9b	Merge 'repair: Introduce small table optimization' from Asias He repair: Introduce small table optimization ) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. ) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. ) Performance: 1) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X 2) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 3) 3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3} , 10 ms network latency Before: ``` repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=2305, round_nr=6916, round_nr_fast_path_already_synced=6915, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015 seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2, 1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385}, {127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178}, {127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2, 0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793}, {127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7, 0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634}, {127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4, 2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634}, {127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9, 2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to repair is reduced from 986s (16 minutes) to 1.6s ) Summary So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes #16011 Refs #15159 Closes scylladb/scylladb#15974 github.com:scylladb/scylladb: repair: Introduce small table optimization repair: Convert put_row_diff_with_rpc_stream to use coroutine	2023-11-24 15:11:42 +02:00
Kamil Braun	1f56962591	Merge 'test: topology: test concurrent bootstrap' from Patryk Jędrzejczak We add a test for concurrent bootstrap in the raft-based topology. Additionally, we extend the testing framework with a new function - `ManagerClient.servers_add`. It allows adding multiple servers concurrently to a cluster. This PR is the first step to fix #15423. After merging it, if the new test doesn't fail for some time in CI, we can: - use `ManagerClient.servers_add` in other tests wherever possible, - start initial servers concurrently in all suites with `initial_size > 0`. Closes scylladb/scylladb#16102 * github.com:scylladb/scylladb: test: topology: add test_concurrent_bootstrap test: ManagerClient: introduce servers_add test: ManagerClient: introduce _create_server_add_data	2023-11-24 12:41:05 +01:00
Kefu Chai	f99223919a	compaction: add formatter for map<timestamp_type, vector<shared_sstable>> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for map<timestamp_type, vector<shared_sstable>>. since the operator<< for this type is only used in the .cc file, and the only use case of it is to provide the formatter for fmt, so the operator<< based formatter is remove in this change. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16163	2023-11-24 11:56:28 +02:00
Kamil Braun	5acfcd8ef5	Merge 'raft: send group0 RPCs only if the destination group0 server is seen as alive' from Piotr Dulikowski In topology on raft mode, the events "new node starts its group0 server" and "new node is added to group0 configuration" are not synchronized with each other. Therefore it might happen that the cluster starts sending commands to the new node before the node starts its server. This might lead to harmless, but ugly messages like: INFO 2023-09-27 15:42:42,611 [shard 0:stat] rpc - client 127.0.0.1:56352 msg_id 2: exception "Raft group b8542540-5d3b-11ee-99b8-1052801f2975 not found" in no_wait handler ignored In order to solve this, the failure detector verb is extended to report information about whether group0 is alive. The raft rpc layer will drop messages to nodes whose group0 is not seen as alive. Tested by adding a delay before group0 is started on the joining node, running all topology tests and grepping for the aforementioned log messages. Fixes: scylladb/scylladb#15853 Fixes: scylladb/scylladb#15167 Closes scylladb/scylladb#16071 * github.com:scylladb/scylladb: raft: rpc: introduce destination_not_alive_error raft: rpc: drop RPCs if the destination is not alive raft: pass raft::failure_detector to raft_rpc raft: transfer information about group0 liveness in direct_fd_ping raft: add server::is_alive	2023-11-24 10:34:05 +01:00
Patryk Jędrzejczak	a8d06aa9fd	test: topology: add test_concurrent_bootstrap We add a test for concurrent bootstrap support in the raft-based topology. The plan is to make this test temporary. In the future, we will: - use ManagerClient.servers_add in other tests wherever possible, - start initial servers concurrently in all suites with initial_size > 0. So, this test will not test anything unique. We could make the changes proposed above now instead of adding this small test. However, if we did that and it turned out that concurrent bootstrap is flaky in CI, we would make almost every CI run fail with many failures. We want to avoid such a situation. Running only this test for some time in CI will reduce the risk and make investigating any potential failures easier.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	cd7b282db6	test: ManagerClient: introduce servers_add We add a new function - servers_add - that allows adding multiple servers concurrently to a cluster. It makes use of a concurrent bootstrap now supported in the raft-based topology. servers_add doesn't have the replace_cfg parameter. The reason is that we don't support concurrent replace operations, at least for now. There is an implementation detail in ScyllaCluster.add_servers. We cannot simply do multiple calls to add_server concurrently. If we did that in an empty cluster, every node would take itself as the only seed and start a new cluster. To solve this, we introduce a new field - initial_seed. It is used to choose one of the servers as a seed for all servers added concurrently to an empty cluster. Note that the add_server calls in asyncio.gather in add_servers cannot race with each other when setting initial_seed because there is only one thread. In the future, we will also start all initial servers concurrently in ScyllaCluster.install_and_start. The changes in this commit were designed in a way that will make changing install_and_start easy.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	aca90e6640	test: ManagerClient: introduce _create_server_add_data We introduce this function to avoid code duplication. After the following commits, it will also be used in the new ManagerClient.servers_add function.	2023-11-24 09:39:01 +01:00
Botond Dénes	c47a63835e	Merge 'test/sstable_compaction_test: check every sstable replaced sstable ' from Kefu Chai before this change, in sstable_run_based_compaction_test, we check every 4 sstables, to verify that we close the sstable to be replaced in a batch of 4. since the integer-based generation identifier is monotonically incremental, we can assume that the identifiers of sstables are like 0, 1, 2, 3, .... so if the compaction consumes sstable in a batch of 4, the identifier of the first one in the batch should always be the multiple of 4. unfortunately, this test does not work if we use uuid-based identifier. but if we take a closer look at how we create the dataset, we can have following facts: 1. the `compaction_descriptor` returned by `sstable_run_based_compaction_strategy_for_tests` never set `owned_ranges` in the returned descriptor 2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no` is used, if `_owned_ranges_checker` is empty 3. `mutation_reader_merger` respects the `fwd_mr` passed to its ctor, so it closes current sstable immediately when the underlying mutation reader reaches the end of stream. in other words, we close every sstable once it is fully consumed in sstable_ompaction_test. and the reason why the existing test passes is that we just sample the sstables whose generation id is a multiple of 4. what happens when we perform compaction in this test is: 1. replace 5 with 33, closing 5 2. replace 6 with 34, closing 6 3. replace 7 with 35, closing 7 4. replace 8 with 36, closing 8 << let's check here.. good, go on! 5. replace 13 with 37, closing 13 ... 8. replace 16 with 40, closing 16 << let's check here.. also, good, go on! so, in this change, we just check all old sstables, to verify that we close each of them once it is fully consumed. Fixes https://github.com/scylladb/scylladb/issues/16073 Closes scylladb/scylladb#16074 * github.com:scylladb/scylladb: test/sstable_compaction_test: check every sstable replaced sstable test/sstable_compaction_test: s/old_sstables.front()/old_sstable/	2023-11-24 07:25:28 +02:00
Kamil Braun	35bb025f99	test/pylib: log_browsing: fix type hint	2023-11-23 17:23:47 +01:00
Kamil Braun	819f542ee6	migration_manager: take `abort_source&` in get_schema_for_read/write No callsite needed the `nullptr` case, so we can convert pointer to reference.	2023-11-23 17:23:47 +01:00
Kamil Braun	ddfe4f65a8	migration_manager: inline merge_schema_in_background There was only one use site of this template.	2023-11-23 17:23:47 +01:00
Kamil Braun	42f6c5c2db	migration_manager: remove unused merge_schema_from overload The `frozen_mutation` version is now dead code.	2023-11-23 17:23:47 +01:00
Kamil Braun	8f5c2c88b8	migration_manager: assume `canonical_mutation` support Support for `canonical_mutation`s was added way back in Scylla 3.2. A lot of code in `migration_manager` is still checking whether the old `frozen_mutations` are received or need to be sent. We no longer need this code, since we don't support version skips during upgrade (and certainly not upgrades like 3.2->5.4). Leave a sanity checks in place, but otherwise delete the `frozen_mutation` branches.	2023-11-23 17:23:47 +01:00
Kamil Braun	0479e5529a	migration_manager: add `std::move` to avoid a copy	2023-11-23 17:23:47 +01:00
Kamil Braun	269a189526	schema_tables: refactor `scylla_tables(schema_features)` The `scylla_tables` function gives a different schema definition for the `system_schema.scylla_tables` table, depending on whether certain schema features are enabled or not. The way it was implemented, we had to write `θ(2^n)` amount of code and comments to handle `n` features. Refactor it so that the amount of code we have to write to handle `n` features is `θ(n)`.	2023-11-23 17:23:47 +01:00
Raphael S. Carvalho	157a5c4b1b	treewide: Avoid using namespace sstables in header to avoid conflicts That's needed for compaction_group.hh to be included in headers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-23 17:36:57 +02:00
Kamil Braun	c3257bf546	Revert "test: cql_test_env: Interrupt all components on cql_test_env teardown" This reverts commit `93ee7b7df9`. It's causing assertion failures when shutting down `cql_test_env` in boost unit tests: scylladb/scylladb#16144	2023-11-23 15:32:13 +01:00
Gleb Natapov	7267376eac	storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state Handle the barrier failure by sleeping for a "ring delay" and continuing. The purpose of the barrier is to wait for all reads to old replica set to complete and fence the remaining requests. If the barrier fails we give the fence some time to propagate and continue with the topology change. Of fence did not propagate we may have stale reads, but this is not worse that we have with gossiper.	2023-11-23 15:30:10 +02:00
Gleb Natapov	7ea8fa459c	storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state Handle the barrier failure by sleeping for a "ring delay" and continuing. The purpose of the barrier is to wait for unfinished writes to decommissioned node complete. If barrier fails we give them some time to complete and then proceed with node decommission. The worse thing that may happen if some write will fail because the node will be shutdown.	2023-11-23 15:30:10 +02:00
Gleb Natapov	11b7ee32ec	storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes Not that is removed is dead, so no need to talk to it.	2023-11-23 15:30:10 +02:00
Gleb Natapov	4c76b8b59f	storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure Go through the rollback_to_normal state when the node needs to move to normal during the rollback and update fence in this state before moving the node to normal. This guaranties that the fence update will not be missed. Not that when a node moves to left state it already passes through left_token_ring which guaranties the same.	2023-11-23 15:29:36 +02:00
Gleb Natapov	95dd0e453d	storage_service: topology coordinator: add rollback_to_normal node state When a topology coordinator rolls back from unsuccessful topology operation it advances the fence (which is now in the raft state) after moving to normal state. We do not want this to fail (only majority of nodes is needed for it to not to), but currently it may fail in case the coordinator moves to another node after changing the rollback node's state to normal, but before updating the fence. To solve that the rollback operation needs to go through a new rollback_to_normal state that will do the fencing before moving to normal. This patch introduces that state, but does not use it yet.	2023-11-23 15:27:28 +02:00
Kamil Braun	5223d32fab	schema_tables: pass `reload` flag when calling `merge_schema` cross-shard In `0c86abab4d` `merge_schema` obtained a new flag, `reload`. Unfortunately, the flag was assigned a default value, which I think is almost always a bad idea, and indeed it was in this case. When `merge_schema` is called on shard different than 0, it recursively calls itself on shard 0. That recursive call forgot to pass the `reload` flag. Fix this.	2023-11-23 14:06:40 +01:00
Kamil Braun	de3607810d	system_keyspace: fix outdated comment	2023-11-23 14:06:27 +01:00
Piotr Dulikowski	c58ff554d8	raft: rpc: introduce destination_not_alive_error Add a new destination_not_alive_error, thrown from two-way RPCs in case when the RPC is not issued because the destination is not reported as alive by the failure detector. In snapshot transfer code, lower the verbosity of the message printed in case it fails on the new error. This is done to prevent flakiness in the CI - in case of slow runs, nodes might get spuriously marked as dead if they are busy, and a message with the "error" verbosity can cause some tests to fail.	2023-11-23 11:14:28 +01:00
Kamil Braun	03ecc8457c	Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive. Apart from adding this change, this PR adds a test case - `test_replacing_alive_node_fails` - that verifies it. A few testing framework adjustments were necessary to implement this test and to avoid flakiness in other tests that use the replace operation after the change. From now, we need to ensure that all nodes see the node being replaced as dead before starting the replace. Otherwise, the check added in this PR could reject the replace. Additionally, this PR changes the replace procedure in a way that if the replacing node reuses the IP of the node being replaced, other nodes can see it as alive only after the topology coordinator accepts its join request. The replacing node may become alive before the topology coordinator checks if the node being replaced is dead. If that happens and the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. Fixes #15863 Closes scylladb/scylladb#15926 * github.com:scylladb/scylladb: test: add test_replacing_alive_node_fails raft topology: reject replace if the node being replaced is not dead raft topology: add the gossiper ref to topology_coordinator test: test_cluster_features: stop gracefully before replace test: decrease failure_detector_timeout_in_ms in replace tests test: move test_replace to topology_custom test: server_add: wait until the node being replaced is dead test: server_add: add support for expected errors raft topology: join: delay advertising replacing node if it reuses IP raft topology: join: fix a condition in validate_joining_node	2023-11-23 10:31:59 +01:00
Kefu Chai	55103f4a6b	hints: move formatter of db::hints::sync_point to test the operator<<() based formatter is only used in its test, so let's move it to where it is used. we can always bring it back later if it is required in other places. but better off implementing it as a fmt::formatter<> then. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16142	2023-11-23 11:22:31 +02:00
Kefu Chai	a9c1a435ec	result_message: add formatter for result_message::rows before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for `cql_transport::messages::result_message::rows` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16143	2023-11-23 11:12:55 +02:00
Kefu Chai	6749d963ed	config: define formatter for db::seed_provider_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for db::seed_provider_type. please note, we are still formatting vector<db::seed_provider_type> with the helper provided by seastar/core/sstring.hh, which uses operator<<() to print the elements in the vector being printed. so we have to keep the operator<< formatter before disabling the generic formatter for vector<T>. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16138	2023-11-23 11:04:35 +02:00
Kefu Chai	ef76c4566b	gossiper: do not use {:d} fmt specifier when formating generation_number generation_number's type is `generation_type`, which in turn is a `utils::tagged_integer<struct generation_type_tag, int32_t>`, which formats using either fmtlib which uses ostream_formatter backed by operator<< . but `ostream_formatter` does not provide the specifier support. so {:d} does apply to this type, when compiling with fmtlib v10, it rejects the format specifier (the error is attached at the end of the commit message). so in this change, we just drop the format specifier. as fmtlib prints `int32_t` as a decimal integer, so even if {:d} applied, it does not change the behavior. ``` /home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: error: call to consteval function 'fmt::basic_format_string<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &>::basic_format_string<char[48], 0>' is not a constant expression 1798 \| auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen); \| ^ /usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression 2322 \| if (!in(arg_type, set)) throw_format_error("invalid format specifier"); \| ^ /usr/include/fmt/core.h:2395:14: note: in call to 'parse_presentation_type.operator()(1, 510)' 2395 \| return parse_presentation_type(pres::dec, integral_set); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2706:9: note: in call to 'parse_format_specs<char>(&"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47], formatter<mapped_type, char_type>().formatter::specs_, checker(s).context_, 13)' 2706 \| detail::parse_format_specs(ctx.begin(), ctx.end(), specs_, ctx, type); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2561:10: note: in call to 'formatter<mapped_type, char_type>().parse<fmt::detail::compile_parse_context<char>>(checker(s).context_)' 2561 \| return formatter<mapped_type, char_type>().parse(ctx); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2647:39: note: in call to 'parse_format_specs<utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, fmt::detail::compile_parse_context<char>>(checker(s).context_)' 2647 \| return id >= 0 && id < num_args ? parse_funcs_[id](context_) : begin; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2485:15: note: in call to 'handler.on_format_specs(0, &"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47])' 2485 \| begin = handler.on_format_specs(adapter.arg_id, begin + 1, end); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2541:13: note: in call to 'parse_replacement_field<char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>> &>(&"Remote generation {:d} != local generation {:d}"[19], &"Remote generation {:d} != local generation {:d}"[47], checker(s))' 2541 \| begin = parse_replacement_field(p, end, handler); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2769:7: note: in call to 'parse_format_string<true, char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>>>({&"Remote generation {:d} != local generation {:d}"[0], 47}, checker(s))' 2769 \| detail::parse_format_string<true>(str_, checker(s)); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: note: in call to 'basic_format_string<char[48], 0>("Remote generation {:d} != local generation {:d}")' 1798 \| auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16126	2023-11-23 11:02:44 +02:00
Tzach Livyatan	225f0ff5aa	Remove i3i from EC2 recommended EC2 instance types list There is no reason to prefer i3i over i4i. Closes scylladb/scylladb#16141	2023-11-23 10:09:34 +02:00
Kefu Chai	0e3f6186cb	build: disable enum-constexpr-conversion Clang-18 starts to complain when a constexp value is casted to a enum and the value is out of the range of the enum values. in this case, boost intentially cast the out-of-range values to the type to be casted. so silence this warning at this moment. since `lexical_cast.hpp` is included in multiple places in the source tree, this warning is disabled globally. the warning look like: ``` In file included from /home/kefu/dev/scylladb/types/types.cc:9: In file included from /usr/include/boost/lexical_cast.hpp:32: In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43: In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36: In file included from /usr/include/boost/numeric/conversion/cast.hpp:33: In file included from /usr/include/boost/numeric/conversion/converter.hpp:13: In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13: In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18: In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19: In file included from /usr/include/boost/mpl/integral_c.hpp:32: /usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'udt_buil tin_mixture_enum' [-Wenum-constexpr-conversion] 73 \| typedef AUX_WRAPPER_INST( BOOST_MPL_AUX_STATIC_CAST(AUX_WRAPPER_VALUE_TYPE, (value - 1)) ) prior; \| ^ /usr/include/boost/mpl/aux_/static_cast.hpp:24:47: note: expanded from macro 'BOOST_MPL_AUX_STATIC_CAST' 24 \| # define BOOST_MPL_AUX_STATIC_CAST(T, expr) static_cast<T>(expr) \| ^ In file included from /home/kefu/dev/scylladb/types/types.cc:9: In file included from /usr/include/boost/lexical_cast.hpp:32: In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43: In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36: In file included from /usr/include/boost/numeric/conversion/cast.hpp:33: In file included from /usr/include/boost/numeric/conversion/converter.hpp:13: In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13: In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18: In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19: In file included from /usr/include/boost/mpl/integral_c.hpp:32: /usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'int_float_mixture_enum' [-Wenum-constexpr-conversion] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16082	2023-11-23 10:08:56 +02:00
Kefu Chai	d28598763d	build: s/-Wignore-qualifiers/-Wignored-qualifiers/ this was a typo introduced by `781b7de5`. which intended to add -Wignored-qualifiers to the compiling options, but it ended up adding -Wignore-qualifiers. in this change, the typo is corrected. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16124	2023-11-23 09:47:35 +02:00
Pavel Emelyanov	2f7f4ebb74	raft_state_machine: Check system.topology presense before tying to find it The write_mutations_to_database() decides if it needs to flush the database by checking if the mutations came to system.topology table and performing some more checks if they did. Overall this looks like auto topo_schema = db.find_schema(system.topology) if (target_schema != topo_schema) return false; // extra checks go here However, the system.topology table exists only if the feature named CONSISTENT_TOPOLOGY_CHANGES is enabled via commandline. If it's not, the call to db.find_schema(system.topology) throws and the whole attempt to write mutations throws too stopping raft state machine. Since the intention is to check if the target schema is the topology table, the presense of this table should come first. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16089	2023-11-23 09:35:43 +02:00
Takuya ASADA	c9d77699e1	scylla_setup: stop listing virtual devices on the NIC prompt Currently, the NIC prompt on scylla_setupshows up virtual devices such as VLAN devices and bridge devices, but perftune.py does not support them. To prevent causing error while running scylla_setup, we should stop listing these devices from the NIC prompt. closes #6757 Closes scylladb/scylladb#15958	2023-11-23 10:27:09 +03:00
Piotr Dulikowski	ab42932ba4	raft: rpc: drop RPCs if the destination is not alive If the failure detector sees the destination as dead, there is no use to send the RPC so drop it silently. This only affects two-way RPCs and "request" one-way RPCs. The one-way RPCs used as responses to other one-way RPCs are not affected.	2023-11-23 00:34:22 +01:00
Piotr Dulikowski	3e32ee2d36	raft: pass raft::failure_detector to raft_rpc In following commits, raft_rpc will drop outgoing messages if the destination is not seen as alive by the failure detector.	2023-11-23 00:34:22 +01:00
Piotr Dulikowski	a8ee4d543a	raft: transfer information about group0 liveness in direct_fd_ping Add a new variant of the reply to the direct_fd_ping which specifies whether the local group0 is alive or not, and start actively using it. There is no need to introduce a cluster feature. Due to how our serialization framework works, nodes which do not recognize the new variant will treat it as the existing std::monostate. The std::monostate means "the node and group0 is alive"; nodes before the changes in this commit would send a std::monostate anyway, so this is completely transparent for the old nodes.	2023-11-23 00:34:22 +01:00
Piotr Dulikowski	a1ebfcf006	raft: add server::is_alive Add a method which reports whether given raft server is running. In following commits, the information about whether the local raft group 0 is running or not will be included in the response to the failure detector ping, and the is_alive method will be used there.	2023-11-23 00:34:22 +01:00
Avi Kivity	00d82c0d54	Update tools/java submodule * tools/java 8485bef333...1048034277 (1): > resolver: download sigar artifact only for Linux / AMD64	2023-11-22 18:02:04 +02:00
Kefu Chai	cfcd34ba64	cql3: test_assignment: define formatter for assignment_testable add fmt formatter for `assignment_testable`. this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `assignment_testabe` without the help of `operator<<`. since we are still printing the shared_ptr<assignment_testable> using operator<<(.., const assignment_testable&), we cannot drop this operator yet. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16127	2023-11-22 17:44:07 +02:00
Tomasz Grabiec	b06a0078fb	Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed. The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload. Mechanism for sending custom_payload added. Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable. I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly. Automatic tests added. Closes scylladb/scylladb#15410 * github.com:scylladb/scylladb: docs: add documentation about sending tablet info to protocol extensions Add tests for sending tablet info cql3: send tablet if wrong node/shard is used during modification statement cql3: send tablet if wrong node/shard is used during select statement locator: add function to check locality locator: add function to check if host is local transport: add function to add tablet info to the result_message transport: add support for setting custom payload	2023-11-22 17:44:07 +02:00
Botond Dénes	0ae1335daa	Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk" This reverts commit `11cafd2fc8`, reversing changes made to `2bae14f743`. Reverting because this series causes frequent CI failures, and the proposed quickfix causes other failures of its own. Fixes: #16113	2023-11-22 17:44:07 +02:00
Kefu Chai	48340380dd	scylla-sstable: print "validate" result in JSON instead of printing the result of the "validate" subcommand in a free-style plain text, let's print it using JSON. for two reasons: 1. it is simpler to consume the output with other tools and tests. 2. more consistent with other commands. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16105	2023-11-22 17:44:07 +02:00
Botond Dénes	8c5f5b7722	service/migration_manager: only reload schema when enabling disabled features Instead of unconditionally reloading schema when enabling any schema feature, only create a listener, if the feature was disabled in the first place. So that we don't trigger reloading of the schema on each schema feature, on node restarts. In this case, the node will start with all these features enabled already. This prevents unnecessary work on restarts. Fixes: #16112 Closes scylladb/scylladb#16118	2023-11-22 17:44:07 +02:00
Kefu Chai	ca1828c718	scylla-sstable: print "validate-checksum" result in JSON instead of printing the result of the "validate-checksum" subcommand with the logging message, let's print it using JSON. for three reasons: 1. it is simpler to consume the output with other tools and tests. 2. more consistent with other commands. 3. the logging system is used for audit the behavior and for debugging purposes, not for building a user-facing command line interface. 4. the behavior should match with the corresponding document. and in docs/operating-scylla/admin-tools/scylla-sstable.sst, we claim that `validate-checksums` subcommand prints a dict of ``` $ROOT := { "$sstable_path": Bool, ... } ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16106	2023-11-22 17:44:07 +02:00
Kefu Chai	43fd63e28c	clocks-impl: format time_point using fmt instead of relying on the operator<<() of an opaque type, use fmtlib to print a timepoint for better support of new fmtlib which dropped the default-generated formatter for types with operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16116	2023-11-22 17:44:07 +02:00
Nadav Har'El	242a4b23c0	Merge 'tests: Skip unnecessary sleeps in cql_test_env teardown' from Tomasz Grabiec This PR contains two patches which get rid of unnecessary sleeps on cql_test_env teardown greatly reducing run time of tests. Reduces run time of `build/dev/test/boost/schema_change_test` from 90s to 6s. Closes scylladb/scylladb#16111 * github.com:scylladb/scylladb: test: cql_test_env: Interrupt all components on cql_test_env teardown tests: cql_test_env: Skip gossip shutdown sleep	2023-11-22 17:44:07 +02:00
Anna Stuchlik	3751acce42	doc: fix rollback in the 5.2-to-5.4 upgrade guide This commit fixes the rollback procedure in the 5.2-to-5.4 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Also, I've removed the optional step to enable consistent schema management from the list of steps - the appropriate section has already been removed, but it remained in the procedure description, which was misleading. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4 Closes scylladb/scylladb#16114	2023-11-22 17:44:07 +02:00
Takuya ASADA	b97df92d76	scylla_setup: stop aborting on old kernel warning when non-interactive mode On non-interactive mode setup, RHEL/CentOS7 old kernel check causes "Setup aborted", this is not what we want. We should keep warning but proceed setup, so default value of the kernel check should be True, since it will automatically applied on non-interactive mode. Fixes #16045 Closes scylladb/scylladb#16100	2023-11-22 17:44:07 +02:00
Botond Dénes	b1a76ebb93	Merge 'Sanitize storage service init/deinit sequences' from Pavel Emelyanov Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place. refs: #2795 refs: #2737 Closes scylladb/scylladb#16103 * github.com:scylladb/scylladb: storage_service: Drop (un)init_messaging_service_part() pair storage_service: Init/Deinit RPC handlers in constructor/stop storage_service: Dont capture container() on RPC handler storage_service: Use storage_service::_sys_dist_ks in some places storage_service: Add explicit dependency on system dist. keyspace storage_service: Rurn query processor pointer into reference storage_service: Add explicity query_processor dependency main: Start storage service later	2023-11-22 17:44:07 +02:00
sylwiaszunejko	ac51c417ea	docs: add documentation about sending tablet info to protocol extensions	2023-11-22 09:23:43 +01:00
sylwiaszunejko	207d673ad6	Add tests for sending tablet info	2023-11-22 09:23:43 +01:00
sylwiaszunejko	cea4c40685	cql3: send tablet if wrong node/shard is used during modification statement	2023-11-22 09:23:43 +01:00
sylwiaszunejko	54f22927a3	cql3: send tablet if wrong node/shard is used during select statement	2023-11-22 09:23:43 +01:00
sylwiaszunejko	954d51389c	locator: add function to check locality	2023-11-22 09:23:43 +01:00
Eliran Sinvani	bfa839ce92	commitlog: enforce commitlog size hard limit by default Since the commitlog size hard limit is a failsafe mechanism, we don't expect to ever hit it. If we do hit the limit, it means that we have an exceptional condition in the system. Hence, the impact of enforcing the commitlog hard limit is irrelevant. Here we enforce the limit by default. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-22 08:48:28 +02:00
Eliran Sinvani	63d62a7db2	commitlog: set flush threshold to half of the limit size Once we enable commitlog hard limit by default, we would like to have some room in case flushing memtables takes some time to catch up. This threshold is half the limit. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-22 08:48:28 +02:00
Eliran Sinvani	d2a8651bce	commitlog: unfold flush threshold assignment This commit is only a cosmetic change. It is meant to make the flush threshold assignment more readable and comprehensible so future changes are easier to review. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-22 08:48:28 +02:00
sylwiaszunejko	a0c8531875	locator: add function to check if host is local	2023-11-21 15:15:20 +01:00
sylwiaszunejko	93420353f4	transport: add function to add tablet info to the result_message	2023-11-21 15:15:20 +01:00
sylwiaszunejko	75b3dbf7ea	transport: add support for setting custom payload A custom payload can now be added to response_message. If it is set, it will be sent to client and the custom_payload flag will be set. write_string_bytes_map method is added to response class and a missing custom_payload flag is added to cql_frame_flags.	2023-11-21 15:09:36 +01:00
Pavel Emelyanov	74329e5aee	test: Add object_store test to validate config reloading works The test case is - start scylla with broken object storage endpoint config - create and populate s3-backed keyspace - try flushing it (API call would hang, so do it in the background) - wait for a few seconds, then fix the config - wait for the flush to finish and stop scylla - start scylla again and check that the keyspace is properly populated Nice side effect of this test is that once flush fails (due to broken config) it tries to remove the not-yet-sealed sstables and (!) fails again, for the same reason. So during the restart there happen to be several sstables in "creating" state with no stored objects, so this additionally tests one more g.c. corner case Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	26f8202651	test: Add config update facility to test cluster The Cluster wrapper used by object_store test already has the ability to access cluster via CQL and via API. Add the sugar to make the cluster re-read its scylla.yaml and other configs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	4a531e4129	test: Make S3_Server export config file as pathlib.Path The pylib minio server does that already. A test case added by the next patch would need to have both cases as path, not as string Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	210b01a5ce	config: Make object storage config updateable_value_source Now its plain updateable_value, but without the ..._source object the updateable_value is just a no-op value holder. In order for the observers to operate there must be the value source, updating it would update the attached updateable values _and_ notify the observers. In order for the config to be the u.v._source, config entries should be comparable to each other, thus the <=> operator for it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	9eb96a03f0	memtable: Extend list of checking codes When flushing an sstable there can be errors that are not fatal and shouldn't cause the whole scylla to die. Currently only ENOSPC and EDQUOT are considered as such, but there's one more possibility -- access denied errors. Those can happen, for example, if datadir is chmod/chown-ed by mistake or intentionally while scylla is running (doing it pre-start time won't trigger the issue as distributed loader checks permissions of datadir on boot). Another option to step on "access denied" error is to flush memtable on S3 storage with broken configuration. Anyway, seeing the access denied error is also a good reason not to crash, but print a warning in logs and retry in a hope that the node administrator fixed things. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	a34dae8c37	sstables/storage/s3: Fix missing TOC status check When TOC file is missing while garbage collecting the S3 server would resolve with storage_io_error(ENOENT) nowadays Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	855626f7de	s3/client: Map http exceptions into storage_io_error When http request resolves with excpetion it makes sense to translate the network exception into storage exceptio to make upper layers think that it was some sort of IO error, not SUDDENLY and http one. The translation is, for now, pretty simple: - 404 and 3xx -> ENOENT - 403(forbidden) and 401(unauthorized) -> EACCESS - anything else -> EIO Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Patryk Jędrzejczak	566176bcd1	test: add test_replacing_alive_node_fails We add a test for the Raft-based topology's new feature - rejecting the replace operation if the node being replaced is considered alive by the failure detector. This test is not so fast, and it does not test any critical paths so we run it only in dev mode.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	bf7a67224c	raft topology: reject replace if the node being replaced is not dead The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	94ffdb4792	raft topology: add the gossiper ref to topology_coordinator It is used in the following commit.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	8605cdd9cd	test: test_cluster_features: stop gracefully before replace In on of the previous commits, we have made ManagerClient.server_add wait until all running nodes see the node being replaced as dead. Unfortunately, the waiting time is around 20 s if we stop the node being replaced ungracefully. We change the stop procedure to graceful to not slow down the test.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	206a446a02	test: decrease failure_detector_timeout_in_ms in replace tests In one of the previous commits, we have made ManagerClient.server_add wait until all running nodes see the node being replaced as dead. Unfortunately, the waiting time can be around 20 s if we stop the node being replaced ungracefully. 20 s is the default value of the failure detector timeout. We don't want to slow down the replace operations this much for no good reason. We could use server_stop_gracefully instead of server_stop everywhere, but we should have at least a few replace tests with server_stop. For now, test_replace and test_raft_ignore_nodes will be these tests. To keep them reasonably fast, we decrease the failure_detector_timeout_in_ms value on all initial servers. We also skip test_replace in debug mode to avoid flakiness due to low failure_detector_timeout_in_ms (test_raft_ignore_nodes is already skipped).	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	7062ff145e	test: move test_replace to topology_custom In the following commit, we make all servers in test_replace use failure-detector-timeout-in-ms = 2000. Therefore, we need test_replace to be in a suite with initial_size equal to 0.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	9775b1c12d	test: server_add: wait until the node being replaced is dead In the following commits, we make the topology coordinator reject join requests if the node being replaced is considered alive by the gossiper. Before making this change, we need to adapt the testing framework so that we don't have flaky replace operations that fail because the node being replaced hasn't been marked as dead yet. We achieve this by waiting until all other running nodes see the node being replaced as dead in all replace operations.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	18ed89f760	test: server_add: add support for expected errors After this change, if we try to add a server and it fails with an expected error, the add_server function will not throw. Also, the server will be correctly installed and stopped. Two issues are motivating this feature. The first one is that if we want to add a server while expecting an error, we have to do it in two steps: - call server_add with the start parameter set to False, - call server_start with the expected_error parameter. It is quite inconvenient. The second one is that we want to be able to test the replace operation when it is considered incorrect, for example when we try to replace an alive node. To do this, we would have to remove some assertions from ScyllaCluster.add_server. However, we should not remove them because they give us clear information when we write an incorrect test. After adding the expected_error parameter, we can ignore these assertions only when we expect an error. In this way, we enable testing failing replace operations without sacrificing the testing framework's protection.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	ee45a1c430	raft topology: join: delay advertising replacing node if it reuses IP After this change, other nodes can see the replacing node as alive only after the topology coordinator accepts its join request. In the following commits, we make the topology coordinator reject join requests if the node being replaced is considered alive by the gossiper. However, the replacing node may become alive before the topology coordinator does the validation. If the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. The gossiper-based topology also delays the replacing node from advertising itself if it reuses the IP. To achieve the same effect in raft-based topology, we only need to move the definition of replacing_a_node_with_same_ip. However, there is a code that puts bootstrap tokens of the node being replaced into the gossiper state, and it depends on replacing_a_node_with_same_ip and replacing_a_node_with_diff_ip being always false in the raft-based topology mode. We prevent it from breaking by changing the condition.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	c0e4b8e9c0	raft topology: join: fix a condition in validate_joining_node It was incorrect. node.rs->state evaluated to node_state::none for both join and replace.	2023-11-21 12:39:13 +01:00
Tomasz Grabiec	93ee7b7df9	test: cql_test_env: Interrupt all components on cql_test_env teardown This should interrupt all sleeps in component teardown. Before this patch, there was a 1s sleep on gossiper shutdown, which I don't know where it comes from. After the patch there is no such sleep.	2023-11-21 12:22:32 +01:00
Tomasz Grabiec	7f3a74efab	tests: cql_test_env: Skip gossip shutdown sleep Removes unnecessary 2s sleep on each cql test env teardown.	2023-11-21 12:22:24 +01:00
Pavel Emelyanov	0e9428ab4a	exceptions: Extend storage_io_error construction options To make it possible to construct it with plain errno value and a string Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 13:37:52 +03:00
Calle Wilund	33fba28265	commitlog_test: Add test for replaying large-ish mutation (i.e. cross several normal-sized buffers).	2023-11-21 08:50:57 +00:00
Calle Wilund	0d41769daa	commitlog_test: Add additional test for segmnent truncation Emulate replay of a non-sealed segment, verifying we don't get data beyond termination point, as well as the correct exception.	2023-11-21 08:50:57 +00:00
Calle Wilund	57a4645c81	docs: Add docs on commitlog format 3	2023-11-21 08:50:57 +00:00
Calle Wilund	6b66daabfc	commitlog: Remove entry CRC from file format Since CRC is already handled by disk blocks, we can remove some of the entry CRC:ing, both simplifying code and making at least that part of both write and read faster.	2023-11-21 08:50:57 +00:00
Calle Wilund	e29bf6f9e8	commitlog: Implement new format using CRC:ed sectors Breaks the file into individually tagged + crc:ed pages. Each page (sized as disk write alignment) gets a trailing 12-byte metadata, including CRC of the first page-12 bytes, and the ID of the segment being written. When reading, each page read is CRC:ed and checked to be part of the expected segment by comparing ID:s. If crc is broken, we have broken data. If crc is ok, but ID does not match, we have a prematurely terminated segment (truncated), which, depending on whether we use batch mode or not, implied data loss.	2023-11-21 08:50:54 +00:00
Calle Wilund	18e79d730e	commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges With somewhat less overhead than creating 100+ temporary_buffer proxies	2023-11-21 08:42:33 +00:00
Calle Wilund	560364d278	fragmented_temporary_buffer: Add const iterator access to underlying buffers Breaks abstraction a bit, but some (me) might need something like it...	2023-11-21 08:42:33 +00:00
Calle Wilund	862f4f2ed3	commitlog_replayer: differentiate between truncated file and corrupt entries Refs #11845 When replaying, differentiate between the two cases for failure we have: - A broken actual entry - i.e. entry header/data does not hold up to crc scrutiny - Truncated file - i.e. a chunk header is broken or unreadable. This can be due to either "corruption" (i.e. borked write, post-corruption, hw whatever), or simply an unterminated segment. The difference is that the former is recoverable, the latter is not. We now signal and report the two separately. The end result for a user is not much different, in either case they imply data loss and the need for repair. But there is some value in differentiating which of the two we encountered. Modifies and adds test cases.	2023-11-21 08:42:33 +00:00
Botond Dénes	65e42e4166	Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, will not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired. This series fixes the bug and adds a minimal reproducer test. Fixes #10598 Closes scylladb/scylladb#16003 * github.com:scylladb/scylladb: mutation_query_test: test that range tombstones are sent in reverse queries mutation_query: properly send range tombstones in reverse queries	2023-11-21 09:19:14 +02:00
Kefu Chai	691f7f6edb	util: do not use variable length array vla (variable length array) is an extension in GCC and Clang. and it is not part of the C++ standard. so let's avoid using it if possible, for better standard compliant. it's also more consistent with other places where we calculate the size of an array of T in the same source file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16084	2023-11-20 23:02:41 +02:00
Nadav Har'El	0fd10690d4	Merge 'When creating S3-backed keyspace, check the endpoint instantly' from Pavel Emelyanov Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that. fixes: #15074 Closes scylladb/scylladb#16038 * github.com:scylladb/scylladb: test: Add validation of misconfigured storage creation sstables: Throw early if endpoint for keyspace is not configured replica: Move storage options validation to sstables manager test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store sstables: Add has_endpoint_client() helper to manager	2023-11-20 21:12:48 +02:00
Kefu Chai	9a3c7cd768	build: cmake: drop Seastar_OptimizationLevel_* in this change, * all `Seastar_OptimizationLevel_` are dropped. mode.Sanitize.cmake: s/CMAKE_CXX_FLAGS_COVERAGE/CMAKE_CXX_FLAGS_SANITIZE/ * mode.Dev.cmake: s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/ Seastar_OptimizationLevel_* variables have nothing to do with Seastar, and they introduce unnecessary indirection. the function of `update_cxx_flags()` already requires an option name for this parameter, so there is no need to have a name for it. the cached entry of `Seastar_OptimizationLevel_DEBUG` is also dropped, if we really need to have knobs which can be configured by user, we should define them in a more formal way. at this moment, this is not necessary. so drop it along with this variable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16059	2023-11-20 19:26:54 +02:00
Botond Dénes	6e9850067b	Merge 'Make test-only write_memtable_to_sstable() overloads shorter' from Pavel Emelyanov There are three of them, one is used by core, another by tests and the third one passes arguments between those two. And the ..._for_tests() helper in test utils. This PR leaves only one for tests out of three. Closes scylladb/scylladb#16068 * github.com:scylladb/scylladb: tests: Shorten the write_memtable_to_sstable_for_test() replica: Squash two write_memtable_to_sstable() replica: Coroutinize one of write_memtable_to_sstable() overloads	2023-11-20 16:05:06 +02:00
Pavel Emelyanov	9b16c298e9	test: Add validation of misconfigured storage creation In an attempt to create a non-local keyspace with unknown endpoint, there should pop up the configuration exception. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 15:25:58 +03:00
Pavel Emelyanov	2bf1e2a294	sstables: Throw early if endpoint for keyspace is not configured When a keyspace is created it initiaizes the storage for it and initialization of S3 storage is the good place to check if the endpoint for the storage is configured at all. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 15:25:58 +03:00
Pavel Emelyanov	f2a99ad30a	replica: Move storage options validation to sstables manager Currently the cql statement .validate() callback is responsible for checking if the non-local storage options are allowed with the respective feature. Next patch will need to extend this check to also validate the details of the provided storage options, but doing it at cql level doesn't seem correct -- it's "too far" from query processor down to sstables manager. Good news is that there's a lower-level validation of the new keyspace, namely the database::validate_new_keyspace() call. Move the storage options validation into sstables manager, while at it, reimplement it as a visitor to facilitate further extentions and plug the new validation to the aforementioned database::validate_new_keyspace(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 15:24:59 +03:00
Botond Dénes	f53961248d	gms,service: add a feature to protect the usage of allow_mutation_read_page_without_live_row allow_mutation_read_page_without_live_row is a new option in the partition_slice::option option set. In a mixed clusters, old nodes possibly don't know this new option, so its usage must be protected by a cluster feature. This patch does just that. Fixes: #15795 Closes scylladb/scylladb#15890	2023-11-20 13:03:55 +01:00
Botond Dénes	935065fd8d	Update tools/java submodule * tools/java b776096d...8485bef3 (2): > dist: Require jre-11-headless in from rpm > dist: remove duplicated java-headless from "Requires"	2023-11-20 13:55:55 +02:00
Pavel Emelyanov	b31b51ae90	test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store We're going to ban creation of a keyspace with S3 type in case the requested endpoint is not configured. The problem is that this test case of cql-pytest needs such keyspace to be created and in order to provide the object storage configuration we'd need to touch the generic scylla cluster management which is an overill for generic cql-pytest case. Simpler solution is to make object_store test suite perform all the S3-related checks, including the way DESCRIBE for S3-backed ks works. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:31:08 +03:00
Pavel Emelyanov	2c31cd7817	sstables: Add has_endpoint_client() helper to manager It's the get_endpoint_client() peer that only checks the client presense. To be used by next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:31:08 +03:00
Pavel Emelyanov	8ae751a3ff	tests: Shorten the write_memtable_to_sstable_for_test() The wrapper just calls the test-only core write_memtable_to_sstable() overload, tests can do it on their own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:27:57 +03:00
Pavel Emelyanov	1d7d2dff50	replica: Squash two write_memtable_to_sstable() There are three of them and one acts purely as arguments passer between other two. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:27:57 +03:00
Pavel Emelyanov	e9826858a9	replica: Coroutinize one of write_memtable_to_sstable() overloads Simpler to read and patch further this way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:27:57 +03:00
Pavel Emelyanov	f4626f6b8e	storage_service: Drop (un)init_messaging_service_part() pair It's no longer needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:59:08 +03:00
Pavel Emelyanov	c42c13e658	storage_service: Init/Deinit RPC handlers in constructor/stop All the services that need to register RPC handlers do it in service constructor or .start() method. Unregistration happens in .stop(). Storage service explicitly (de)initializes its RPC handlers in dedicated calls, but there's no point in that. The handlers' accessibility is determined by messaging service start_lister/shutdown, handlers themselves can be registered any time before it and unregistered any time after it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:57:07 +03:00
Pavel Emelyanov	40cb9dd66f	storage_service: Dont capture container() on RPC handler The handlers are about to be initialized from inside storage_service constructor. At that time container() is not yet available and its invalid to capture it on handlers' lambda. Fortunately, there's only one handler that does it, other handlers capture 'this' and call container() explicitly. This patch fixes the remaining one to do the same. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:55:56 +03:00
Pavel Emelyanov	cc76f03f63	storage_service: Use storage_service::_sys_dist_ks in some places The main goal here is to drop sys.dist.ks argument from the init_messaging_service call to make future patching simpler. While doing it it turned out that the argument was needed to be passed all the way down to the mark_existing_views_as_built(), so this patch also dropes this argument from this whole call-trace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:53:55 +03:00
Pavel Emelyanov	4df5af931a	storage_service: Add explicit dependency on system dist. keyspace This effectively reverts `bc051387c5` (storage_service: Remove sys_dist_ks from storage_service dependencies) since now storage service needs the sys. disk. ks not only cluster join time. Next patch will make more use of it as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:52:42 +03:00
Pavel Emelyanov	a7f23930cb	storage_service: Rurn query processor pointer into reference It's non-nullptr all the time after previous patch and can be a reference instead Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:52:04 +03:00
Pavel Emelyanov	e59544674a	storage_service: Add explicity query_processor dependency It's now set via a dedicated call that happens after query processor is started. Now query processor is started before storage service and the latter can get the q.p. local reference via constructor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:51:09 +03:00
Pavel Emelyanov	6ee8e7a031	main: Start storage service later The storage service is top-level service which depends on many other services. Recently (see `d42685d0cb` storage_service: Load tablet metadata on boot and from group0 changes) it also got implicit dependency on query processor, but it still starts too early for explicit reference on q.p. This patch moves storage service start to later times. This is possible because storage service is not explicitly needed by any other component start/init in between its old and new start places. Also, cql_test_ent starts storage service "that late" too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:48:30 +03:00
Nadav Har'El	5752dc875b	Merge 'Materialize_views: don't construct `global_schema_ptr` from views schemas that lacks base information' from Eliran Sinvani This miniset addresses two potential conversions to `global_schema_ptr` of incomplete materialized views schemas. One of them was completely unnecessary and also is a "chicken and an egg" problem where on the sync schema procedure itself a view schema was converted to `global_schema_ptr` solely for the purposes of logging. This can create a "hickup" in the materialized views updates if they are comming from a node with a different mv schema. The reason why sometimes a synced schema can have no base info is because of deactivision and reactivision of the schema inside the `schema_registry` which doesn't restore the base information due to lack of context. When a schema is synced the problem becomes easy since we can just use the latest base information from the database. Fixes #14011 Closes scylladb/scylladb#14861 * github.com:scylladb/scylladb: migration manager: fix incomplete mv schemas returned from get_schema_for_write migration_manager: do not globalize potentially incomplete schema	2023-11-20 11:54:01 +02:00
Pavel Emelyanov	3471f30b58	view_update_generator: Unplug from database later Patch `967ebacaa4` (view_update_generator: Move abort kicking to do_abort()) moved unplugging v.u.g from database from .stop() to .do_abort(). The latter call happens very early on stop -- once scylla receives SIGINT. However, database may still need v.u.g. plugged to flush views. This patch moves unplug to later, namely to .stop() method of v.u.g. which happens after database is drained and should no longer continue view updates. fixes: #16001 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16091	2023-11-20 11:47:55 +02:00
Botond Dénes	fd11eeeaa3	Merge 'dist/redhat: drop unnecessary variables and tags' from Kefu Chai this is a cleanup in `scylla.spec`. Closes scylladb/scylladb#16097 * github.com:scylladb/scylladb: dist/redhat: group sub-package preambles together dist/redhat: drop unused `defines` variable dist/redhat: remove tags for subpackage which are same as main preamble	2023-11-20 11:46:56 +02:00
Asias He	c605220bb3	repair: Introduce small table optimization ) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. ) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. ) Performance: 1) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X 2) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 3) 3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3} , 10 ms network latency Before: ``` repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=2305, round_nr=6916, round_nr_fast_path_already_synced=6915, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015 seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2, 1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385}, {127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178}, {127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2, 0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793}, {127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7, 0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634}, {127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4, 2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634}, {127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9, 2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to repair is reduced from 986s (16 minutes) to 1.6s *) Summary So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes #16011 Refs #15159	2023-11-20 15:11:16 +08:00
Kefu Chai	71f352896d	dist/redhat: group sub-package preambles together group sections like `%build` and `%install` together, to improve the readability of the spec recipe. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-20 12:19:33 +08:00
Kefu Chai	3f108629b9	dist/redhat: drop unused `defines` variable this variable was introduced in `6d7d0231`. back then, we were still building the binaries in .spec, but we've switched to the relocatable package now, so there is no need to use keep these compilation related flags anymore. in this change, the `defines` variable is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-20 12:19:33 +08:00
Kefu Chai	d69b4838ea	dist/redhat: remove tags for subpackage which are same as main preamble this is a cleanup. if a subpackage is licensed under a different license from the one specified in the main preamble, we need to use a distinct License tag on a per-subpackage basis. but if it is licensed with the identical license, it is not necessary. since all three subpackages of "*-{server, conf, kernel-conf}" are licensed under AGPLv3, there is no need to repeat the "License:" tag in their own preamble section. the same applies to the "URL" tag. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-20 12:19:33 +08:00
Eliran Sinvani	63631257db	migration manager: fix incomplete mv schemas returned from get_schema_for_write Sometimes a view registry can get deactivated inside the schema registry, this happens due to dactivating and reactivating the registry entry which doesn't rebuild the base table information in the view. This error is later caught when trying to convert the schema into a `global_schema_ptr`, however, the real bug here is that not all schemas returned from `get_schema_for_write` are suitable for write because the mv schemas can be incomplete. This commit changes the aforementioned function in order to fix the bug. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-20 06:07:20 +02:00
Piotr Grabowski	321459ec51	install-dependencies.sh: update node_exporter to 1.7.0 Update node_exporter to 1.7.0. The previous version (1.6.1) was flagged by security scanners (such as Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that problem. [Botond: regenerate frozen toolchain] Fixes #16085 Closes scylladb/scylladb#16086 Closes scylladb/scylladb#16090	2023-11-19 18:15:44 +02:00
Calle Wilund	6ffb482bf3	Commitlog replayer: Range-check skip call Fixes #15269 If segment being replayed is corrupted/truncated we can attempt skipping completely bogues byte amounts, which can cause assert (i.e. crash) in file_data_source_impl. This is not a crash-level error, so ensure we range check the distance in the reader. v2: Add to corrupt_size if trying to skip more than available. The amount added is "wrong", but at least will ensure we log the fact that things are broken Closes scylladb/scylladb#15270	2023-11-19 17:44:55 +02:00
Gleb Natapov	6edbf4b663	storage_service: topology coordinator: put fence version into the raft state Currently when the coordinator decides to move the fence it issues an RPC to each node and each node locally advances fence version. This is fine if there are no failures or failures are handled by retrying fencing, but if we want to allow topology changes to progress even in the presence of barrier failures it is easier to store the fence version in the raft state. The nodes that missed fence rpc may easily catch up to the latest fence version by simply executing a raft barrier.	2023-11-19 15:28:08 +02:00
Eliran Sinvani	562403b82f	migration_manager: do not globalize potentially incomplete schema There was a case where maybe sync function of a materialized view could fail to sync if the view version was old. This is because adding the base information to the view is only relevant until the record is synced. This triggers an internal error in the `global_schem_ptr` constructor. The conversion to global pointer in that case was solely for logging purposes so instead, we pass the pieces of information needed for the logging itself. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-19 14:13:01 +02:00
Botond Dénes	eb674128ca	Merge 'treewide: do not mark return value const if this has no effect ' from Kefu Chai this change is a cleanup to add `-Wignore-qualifiers` when building the tree. to mark a return value without value semantics has no effect. these `const` specifier useless. so let's drop them. and, if we compile the tree with `-Wignore-qualifiers`, the compiler would warn like: ``` /home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] 245 \| const index_metadata_kind kind() const; \| ^~~~~ ``` so this change also silences the above warnings. Closes scylladb/scylladb#16083 * github.com:scylladb/scylladb: build: enable -Wignore-qualifiers treewide: do not mark return value const if this has no effect	2023-11-17 15:35:20 +02:00
Kefu Chai	781b7de502	build: enable -Wignore-qualifiers `-Wignore-qualifiers` is included by -Wextra. but we are not there yet, with this change, we can keep the changes introducing -Wignore-qualifiers warnings out of the repo, before applying `-Wextra`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-17 17:49:47 +08:00
Kefu Chai	15bfa09454	treewide: do not mark return value const if this has no effect this change is a cleanup. to mark a return value without value semantics has no effect. these `const` specifier useless. so let's drop them. and, if we compile the tree with `-Wignore-qualifiers`, the compiler would warn like: ``` /home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] 245 \| const index_metadata_kind kind() const; \| ^~~~~ ``` so this change also silences the above warnings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-17 17:46:19 +08:00
Tomasz Grabiec	6bcf3ac86c	Merge 'Fix a few rare bugs in row cache' from Michał Chojnowski This is a loose collection of fixes to rare row cache bugs flushed out by running test_concurrent_reads_and_eviction several million times. See individual commits for details. Fixes #15483 Closes scylladb/scylladb#15945 * github.com:scylladb/scylladb: partition_version: fix violation of "older versions are evicted first" during schema upgrades cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound() cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity() cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads cache_flat_mutation_reader: fix some cache mispopulations with reverse reads cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads cache_flat_mutation_reader: never make an unlinked last dummy continuous	2023-11-16 23:48:17 +01:00
Michał Chojnowski	9ccd4ea416	partition_version: fix violation of "older versions are evicted first" during schema upgrades A schema upgrade appends a MVCC version B after an existing version A. The last dummy in B is added to the front of LRU, so it will be evicted after the entries in A. This alone doesn't quite violate the "older versions are evicted first" rule, because the new last dummy carries no information. But apply_monotonically generally assumes that entries on the same position have the obvious eviction order, even if they carry no information. Thus, after the merge, the rule can become broken. The proposed fix is as follows: - In the case where A is merged into B, the merged last dummy inherits the link of A. - The merging of B into anything is prevented until its merge with A is finished. This is relatively hacky, because it still involves a state that goes against some natural expectations granted by the "older versions..." rule. A less hacky fix would be to ensure that the new dummy is inserted into a proper place in the eviction order to begin with. Or, better yet, we could eliminate the rule altogether. Aside from being very hard to maintain, it also prevents the introduction of any eviction algorithm other than LRU.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	2aac8690c7	cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound() ensure_population_lower_bound() guarantees that _last_row is valid or null. However, it fails to provide this guarantee in the special rare case when `_population_range_starts_before_all_rows == true` and _last_row is non-null. (This can happen in practice if there is a dummy at before_all_clustering_rows and eviction makes the `(before_all_clustering_rows, ...)` interval discontinous. When the interval is read in this state, _last_row will point to the dummy, while _population_range_starts_before_all_rows will still be true.) In this special case, `ensure_population_lower_bound()` does not refresh `_last_row`, so it can be non-null but invalid after the call. If it is accessed in this state, undefined behaviour occurs. This was observed to happen in a test, in the `read_from_underlying() -- maybe_drop_last_entry()` codepath. The proposed fix is to make the meaning of _population_range_starts_before_all_rows closer to its real intention. Namely: it's supposed to handle the special case of a left-open interval, not the case of an interval starting at -inf.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	0dcf91491e	cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity() To reflect the final range tombstone change in the populated range, maybe_update_continuity() might insert a dummy at `before_key(_next_row.table_position())`. But the relevant logic breaks down in the special case when that position is equal to `_last_row.position()`. The code treats the dummy as a part of the (_last_row, _next_row) range, but this is wrong in the special case. This can lead to inconsistent state. For example, `_last_row` can be wrongly made continuous, or its range tombstone can be wrongly nulled. The proposed fix is to only modify the dummy if it was actually inserted. If it had been inserted beforehand (which is true in the special case, because of the `ensure_population_lower_bound()` call earlier), then it's already in a valid state and doesn't need changes.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	6601c778dd	cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads Cache population routines insert new row entries. In non-reverse reads, the new entries (except for the lower bound of the query range) are filled with the correct continuity and range tombstones immediately after insertion, because that information has already arrived from underlying. when the entries are inserted. But in reverse reads, it's the interval after the newly-inserted entry that's made continuous. The continuity information in the new entries isn't filled. When two population routines race, the one which comes later can punch holes in the continuity left by the first routine, which can break the "older versions are evicted first" rule and revert the affected interval to an older version. To fix this, we must make sure that inserting new row entries doesn't change the total continuity of the version.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	47299d6b06	partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads The FIXME comment claims that setting continity isn't very important in this place, but in fact this is just wrong. If two calls to read_from_underlying() get into a race, the one which finishes later can call ensure_entry_in_latest() on a position which lies inside a continuous interval in the newest version. If we don't take care to preserve the total continuity of the version, this can punch a hole in the continuity of the newest version, potentially reverting the affected interval to an older version. Fix that.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	b5988fb389	cache_flat_mutation_reader: fix some cache mispopulations with reverse reads `_last_row` is in table schema, but it is sometimes compared with positions in query schema. This leads to unexpected behaviour when reverse reads are used. The previous patch fixed one such case, which was affecting correctness. As far as I can tell, the three cases affected by this patch aren't a correctness problem, but can cause some intervals to fail to be made continuous. (And they won't be cached even if the same read is repeated many times).	2023-11-16 19:01:18 +01:00
Michał Chojnowski	f9eb64b8e0	cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads `_last_row` is in table schema, while `cur.position()` is in query schema (which is either equal to table schema, or its reverse). Thus, the comparison affected by this patch doesn't work as intended. In reverse reads, the check will pass even if `_last_row` has the same key, but opposite bound weight to `cur`, which will lead to the dummy being inserted at the wrong position, which can e.g. wrongly extend a range tombstone. Fix that.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	ec364c3580	cache_flat_mutation_reader: never make an unlinked last dummy continuous It is illegal for an unlinked last dummy to be continuous, (this is how last dummies respect the "older verions are evicted first" rule), but it is technically possible for an unlinked last dummy to be made continuous by read_from_underlying. This commit fixes that. Found by row_cache_test. The bug is very unlikely to happen in practice because the relevant rows_entry is bumped in LRU before read_from_underlying starts. For the bug to manifest, the entry has to fall down to the end of the LRU list and be evicted before read_from_underlying() ends. Usually it takes several minutes for an entry to fall out of LRU, and read_from_underlying takes maybe a few hundred milliseconds. And even if the above happened, there still needs to appear a new version, which needs to have its continuous last dummy evicted before it's merged.	2023-11-16 19:01:18 +01:00
Anna Stuchlik	ca22de4843	doc: mark the link to upgrade guide as OSS-only This commit adds the .. only:: opensource directive to the Raft page to exclude the link to the 5.2-to-5.4 upgrade guide from the Enterprise documentation. The Raft page belongs to both OSS and Enterprise documentation sets, while the upgrade guide is OSS-only. This causes documentation build issues in the Enterprise repository, for example, https://github.com/scylladb/scylla-enterprise/pull/3242. As a rule, all OSS-only links should be provided by using the .. only:: opensource directive. This commit must be backported to branch-5.4 to prevent errors in the documentation for ScyllaDB Enterprise 2024.1 (backport) Closes scylladb/scylladb#16064	2023-11-16 10:36:27 +02:00
Kefu Chai	687ba9cacc	test/sstable_compaction_test: check every sstable replaced sstable before this change, in sstable_run_based_compaction_test, we check every 4 sstables, to verify that we close the sstable to be replaced in a batch of 4. since the integer-based generation identifier is monotonically incremental, we can assume that the identifiers of sstables are like 0, 1, 2, 3, .... so if the compaction consumes sstable in a batch of 4, the identifier of the first one in the batch should always be the multiple of 4. unfortunately, this test does not work if we use uuid-based identifier. but if we take a closer look at how we create the dataset, we can have following facts: 1. the `compaction_descriptor` returned by `sstable_run_based_compaction_strategy_for_tests` never set `owned_ranges` in the returned descriptor 2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no` is used, if `_owned_ranges_checker` is empty 3. `mutation_reader_merger` respects the `fwd_mr` passed to its ctor, so it closes current sstable immediately when the underlying mutation reader reaches the end of stream. in other words, we close every sstable once it is fully consumed in sstable_ompaction_test. and the reason why the existing test passes is that we just sample the sstables whose generation id is a multiple of 4. what happens when we perform compaction in this test is: 1. replace 5 with 33, closing 5 2. replace 6 with 34, closing 6 3. replace 7 with 35, closing 7 4. replace 8 with 36, closing 8 << let's check here.. good, go on! 5. replace 13 with 37, closing 13 ... 8. replace 16 with 40, closing 16 << let's check here.. also, good, go on! so, in this change, we just check all old sstables, to verify that we close each of them once it is fully consumed. Fixes #16073 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-16 16:21:46 +08:00
Kefu Chai	18792fe059	test/sstable_compaction_test: s/old_sstables.front()/old_sstable/ for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-16 16:21:40 +08:00
Botond Dénes	323e34e1ed	Update tools/java submodule * tools/java 97c49094...b776096d (2): > build: take care of old libthrift [PART 2/2] > build: take care of old libthrift [PART 1/2]	2023-11-16 10:14:38 +02:00
Kefu Chai	12f4f9f481	build: cmake: link against cryptopp::cryptopp instead of linking against cryptopp, we should link against crytopp::crytopp. the latter is the target exposed by Findcryptopp.cmake, while the former is but a library name which is not even exposed by any find_package() call. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16060	2023-11-15 17:14:04 +02:00
Anna Stuchlik	e8129d9a0c	doc: remove DateTieredCompactionStrategy This commit removes support for DateTieredCompactionStrategy from the documentation. Support for DTCS was removed in 5.4, so this commit must be backported to branch-5.4. Refs https://github.com/scylladb/scylladb/issues/15869#issuecomment-1784181274 The information is already added to the 5.2-to-5.4 upgrade guide: https://github.com/scylladb/scylladb/pull/15988 (backport) Closes scylladb/scylladb#16061	2023-11-15 15:39:57 +02:00
Pavel Emelyanov	f4fd5c7207	s3/client: Tag pieces of jumbo uploader The jumbo sink is there to upload files that can be potentially larger than 50Gb (10000*5Mb). For that the sink uploads a set of so called "pieces" -- files up to 50Gb each -- then uses the copy-upload APi call to squash the pieces together. After copying the piece is removed. In case of a crash while uploading pieces remain in the bucket forever which is not great. This patch tags pieces with 'kind=piece' tag in order to tell pieces from regular objects. This can be used, for example, by setting up the lifecycle tag-based policy and collect dangling pieces eventually. fixes: #13670 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16023	2023-11-15 15:32:30 +02:00
Kefu Chai	6a753f9f06	build: cmake: define SCYLLA_BUILD_MODE=dev for Dev mode it was a typo in `b234c839`. so let's correct it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16063	2023-11-15 13:17:30 +02:00
Kefu Chai	972b852e0a	build: cmake: explain the build dependencies in check-headers developer might notice that when he/she builds 'check-headers', the whole tree is built. so let's explain this behavior. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16062	2023-11-15 13:16:01 +02:00
Botond Dénes	ba17ae2ab6	Merge 'Fix tests in test/cql-pytest/ that fail on Cassandra' from Nadav Har'El As a general rule, tests in test/cql-pytest shouldn't just pass on Scylla - they also should not fail on Cassandra; A test that fails on Cassandra may indicate that the test is wrong, or that Scylla's behavior is wrong and the test just enshrines that wrong behavior. Each time we see a test fail on Cassandra we need to check if this is not the case. We also have special markers scylla_only and cassandra_bug to put on tests that we know _should_ fail on Cassandra because it is missing some Scylla-only feature or there is a bug in Cassandra, respectively. Such tests will be xfailed/skipped when running on Cassandra, and not report failures. Unfortunately, over time more several tests got into our suite in that did not pass on Cassandra. In this series I went over all of them, and fixed each to pass - or be skipped - on Cassandra, in a way that each patch explains. Fixes #16027 Closes scylladb/scylladb#16033 * github.com:scylladb/scylladb: test/cql-pytest: fix test_describe.py to not fail on Cassandra test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra test/cql-pytest: fix test_keyspace.py to not fail on Cassandra test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra test/cql-pytest: fix test_filtering.py to not fail on Cassandra	2023-11-15 09:13:09 +02:00
Nadav Har'El	8964cce04c	test/cql-pytest: fix test_describe.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). Some of the tests checked on Cassandra things that don't exist there (namely local secondary indexes) and could skip that part. Other tests need to be skipped completely ("scylla_only") because they rely on a Scylla-only feature. We have a bit too many of those in this file, but I don't want to fix this now. Yet another test found a real bug in Cassandra 4.1.1 (CASSANDRA-17918) but passes in Cassandra 4.1.2 and up, so there's nothing to fix except a comment about the situation. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:40:30 +02:00
Nadav Har'El	6802dca6b5	test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra In commit `52bbc1065c`, we started to allow "IN NULL" - it started to match nothing instead of being an error as it is in Cassandra. The commit incorrectly "fixed" the existing translated Cassandra unit test to match the new behavior - but after this "fix" the test started to fail on Cassandra. The appropriate fix is just to comment out this part of the test and not do it. It's a small point where we deliberately decided to deviate from Cassandra's behavior, so the test it had for this behavior is irrelevant. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	d8997d49e7	test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra Some error-message checks in this test file (which was translated in the past from Cassandra) try operations which actually has two errors, and expected to see one error message - but recent Cassandra prints the other one. This caused several tests to fail when running on Cassandra 4.1. Both messages are fine, so let's accept both. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	a7f5eb3621	test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra Fixed two tests thich failed when running on Cassandra: One test waited for a secondary index to appear, but in Cassandra, the index can be broken (cause a read failure) for a short while and we need to wait through this failure as well and not fail the entire test. Another test was for local secondary index, which is a Scylla-only feature, but we forgot the "scylla_only" tag. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	92f591dc38	test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra The test function test_mv_synchronous_updates checks the synchronous_updates feature, which is a ScyllaDB extension and doesn't exist in Cassandra. So it should be marked with "scylla_only" so that it doesn't fail when running the tests on Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	301189ee28	test/cql-pytest: fix test_keyspace.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). When testing some invalid cases of ALTER TABLE, the test required that you cannot choose SimpleStrategy without specifying a replication_factor. As explained in Refs #16028, this isn't true in Cassandra 4.1 and up - it now has a default value for replication_factor and it's no longer required. So in this patch we split that part of the test to a separate test function and mark it scylla_only. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	2b67cd3921	test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only The tests in test/cql-pytest/test_guardrail_replication_strategy.py are for a Scylla-only feature that doesn't exist in Cassandra, so obviously they all fail on Cassandra. Let's mark them all as scylla_only. We use an autouse fixture to automatically mark all tests in this file as scylla-only, instead of marking each one separately. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	c4d3e08987	test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). This patch is only a partial fix - it fixes trivial differences in error messages, but some potentially-real differences remain so three of the tests still fail: 1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB ("must be between 0.0 and 1.0") but allowed in Cassandra. 2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should have been fine?!) but allowed in Cassandra. 3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB ("invalid timestamp resolution SECONDS") but allowed in Cassandra. I don't think anybody wants to actually use "SECONDS", but it seems legal in Cassandra, so do we need to support it? The patch also simplifies the test to use cql-pytest's util.py, instead of cassandra_tests/porting.py. The latter was meant to make porting existing Cassandra tests easier - not for writing new ones - and made using a regular expression for testing error messages harder so I switched to using pytest.raises() whose "match=" accepts a regular expression. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	8e51ebd8a0	test/cql-pytest: fix test_filtering.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). It turns out that when the token() function is used with incorrect parameters (it needs to be passed all partition-key columns), the error message is different in ScyllaDB and Cassandra. Both are reasonable error messages, so if we insist on checking the error message - we should allow both. Also the same test called its second partition-key column "ck". This is confusing, because we usually use the name "ck" to refer to a clustering key. So just for clarity, we change this name to "pk2". This is not a functional change in the test. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	64d1d5cf62	Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows. Turns out we had two problems in this area that leads to suboptimal bloom filters. 1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed. 2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count. For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong. Fixes https://github.com/scylladb/scylladb/issues/15704. Closes scylladb/scylladb#15938 * github.com:scylladb/scylladb: streaming: Improve partition estimation with TWCS streaming: Don't adjust partition estimate if segregation is postponed	2023-11-14 20:41:36 +02:00
Kefu Chai	d49ea833fd	scylla-sstable: reject duplicate sstable names before this change, `load_sstables()` fills the output sstables vector by indexing it with the sstable's path. but if there are duplicated items in the given sstable_names, the returned vector would have uninitialized shared_sstable instance(s) in it. if we feed such a sstables to the operation funcs, they would segfault when derferencing the empty lw_shared_ptr. in this change, we error out if duplicated sstable names are specified in the command line. an alternative is to tolerate this usage by initializing the sstables vector with a back_inserter, as we always return a dictionary with the sstable's name as the key, but it might be desirable from user's perspective to preserve the order, like OrderedDict in Python. so let's preserve the ordering of the sstables in the command line. this should address the problem of the segfault if we pass duplicated sstable paths to this tool. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16048	2023-11-14 19:37:14 +02:00
Botond Dénes	11cafd2fc8	Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16050 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-14 14:55:17 +02:00
Kefu Chai	2bae14f743	dist: let scylla-server.service Wants var-lib-systemd-coredump without adding `WantedBy=scylla-server.service` in var-lib-systemd-coredump, if we starts `scylla-server.service`, it does not necessarily starts `var-lib-systemd-coredump` even if the latter is installed. with `WantedBy=scylla-server.service` in var-lib-systemd-coredump, if we starts `scylla-server.service`, var-lib-systemd-coredump will be started also. and `Before=scylla-server.service` ensures that, before `scylla-server.service` is started, var-lib-systemd-coredump is already ready. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15984	2023-11-14 14:54:39 +02:00
Michał Jadwiszczak	0083ddd7a0	generic_server: use mutable reference in `for_each_gently` Make `generic_server::gentle_iterator` a mutable iterator to allow `for_each_gently` to make changes to the connections. Fixes: #16035 Closes scylladb/scylladb#16036	2023-11-14 14:25:22 +02:00
Pavel Emelyanov	a87b5cfbec	test/object_store: Generalize test table creation All two and the upcoming third test cases in the test create the very same ks.cf pair with the very same sequence of steps. Generalize them. For the basic test case also tune up the way "expected" rows are calculated -- now they are SELECT-ed right after insertion and the size is checked to be non zero. Not _exactly_ the same check, but it's good enough for basic testing purposes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15986	2023-11-14 13:55:02 +02:00
Takuya ASADA	338a9492c9	scylla_post_install.sh: detect RHEL correctly $ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL itself. To detect RHEL correctly, we also need to check $ID = "rhel". Fixes #16040 Closes scylladb/scylladb#16041	2023-11-14 13:53:35 +02:00
Kefu Chai	5a6c5320de	test/sstable_compaction_test: use BOOST_REQUIRE_EQUAL when appropriate Boost.Test prints the LHS and RHS when the predicate statement passed to BOOST_REQUIRE_EQUAL() macro evaluates to false. so the error message printed by Boost would be more developer friendly when the test fails. in this test, we replace some BOOST_REQUIRE() with BOOST_REQUIRE_EQUAL() when appropriate. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16047	2023-11-14 13:51:47 +02:00
Botond Dénes	f63645ceab	Merge 'test/cql-pytest: fix test_permissions.py to not fail on Cassandra' from Nadav Har'El This short series fixes test/cql-pytest/test_permissions.py to stop failing on Cassandra. The second patch fixes these failures (and explains why). The first patch is a new test for UDFs, which helped me prove that one of the test_permissions.py failures in Cassandra is a Cassandra bug - some esoteric error path that prints the right message when no permissions are involved, becomes wrong when permissions are added. Fixes #15969 Closes scylladb/scylladb#15979 * github.com:scylladb/scylladb: test/cql-pytest: fix test_permissions.py to not fail on Cassandra test/cql-pytest: add test for DROP FUNCTION	2023-11-14 13:50:51 +02:00
Gleb Natapov	f04e890690	storage_service: topology coordinator: do fencing even if draining failed Token metadata barrier consists for two steps. First old request are drained and then requests that are not drained are fenced. But currently if draining fails then fencing is note done. This is fine if the barrier's failure handled by retrying, but we when to start handling errors differently. In fact during topology operation rollback we already do not retry failed barrier. The patch fixes the metadata barrier to do fencing even if draining failed.	2023-11-14 13:06:41 +02:00
Aleksandra Martyniuk	6af581301b	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-14 10:36:38 +01:00
Botond Dénes	a66ec1d3c1	Merge 'Drop compaction_manager_test' from Pavel Emelyanov This is continuation of `a34c8dc4` (Drop compaction_manager_for_testing). There's one more wrapper over compaction_manager to access its private fields. All such access was recently moved to sstables::test_env's compaction manager, now it's time to drop the remaining legacy wrapper class. Closes scylladb/scylladb#16017 * github.com:scylladb/scylladb: test/utils: Drop compaction_manager_test test/utils: Get compaction manager from test_env test/sstables: Introduce test_env_compaction_manager::perform_compaction() test/env: Add sstables::test_env& to compaction_manager_test::run() test/utils: Add sstables::test_env& to compact_sstables() test/utils: Simplify and unify compaction_manager_test::run() test/utils: Squash two compact_sstables() helpers test/compaction: Use shorter compact_sstables() helper test/utils: Keep test task compaction gate on task itself test/utils: Move compaction_manager_test::propagate_replacement()	2023-11-14 11:25:17 +02:00
Kamil Braun	9212bdc6b1	migration_manager: more verbose logging for schema versions We're observing nodes getting stuck during bootstrap inside `storage_service::wait_for_ring_to_settle()`, which periodically checks `migration_manager::have_schema_agreement()` until it becomes `true`: scylladb/scylladb#15393. There is no obvious reason why that happens -- according to the nodes' logs, their latest in-memory schema version is the same. So either the gossiped schema version is for some reason different (perhaps there is a race in publishing `application_state::SCHEMA`) or missing entirely. Alternatively, `wait_for_ring_to_settle` is leaving the `have_schema_agreement` loop and getting stuck in `update_topology_change_info` trying to acquire a lock. Modify logging inside `have_schema_agreement` so details about missing schema or version mismatch are logged on INFO level, and an INFO level message is printed before we return `true`. To prevent logs from getting spammed, rate-limit the periodic messages to once every 5 seconds. This will still show the reason in our tests which allow the node to hang for many minutes before timing out. Also these schema agreement checks are done on relatively rare occasions such as bootstrap, so the additional logs should not be harmful. Furthermore, when publishing schema version to gossip, log it on INFO level. This is happening at most once per schema change so it's a rare message. If there's a race in publishing schema versions, this should allow us to observe it. Ref: scylladb/scylladb#15393 Closes scylladb/scylladb#16021	2023-11-14 11:24:47 +02:00
Alexey Novikov	bd73536b33	When add duration field to UDT check whether this UDT is used in some clustering key Having values of the duration type is not allowed for clustering columns, because duration can't be ordered. This is correctly validated when creating a table but do not validated when we alter the type. Fixes #12913 Closes scylladb/scylladb#16022	2023-11-14 11:23:05 +02:00
Botond Dénes	4968f50ff7	Merge 'auth: fix error message when consistency level is not met' from Paweł Zakrzewski Propagate `exceptions::unavailable_exception` error message to the client such as cqlsh. Fixes #2339 Closes scylladb/scylladb#15922 * github.com:scylladb/scylladb: test: add the auth_cluster test suite auth: fix error message when consistency level is not met	2023-11-14 11:22:38 +02:00
Kefu Chai	4f361b73c4	build: cmake: consolidate the setting of cxx_flags before this change, we define the CMAKE_CXX_FLAGS_${CONFIG} directly. and some of the configurations are supposed to generate debugging info with "-g -gz" options, but they failed to include these options in the cxx flags. in this change: * a macro named `update_cxx_flags` is introduced to set this option. * this macro also sets -O option instead of using function, this facility is implemented as a macro so that we can update the CMAKE_CXX_FLAGS_${CONFIG} without setting this variable with awkward syntax like set ```cmake set(${flags} "${${flags}}" PARENT_SCOPE) ``` this mirrors the behavior in configure.py in sense that the latter sets the option on a per-mode basis, and interprets the option to compiling option. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16043	2023-11-14 11:21:52 +02:00
Kefu Chai	a846291ce8	build: cmake: define SCYLLA_BUILD_MODE for Release build this macro definition was dropped in `2b961d8e3f` by accident. in this change, let's bring it back. this macro is always necessary, as it is checked in scylla source. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16044	2023-11-14 11:21:33 +02:00
Tomasz Grabiec	dc6a0b2c35	gossiper: Elevate logging level for node restart events They cause connection drops, which is a significant disruptive event. We should log it so that we can know that this is the cause of the problems it may cause, like requests timing out. Connection drop will cause coordinator-side requests to time out in the absence of speculation. Refs #14746 Closes scylladb/scylladb#16018	2023-11-14 11:21:13 +02:00
Kefu Chai	58f3ced4d6	scylla-gdb: raise if no tasks are found the "task" fixture is supposed to return a task for test, if it fails to do so, it would be an issue not directly related to the test. so let's fail it early. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16042	2023-11-14 11:12:43 +02:00
Botond Dénes	22381441b0	migration_manager: also reload schema on enabling digest_insensitive_to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: #16004 Closes scylladb/scylladb#16013	2023-11-13 23:32:20 +02:00
Aleksandra Martyniuk	a63a6dcd93	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-13 16:06:36 +01:00
Aleksandra Martyniuk	2a9ee59cc4	tasks: fail if a task was aborted run() method of task_manager::task::impl does not have to throw when a task is aborted with task manager api. Thus, a user will see that the task finished successfully which makes it inconsistent. Finish a task with a failure if it was aborted with task manager api.	2023-11-13 16:06:20 +01:00
Aleksandra Martyniuk	599d6ebd52	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-11-13 15:46:58 +01:00
Kamil Braun	d24b305712	Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. Additionally, after removing the timeout, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails. Fixes #15865 Closes scylladb/scylladb#15944 * github.com:scylladb/scylladb: raft topology: fix indentation raft topology: join: try sending the response only once raft topology: join: do not time out waiting for the node to be joined group 0: group0_handshaker: add the abort_source parameter to post_server_start	2023-11-13 15:02:27 +01:00
Paweł Zakrzewski	a0dcc154c1	test: add the auth_cluster test suite This commit adds the auth_cluster test suite to test a custom scenario involving password authentication: - create a cluster of 2 nodes with password authentication - down one node - the other node should refuse login stating that it couldn't reach QUORUM References ScyllaDB OSS #2339	2023-11-13 14:04:28 +01:00
Paweł Zakrzewski	400aa2e932	auth: fix error message when consistency level is not met Propagate `exceptions::unavailable_exception` error message to the client such as cqlsh. Fixes #2339	2023-11-13 14:04:23 +01:00
Takuya ASADA	85339d1820	scylla_setup: add warning for CentOS7 default kernel Since CentOS7 default kernel is too old, has performance issues and also has some bugs, we have been recommended to use kernel-ml kernel. Let's check kernel version in scylla_setup and print warning if the kernel is CentOS7 default one. related #7365 Closes scylladb/scylladb#15705	2023-11-13 13:47:06 +02:00
Botond Dénes	2b11a02b67	Merge 'Improvements to gossiper shadow round' from Kamil Braun Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions. Fix the calculation of `nodes_down` which could count a single node multiple times. Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode). Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node). More details in commit messages. Ref: https://github.com/scylladb/scylladb/issues/15675 Closes scylladb/scylladb#15941 * github.com:scylladb/scylladb: gossiper: do_shadow_round: increment `nodes_down` in case of timeout gossiper: do_shadow_round: fix `nodes_down` calculation storage_service: make shadow round mandatory during bootstrap/replace gossiper: do_shadow_round: remove default value for nodes param gossiper: do_shadow_round: remove `fall_back_to_syn_msg`	2023-11-13 13:37:13 +02:00
Botond Dénes	dfd7981fa7	api/storage_service: start/stop native transport in the statement sg Currently, it is started/stopped in the streaming/maintenance sg, which is what the API itself runs in. Starting the native transport in the streaming sg, will lead to severely degraded performance, as the streaming sg has significantly less CPU/disk shares and reader concurrency semaphore resources. Furthermore, it will lead to multi-paged reads possibly switching between scheduling groups mid-way, triggering an internal error. To fix, use `with_scheduling_group()` for both starting and stopping native transport. Technically, it is only strictly necessary for starting, but I added it for stop as well for consistency. Also apply the same treatment to RPC (Thrift). Although no one uses it, best to fix it, just to be on the safe side. I think we need a more systematic approach for solving this once and for all, like passing the scheduling group to the protocol server and have it switch to it internally. This allows the server to always run on the correct scheduling group, not depending on the caller to remember using it. However, I think this is best done in a follow-up, to keep this critical patch small and easily backportable. Fixes: #15485 Closes scylladb/scylladb#16019	2023-11-13 14:08:01 +03:00
Anna Stuchlik	8a4a8f077a	doc: document full support for RBNO This commit updates the Repair-Based Node Operations page. In particular: - Information about RBNO enabled for all node operations is added (before 5.4, RBNO was enabled for the replace operation, while it was experimental for others). - The content is rewritten to remove redundant information about previous versions. The improvement is part of the 5.4 release. This commit must be backported to branch-5.4 Closes scylladb/scylladb#16015	2023-11-13 13:06:15 +02:00
Pavel Emelyanov	492b842929	messaging_service: Define metrics domain for client connections Recent seastar update included RPC metrics (scylladb/seastar#1753). The reported metrics groups together sockets based on their "metrics_domain" configuration option. This patch makes use of this domain to make scylla metrics sane. The domain as this patch defines it includes two strings: First, the datacenter the server lives in. This is because grouping metrics for connections to different datacenters makes little sense for several reasons. For example -- packet delays _will_ differ for local-DC vs cross-DC traffic and mixing those latencies together is pointless. Another example -- the amount of traffic may also differ for local- vs cross-DC connections e.g. because of different usage of enryption and/or compression. Second, each verb-idx gets its own domain. That's to be able to analyze e.g. query-related traffic from gossiper one. For that the existing isolation cookie is taken as is. Note, that the metrics is _not_ per-server node. So e.g. two gossiper connections to two different nodes (in one DC) will belong to the same domain and thus their stats will be summed when reported. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15785	2023-11-13 11:13:20 +01:00
Pavel Emelyanov	f4696f21a8	test/utils: Drop compaction_manager_test This class only provides a .run() method which allocates a task and calls sstables::test_env::perform_compaction(). This can be done in a helper method, no need for the whole class for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	b68f9c32bb	test/utils: Get compaction manager from test_env This is just to reduce churn in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	9fd270566a	test/sstables: Introduce test_env_compaction_manager::perform_compaction() Take it from compaction_manager_test::run() which is simplified overwite of the compaction_manager::perform_compaction(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	0160265c7d	test/env: Add sstables::test_env& to compaction_manager_test::run() Continuation of the previous patch that will also be used further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	393c066f3e	test/utils: Add sstables::test_env& to compact_sstables() Will be used in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	ca18db4a71	test/utils: Simplify and unify compaction_manager_test::run() The method is the simplified rewrite of the compaction_manager's perform_compaction() one, but it makes task registration and unregistration to hard way. Keep it shorter and simpler resembling the compaction_manager's prototype. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	9a9e1fdd7d	test/utils: Squash two compact_sstables() helpers Now the one sitting in utils is only called from its peer in compaction test. Things get simpler if they get merged. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	69657a2a97	test/compaction: Use shorter compact_sstables() helper There are several of them spread between the test and utils. One of the test cases can use its local shorter overload for brevity. Also this makes one of the next patches shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	59943267c2	test/utils: Keep test task compaction gate on task itself They both have the same scope, but keeping it on the task frees the caller from the need to mess with its private fields. For now it's not a problem, but it will be critical in one of the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	aec3fc493a	test/utils: Move compaction_manager_test::propagate_replacement() The purpose of this method is to turn public the private compaction_manager method of the same name. The caller of this method is having sstable_test_env at hand with its test_env_compaction_manager, so the de-private-isation call can be moved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Kefu Chai	efd65aebb2	build: cmake: add check-header target to have feature parity with `configure.py`. we won't need this once we migrate to C++20 modules. but before that day comes, we need to stick with C++ headers. we generate a rule for each .hh files to create a corresponding .cc and then compile it, in order to verify the self-containness of that header. so the number of rule is quite large, to avoid the unnecessary overhead. the check-header target is enabled only if `Scylla_CHECK_HEADERS` option is enabled. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15913	2023-11-13 10:27:06 +02:00
Avi Kivity	7b08886e8d	Update tools/java submodule (dependencies update) * tools/java 86a200e324...97c490947c (1): > Merge 'build: update several dependencies' from Piotr Grabowski Ref https://github.com/scylladb/scylla-tools-java/issues/348 Ref https://github.com/scylladb/scylla-tools-java/issues/349 Ref https://github.com/scylladb/scylla-tools-java/issues/350	2023-11-12 18:17:04 +02:00
Nadav Har'El	7f34006ce2	test/cql-pytest: fix test_permissions.py to not fail on Cassandra We shouldn't have cql-pytest tests that report failure when run on Cassandra (with test/cql-pytest/run-cassandra): A test that passes on Scylla but fails on Cassandra indicates a difference between Scylla's behavior and Cassandra's, and this difference should always be investigated: 1. It can be a Scylla bug, which of should be fixed immediately or reported as a bug and the test changed to fail on Scylla ("xfail"). 2. It can be a minor difference in Scylla's and Cassandra's behavior where both can be accepted. In this case the test should me modified to accept both behaviors, and a comment added to explain why we decided to do that. 3. It can be a Cassandra bug which causes a correct test to fail. This case should not be taken lightly, and a serious effort is needed to be convinced that this is really a Cassandra bug and not our misunderstanding of what Cassandra does. In this case the test should be marked "cassandra_bug" and a detailed comment should explain why. 4. Or it can be an outright bug in the test that caused it to fail on Cassandra. This test had most of these cases :-) There was a test bug in one place (in a Cassandra-specific Java UDF), a minor and (aruably) acceptable difference between the error codes returned by Scylla and Cassandra in one case, and two minor Cassandra bugs (in the error path). All of these are fixed here, and after this patch test/cql-pytest/run-cassandra no longer fails on this file. Fixes #15969 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-12 17:14:09 +02:00
Nadav Har'El	0ecf84e83e	test/cql-pytest: add test for DROP FUNCTION We already have in test/cql-pytest various tests for UDF in the bigger context of UDA (test_uda.py), WASM (test_wasm.py) and permissions, but somehow we never had a file for simple tests only for UDF, so we add one here, test/cql-pytest/test_udf.py We add a test for checking something which was already assumed in test_permissions.py - that it is possible to create two different UDFs with the same name and different parameters, and then you must specify the parameters when you want to DROP one of them. The test confirms that ScyllaDB's and Cassandra's behavior is identical in this, as hoped. To allow the test to run on both ScyllaDB and Cassandra, it needs to support both Lua (for ScyllaDB) or Java (for Cassandra), and we introduce a fixture to make it easier to support both. This fixture can later be used in more tests added to this file. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-12 17:14:08 +02:00
Tomasz Grabiec	457d170078	Merge 'Multishard mutation query test fix misses expectations' from Botond Dénes There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many shards we have without readers on them. Fixes: https://github.com/scylladb/scylladb/issues/14087 Closes scylladb/scylladb#15806 * github.com:scylladb/scylladb: test/boost/multishard_mutation_query_test: fix querier cache misses expectations test/lib/test_utils: add require_* variants for all comparators	2023-11-12 13:15:29 +01:00
Benny Halevy	68a7bbe582	compaction_manager: perform_cleanup: ignore condition_variable_timed_out The polling loop was intended to ignore `condition_variable_timed_out` and check for progress using a longer `max_idle_duration` timeout in the loop. Fixes #15669 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15671	2023-11-12 13:53:51 +02:00
Patryk Jędrzejczak	2d7bfeb3fa	raft topology: fix indentation Broken in the previous commit.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	e94c7cff28	raft topology: join: try sending the response only once When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. In the previous commit, we have made the operator responsible for shutting down the joining node if the topology coordinator fails to deliver a response by removing the timeout. In this commit, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	4ffa692cb3	raft topology: join: do not time out waiting for the node to be joined When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. This change additionally fixes the TODO in raft_group0::join_group0.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	5f36e1d7f2	group 0: group0_handshaker: add the abort_source parameter to post_server_start Used in the following commit to enable the clean shutdown of a node that does not receive the join rejection from the topology coordinator.	2023-11-10 12:35:38 +01:00
Anna Stuchlik	8d618bbfc6	doc: update cqlsh compatibility with Python This commit updates the cqlsh compatibility with Python to Python 3. In addition it: - Replaces "Cassandra" with "ScyllaDB" in the description of cqlsh. The previous description was outdated, as we no longer can talk about using cqlsh released with Cassandra. - Replaces occurrences of "Scylla" with "ScyllaDB". - Adds additional locations of cqlsh (Docker Hub and PyPI), as well as the link to the scylla-cqlsh repository. Closes scylladb/scylladb#16016	2023-11-10 09:19:41 +02:00
Avi Kivity	d8bf8f0f43	Merge 'Do not create directories in datadir for S3-backed sstables' from Pavel Emelyanov After `146e49d0dd` (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3) fixes: #13020 Closes scylladb/scylladb#16007 * github.com:scylladb/scylladb: test/object_store: Check that keyspace directory doesn't appear sstables/storage: Do storage init/destroy based on storage options replica/{ks\|cf}: Move storage init/destroy to sstables manager database: Add get_sstables_manager(bool_class is_system) method	2023-11-09 20:35:13 +02:00
Kamil Braun	3bcee6a981	Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani" This reverts commit `7c7baf71d5`. If `stop_gracefully` times out during test teardown phase, it crashes the test framework reporting multiple errors, for example: ``` 12:35:52 /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited 12:35:52 self.exit_artifacts = {} 12:35:52 RuntimeWarning: Enable tracemalloc to get the object allocation traceback 12:35:52 Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:52 Traceback (most recent call last): 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for 12:35:52 return fut.result() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait 12:35:52 return await self._transport._wait() 12:35:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait 12:35:52 return await waiter 12:35:52 ^^^^^^^^^^^^ 12:35:52 asyncio.exceptions.CancelledError 12:35:52 12:35:52 The above exception was the direct cause of the following exception: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully 12:35:52 await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS) 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for 12:35:52 raise exceptions.TimeoutError() from exc 12:35:52 TimeoutError 12:35:52 12:35:52 During handling of the above exception, another exception occurred: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789 12:35:52 code = await main() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main 12:35:52 await run_all_tests(signaled, options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests 12:35:52 await reap(done, pending, signaled) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap 12:35:52 result = coro.result() 12:35:52 ^^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run 12:35:52 await test.run(options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run 12:35:52 async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager: 12:35:52 File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__ 12:35:52 await anext(self.gen) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager 12:35:52 await manager.stop() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop 12:35:52 await self.clusters.put(self.cluster, is_dirty=True) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put 12:35:52 await self.destroy(obj) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster 12:35:52 await cluster.stop_gracefully() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully 12:35:52 await asyncio.gather(*(server.stop_gracefully() for server in self.running.values())) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully 12:35:52 raise RuntimeError( 12:35:52 RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited ```	2023-11-09 12:30:35 +01:00
Gleb Natapov	2dd8152c8b	storage_service: topology coordinator: log rollback event before changing node's state The test for the rollback relies on the log to be there after operation fails, but if node's state is changed before the log the operation may fail before the log is printed. Fixes scylladb/scylladb#15980 Message-ID: <ZUuwoq65SJcS+yTH@scylladb.com>	2023-11-09 12:11:58 +01:00
Botond Dénes	d8b6771eb8	Merge 'doc: add CQL Reference for Materialized Views and remove irrelevant version information' from Anna Stuchlik This PR is a follow-up to https://github.com/scylladb/scylladb/pull/15742#issuecomment-1766888218. It adds CQL Reference for Materialized Views to the Materialized Views page. In addition, it removes the irrelevant information about when the feature was added and replaces "Scylla" with "ScyllaDB". (nobackport) Closes scylladb/scylladb#15855 * github.com:scylladb/scylladb: doc: remove versions from Materialized Views doc: add CQL Reference for Materialized Views	2023-11-09 10:43:11 +01:00
Botond Dénes	1cccc86813	Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk" This reverts commit `2860d43309`, reversing changes made to `a3621dbd3e`. Reverting because rest_api.test_compaction_task started failing after this was merged. Fixes: #16005	2023-11-09 10:43:11 +01:00
Eliran Sinvani	c5956957f3	use_statement: Covert an exception to a future exception The use statement execution code can throw if the keyspace is doesn't exist, this can be a problem for code that will use execute in a fiber since the exception will break the fiber even if `then_wrapped` is used. Fixes #14449 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes scylladb/scylladb#14394	2023-11-09 10:43:11 +01:00
Pavel Emelyanov	7e1017c7d8	test/object_store: Check that keyspace directory doesn't appear When creating a S3-backed keyspace its storage dir shouldn't be made. Also it shouldn't be "resurrected" by boot-time loader of existing keyspaces. For extra confidence check that the system keyspace's directory does exists where the test expects keyspaces' directories to appear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	f6eae191ff	sstables/storage: Do storage init/destroy based on storage options It's only local storage type that needs directores touch/remove, S3 storage initialization is for now a no-op, maybe some day soon it will appear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	11b704e8b8	replica/{ks\|cf}: Move storage init/destroy to sstables manager It's the manager that knows about storages and it should init/destroy it. Also the "upload" and "staging" paths are about to be hidden in sstables/ code, this code move also facilitates that. The indentation in storage.cc is deliberately broken to make next patch look nicer (spoiler: it won't have to shift those lines right). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	68cf26587c	database: Add get_sstables_manager(bool_class is_system) method There's one place that does this selection, soon there will appear another, so it's worth having a convenience helper getter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Michał Chojnowski	206e313c60	mutation_query_test: test that range tombstones are sent in reverse queries Reproducer for #10598.	2023-11-08 14:54:48 +01:00
Michał Chojnowski	002357e238	mutation_query: properly send range tombstones in reverse queries reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, are not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down, will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired.	2023-11-08 14:54:48 +01:00
Nadav Har'El	6453f41ca9	Merge 'schema: add whitespaces to values of table options' from Michał Jadwiszczak Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`). This improves readability and matches client-side describe format. Fixes: #14895 Closes scylladb/scylladb#15900 * github.com:scylladb/scylladb: cql-pytest:test_describe: add test for whitespaces in json objects schema: add whitespace to description of table options	2023-11-08 15:26:49 +02:00
Anna Stuchlik	ca0f5f39b5	doc: fix info about in 5.4 upgrade guide This commit fixes the information about Raft-based consistent cluster management in the 5.2-to-5.4 upgrade guide. This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4. In addition, it adds information about removing DateTieredCompactionStrategy to the 5.2-to-5.4 upgrade guide, including the guideline to migrate to TimeWindowCompactionStrategy. Closes scylladb/scylladb#15988	2023-11-08 13:21:53 +01:00
Kamil Braun	3036a80334	docs: mention Raft getting enabled when upgrading to 5.4 Fixes: scylladb/scylladb#15952 Closes scylladb/scylladb#16000	2023-11-08 14:18:29 +02:00
Raphael S. Carvalho	b551f4abd2	streaming: Improve partition estimation with TWCS When off-strategy is disabled, data segregation is not postponed, meaning that getting partition estimate right is important to decrease filter's false positives. With streaming, we don't have min and max timestamps at destination, well, we could have extended the RPC verb to send them, but turns out we can deduce easily the amount of windows using default TTL. Given partitioner random nature, it's not absurd to assume that a given range being streamed may overlap with all windows, meaning that each range will yield one sstable for each window when segregating incoming data. Today, we assume the worst of 100 windows (which is the max amount of sstables the input data can be segregated into) due to the lack of metadata for estimating the window count. But given that users are recommended to target a max of ~20 windows, it means partition estimate is being downsized 5x more than needed. Let's improve it by using default TTL when estimating window count, so even on absence of timestamp metadata, the partition estimation won't be way off. Fixes #15704. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-08 12:10:03 +02:00
Kamil Braun	f094e23d84	system_keyspace: use system memory for `system.raft` table `system.raft` was using the "user memory pool", i.e. the `dirty_memory_manager` for this table was set to `database::_dirty_memory_manager` (instead of `database::_system_dirty_memory_manager`). This meant that if a write workload caused memory pressure on the user memory pool, internal `system.raft` writes would have to wait for memtables of user tables to get flushed before the write would proceed. This was observed in SCT longevity tests which ran a heavy workload on the cluster and concurrently, schema changes (which underneath use the `system.raft` table). Raft would often get stuck waiting many seconds for user memtables to get flushed. More details in issue #15622. Experiments showed that moving Raft to system memory fixed this particular issue, bringing the waits to reasonable levels. Currently `system.raft` stores only one group, group 0, which is internally used for cluster metadata operations (schema and topology changes) -- so it makes sense to keep use system memory. In the future we'd like to have other groups, for strongly consistent tables. These groups should use the user memory pool. It means we won't be able to use `system.raft` for them -- we'll just have to use a separate table. Fixes: scylladb/scylladb#15622 Closes scylladb/scylladb#15972	2023-11-08 11:21:14 +02:00
Nadav Har'El	284534f489	Merge 'Nodetool additional commands 4/N' from Botond Dénes This PR implements the following new nodetool commands: * snapshot * drain * flush * disableautocompaction * enableautocompaction All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15939 * github.com:scylladb/scylladb: test/nodetool: add README.md tools/scylla-nodetool: implement enableautocompaction command tools/scylla-nodetool: implement disableautocompaction command tools/scylla-nodetool: implement the flush command tools/scylla-nodetool: extract keyspace/table parsing tools/scylla-nodetool: implement the drain command tools/scylla-nodetool: implement the snapshot command test/nodetool: add support for matching aproximate query parameters utils/http: make dns_connection_factory::initialize() static	2023-11-08 11:18:35 +02:00
Kefu Chai	cf70970226	build: cmake: use $<CONFIG:cfgs> when appropriate since CMake 3.19, we are able to use $<CONFIG:cfgs> instead of the more cubersume $<IN_LIST:$<CONFIG>,foo;bar> expression for checking if a config is in a list of configurations. and since the minimal required CMake of scylla is 3.27, so let's use $<CONFIG:cfgs> when possible. see also https://cmake.org/cmake/help/git-stage/manual/cmake-generator-expressions.7.html#configuration-expressions Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15989	2023-11-08 08:50:44 +02:00
Nadav Har'El	3729ea8bfd	cql-pytest: translate Cassandra's test for CREATE operations This is a translation of Cassandra's CQL unit test source file validation/operations/CreateTest.java into our cql-pytest framework. The 15 tests did not reproduce any previously-unknown bug, but did provide additional reproducers for several known issues: Refs #6442: Always print all schema parameters (including default values) Refs #8001: Documented unit "µs" not supported for assigning a duration" type. Refs #8892: Add an option for default RF for new keyspaces. Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Unfortunately, I also had to comment out - and not translate - several tests which weren't real "CQL tests" (tests that use only the CQL driver), and instead relied on Cassandra's Java implementation details: 1. Tests for CREATE TRIGGER were commented out because testing them in Cassandra requires adding a Java class for the test. We're also not likely to ever add this feature to Scylla (Refs #2205). 2. Similarly, tests for CEP-11 (Pluggable memtable implementations) used internal Java APIs instead of CQL, and it also unlikely we'll ever implement it in a way compatible with Cassandra because of its Java reliance. 3. One test for data center names used internal Cassandra Java APIs, not CQL to create mock data centers and snitches. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#15791	2023-11-08 08:46:27 +02:00
Botond Dénes	2860d43309	Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#15083 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-08 08:45:16 +02:00
Asias He	194507dffa	repair: Convert put_row_diff_with_rpc_stream to use coroutine It will be easier to add more logics in this function.	2023-11-08 13:52:34 +08:00
Nadav Har'El	a3621dbd3e	Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz alternator: add support for ReturnValuesOnConditionCheckFailure feature As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed. Fixes https://github.com/scylladb/scylladb/issues/14481 Closes scylladb/scylladb#15125 * github.com:scylladb/scylladb: alternator: add support for ReturnValuesOnConditionCheckFailure feature alternator: add ability to send additional fields in api_error	2023-11-07 23:19:51 +02:00
Takuya ASADA	a4aeef2eb0	scylla_util.py: run apt-get update before apt-get install if it necessary Unlike yum, "apt-get install" may fails because package cache is outdated. Let's check package cache mtime and run "apt-get update" if it's too old. Fixes #4059 Closes scylladb/scylladb#15960	2023-11-07 20:40:16 +02:00
Wojciech Mitros	ab743271f1	test: increase timeout for lua UDF execution When running on a particularly slow setup, for example on an ARM machine in debug mode, the execution time of even a small Lua UDF that we're using in tests may exceed our default limits. To avoid timeout errors, the limit in tests is now increased to a value that won't be exceeded in any reasonable scenario (for the current set of tested UDFs), while not making the test take an excessive amount of time in case of an error in the UDF execution. Fixes #15977 Closes scylladb/scylladb#15983	2023-11-07 20:28:28 +02:00
Kamil Braun	07e9522d6c	Merge 'raft topology: handle abort exceptions better in fence_previous_coordinator' from Piotr Dulikowski When topology coordinator tries to fence the previous coordinator it performs a group0 operation. The current topology coordinator might be aborted in the meantime, which will result in a `raft::request_aborted` exception being thrown. After the fix to scylladb/scylladb#15728 was merged, the exception is caught, but then `sleep_abortable` is called which immediately throws `abort_requested_exception` as it uses the same abort source as the group0 operation. The `fence_previous_coordinator` function which does all those things is not supposed to throw exceptions, if it does - it causes `raft_state_monitor_fiber` to exit, completely disabling the topology coordinator functionality on that node. Modify the code in the following way: - Catch `abort_requested_exception` thrown from `sleep_abortable` and exit the function if it happens. In addition to the described issue, it will also handle the case when abort is requested while `sleep_abortable` happens, - Catch `raft::request_aborted` thrown from group0 operation, log the exception with lower verbosity and exit the function explicitly. Finally, wrap both `fence_previous_coordinator` and `run` functions in a `try` block with `on_fatal_internal_error` in the catch handler in order to implement the behavior that adding `noexcept` was originally supposed to introduce. Fixes: scylladb/scylladb#15747 Closes scylladb/scylladb#15948 * github.com:scylladb/scylladb: raft topology: catch and abort on exceptions from topology_coordinator::run Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept" raft topology: don't print an error when fencing previous coordinator is aborted raft topology: handle abort exceptions from sleeping in fence_previous_coordinator	2023-11-07 17:17:49 +01:00
Botond Dénes	60ea940f9e	Merge 'docs: render options with role' from Kefu Chai this series tries to 1. render options with role. so the options can be cross referenced and defined. 2. move the formatting out of the content. so the representation can be defined in a more flexible way. Closes scylladb/scylladb#15860 * github.com:scylladb/scylladb: docs: add divider using CSS docs: extract _clean_description as a filter docs: render option with role docs: parse source files right into rst	2023-11-07 17:01:22 +02:00
Botond Dénes	3088453a09	test/nodetool: add README.md	2023-11-07 09:49:56 -05:00
Botond Dénes	7ff7cdc86a	tools/scylla-nodetool: implement enableautocompaction command	2023-11-07 09:49:56 -05:00
Botond Dénes	0e0401a5c5	tools/scylla-nodetool: implement disableautocompaction command	2023-11-07 09:49:56 -05:00
Botond Dénes	f5083f66f5	tools/scylla-nodetool: implement the flush command	2023-11-07 09:49:56 -05:00
Botond Dénes	f082cc8273	tools/scylla-nodetool: extract keyspace/table parsing Having to extract 1 keyspace and N tables from the command-line is proving to be a common pattern among commands. Extract this into a method, so the boiler-plate can be shared. Add a forward-looking overload as well, which will be used in the next patch.	2023-11-07 09:49:56 -05:00
Botond Dénes	ec5b24550a	tools/scylla-nodetool: implement the drain command	2023-11-07 09:49:56 -05:00
Botond Dénes	598dbd100d	tools/scylla-nodetool: implement the snapshot command	2023-11-07 09:49:56 -05:00
Benny Halevy	6a628dd9a6	docs: operating-scylla: nodetool: improve documentation for {en,dis}ableautocompaction Fixes scylladb/scylladb#15554 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15950	2023-11-07 14:05:55 +02:00
Kamil Braun	e64613154f	Merge 'cleanup no longer used gossiper states' from Gleb Remove no longer used gossiper states that are not needed even for compatibility any longer. * 'remove_unused_states' of github.com:scylladb/scylla-dev: gossip: remove unused HIBERNATE gossiper status gossip: remove unused STATUS_MOVING state	2023-11-07 11:48:04 +01:00
Botond Dénes	07c7109eb6	test/nodetool: add support for matching aproximate query parameters Match paramateres within some delta of the expected value. Useful when nodetool generates a timestamp, whose exact value cannot be predicted in an exact manner.	2023-11-07 04:58:41 -05:00
Botond Dénes	b61822900b	utils/http: make dns_connection_factory::initialize() static Said method can out-live the factory instance. This was not a problem because the method takes care to keep all its need from `this` alive, by copying them to the coroutine stack. However, this fact that this method can out-live the instance is not obvious, and an unsuspecting developer (me) added a new member (_logger) which was not kept alive. This can cause a use-after-free in the factory. Fix by making initialize() static, forcing the instance to pass all parameters explicitely and add a comment explaining that this method can out-live the instance.	2023-11-07 04:39:33 -05:00
Pavel Emelyanov	9443253f3d	Merge 'api: failure_detector: invoke on shard 0' from Kamil Braun These APIs may return stale or simply incorrect data on shards other than 0. Newer versions of Scylla are better at maintaining cross-shard consistency, but we need a simple fix that can be easily and without risk be backported to older versions; this is the fix. Add a simple test to check that the `failure_detector/endpoints` API returns nonzero generation. Fixes: scylladb/scylladb#15816 Closes scylladb/scylladb#15970 * github.com:scylladb/scylladb: test: rest_api: test that generation is nonzero in `failure_detector/endpoints` api: failure_detector: fix indentation api: failure_detector: invoke on shard 0	2023-11-07 11:54:27 +03:00
Botond Dénes	76ab66ca1f	Merge 'Support state change for S3-backed sstables' from Pavel Emelyanov The sstable currently can move between normal, staging and quarantine state runtime. For S3-backed sstables the state change means maintaining the state itself in the ownership table and updating it accordingly. There's also the upload facility that's implemented as state change too, but this PR doesn't support this part. fixes: #13017 Closes scylladb/scylladb#15829 * github.com:scylladb/scylladb: test: Make test_sstables_excluding_staging_correctness run over s3 too sstables,s3: Support state change (without generation change) system_keyspace: Add state field to system.sstables sstable_directory: Tune up sstables entries processing comment system_keyspace: Tune up status change trace message sstables: Add state string to state enum class convert	2023-11-07 10:45:41 +02:00
Botond Dénes	74f68a472f	Merge 'doc: add the upgrade guide from 5.2 to 5.4' from Anna Stuchlik This PR adds the 5.2-5.4 upgrade guide. In addition, it removes the redundant upgrade guide from 5.2 to 5.3 (as 5.3 was skipped), as well as some mentions of version 5.3. This PR must be backported to branch-5.4. Closes scylladb/scylladb#15880 * github.com:scylladb/scylladb: doc: add the upgrade guide from 5.2 to 5.4 doc: remove version "5.3" from the docs doc: remove the 5.2-to-5.3 upgrade guide	2023-11-07 10:35:33 +02:00
David Garcia	afaeb30930	docs: add dynamic version on aws images extension Closes scylladb/scylladb#15940	2023-11-07 10:30:23 +02:00
Takuya ASADA	2e7552a0ca	dist/redhat: drop rpm conflict with ABRT, add systemd conflict instead Currently, "yum install scylla" causes conflict when ABRT is installed. To avoid this behavior and keep using systemd-coredump for scylla coredump, let's drop "Conflicts: abrt" from rpm and add "Conflicts=abrt-ccpp.service" to systemd unit. Fixes #892 Closes scylladb/scylladb#15691	2023-11-07 10:30:23 +02:00
Botond Dénes	2f0284f30d	Merge 'build: cmake: configure all available config types' from Kefu Chai in this series, instead of assuming that we always have only one single `CMAKE_BUILD_TYPE`, we configure all available configurations, to be better prepared for the multi-config support. Refs #15241 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15933 * github.com:scylladb/scylladb: build: cmake: set compile options with generator expression build: cmake: configure all available config types build: cmake: set per-mode stack usage threshold build: cmake: drop build_mode build: cmake: check for config type if multi-config is used	2023-11-07 09:45:57 +02:00
Botond Dénes	7679152209	Merge 'Sanitize usage of make_sstable_easy+make_memtable in tests' from Pavel Emelyanov The helper makes sstable, writes mutations into it and loads one. Internally it uses the make_memtable() helper that prepares a memtable out of a vector of mutations. There are many test cases that don't use these facilities generating some code duplication. The make_sstable() wrapper around make_sstable_easy() is removed along the way. Closes scylladb/scylladb#15930 * github.com:scylladb/scylladb: tests: Use make_sstable_easy() where appropriate sstable_conforms_to_mutation_source_test: Open-code the make_sstable() helper sstable_mutation_test: Use make_sstable_easy() instead of make_sstable() tests: Make use of make_memtable() helper tests: Drop as_mutation_source helper test/sstable_utils: Hide assertion-related manipulations into branch	2023-11-07 09:29:30 +02:00
Kefu Chai	882e7eca25	build: cmake: set compile options with generator expression instead of using a single compile option for all modes, use per-mode compile options. this change keeps us away from using `CMAKE_BUILD_TYPE` directly, and prepares us for the multi-config generator support. because we only apply these settings in the configurations where sanitizers are used, there is no need to check if these option can be accepted by the compiler. if this turns out to be a problem, we can always add the check back on a per-mode basis. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:35:20 +08:00
Kefu Chai	61a542ffd0	build: cmake: configure all available config types if `CMAKE_CONFIGURATION_TYPES` is set, it implies that the multi-config generator is used, in this case, we include all available build types instead of only the one specified by `CMAKE_BUILD_TYPE`, which is typically used by non-multi-config generators. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:14:33 +08:00
Kefu Chai	6fcff51cf1	build: cmake: set per-mode stack usage threshold instead of setting a single stack usage threshold, set per-mode stack usage threshold. this prepares for the support of multi-config generator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:13:50 +08:00
Kefu Chai	23bb644314	build: cmake: drop build_mode there is no benefit having this variable. and it introduces another layer of indirection. so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:10:59 +08:00
Kefu Chai	7369e2e3df	build: cmake: check for config type if multi-config is used we should not set_property() on a non-existant property. if a multi-config generator is used, `CMAKE_BUILD_TYPE` is not added as a cached entry at all. Refs #15241 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:10:59 +08:00
Paweł Zakrzewski	9e240c2dc8	test/cql-pytest: Verify that GRANT ALTER ALL allows changing the superuser password This is a test for #14277. We do want to match Cassandra's behavior, which means that a user who is granted ALTER ALL is able to change the password of a superuser. Closes scylladb/scylladb#15961	2023-11-06 18:39:53 +01:00
Takuya ASADA	a23278308f	dist: fix local-fs.target dependency systemd man page says: systemd-fstab-generator(3) automatically adds dependencies of type Before= to all mount units that refer to local mount points for this target unit. So "Before=local-fs.taget" is the correct dependency for local mount points, but we currently specify "After=local-fs.target", it should be fixed. Also replaced "WantedBy=multi-user.target" with "WantedBy=local-fs.target", since .mount are not related with multi-user but depends local filesystems. Fixes #8761 Closes scylladb/scylladb#15647	2023-11-06 18:39:53 +01:00
Kefu Chai	d78ccab337	test/s3: add --keep-tmp option to preserve the tmp dir before this change, the tempdir is always nuked no matter if the test succceds. but sometimes, it would be important to check scylla's sstables after the test finishes. so, in this change, an option named `--keep-tmp` is added so we can optionally preserve the temp directory. this option is off by default. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15949	2023-11-06 18:39:53 +01:00
Anna Stuchlik	3756705520	doc: add OS support in version 5.4 This commit adds OS support information in version 5.4 (removing the non-released version 5.3). In particular, it adds support for Oracle Linux and Amazon Linux. Also, it removes support for outdated versions. Closes scylladb/scylladb#15923	2023-11-06 18:39:53 +01:00
Anna Stuchlik	1e0cbfe522	doc: update package installation in version 5.4 This commit updates the package installation instructions in version 5.4. - It updates the variables to include "5.4" as the version name. - It adds the information for the newly supported Rocky/RHEL 9 - a new EPEL download link is required. Closes scylladb/scylladb#15963	2023-11-06 18:39:53 +01:00
Pavel Emelyanov	bcec9c4ffc	Merge 'test/object_store: PEP8 compliant cleanups' from Kefu Chai this series applies fixes to make the test more PEP8 compliant. the goal is to improve the readability and maintainability. Closes scylladb/scylladb#15946 * github.com:scylladb/scylladb: test/object_store: wrap line which is too long test/object_store: use pattern matching to capture variable in loop test/object_store: remove space after and before '{' and '}' test/object_store: add an empty line before nested function definition test/object_store: use two empty lines in-between global functions	2023-11-06 18:39:53 +01:00
Benny Halevy	0064fc55b0	interval: make default ctor and make_open_ended_both_sides constexpr Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15955	2023-11-06 18:39:53 +01:00
Kefu Chai	39340d23e5	storage_service: avoid using non-constexpr as format string in order to use compile-time format check, we would need to use compile-time constexpr for the format string. despite that we might be able to find a way to tell if an expression is compile-time constexpr in C++20, it'd be much simpler to always use a known-to-be-constexpr format string. this would help us to eventually migrate to the compile-time format check in seastar's logging subsystem. so, in this change, instead of feeding `seastar::logger::info()` and friends with a non-constexpr format string, let's just use "{}" for printing it, or mark the format string with `constexpr` instead of `const`. as the former tells the compiler it is a variable that can be evaluated at compile-time, while the latter just inform the compiler that the variable is not mutable after it is initialized. This change also helps to address the compiling failure with the yet-merged compile-time format check patch in Seastar: ``` /usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/storage_service.cc.o -MF service/CMakeFiles/service.dir/storage_service.cc.o.d -o service/CMakeFiles/service.dir/storage_service.cc.o -c /home/kefu/dev/scylladb/service/storage_service.cc /home/kefu/dev/scylladb/service/storage_service.cc:2460:18: error: call to consteval function 'seastar::logger::format_info<>::format_info<const char *, 0>' is not a constant expression slogger.info(str.c_str()); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15959	2023-11-06 18:39:53 +01:00
Kamil Braun	315c69cec2	test: rest_api: test that generation is nonzero in `failure_detector/endpoints`	2023-11-06 18:03:34 +01:00
Kamil Braun	eb6943b852	api: failure_detector: fix indentation	2023-11-06 17:12:17 +01:00
Kamil Braun	a89c69007e	api: failure_detector: invoke on shard 0 These APIs may return stale or simply incorrect data on shards other than 0. Newer versions of Scylla are better at maintaining cross-shard consistency, but we need a simple fix that can be easily and without risk be backported to older versions; this is the fix. Fixes: scylladb/scylladb#15816	2023-11-06 17:03:38 +01:00
Piotr Dulikowski	85516c9155	raft topology: catch and abort on exceptions from topology_coordinator::run The `topology_coordinator` function is supposed to handle all of the exceptions internally. Assert, in runtime, that this is the case by wrapping the `run` invocation with a try..catch; in case of an exception, step down as a leader first and then abort.	2023-11-06 15:25:38 +01:00
Anna Stuchlik	a6fd4cccf2	doc: add the upgrade guide from 5.2 to 5.4 This commit adds the upgrade guide from version 5.2 to 5.4. Version 5.3 was never released. This commit must be backported to branch-5.4.	2023-11-06 14:48:26 +01:00
Piotr Dulikowski	843f02eb5d	Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept" This reverts commit `dcaaa74cd4`. The `noexcept` specifier that it added is only relevant to the function and not the coroutine returned from that function. This was not the intention and it looks confusing now, so remove it.	2023-11-06 12:00:42 +01:00
Piotr Dulikowski	41c2dac250	raft topology: don't print an error when fencing previous coordinator is aborted An attempt to fence the previous coordinator may fail because the current coordinator is aborted. It's not a critical error and it can happen during normal operations, so lower the verbosity used to print a message about this error to 'debug'. Return from the function immediately in that case - the sleep_aborted that happens as a next step would fail on abort_requested_exception anyway, so make it more explicit.	2023-11-06 12:00:42 +01:00
Piotr Dulikowski	1408b7cfa8	raft topology: handle abort exceptions from sleeping in fence_previous_coordinator The fence_previous_coordinator function has a retry loop: if it fails to perform a group0 operation, it will try again after a 1 second delay. However, if the topology coordinator is aborted while it waits, an exception will be thrown and will be propagated out of the function. The function is supposed to handle all exceptions internally, so this is not desired. Fix this by catching the abort_requested_exception and returning from the function if the exception is caught.	2023-11-06 12:00:41 +01:00
Michał Jadwiszczak	213e39a937	cql-pytest:test_describe: add test for whitespaces in json objects	2023-11-06 10:37:10 +01:00
Kamil Braun	15b441550b	gossiper: do_shadow_round: increment `nodes_down` in case of timeout Previously we would only increment `nodes_down` when getting `rpc::closed_error`. Distinguishing between that and timeout is unreliable. Consider: 1. if a node is dead but we can reach the IP, we'd get `closed_error` 2. if we cannot reach the IP (there's a network partition), the RPC would hang so we'd get `timeout_error` 3. if the node is both dead and the IP is unreachable, we'd get `timeout_error` And there are probably other more complex scenarios as well. In general, it is impossible to distinguish a dead node from a partitioned node in asynchronous networks, and whether we end up with `closed_error` or `timeout_error` is an implementation detail of the underlying protocol that we use. The fact that `nodes_down` was not incremented for timeouts would prevent a node from starting if it cannot reach isolated IPs (whether or not there were dead or alive nodes behind those IPs). This was observed in a Jepsen test: https://github.com/scylladb/scylladb/issues/15675. Note that `nodes_down` is only used to skip shadow round outside bootstrap/replace, i.e. during restarts, where the shadow round was "best effort" anyway (not mandatory). During bootstrap/replace it is now mandatory. Also fix grammar in the error message.	2023-11-06 10:28:08 +01:00
Kamil Braun	897cb6510e	gossiper: do_shadow_round: fix `nodes_down` calculation During shadow round we would calculate the number of nodes from which we got `rpc::closed_error` using `nodes_counter`, and if the counter reached the size of all contact points passed to shadow round, we would skip the shadow round (and after the previous commit, we do it only in the case of restart, not during bootstrap/replace which is unsafe). However, shadow round might have multiple loops, and `nodes_down` was initialized to `0` before the loop, then reused. So the same node might be counted multiple times in `nodes_down`, and we might incorrectly enter the skipping branch. Or we might go over `nodes.size()` and never finish the loop. Fix this by initializing `nodes_down = 0` inside the loop.	2023-11-06 10:28:07 +01:00
Kamil Braun	b03fa87551	storage_service: make shadow round mandatory during bootstrap/replace It is unsafe to bootstrap or perform replace without performing the shadow round, which is used to obtain features from the existing cluster and verify that we support all enabled features. Before this patch, I could easily produce the following scenario: 1. bootstrap first node in the cluster 2. shut it down 3. start bootstrapping second node, pointing to the first as seed 4. the second node skips shadow round because it gets `rpc::closed_error` when trying to connect to first node. 5. the node then passes the feature check (!) and proceeds to the next step, where it waits for nodes to show up in gossiper 6. we now restart the first node, and the second node finishes bootstrap The shadow round must be mandatory during bootstrap/replace, which is what this patch does. On restart it can remain optional as it was until now. In fact it should be completely unnecessary during restart, but since we did it until now (as best-effort), we can keep doing it.	2023-11-06 10:28:07 +01:00
Kamil Braun	7e9e84200c	gossiper: do_shadow_round: remove default value for nodes param	2023-11-06 10:28:07 +01:00
Kamil Braun	108aae09c5	gossiper: do_shadow_round: remove `fall_back_to_syn_msg` If during shadow round we learned that a contact node does not understand the GET_ENDPOINT_STATES verb, we'd fall back to old shadow round method (using gossiper SYN messages). The verb was added a long time ago and it ended up in Scylla 4.3 and 2021.1. So in newer versions we can make it mandatory, as we don't support skipping major versions during upgrades. Even if someone attempted to, they would just get an error and they can retry bootstrap after finnishing upgrade.	2023-11-06 10:28:07 +01:00
Botond Dénes	2e1562d889	Merge 'dht: i_partitioner cleanup' from Benny Halevy This series refactors the `dht/i_paritioner.hh` header file and cleans up its usage so to reduce the dependencies on it, since it is carries a lot of baggage that is rarely required in other header files. Closes scylladb/scylladb#15954 * github.com:scylladb/scylladb: everywhere: reduce dependencies on i_partitioner.hh locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration dht: reduce dependency on i_partitioner.hh dht: fold compatible_ring_position in ring_position.hh dht: refactor i_partitioner.hh dht: move token_comperator to token.{cc,hh} dht/i_partitioner: include i_partitioner_fwd.hh	2023-11-06 10:34:38 +02:00
Kefu Chai	2b961d8e3f	build: cmake: define per-mode compile definition instead of setting for a single CMAKE_BUILD_TYPE, set the compilation definitions for each build configuration. this prepares for the multi-config generator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15943	2023-11-06 10:34:38 +02:00
Kefu Chai	f2693752f1	build: cmake: avoid referencing CMAKE_BUILD_TYPE use generator-expresion instead, so that the value can be evaluated when generating the build system. this prepares for the multi-config support. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15942	2023-11-06 10:34:38 +02:00
Botond Dénes	7c7baf71d5	Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani This mini series purpose is to move all tests (that uses the infrastructure to create a Scylla cluster) to shut down gracefully on shutdown. One benefit is that the shutdown sequence for cluster will be tested better, however it is not the main purpose of this change. The main purpose of this change is to pave the way for coverage reporting on all tests and not only the ones that has a standalone executables. Full test runs are only slightly impacted by this change (~2.4% increase in runtime): Without gracefull shutdown ``` time ./test.py --mode dev Found 2966 tests. ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [2966/2966] topology_experimental_raft dev [ PASS ] topology_experimental_raft.test_raft_cluster_features.1 ------------------------------------------------------------------------------ CPU utilization: 13.1% real 4m50.587s user 13m58.358s sys 6m55.975s ``` With gracefull shutdown ``` time ./test.py --mode dev Found 2966 tests. ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [2966/2966] topology_experimental_raft dev [ PASS ] topology_experimental_raft.test_raft_cluster_features.1 ------------------------------------------------------------------------------ CPU utilization: 12.6% real 4m57.637s user 13m56.864s sys 6m46.657s ``` Closes scylladb/scylladb#15851 * github.com:scylladb/scylladb: test.py: move to a gracefull temination of nodes on teardown test.py: Use stop lock also in the graceful version	2023-11-06 10:34:38 +02:00
Benny Halevy	a1acf6854b	everywhere: reduce dependencies on i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:47:44 +02:00
Benny Halevy	6de1cc2993	locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh define token_metadata_ptr in token_metadata_fwd.hh So that the declaration of `make_splitter` can be moved to token_range_splitter.hh, where it belongs, and so token_metadata.hh won't have to include it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	182e5381d8	cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	4b184e950a	dht: reduce dependency on i_partitioner.hh include only the required header files where needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	aa70e3a536	dht: fold compatible_ring_position in ring_position.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	28b5482403	dht: refactor i_partitioner.hh Extract decorated_key.hh and ring_position.hh out of i_partitioner.hh so they can be included selectively, since i_partitioner.hh contains too much bagage that is not always needed in full. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:27 +02:00
Benny Halevy	232918eef0	dht: move token_comperator to token.{cc,hh} Move the `token_comparator` definition and implementation to token.{hh,cc}, respectively since they are independent of i_partitioner. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:15 +02:00
Benny Halevy	8309cf743e	dht/i_partitioner: include i_partitioner_fwd.hh Rather than repeating the same declarations in i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:14 +02:00
Kefu Chai	08f8796cf0	test/object_store: wrap line which is too long to be compliant to PEP8, see https://peps.python.org/pep-0008/#blank-lines also easier to read with smaller screen and/or large fonts. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	5c0e4df624	test/object_store: use pattern matching to capture variable in loop instead of referencing the elements in tuple with their indexes, use pattern matching to capture them. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	6208a05c40	test/object_store: remove space after and before '{' and '}' to be compliant with PEP8, see https://peps.python.org/pep-0008/#whitespace-in-expressions-and-statements for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	231938f739	test/object_store: add an empty line before nested function definition to be compliant to PEP8, see https://peps.python.org/pep-0008/#blank-lines for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	38d5e7cae2	test/object_store: use two empty lines in-between global functions to be compliant to PEP8, see https://peps.python.org/pep-0008/#blank-lines for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Michał Jadwiszczak	cbfbcffc75	schema: add whitespace to description of table options Values of `caching`, `tombstone_gc` and `cdc` are json object but they were printed without any whitespaces. This commit adds them after colons(:) and commas(,), so the values are more readable and it matches format of old client-side describe.	2023-11-04 12:30:19 +01:00
Kefu Chai	ff12f1f678	docs: add divider using CSS instead of hardwiring the formatting in the html code, do this using CSS, more flexible this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:34 +08:00
Kefu Chai	1694a7addc	docs: extract _clean_description as a filter would be better to split the parser from the formatter. in future, we can apply more filter on top of the exiting one. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:34 +08:00
Kefu Chai	9ddc639237	docs: render option with role so we can cross-reference them with the syntax like :confval:`alternator_timeout_in_ms`. or even render an option like: .. confval:: alternator_timeout_in_ms in order to make the headerlink of the option visible, a new CSS rule is added. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:34 +08:00
Kefu Chai	53dfb5661d	docs: parse source files right into rst so we can render the rst without writing a temporary YAML. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:33 +08:00
Kamil Braun	6cc5bcae80	test: test_topology_ops: disable background writes Recently, in `a3ba4b3109`, this test was extended with a background task that continuously performs CQL writes. This turned out to be very valuable and detected a couple of bugs, including: https://github.com/scylladb/scylladb/issues/15924 https://github.com/scylladb/scylladb/issues/15935 Unfortunately this causes CI to be flaky. Until these bugs are fixed, we disable the background writes to unflake CI. Closes scylladb/scylladb#15937	2023-11-03 16:52:10 +02:00
Raphael S. Carvalho	cca85f5454	streaming: Don't adjust partition estimate if segregation is postponed When off-strategy is enabled, data segregation is postponed to when off-strategy runs. Turns out we're adjusting partition estimate even when segregation is postponed, meaning that sstables in maintenance set will smaller filters than they should otherwise have. This condition is transient as the system eventually heal this through compactions. But note that with TWCS, problem of inefficient filters may persist for a long time as sstables written into older windows may stay around for a significant amount of time. In the future, we're planning to make this less fragile by dynamically resizing filters on sstable write completion. The problem aforementioned is solved by skipping adjustment when segregation is postponed (i.e. off-strategy is enabled). Refs #15704. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-03 16:22:07 +02:00
Asias He	2b2302d373	streaming: Ignore dropped table on both sides It is possible the sender and receiver of streaming nodes have different views on if a table is dropped or not. For example: - n1, n2 and n3 in the cluster - n4 started to join the cluster and stream data from n1, n2, n3 - a table was dropped - n4 failed to write data from n2 to sstable because a table was dropped - n4 ended the streaming - n2 checked if the table was present and would ignore the error if the table was dropped - however n2 found the table was still present and was not dropped - n2 marked the streaming as failed This will fail the streaming when a table is dropped. We want streaming to ignore such dropped tables. In this patch, a status code is sent back to the sender to notify the table is dropped so the sender could ignore the dropped table. Fixes #15370 Closes scylladb/scylladb#15912	2023-11-03 13:38:48 +02:00
David Garcia	84e073d0ec	docs: update theme 1.6 Closes scylladb/scylladb#15782	2023-11-03 09:45:16 +01:00
Piotr Dulikowski	70f4f8d799	test/pylib: increase control connection timeout in cql_is_up After starting the associated node, ScyllaServer waits until the node starts serving CQL requests. It does that by periodically trying to establish a python driver session to the node. During session establishment, the driver tries to fetch some metadata from the system tables, and uses a pretty short timeout to do so (by default it's 2 seconds). When running tests in debug mode, this timeout can prove to be too short and may prevent the testing framework from noticing that the node came up. Fix the problem by increasing the timeout. Currently, after the session is established, a query is sent in order to further verify that the session works and it uses a very generous timeout of 1000 seconds to do so - use the same timeout for internal queries in the python driver. Fixes: scylladb/scylladb#15898 Closes scylladb/scylladb#15929	2023-11-03 09:32:11 +01:00
Kefu Chai	5b7feb8b95	build: s/create_building_system/create_build_system/ as build system is more correct in this context. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15932	2023-11-03 09:37:44 +02:00
Pavel Emelyanov	3173336e97	tests: Use make_sstable_easy() where appropriate There are two test cases out there that make sstable, write it and the load, but the make_sstable_easy() is for that, so use it there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:32:43 +03:00
Pavel Emelyanov	cc89acff67	sstable_conforms_to_mutation_source_test: Open-code the make_sstable() helper This test case is pretty special in the sense that it uses custom path for tempdir to create, write and load sstable to/from. It's better to open-code the make_sstable() helper into the test case rather than encourage callers to use custom tempdirs. "Good" test cases can use make_sstable_easy() for the same purposes (in fact they alredy do). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:30:54 +03:00
Pavel Emelyanov	7f6423bc35	sstable_mutation_test: Use make_sstable_easy() instead of make_sstable() The latter is only used in the former test case and doesn't provide extra value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:30:02 +03:00
Pavel Emelyanov	eeee58def8	tests: Make use of make_memtable() helper There's one in the utils that creates lw_shared_ptr<memtable> and applies provided vector of mutations into it. Lots of other test cases do literally the same by hand. The make_memtable() assumes that the caller is sitting in the seastar thread, and all the test cases that can benfit from it already are. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:28:35 +03:00
Pavel Emelyanov	c1824324bd	tests: Drop as_mutation_source helper It does nothing by calls the sstable method of the same name. Callers can do it on their own, the method is public. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:27:59 +03:00
Pavel Emelyanov	3ff32a2ca5	test/sstable_utils: Hide assertion-related manipulations into branch The make_sstable_containing() can validate the applied mutations are produced by the resulting sstable if the callers asks for it. To do so the mutations are merged prior to checking and this merging should only happen if validation is requested, otherwise it just makes no sense. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:26:46 +03:00
Kamil Braun	8179296f56	Merge 'retry automatic announcements of the schema changes on concurrent operation' from Patryk Jędrzejczak The follow-up to #15594. We retry every automatic `migration_manager::announce` if `group0_concurrent_modification` occurs. Concurrent operations can happen during concurrent bootstrap in Raft-based topology, so we need this change to enable support for concurrent bootstrap. This PR adds retry loops in 4 places: - `service::create_keyspace_if_missing`, - `system_distributed_keyspace::start`, - `redis::create_keyspace_if_not_exists_impl`, - `table_helper::setup_keyspace` (used for creating the `system_traces` keyspace). Fixes #15435 Closes scylladb/scylladb#15613 * github.com:scylladb/scylladb: table_helper: fix indentation table_helper: retry in setup_keyspace on concurrent operation table_helper: add logger redis/keyspace_utils: fix indentation redis: retry creating defualt databases on concurrent operation db/system_distributed_keyspace: fix indentation db/system_distributed_keyspace: retry start on concurrent operation auth/service: retry creating system_auth on concurrent operation	2023-11-02 17:24:52 +01:00
Kamil Braun	5cf18b18b2	Merge 'raft: topology: outside topology-on-raft mode, make sure not to use its RPCs' from Piotr Dulikowski Topology on raft is still an experimental feature. The RPC verbs introduced in that mode shouldn't be used when it's disabled, otherwise we lose the right to make breaking changes to those verbs. First, make sure that the aforementioned verbs are not sent outside the mode. It turns out that `raft_pull_topology_snapshot` could be sent outside topology-on-raft mode - after the PR, it no longer can. Second, topology-on-raft mode verbs are now not registered at all on the receiving side when the mode is disabled. Additionally tested by running `topology/` tests with `consistent_cluster_management: True` but with experimental features disabled. Fixes: scylladb/scylladb#15862 Closes scylladb/scylladb#15917 * github.com:scylladb/scylladb: storage_service: fix indentation raft: topology: only register verbs in topology-on-raft mode raft: topology: only pull topology snapshot in topology-on-raft mode	2023-11-02 16:44:18 +01:00
Kefu Chai	798eede61a	build: cmake: update 3rd party library deps where it is found move the code which updates the third-party library closer to where the library is found. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15915	2023-11-02 17:20:57 +02:00
Kefu Chai	0421db2471	build: cmake: enable Seastar_UNUSED_RESULT_ERROR this mirrors what we already have in `configure.py`. so that Seastar can report [[nodiscard]] violations as error. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15914	2023-11-02 17:19:31 +02:00
Patryk Jędrzejczak	dacec6374d	table_helper: fix indentation Broken in the previous commit.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	e10036babe	table_helper: retry in setup_keyspace on concurrent operation Currently, table_helper::setup_keyspace is used only for starting the system_traces keyspace. We need to handle concurrent group 0 operations possible during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	e2894a081a	table_helper: add logger It will be used in the next commit to log information when a concurrent group 0 modification occurs.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	3e8a307cd4	redis/keyspace_utils: fix indentation Broken in the previous commit.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	24aa5bf72c	redis: retry creating defualt databases on concurrent operation A concurrent group 0 operation in create_keyspace_if_not_exists_impl can happen during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	0357636f16	db/system_distributed_keyspace: fix indentation Broken in the previous commit.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	813c7a582c	db/system_distributed_keyspace: retry start on concurrent operation A concurrent group 0 operation in system_distributed_keyspace::start can happen during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	dfba0b9e9b	auth/service: retry creating system_auth on concurrent operation A concurrent group 0 operation in service::create_keyspace_if_missing can happen during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Pavel Emelyanov	1a44f362b2	pytest: Do not try to guess which scylla binary user wants to run When running some pytest-based tests they start scylla binary by hand instead of relying on test.py's "clusters". In automatic run (e.g. via test.py itself) the correct scylla binary is the one pointed to by SCYLLA environment, but when run from shell via pytest directly it tries to be smart and looks at build/*/scylla binaries picking the one with the greatest mtime. That guess is not very nice, because if the developer switches between build modes with configure.py and rebuilds binaries, binaries from "older" or "previous" builds stay on the way and confuse the guessing code. It's better to be explicit. refs: #15679 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15684	2023-11-02 12:34:49 +02:00
Kamil Braun	0846d324d7	Merge 'rollback topology operation on streaming failure' from Gleb This patch series adds error handling for streaming failure during topology operations instead of an infinite retry. If streaming fails the operation is rolled back: bootstrap/replace nodes move to left and decommissioned/remove nodes move back to normal state. * 'gleb/streaming-failure-rollback-v4' of github.com:scylladb/scylla-dev: raft: make sure that all operation forwarded to a leader are completed before destroying raft server storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier tests: add tests for streaming failure in bootstrap/replace/remove/decomission test/pylib: do not stop node if decommission failed with an expected error storage_service: raft topology: fix typo in "decommission" everywhere storage_service: raft topology: add streaming error injection storage_service: raft topology: do not increase topology version during CDC repair storage_service: raft topology: rollback topology operation on streaming failure. storage_service: raft topology: load request parameters in left_token_ring state as well storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch storage_service: raft topology: make global_token_metadata_barrier node independent storage_service: raft topology: split get_excluded_nodes from exec_global_command storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true storage_service: raft topology: simplify streaming RPC failure handling	2023-11-02 10:15:45 +01:00
Kamil Braun	ae58e39743	Merge 'reduce announcements of the automatic schema changes' from Patryk Jędrzejczak There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in: - `system_distributed_keyspace::start`, - `redis:create_keyspace_if_not_exists_impl`, - `table_helper::setup_keyspace` (for the `system_traces` keyspace). All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this: - calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum, - we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder. This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME. Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring. Fixes scylladb/scylladb#15437 Closes scylladb/scylladb#15897 * github.com:scylladb/scylladb: table_helper: announce twice in setup_keyspace table_helper: refactor setup_table redis: create_keyspace_if_not_exists_impl: fix indentation redis: announce once in create_keyspace_if_not_exists_impl db: system_distributed_keyspace: fix indentation db: system_distributed_keyspace: announce once in start tablet_allocator: update on_before_create_column_family migration_listener: add parameter to on_before_create_column_family alternator: executor: use new prepare_new_column_family_announcement alternator: executor: introduce create_keyspace_metadata migration_manager: add new prepare_new_column_family_announcement	2023-11-02 09:32:35 +01:00
Piotr Dulikowski	6d15f0283e	storage_service: fix indentation It was broken by the previous commit.	2023-11-02 07:39:27 +01:00
Piotr Dulikowski	190d549bd5	raft: topology: only register verbs in topology-on-raft mode Verbs related to topology on raft should not be sent outside the topology on raft mode - and, after the previous commit, they aren't. Make sure not to register handlers for those verbs if topology on raft mode is not enabled.	2023-11-02 07:39:27 +01:00
Piotr Dulikowski	8727634e9c	raft: topology: only pull topology snapshot in topology-on-raft mode Currently, during group0 snapshot transfer, the node pulling the snapshot will send the `raft_pull_topology_snapshot` verb even if the cluster is not in topology-on-raft mode. The RPC handler returns an empty snapshot in that case. However, using the verb outside topology on raft causes problems: - It can cause issues during rolling upgrade as the snapshot transfer will keep failing on the upgraded nodes until the leader node is upgraded, - Topology changes on raft are still experimental, and using the RPC outside experimental mode will prevent us from doing breaking changes to it. Solve the issue by passing the "topology changes on raft enabled" flag to group0_state_machine and send the RPC only in topology on raft mode.	2023-11-02 07:39:27 +01:00
Yaniv Kaul	c662fe6444	Debian based Dockerfile: do not install 'suggested' pacakges We can opt out from installing suggested packages. Mainly those related to Java and friends that we do not seem to need. Fixes: #15579 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#15580	2023-11-01 17:16:18 +02:00
Botond Dénes	a34c8dc485	Merge 'Drop compaction_manager_for_testing' from Pavel Emelyanov There's such a wrapper class in test_services. After #15889 this class resembles the test_env_compaction_manager and can be replaced with it. However, two users of the former wrapper class need it just to construct table object, and the way they do it is re-implementation of table_for_tests class. This PR patches the test cases to make use of table_for_tests and removes the compaction_manager_for_testing that becomes unused after it. Closes scylladb/scylladb#15909 * github.com:scylladb/scylladb: test_services: Ditch compaction_manager_for_testing test/sstable_compaction_test: Make use of make_table_for_tests() test/sstable_3_x_test: Make use of make_table_for_tests() table_for_tests: Add const operator-> overload sstable_test_env: Add test_env_compaction_manager() getter sstable_test_env: Tune up maybe_start_compaction_manager() method test/sstable_compaction_test: Remove unused tracker allocation	2023-11-01 16:08:34 +02:00
Botond Dénes	665a5cb322	Update tools/jmx submodule * tools/jmx 8d15342e...05bb7b68 (4): > README: replace 0xA0 (NBSP) character with space > scylla-apiclient: update Guava dependency > scylla-apiclient: update snakeyaml dependency > scylla-apiclient: update Jackson dependencies [Botond: regenerate frozen toolchain]	2023-11-01 08:08:37 -04:00
Pavel Emelyanov	787c6576fe	test_services: Ditch compaction_manager_for_testing Now this wrapper is unused, all (both) test cases that needed it were patched to use make_table_for_tests(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	731a82869a	test/sstable_compaction_test: Make use of make_table_for_tests() The max_ongoing_compaction_test test case constructs table object by hand. For that it needs tracker, compaction manager and stats. Similarly to previous patch, the test_env::make_table_for_tests() helper does exactly that, so the test case can be simplified as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	5b3b8c2176	test/sstable_3_x_test: Make use of make_table_for_tests() The compacted_sstable_reader() helper constructs table object and all its "dependencies" by hand. The test_env::make_table_for_tests() helper does the same, so the test code can be simplified. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	9b8f03bdb0	table_for_tests: Add const operator-> overload Will be used later in boost transformation lambda Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	3021fb7b6c	sstable_test_env: Add test_env_compaction_manager() getter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	19b524d0f3	sstable_test_env: Tune up maybe_start_compaction_manager() method Make it public and add `bool enable` flag so that test cases could start the compaction manager (to call make_table_for_tests() later) but keep it disabled for their testing purposes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	3f354c07a3	test/sstable_compaction_test: Remove unused tracker allocation The sstable_run_based_compaction_test case allocates the tracker but doesn't use it. Probably was left after the case was patched to use make_table_for_tests() helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:12 +03:00
Kefu Chai	ef023dae44	s3: use rapixml/rapidxml.hpp as a fallback on debian derivatives librapidxml-dev installs rapidxml.h as rapixml/rapidxml.hpp, so let's use it as a fallback. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15814	2023-11-01 10:25:40 +03:00
Kefu Chai	7253369ad9	SCYLLA-VERSION-GEN: respect --date-stamp before this change the argument passed to --date-stamp option is ignored, as we don't reference the date-stamp specified with this option at all. instead, we always overwrite it with the the output of `date --utc +%Y%m%d`, if we are going to reference this value. so, in this change instead of unconditionally overwriting it, we keep its value intact if it is already set. the change which introduced this regression was `839d8f40e6` Fixes #15894 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15895	2023-11-01 10:24:04 +03:00
Avi Kivity	fcd86d993d	Merge 'Put table_for_tests on a diet' from Pavel Emelyanov The object in question is used to facilitate creation of table objects for compaction tests. Currently the table_for_test carries a bunch of auxiliary objects that are needed for table creation, such as stats of all sorts and table state. However, there's also some "infrastructure" stuff onboard namely: - reader concurrency semaphore - cache tracker - task manager - compaction manager And those four are excessive because all the tests in question run inside the sstables::test_env that has most of it. This PR removes the mentioned objects from table_for_tests and re-uses those from test_env. Also, while at it, it also removes the table::config object from table_for_tests so that it looks more like core code that creates table does. Closes scylladb/scylladb#15889 * github.com:scylladb/scylladb: table_for_tests: Use test_env's compaction manager sstables::test_env: Carry compaction manager on board table_for_tests: Stop table on stop table_for_tests: Get compaction manager from table table_for_tests: Ditch on-board concurrency semaphore table_for_tests: Require config argument to make table table_for_tests: Create table config locally table_for_tests: Get concurrency semaphore from table table_for_tests: Get table directory from table itself table_for_tests: Reuse cache tracker from sstables manager table_for_tests: Remove unused constructor tests: Split the compaction backlog test case sstable_test_env: Coroutinize and move to .cc test_env::stop()	2023-10-31 18:03:07 +02:00
Piotr Smaroń	8c464b2ddb	guardrails: restrict replication strategy (RS) Replacing `restrict_replication_simplestrategy` config option with 2 config options: `replication_strategy_{warn,fail}_list`, which allow us to impose soft limits (issue a warning) and hard limits (not execute CQL) on replication strategy when creating/altering a keyspace. The reason to rather replace than extend `restrict_replication_simplestrategy` config option is that it was not used and we wanted to generalize it. Only soft guardrail is enabled by default and it is set to SimpleStrategy, which means that we'll generate a CQL warning whenever replication strategy is set to SimpleStrategy. For new cloud deployments we'll move SimpleStrategy from warn to the fail list. Guardrails violations will be tracked by metrics. Resolves #5224 Refs #8892 (the replication strategy part, not the RF part) Closes scylladb/scylladb#15399	2023-10-31 18:34:41 +03:00
Botond Dénes	287f05ad26	Merge 'scylla-sstable/tools: Use semi-properly initiated db::config + extensions to allow encrypted sstables' from Calle Wilund Refs https://github.com/scylladb/scylla-enterprise/issues/3461 Refs https://github.com/scylladb/scylla-enterprise/issues/3210 Adds a tool-app global db::config + extensions to each tool invocation + configurable init. Uses this in scylla-sstables, allowing both enterprise-only configs to be read, as well as (almost all) encrypted sstables. Note: Do not backport to enterprise before https://github.com/scylladb/scylla-enterprise/pull/3473 is merged, otherwise tools will break there. Closes scylladb/scylladb#15615 * github.com:scylladb/scylladb: scylla-sstable: Use tool-global config + extensions tools: Add db config + extensions to tool app run	2023-10-31 14:21:57 +02:00
Pavel Emelyanov	b974d8ca1b	stream_session: Do not print banign exceptions with error level Handler of STREAM_MUTATION_FRAGMENTS verb creates and starts reader. The resulting future is then checked for being exceptional and an error message is printed in logs. However, if reader fails because of socket being closed by peer, the error looks excessive. In that case the exception is just regular handling of the socket/stream closure and can be demoted down to debug level. fixes: #15891 Similar cherry-picking of log level exists in e.g. storage proxy, see for example `56bd9b5d` (service: storage_proxy: do not report abort requests in handle_write ) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15892	2023-10-31 14:21:22 +02:00
Gleb Natapov	15a34f650d	gossip: remove unused HIBERNATE gossiper status The status is not used since `2ec1f719de` which is included in scylla-4.6.0. We cannot have mixed cluster with the version so old, so the new version should not carry the compatibility burden.	2023-10-31 14:08:38 +02:00
Gleb Natapov	35a1ac1a9a	gossip: remove unused STATUS_MOVING state Moving operation was removed by `4a0b561376` and since then the state is unused. Even back then it worked only for the case of one token so it is safe to say we never used it. Lets remove the remains of the code instead of carrying it forever.	2023-10-31 13:54:46 +02:00
Kefu Chai	2cd804b8e5	build: cmake: do not hardwire build_reloc.sh arguments before this change, we feed `build_reloc.sh` with hardwired arguments when building python3 submodule. but this is not flexible, and hurts the maintainability. in this change, we mirror the behavior of `configure.py`, and collect the arguments from the output of `install-dependencies.sh`, and feed the collected argument to `build_reloc.sh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15885	2023-10-31 13:27:12 +02:00
Botond Dénes	90a8489809	repair/repair.cc: do_repair_ranges(): prevent stalls when skipping ranges We have observed do_repair_ranges() receiving tens of thousands of ranges to repairs on occasion. do_repair_ranges() repairs all ranges in parallel, with parallel_for_each(). This is normally fine, as the lambda inside parallel_for_each() takes a semaphore and this will result in limited concurrency. However, in some instances, it is possible that most of these ranges are skipped. In this case the lambda will become synchronous, only logging a message. This can cause stalls beacuse there are no opportunities to yield. Solve this by adding an explicit yield to prevent this. Fixes: #14330 Closes scylladb/scylladb#15879	2023-10-31 13:24:54 +02:00
Avi Kivity	ef7db6df99	Merge 'schema_tables: turn view schema fixing code into a sanity check' from Kamil Braun The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal with legacy materialized view schemas used for secondary indexes, schemas which were created before the notion of "computed columns" was introduced. Back then, secondary index schemas would use a regular "token" column. Later it became a computed column and old schemas would be migrated during rolling upgrade. The migration code was introduced in 2019 (`db8d4a0cc6`) and then fixed in 2020 (`d473bc9b06`). The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming that users don't try crazy things like upgrading from 2021.X to 2023.X (which we do not support), all clusters will have already executed the migration code once they upgrade to 2023.X, meaning we can get rid of it. The main motivation of this PR is to get rid of the `db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft mode this was the only call to `merge_schema` outside "group 0 code" and in fact it is unsafe -- it uses locally generated mutations with locally generated timestamp (`api::new_timestamp()`), so if we actually did it, we would permanently diverge the group 0 state machine across nodes (the schema pulling code is disabled in Raft mode). Fortunately, this should be dead code by now, as explained in the previous paragraph. The migration code is now turned into a sanity check, if the users try something crazy, they will get an error instead of silent data corruption. Closes scylladb/scylladb#15695 * github.com:scylladb/scylladb: view: remove unused `_backing_secondary_index` schema_tables: turn view schema fixing code into a sanity check schema_tables: make comment more precise feature_service: make COMPUTED_COLUMNS feature unconditionally true	2023-10-31 13:23:19 +02:00
Kefu Chai	e853d7bb4b	build: cmake: add Scylla_DATE_STAMP option to be compatible with `configure.py` which allows us to optionally specify the --date-stamp option for SCYLLA-VERSION-GEN. this option is used by our CI workflow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15896	2023-10-31 13:21:30 +02:00
Eliran Sinvani	2a45fed0cf	test.py: move to a gracefull temination of nodes on teardown This change move existing suits which create cluster through the testing infra to be stopped and uninstalled gracefully. The motivation, besides the obvious advantage of testing our stop sequence is that it will pave the way for applying code coverage support to all tests (not only standalone unit and boost test executables). testing: Ran all tests 10 times in a row in dev mode. Ran all tests once in release mode Ran all tests once in debug mode Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-10-31 13:12:49 +02:00
Eliran Sinvani	62ec1fe8e0	test.py: Use stop lock also in the graceful version An already known race (see: https://github.com/scylladb/scylladb/issues/15755) has been found once again as part of moving all tests to stop all nodes gracefully on teardown. The solution was to add the lock acquisition also to `stop_gracefully`. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-10-31 13:12:49 +02:00
Patryk Jędrzejczak	ba5275a6ae	table_helper: announce twice in setup_keyspace We refactor table_helper::setup_keyspace so that it calls migration_manager::announce at most twice. We achieve it by announcing all tables at once. The number of announcements should further be reduced to one, but it requires a big refactor. The CQL code used in parse_new_cf_statement assumes the keyspace has already been created. We cannot have such an assumption if we want to announce a keyspace and its tables together. However, we shouldn't touch the CQL code as it would impact user requests, too. One solution is using schema_builder instead of the CQL statements to create tables in table_helper. Another approach is removing table_helper completely. It is used only for the system_traces keyspace, which Scylla creates automatically. We could refactor the way Scylla handles this keyspace and make table_helper unneeded.	2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak	bf15d5f7bb	table_helper: refactor setup_table In the following commit, we reduce migration_manager::announce calls in table_helper::setup_keyspace by announcing all tables together. To do it, we cannot use table_helper::setup_table anymore, which announces a single table itself. However, the new code still has to translate CQL statements, so we extract it to the new parse_new_cf_statement function to avoid duplication.	2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak	4dd5d8e5be	redis: create_keyspace_if_not_exists_impl: fix indentation Broken in the previous commit.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	3be7215163	redis: announce once in create_keyspace_if_not_exists_impl We refactor create_keyspace_if_not_exists_impl so that it takes at most one group 0 guard and calls migration_manager::announce at most once.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	df199eec11	db: system_distributed_keyspace: fix indentation Broken in the previous commit.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	91ff8007b3	db: system_distributed_keyspace: announce once in start We refactor system_distributed_keyspace::start so that it takes at most one group 0 guard and calls migration_manager::announce at most once. We remove a catch expression together with the FIXME from get_updated_service_levels (add_new_columns_if_missing before the patch) because we cannot treat the service_levels update differently anymore.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	5027c5f1e5	tablet_allocator: update on_before_create_column_family After adding the keyspace_metadata parameter to migration_listener::on_before_create_column_family, tablet_allocator doesn't need to load it from the database. This change is necessary before merging migration_manager::announce calls in the following commit.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	a762179972	migration_listener: add parameter to on_before_create_column_family After adding the new prepare_new_column_family_announcement that doesn't assume the existence of a keyspace, we also need to get rid of the same assumption in all on_before_create_column_family calls. After all, they may be initiated before creating the keyspace. However, some listeners require keyspace_metadata, so we pass it as a new parameter.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	a2e48b1a5b	alternator: executor: use new prepare_new_column_family_announcement We can use the new prepare_new_column_family_announcement function that doesn't assume the existence of the keyspace instead of the previous work-around.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	4ad2d895a3	alternator: executor: introduce create_keyspace_metadata We need to store a new keyspace's keyspace_metadata as a local variable in create_table_on_shard0. In the following commit, we use it to call the new prepare_new_column_family_announcement function.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	fb2703de50	migration_manager: add new prepare_new_column_family_announcement In the following commits, we reduce the number of the migration_manager::anounce calls by merging some of them in a way that logically makes sense. Some of these merges are similar -- we announce a new keyspace and its tables together. However, we cannot use the current prepare_new_column_family_announcement there because it assumes that the keyspace has already been created (when it loads the keyspace from the database). Luckily, this assumption is not necessary as this function only needs keyspace_metadata. Instead of loading it from the database, we can pass it as a parameter.	2023-10-31 12:08:03 +01:00
Kefu Chai	9dd5af7fef	alternator: avoid using the deprecated API this change silences following compiling warning due to using the deprecated API by using the recommended API in place of the deprecated one: ``` /home/kefu/dev/scylladb/alternator/server.cc:569:27: warning: 'set_tls_credentials' is deprecated: use listen(socket_address addr, server_credentials_ptr credentials) [-Wdeprecated-declarations] _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) { ^ /home/kefu/dev/scylladb/seastar/include/seastar/http/httpd.hh:186:7: note: 'set_tls_credentials' has been explicitly marked deprecated here [[deprecated("use listen(socket_address addr, server_credentials_ptr credentials)")]] ^ 1 warning generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15884	2023-10-31 12:05:58 +03:00
Botond Dénes	4a0f16474f	Merge 'row_cache: abort on exteral_updater::execute errors' from Benny Halevy Currently the cache updaters aren't exception safe yet they are intended to be. Instead of allowing exceptions from `external_updater::execute` escape `row_cache::update`, abort using `on_fatal_internal_error`. Future changes should harden all `execute` implementations to effectively make them `noexcept`, then the pure virtual definition can be made `noexcept` to cement that. Fixes scylladb/scylladb#15576 Closes scylladb/scylladb#15577 * github.com:scylladb/scylladb: row_cache: abort on exteral_updater::execute errors row_cache: do_update: simplify _prev_snapshot_pos setup	2023-10-31 10:07:01 +02:00
Pavel Emelyanov	4db80ed61f	table_for_tests: Use test_env's compaction manager Now when the sstables::test_env provides the compaction manager instance, the table_for_tests can start using it and can remove c.m. and the sidecar task_manager. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:42:19 +03:00
Pavel Emelyanov	2c78b46c78	sstables::test_env: Carry compaction manager on board Most of the test cases that use sstables::test_env do not mess with table objects, they only need sstables. However, compaction test cases do need table objects and, respectively, a compaction manager instance. Today those test cases create compaction manager instance for each table they create, but that's a bit heaviweight and doesn't work the way core code works. This patch prepares the sstables::test_env to provide compaction manager on demand by starting it as soon as it's asked to create table object. For now this compaction manager is unused, but it will be in next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:39:54 +03:00
Pavel Emelyanov	b96d39e63a	table_for_tests: Stop table on stop Next patches will stop using compaction manager from table_for_tests in favor of external one (spoiler: the one from sstables::test_env), thus the compaction manager would outsurvive the table_for_tests object and the table object wrapped by it. So in order for the table_for_tests to stop correctly, it also needs to stop the wrapped table too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:38:03 +03:00
Pavel Emelyanov	e71409df38	table_for_tests: Get compaction manager from table There's table_for_tests::get_compaction_manager() helper that's excessive as compaction manager reference can be provided by the wrapped table object itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:37:22 +03:00
Pavel Emelyanov	ac45aae0c4	table_for_tests: Ditch on-board concurrency semaphore It's not used any longer and can be removed. This make table_for_tests stopping code a bit shorter as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:36:59 +03:00
Pavel Emelyanov	21998296a7	table_for_tests: Require config argument to make table This is the continuation of the previous patch. Make the caller of table_for_tests constructor provide the table::config. This makes the table_for_tests constructor shorter and more self-contained. Also, the caller now needs to provide the reference to reader concurrency semaphore, and that's good news, because the only caller for today is the sstables::test_env that does have it. This makes the semaphore sitting on table_for_tests itself unused and it will be removed eventually. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:34:59 +03:00
Pavel Emelyanov	5ab1af3804	table_for_tests: Create table config locally The table_for_tests keeps a copy of table::config on board. That's not "idiomatic" as table config is a temporary object that should only be needed while creating table object. Fortunately, the copy of config on table_for_tests is no longer needed and it can be made temporary. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:33:29 +03:00
Pavel Emelyanov	76e57cc805	table_for_tests: Get concurrency semaphore from table Making compaction permit needs a semaphore. Current code gets it from the table_for_tests, but the very same semaphore reference sits on the table. So get it from table, as the core code does. This will allow removing the dedicated semaphore from table_for_tests in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:32:32 +03:00
Pavel Emelyanov	35f7ada949	table_for_tests: Get table directory from table itself Making sstable for a table needs passing table directory as an argument. Current table_for_tests's helper gets the directory from table config, but the very same path sits on the table itself. This makes testing code to construct sstable look closer to the core code and is also the prerequisite for removing the table config from table_for_tests in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:30:59 +03:00
Pavel Emelyanov	769d9f17eb	table_for_tests: Reuse cache tracker from sstables manager When making table object it needs the cache tracker reference. The table_for_tests keeps one on board, but the very same object already sits on the sstables manager which has public getter. This makes the table_for_tests's cache tracker object not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:29:49 +03:00
Pavel Emelyanov	89e253c77e	table_for_tests: Remove unused constructor No code constructs it with just sstables manager argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:29:29 +03:00
Pavel Emelyanov	cba8f633f1	tests: Split the compaction backlog test case To improve parallelizm of embedded test sub-cases. By coinsidence, indentation fix is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:27:57 +03:00
Pavel Emelyanov	8d704f2532	sstable_test_env: Coroutinize and move to .cc test_env::stop() It's going to get larger, so better to move. Also when coroutinized it's goind to be easier to extend. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:26:58 +03:00
Kefu Chai	89a75967b1	build: ignore FileExistsError when creating compile_commands.json before this change, we only check the existence of compile_commands.json before creating a symlink to build/*/compile_commands.json. but there are chances that multiple ninja tasks are calling into `configure.py` for updating `build.ninja`: this does not break the process, as the last one wins: we just unconditionally `mv build.ninja.new build.ninja` for updating the this file. but this could break the build of `'compile_commands.json`: we create a symlink with Python, and if it fails the Python script errors out. in this change, we just ignore the `FileExistsError` when creating the symlink to `compile_commands.json`. because, if this symlink, we've achieved the goal, and should not consider it a failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15870	2023-10-30 23:47:48 +02:00
Anna Stuchlik	d4b1e8441a	doc: add the latest AWS image info to Installation This commit adds the AWS image information for the latest patch release to the Launch on AWS page in the installation section. This is a follow-up PR required to finalize the AWS installation docs and should be backported to branch-5.4. Related: https://github.com/scylladb/scylladb/pull/14153 https://github.com/scylladb/scylladb/pull/15651 Closes scylladb/scylladb#15867	2023-10-30 23:41:23 +02:00
Avi Kivity	949e9f1205	Merge 'Nodetool additional commands 3/N' from Botond Dénes This PR implements the following new nodetool commands: * cleanup * clearsnapshots * listsnapshots All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15843 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the listsnapshots command tools/scylla-nodetool: implement clearsnapshot command tools/scylla-nodetool: implement the cleanup command test/nodetool: rest_api_mock: add more options for multiple requests tools/scylla-nodetool: log responses with trace level	2023-10-30 21:53:36 +02:00
Avi Kivity	5a7d15a666	Update seastar submodule * seastar 17183ed4e4...830ce86738 (6): > coroutine: fix use-after-free in parallel_for_each > build: do not provide zlib as an ingredient > http: do not use req.content_length as both input parameter > io_tester: disable -Wuninitialized when including boost.accumulators > scheduling: revise the doxygen comment of create_scheduling_group() > Merge 'Added ability to configure different credentials per HTTP listeners' from Michał Maślanka Closes scylladb/scylladb#15871	2023-10-30 21:39:12 +02:00
Avi Kivity	03a801b61b	Merge 'Nodetools docs improvements 1/N' from Botond Dénes While working on https://github.com/scylladb/scylladb/issues/15588, I noticed problems with the existing documentation, when comparing it with the actual code. This PR contains fixes for nodetool compact, stop and scrub. Closes scylladb/scylladb#15636 * github.com:scylladb/scylladb: docs: nodetool compact: remove common arguments docs: nodetool stop: fix compaction types and examples docs: nodetool compact: remove unsupported partition option	2023-10-30 20:17:14 +02:00
Pavel Emelyanov	c88de8f91e	test/compaction: Use shorter make_table_for_tests() overload There's one that doesn't need tempdir path argument since it gets one from the env onboard tempdir anyway Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15825	2023-10-30 20:16:29 +02:00
Paweł Zakrzewski	384427bd02	doc: Replace instances of SimpleStrategy with NetworkTopologyStrategy The goal is to make the available defaults safe for future use, as they are often taken from existing config files or documentation verbatim. Referenced issue: #14290 Closes scylladb/scylladb#15856	2023-10-30 20:15:48 +02:00
Pavel Emelyanov	7fa7a9495d	task_manager: Don't leave task_ttl uninitialized When task_manager is constructed without config (tests) its task_ttl is left uninitialized (i.e. -- random number gets in there). This results in tasks hanging around being registered for infinite amount of time making long-living task manager look hanged. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15859	2023-10-30 20:15:05 +02:00
Kefu Chai	d01b9f95a0	build: cmake: disable sanitize-address-use-after-scope only when needed we enable sanitizer only in Debug and Sanitize build modes, if we pass `-fno-sanitize-address-use-after-scope` to compiler when the sanitizer is not enabled when compiling, Clang complains like: ``` clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument] ``` this breaks the build on the build modes where sanitizers are not enabled. so, in this change, we only disable the sanitize-address-use-after-scope sanitizer if the sanitizers are enabled. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15868	2023-10-30 20:14:12 +02:00
Anna Stuchlik	9f85b1dc38	doc: remove version "5.3" from the docs Version 5.3 was never released. This commit removes mentions of the version from the docs.	2023-10-30 15:56:53 +01:00
Anna Stuchlik	8723f71a3d	doc: remove the 5.2-to-5.3 upgrade guide Version 5.3 was never released, so the upgrade guide must be removed.	2023-10-30 15:47:23 +01:00
Marcin Maliszkiewicz	3992d1c2ce	alternator: add support for ReturnValuesOnConditionCheckFailure feature As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed. Fixes https://github.com/scylladb/scylladb/issues/14481	2023-10-30 15:33:56 +01:00
Marcin Maliszkiewicz	b4c77a373d	alternator: add ability to send additional fields in api_error While it may not be explicitly documented DynamoDB sometimes enchriches error message by additional fields. For instance when ConditionalCheckFailedException occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object, similarly for TransactionCanceledException it will add CancellationReasons object. There may be more cases like this so generic json field is added to our error class. The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure feature.	2023-10-30 15:13:06 +01:00
Calle Wilund	b9e57583f3	scylla-sstable: Use tool-global config + extensions Uses a single db::config + extensions, allowing both handling of enterprise-only scylla.yaml keys, as well as loading sstables utilizing extension in that universe.	2023-10-30 10:22:12 +00:00
Calle Wilund	6de4e7af21	tools: Add db config + extensions to tool app run Initializes extensions for tools runs, allowing potentially more interaction with, say, sstables in some versions of scylla.	2023-10-30 10:20:53 +00:00
Avi Kivity	d450a145ce	Revert "Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak" This reverts commit `4b80130b0b`, reversing changes made to `a5519c7c1f`. It's suspected of causing dtest failures due to a bug in coroutine::parallel_for_each.	2023-10-29 18:32:06 +02:00
Wojciech Mitros	f08e7aad61	test: account for multiple flushes of commitlog segments Currently, when we calculate the number of deactivated segments in test_commitlog_delete_when_over_disk_limit, we only count the segments that were active during the first flush. However, during the test, there may have been more than one flush, and a segment could have been created between them. This segment would sometimes get deactivated and even destroyed, and as a result, the count of destroyed segments would appear larger than the count of deactivated ones. This patch fixes this behavior by accounting for all segments that were active during any flush instead of just segments active during the first flush. Fixes #10527 Closes scylladb/scylladb#14610	2023-10-29 18:30:32 +02:00
Michał Chojnowski	93ea3d41d8	position_in_partition: make operator= exception-safe The copy assignment operator of _ck can throw after _type and _bound_weight have already been changed. This leaves position_in_partition in an inconsistent state, potentially leading to various weird symptoms. The problem was witnessed by test_exception_safety_of_reads. Specifically: in cache_flat_mutation_reader::add_to_buffer, which requires the assignment to _lower_bound to be exception-safe. The easy fix is to perform the only potentially-throwing step first. Fixes #15822 Closes scylladb/scylladb#15864	2023-10-29 18:30:32 +02:00
Andrii Patsula	5807ef0bb7	test: Verify server exit code during graceful process shutdown. Currently, it's possible for a test to pass even if the server crashes during a graceful shutdown. Additionally, the server may crash in the middle of a test, resulting in a test failure with an inaccurate description. This commit updates the test framework to monitor the server's return code and throw an exception in the event of an abnormal server shutdown. Fixes scylladb/scylla#15365 Closes scylladb/scylladb#15660	2023-10-29 18:30:32 +02:00
Kefu Chai	2be5a86a14	test/pylib: unset the env variables set by MinIoServer before this change, when running object_store tests with `pytest` directly, an instance of MinIoServer is started as a function-scope fixture, but the environmental variables set by it stay with the process, even after the fixture is teared down. So, when the 2nd test in the same process check these environmental variables, it would under the impression that there is already a S3 server running, and thinks it is drived by `test.py`, hence try to reuse the S3 server. But the MinIoServer instance is teared down at that moment, when the first test is completed. So the test is likely to fail when the Scylla instance tries to read the missing conf file previously created by the MinIoServer. after this change, the environmental variables are reset, so they won't be seen by the succeeding tests in the same pytest session. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15779	2023-10-29 18:30:32 +02:00
Botond Dénes	132ae92c75	Merge 'build: extract code fragments into functions' from Kefu Chai this series is one of the steps to remove global statements in `configure.py`. not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system. Refs #15379 Closes scylladb/scylladb#15818 * github.com:scylladb/scylladb: build: move the code with side effects into a single function build: create outdir when outdir is explictly used build: group the code with side effects together build: do not rely on updating global with a dict build: extract generate_version() out build: extract get_release_cxxflags() out build: extract get_extra_cxxflags() out build: move thrift_libs to where it is used build: move pkg closer to where it is used build: remove unused variable build: move variable closer to where it is used	2023-10-29 18:30:32 +02:00
Avi Kivity	e349a2657c	Merge 'Allow running perf-simple-query with tablets' from Tomasz Grabiec Usage: ``` build/dev/scylla perf-simple-query --tablets ``` Closes scylladb/scylladb#15656 * github.com:scylladb/scylladb: perf_simple_query: Allow running with tablets tests: cql_test_env: Allow creating keyspace with tablets tests: cql_test_env: Register storage_service in migration notifier test: cql_test_env: Initialize node state in topology	2023-10-29 18:30:32 +02:00
Aleksandr Bykov	6b991b4791	doc: add note about run test.py with toolchain/dbuild test.py tests could be run with toolchain/dbuild and in this case there is no need to executed ./install-dependicies.sh. Closes scylladb/scylladb#15837	2023-10-29 18:30:32 +02:00
Kefu Chai	3a6e359328	build: cmake: add token_metadata.cc to api `token_metadata.cc` moved into api in `e4c0a4d34d`, let's update CMake accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15857	2023-10-29 18:30:32 +02:00
Kefu Chai	8819865c8d	build: cmake: correct the variable names in mode.Dev.cmake it was a copy-pasta error. - s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/ - s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15849	2023-10-29 18:30:32 +02:00
Kamil Braun	1c0ae2e7ef	Merge 'raft topology: assign tokens after join node response rpc' from Piotr Dulikowski Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier. However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet. Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier. Refs: scylladb/scylladb#15686 Fixes: scylladb/scylladb#15738 Closes scylladb/scylladb#15724 * github.com:scylladb/scylladb: test: test_topology_ops: continuously write during the test raft topology: assign tokens after join node response rpc storage_service: fix indentation after previous commit raft topology: loosen assumptions about transition nodes having tokens	2023-10-29 18:30:32 +02:00
Marcin Maliszkiewicz	020a9c931b	db: view: run local materialized view mutations on a separate smp service group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) not committed framework for easy alternator performance testing: https://github.com/scylladb/scylladb/pull/13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \ --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \ --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads. Fixes https://github.com/scylladb/scylladb/issues/15844 Closes scylladb/scylladb#15845	2023-10-29 18:30:32 +02:00
Patryk Jędrzejczak	a6236072ee	raft topology: join_node_request_handler: wait until first node becomes normal We need to wait until the first node becomes normal in `join_node_request_handler` to ensure that joining nodes are not handled as the first node in the cluster. If we placed a join request before the first node becomes normal, the topology coordinator would incorrectly skip the join node handshake in `handle_node_transition` (`case node_state::none`). It would happen because the topology coordinator decides whether a node is the first in the cluster by checking if there are no normal nodes. Therefore, we must ensure at least one normal node when the topology coordinator handles a join request for a non-first node. We change the previous check because it can return true if there are no normal nodes. `topology::is_empty` would also return false if the first node was still new or in transition. Additionally, calling `join_node_request_handler` before the first node sets itself as normal is frequent during concurrent bootstrap, so we remove "unlikely" from the comment. Fixes: scylladb/scylladb#15807 Closes scylladb/scylladb#15775	2023-10-29 18:30:32 +02:00
Botond Dénes	16ce212c31	tools/scylla-nodetool: implement the listsnapshots command The output is changed slightly, compared to the current nodetool: * Number columns are aligned to the right * Number columns don't have decimal places * There are no trailing whitespaces	2023-10-27 01:26:54 -04:00
Botond Dénes	27854a50be	tools/scylla-nodetool: implement clearsnapshot command	2023-10-27 01:26:54 -04:00
Botond Dénes	b32ee54ba0	tools/scylla-nodetool: implement the cleanup command The --jobs command-line argument is accepted but ignored, just like the current nodetool does.	2023-10-27 01:26:53 -04:00
Botond Dénes	7e3a78d73d	test/nodetool: rest_api_mock: add more options for multiple requests Change the current bool multiple param to a weak enum, allowing for a third value: ANY, which allows for 0 matches too.	2023-10-26 08:31:12 -04:00
Botond Dénes	b878dcc1c3	tools/scylla-nodetool: log responses with trace level With this, both requests and responses to/from the remote are logged when trace-level logging is enabled. This should greatly simplify debugging any problems.	2023-10-26 08:28:37 -04:00
Anna Stuchlik	eb57c3bc22	doc: remove versions from Materialized Views This commit removes irrelevant information about versions from the Materialized Views page (CQL Reference). In addition, it replaces "Scylla" with "ScyllaDB" on MV-related pages.	2023-10-26 12:08:13 +02:00
Anna Stuchlik	29bd044db3	doc: add CQL Reference for Materialized Views This commit adds CQL Reference for Materialized Views to the Materialized Views page.	2023-10-26 11:47:22 +02:00
Kefu Chai	227136ddf5	main.cc: specify shortname for scheduling groups so, for instance, the logging message looks like: ``` INFO 2023-10-24 15:19:37,290 [shard 0:strm] storage_service - entering STARTING mode ``` instead of ``` INFO 2023-10-24 15:19:37,290 [shard 0:stre] storage_service - entering STARTING mode ``` Fixes #15267 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15821	2023-10-26 10:52:05 +03:00
Kefu Chai	d43afd576e	cql3/restrictions/statement_restrictions: s/allow filtering/ALLOW FILTERING/ use the captalized "ALLOW FILTERING" in the error message, because the error message is a part of the user interface, it would be better to keep it aligned with our document, where "ALLOW FILTERING" is used. so, in this change, the lower-cased "allow filtering" error message is changed to "ALLOW FILTERING", and the tests are updated accordingly. see also `a0ffbf3291` Refs #14321 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15718	2023-10-26 10:00:37 +03:00
Kefu Chai	bfd99fad7f	build: move the code with side effects into a single function so that we can optionally utilize CMake for generating the building system instead. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	85cc9073c9	build: create outdir when outdir is explictly used actually we've created outdir when using it as the parent directory of `tempfile.tempdir`, but there are many places where we use `tempfile.tempdir` for, for instance, testing the compiler flags, and these tests will be removed once we migrate to CMake, so they do not really matter when reviewing the change which migrates to CMake. the point of this change is to help the review understand the major changes performed by the migration. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	6c7cc927b5	build: group the code with side effects together so we can move them into a single function Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	a375ce2ac1	build: do not rely on updating global with a dict we use `globals().update(vars(args))` for updating the global variables with a dict in `args`, this is convenient, but it hurts the readability. let's reference the parsed options explicitly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	a25a153e9f	build: extract generate_version() out so we don't do less things with side effects in the global scope. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	cb6531b1a8	build: extract get_release_cxxflags() out prepare for the change to read the SCYLLA-*-FILE in functions not doing this in global scope. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	ec7ac3c750	build: extract get_extra_cxxflags() out on top of per-mode cxxflags, we apply more of them based on settings and building environment. to reduce the statements in global scope, let's extract the related code into a function. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	8646e6c5d1	build: move thrift_libs to where it is used for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:38 +08:00
Kefu Chai	8b76f2a835	build: move pkg closer to where it is used for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:37 +08:00
Kefu Chai	ea6bf6b908	build: remove unused variable `optional_packages` was introduced in `8b0a26f06d`, but we don't offer the alternative versions of libsystemd anymore, and this variable is not used in `configure.py`, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:37 +08:00
Kefu Chai	846218a8bc	build: move variable closer to where it is used for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:37 +08:00
Yaniv Kaul	600822379d	Docs: small typo in cql extensions page Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#15840	2023-10-25 17:27:04 +03:00
Botond Dénes	5d1e9d8c46	Merge 'Sanitize API -> token_metadata dependency' from Pavel Emelyanov This is the continuation for `19fc01be23` Registering API handlers for services need to * use only the required service (sharded<> one if needed) * get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything) There are several endpoints scattered over storage_service and snitch that use token metadata and topology. This PR makes those endpoints work the described way and drop the api::ctx -> token_metadata dependency. Closes scylladb/scylladb#15831 * github.com:scylladb/scylladb: api: Remove http::context -> token_metadata dependency api: Pass shared_token_metadata instead of storage_service api: Move snitch endpoints that use token metadata only api: Move storage_service endpoints that use token metadata only	2023-10-25 17:19:39 +03:00
Anna Stuchlik	ad29ba4cad	doc: add info about encrypted tables to Backup This commit updates the introduction of the Backup Your Data page to include information about encryption. Fixes https://github.com/scylladb/scylladb/issues/15573 Closes scylladb/scylladb#15612	2023-10-25 17:15:15 +03:00
Avi Kivity	782c6a208a	Merge 'cql3: mutation_fragments_select_statement: keep erm alive for duration of the query' from Botond Dénes Said statement keeps a reference to erm indirectly, via a topology node pointer, but doesn't keep erm alive. This can result in use-after-free. Furthermore, it allows for vnodes being pulled from under the query's feet, as it is running. To prevent this, keep the erm alive for the duration of the query. Also, use `host_id` instead of `node`, the node pointer is not needed really, as the statement only uses the host id from it. Fixes: #15802 Closes scylladb/scylladb#15808 * github.com:scylladb/scylladb: cql3: mutation_fragments_select_statement: use host_id instead of node cql3: mutation_fragments_select_statement: pin erm reference	2023-10-25 15:03:07 +03:00
Gleb Natapov	9f6e93c144	raft: make sure that all operation forwarded to a leader are completed before destroying raft server Hold a gate around all operations that are forwarded to a leader to be able to wait for them during server::abort() otherwise the abort() may complete while those operations are still running which may cause use after free.	2023-10-25 13:29:36 +03:00
Gleb Natapov	ba044b769a	storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier global_token_metadata_barrier and global_tablet_token_metadata_barrier are doing practically the same thing now. Call the former from the later.	2023-10-25 13:29:36 +03:00
Gleb Natapov	72419f1a61	tests: add tests for streaming failure in bootstrap/replace/remove/decomission	2023-10-25 13:29:36 +03:00
Gleb Natapov	b072ddd8a7	test/pylib: do not stop node if decommission failed with an expected error	2023-10-25 13:03:57 +03:00
Gleb Natapov	cee7aab32c	storage_service: raft topology: fix typo in "decommission" everywhere	2023-10-25 13:03:57 +03:00
Gleb Natapov	0201304096	storage_service: raft topology: add streaming error injection Add error injection into the stream_ranges topology command.	2023-10-25 13:03:57 +03:00
Gleb Natapov	ba217d9341	storage_service: raft topology: do not increase topology version during CDC repair CDC repair operation does not change the topology, but it goes through the same state as bootstrap that does. Distinguish between two cases and increment the topology version only in the case of the bootstrap.	2023-10-25 13:03:56 +03:00
Gleb Natapov	8e393ea750	storage_service: raft topology: rollback topology operation on streaming failure. Currently if a streaming fails during a topology operation the streaming is retried until is succeeds. If it will never succeed it will be retried forever. There is no way to stop the topology operation. This patch introduce the rollback mechanism on streaming failure. If streaming fails during bootstrap/replace the bootstrapping/replacing node is moved to the left_token_ring state (and then left state) and the operation has to be restarted after removing data directory. If streaming fails during decommission/remove the node is moved back to normal and the operation need to be restarted after the failure reason is eliminated.	2023-10-25 13:03:55 +03:00
Gleb Natapov	0a8c3e5c78	storage_service: raft topology: load request parameters in left_token_ring state as well Next patch will want to access request parameters in left_token_ring for failure recovery purposes.	2023-10-25 12:56:27 +03:00
Gleb Natapov	49b6153d27	storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error Term change is not an error. Do not report it as such.	2023-10-25 12:56:27 +03:00
Gleb Natapov	5b760572df	storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch Currently we get a future and check if it is failed, but with co-routines the complication is not needed. And since we want to filer out some errors in the next patch with try/catch it will be more effective.	2023-10-25 12:56:27 +03:00
Gleb Natapov	466fe35474	storage_service: raft topology: make global_token_metadata_barrier node independent We want to use global_token_metadata_barrier without the node, so make it accept guard and excluded nodes directly.	2023-10-25 12:56:26 +03:00
Gleb Natapov	a49ae3ff87	storage_service: raft topology: split get_excluded_nodes from exec_global_command Will be used later.	2023-10-25 12:56:26 +03:00
Gleb Natapov	897a7e599a	storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true	2023-10-25 12:56:26 +03:00
Gleb Natapov	7f1aa41e86	storage_service: raft topology: simplify streaming RPC failure handling Currently streaming failure handling is different for "removing" and all other operations. Unify them in one try/catch.	2023-10-25 12:56:26 +03:00
Piotr Dulikowski	a3ba4b3109	test: test_topology_ops: continuously write during the test In order to detect issues where requests are routed incorrectly during topology changes, modify the test_topology_ops test so that it runs a background process that continuously writes while the test performs topology changes in the cluster. At the end of the test check whether: - All writes were successful (we only require CL=LOCAL_ONE) - Whether there are any errors from the replica side logic in the nodes' logs (which happen e.g. when node receives writes before learning about the schema)	2023-10-25 11:50:17 +02:00
Piotr Dulikowski	63aa9332aa	raft topology: assign tokens after join node response rpc Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier. However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet. Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.	2023-10-25 11:50:17 +02:00
Piotr Dulikowski	46fce4cff3	storage_service: fix indentation after previous commit	2023-10-25 11:50:17 +02:00
Piotr Dulikowski	2d161676c7	raft topology: loosen assumptions about transition nodes having tokens In later commits, tokens for a joining/replacing node will not be inserted when the node enters `bootstrapping`/`replacing` state but at some later step of the procedure. Loosen some of the assumptions in `storage_service::topology_state_load` and `system_keyspace::load_topology_state` appropriately.	2023-10-25 11:50:17 +02:00
Anna Stuchlik	e223624e2e	doc: fix the Reference page layout This commit fixes the layout of the Reference page. Previously, the toctree level was "2", which made the page hard to navigate. This PR changes the level to "1". In addition, the capitalization of page titles is fixed. This is a follow-up PR to the ones that created and updated the Reference section. It must be backported to branch-5.4. Closes scylladb/scylladb#15830	2023-10-25 12:15:27 +03:00
Botond Dénes	ceb866fa2e	Merge 'Make s3 upload sink PUT small objects' from Pavel Emelyanov When upload-sink is flushed, it may notice that the upload had not yet been started and fall-back to plain PUT in that case. This will make small files uploading much nicer, because multipart upload would take 3 API calls (start, part, complete) in this case fixes: #13014 Closes scylladb/scylladb#15824 * github.com:scylladb/scylladb: test: Add s3_client test for upload PUT fallback s3/client: Add PUT fallback to upload sink	2023-10-25 10:03:46 +03:00
Pavel Emelyanov	cb63d303f0	test: Make test_sstables_excluding_staging_correctness run over s3 too This test checks the way sstable is moved and lives in staging state. Now it passes on S3 as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	d827068d01	sstables,s3: Support state change (without generation change) Now when the system.sstables has the state field, it can be changed (UPDATEd). However, when changing the state AND generation, this still won't work, because generation is the clustering key of the table in question and cannot be just changed. This, nonetheless, is OK, as generation changes with state only when moving an sstable from upload dir into normal/staging and this is separate issue for S3 (#13018). For now changing state only is OK. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	ca5d3d217f	system_keyspace: Add state field to system.sstables The state is one of <empty>(normal)/staging/quarantine. Currently when sstable is moved to non-normal state the s3 backend state_change() call throws thus such sstables do not appear. Next patches are going to change that and the new field in the system.sstables is needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	295936c1d3	sstable_directory: Tune up sstables entries processing comment In fact, this FIXME had been fixed by `2c9ec6bc` (sstable_directory: Garbage collect S3 sstables on reboot) and is no longer valid. However, it's still good to know if GC failed or misbehaved, so replace the comment with a warning. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	e4162227ff	system_keyspace: Tune up status change trace message There will appear very similar one tracing the state change, so it's good to tell them from one another. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	63758d19ce	sstables: Add state string to state enum class convert There's the backward converter already out there. Next code will need to convert string representation of the state back to the internal type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	8e1ff745fa	api: Remove http::context -> token_metadata dependency Now the token metadata usage is fine grained by the relevant endpoint handlers only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:49:05 +03:00
Pavel Emelyanov	be9ea0c647	api: Pass shared_token_metadata instead of storage_service The token metadata endpoints need token metadata, not storage service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:48:27 +03:00
Pavel Emelyanov	c23193bed0	api: Move snitch endpoints that use token metadata only Snitch is now a service can speaks for the local node only. In order to get dc/rack for peers in the cluster one need to use topology which, in turn, lives on token metadata. This patch moves the dc/rack getters to api/token_metadata.cc next to other t.m. related endpoints. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:47:18 +03:00
Pavel Emelyanov	e4c0a4d34d	api: Move storage_service endpoints that use token metadata only There are few of them that don't need the storage service for anything but get token metadata from. Move them to own .cc/.hh units. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:44:53 +03:00
Botond Dénes	6c90d166cc	Merge 'build: cmake: avoid using large amount stack of when compiling parser ' from Kefu Chai this mirrors what we have in `configure.py`, to build the CqlParser with `-O1` and disable `-fsanitize-address-use-after-scope` when compiling CqlParser.cc in order to prevent the compiler from emitting code which uses large amount of stack space at the runtime. Closes scylladb/scylladb#15819 * github.com:scylladb/scylladb: build: cmake: avoid using large amount stack of when compiling parser build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/	2023-10-24 16:19:51 +03:00
Nadav Har'El	4b80130b0b	Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in: - `system_distributed_keyspace::start`, - `redis:create_keyspace_if_not_exists_impl`, - `table_helper::setup_keyspace` (for the `system_traces` keyspace). All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this: - calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum, - we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder. This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME. Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring. Fixes #15437 Closes scylladb/scylladb#15594 * github.com:scylladb/scylladb: table_helper: announce twice in setup_keyspace table_helper: refactor setup_table redis: create_keyspace_if_not_exists_impl: fix indentation redis: announce once in create_keyspace_if_not_exists_impl db: system_distributed_keyspace: fix indentation db: system_distributed_keyspace: announce once in start tablet_allocator: update on_before_create_column_family migration_listener: add parameter to on_before_create_column_family alternator: executor: use new prepare_new_column_family_announcement alternator: executor: introduce create_keyspace_metadata migration_manager: add new prepare_new_column_family_announcement	2023-10-24 15:42:48 +03:00
David Garcia	a5519c7c1f	docs: update cofig params design Closes scylladb/scylladb#15827	2023-10-24 15:41:56 +03:00
Kefu Chai	f8104b92f8	build: cmake: detect rapidxml we use rapidxml for parsing XML, so let's detect it before using it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15813	2023-10-24 15:12:04 +03:00
Pavel Emelyanov	caa3e751f7	test: Add s3_client test for upload PUT fallback The test case creates non-jumbo upload simk and puts some bytes into it, then flushes. In order to make sure the fallback did took place the multipar memory tracker sempahore is broken in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 15:03:53 +03:00
Kamil Braun	db49ccccb0	view: remove unused `_backing_secondary_index` This boolean was only used for a sanity check which was replaced with a stronger sanity check in the previous commit that doesn't require the boolean.	2023-10-24 13:33:36 +02:00
Kamil Braun	3976808b12	schema_tables: turn view schema fixing code into a sanity check The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal with legacy materialized view schemas used for secondary indexes, schemas which were created before the notion of "computed columns" was introduced. Back then, secondary index schemas would use a regular "token" column. Later it became a computed column and old schemas would be migrated during rolling upgrade. The migration code was introduced in 2019 (`db8d4a0cc6`) and then fixed in 2020 (`d473bc9b06`). The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming that users don't try crazy things like upgrading from 2021.X to 2023.X (which we do not support), all clusters will have already executed the migration code once they upgrade to 2023.X, meaning we can get rid of it. The main motivation of this patch is to get rid of the `db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft mode this was the only call to `merge_schema` outside "group 0 code" and in fact it is unsafe -- it uses locally generated mutations with locally generated timestamp (`api::new_timestamp()`), so if we actually did it, we would permanently diverge the group 0 state machine across nodes (the schema pulling code is disabled in Raft mode). Fortunately, this should be dead code by now, as explained in the previous paragraph. The migration code is now turned into a sanity check, if the users try something crazy, they will get an error instead of silent data corruption.	2023-10-24 13:33:35 +02:00
Kamil Braun	f02ac9a9e7	schema_tables: make comment more precise `maybe_fix_legacy_secondary_index_mv_schema` function has this piece of code: ``` // If the first clustering key part of a view is a column with name not found in base schema, // it implies it might be backing an index created before computed columns were introduced, // and as such it must be recreated properly. if (!base_schema->columns_by_name().contains(first_view_ck.name())) { schema_builder builder{schema_ptr(v)}; builder.mark_column_computed(first_view_ck.name(), std::make_unique<legacy_token_column_computation>()); if (preserve_version) { builder.with_version(v->version()); } return view_ptr(builder.build()); } ``` The comment uses the phrase "it might be". However, the code inside the `if` assumes that it "must be": once we determined that the first column in this materialized view does not have a corresponding name in the base table, we set it to be computed using `legacy_token_column_computation`, so we assumed that the column was indeed storing the token. Doing that for a column which is not the token column would be a small disaster. Assuming that the code is correct, we can make the comment more precise. I checked the documentation and I don't see any other way how we could have such a column other than the token column which is internally created by Scylla when creating a secondary index (for example, it is forbidden to use an alias in select statement when creating materialized views, which I checked experimentally).	2023-10-24 13:30:13 +02:00
Kamil Braun	5397524875	feature_service: make COMPUTED_COLUMNS feature unconditionally true The feature is assumed to be true, it was introduced in 2019. It's still advertised in gossip, but it's assumed to always be present. The `schema_feature` enum class still contains `COMPUTED_COLUMNS`, and the `all_tables` function in schema_tables.cc still checks for the schema feature when deciding if `computed_columns()` table should be included. This is necessary because digest calculation tests contain many digests calculated with the feature disabled, if we wanted to make it unconditional in the schema_tables code we'd have to regenerate almost all digests in the tests. It is simpler to leave the possibility for the tests to disable the feature.	2023-10-24 13:30:13 +02:00
Kamil Braun	2a21029ff5	Merge 'make topology_coordinator::run noexcept' from Gleb Topology coordinator should handle failures internally as long as it remains to be the coordinator. The raft state monitor is not in better position to handle any errors thrown by it, all it can do it to restart the coordinator. The series makes topology_coordinator::run handle all the errors internally and mark the function as noexcept to not leak error handling complexity into the raft state monitor. * 'gleb/15728-fix' of github.com:scylladb/scylla-dev: storage_service: raft topology: mark topology_coordinator::run function as noexcept storage_service: raft topology: do not throw error from fence_previous_coordinator()	2023-10-24 12:16:36 +02:00
Kefu Chai	4abcec9296	test: add __repr__ for MinIoServer and S3_Server it is printed when pytest passes it down as a fixture as part of the logging message. it would help with debugging a object_store test. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15817	2023-10-24 12:35:49 +03:00
Pavel Emelyanov	63f2bdca01	s3/client: Add PUT fallback to upload sink When the non-jumbo sink is flushed and notices that the real upload is not started yet, it may just go ahead and PUT the buffers into the object with the single request. For jumbo sink the fallback is not implemented as it likely doesn't make and any sense -- jumbo sinks are unlikely to produce less than 5Mb of data so it's going to be dead code anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 10:59:46 +03:00
Gleb Natapov	dcaaa74cd4	storage_service: raft topology: mark topology_coordinator::run function as noexcept The function handled all exceptions internally. By making it noexcept we make sure that the caller (raft_state_monitor_fiber) does not need handle any exceptions from the topology coordinator fiber.	2023-10-24 10:58:45 +03:00
Gleb Natapov	65bf5877e7	storage_service: raft topology: do not throw error from fence_previous_coordinator() Throwing error kills the topology coordinator monitor fiber. Instead we retry the operation until it succeeds or the node looses its leadership. This is fine before for the operation to succeed quorum is needed and if the quorum is not available the node should relinquish its leadership. Fixes #15728	2023-10-24 10:57:48 +03:00
Botond Dénes	23898581d5	cql3: mutation_fragments_select_statement: use host_id instead of node The statement only uses the node to get its host_id later. Simpler to obtain and store only the host_id int he first place.	2023-10-24 03:12:58 -04:00
Botond Dénes	3cb1669340	cql3: mutation_fragments_select_statement: pin erm reference This query bypasses the usual read-path in storage-proxy and therefore also misses the erm pinning done by storage-proxy. To avoid a vnode being pulled from under its feet, do the erm pinning in the statement itself.	2023-10-24 03:12:36 -04:00
Botond Dénes	0cba973972	Update tools/java submodule * tools/java 3c09ab97...86a200e3 (1): > cassandra-stress: add storage options	2023-10-24 09:41:36 +03:00
Kefu Chai	9347b61d3b	build: cmake: avoid using large amount stack of when compiling parser this mirrors what we have in `configure.py`, to build the CqlParser with -O1 and disable sanitize-address-use-after-scope when compiling CqlParser.cc in order to prevent the compiler from emitting code which uses large amount of stack at the runtime. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-24 12:40:20 +08:00
Kefu Chai	3da02e1bf4	build: cmake: s/COMPILE_FLAGS/COMPILE_OPTIONS/ according to https://cmake.org/cmake/help/latest/prop_sf/COMPILE_FLAGS.html, COMPILE_FLAGS has been superseded by COMPILE_OPTIONS. so let's replace the former with the latter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-24 12:40:20 +08:00
Pavel Emelyanov	7c580b4bd4	Merge 'sstable: switch to uuid identifier for naming S3 sstable objects' from Kefu Chai before this change, we create a new UUID for a new sstable managed by the s3_storage, and we use the string representation of UUID defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for naming the objects stored on s3_storage. but this representation is not what we are using for storing sstables on local filesystem when the option of "uuid_sstable_identifiers_enabled" is enabled. instead, we are using a base36-based representation which is shorter. to be consistent with the naming of the sstables created for local filesystem, and more importantly, to simplify the interaction between the local copy of sstables and those stored on object storage, we should use the same string representation of the sstable identifier. so, in this change: 1. instead of creating a new UUID, just reuse the generation of the sstable for the object's key. 2. do not store the uuid in the sstable_registry system table. As we already have the generation of the sstable for the same purpose. 3. switch the sstable identifier representation from the one defined by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the base36-based one (implemented by fmt::formatter<sstables::generation_type>) Fixes #14175 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#14406 * github.com:scylladb/scylladb: sstable: remove _remote_prefix from s3_storage sstable: switch to uuid identifier for naming S3 sstable objects	2023-10-23 21:05:13 +03:00
Pavel Emelyanov	d7031de538	Merge 'test/pylib: extract the env variable related functions out' from Kefu Chai this series extracts the the env variables related functions out and remove unused `import`s for better readability. Closes scylladb/scylladb#15796 * github.com:scylladb/scylladb: test/pylib: remove duplicated imports test/pylib: extract the env variable printing into MinIoServer test/pylib: extract _set_environ() out	2023-10-23 21:03:03 +03:00
Aleksandra Martyniuk	0c6a3f568a	compaction: delete default_compaction_progress_monitor default_compaction_progress_monitor returns a reference to a static object. So, it should be read-only, but its users need to modify it. Delete default_compaction_progress_monitor and use one's own compaction_progress_monitor instance where it's needed. Closes scylladb/scylladb#15800	2023-10-23 16:03:34 +03:00
Anna Stuchlik	55ee999f89	doc: enable publishing docs for branch-5.4 This commit enables publishing documentation from branch-5.4. The docs will be published as UNSTABLE (the warning about version 5.4 being unstable will be displayed). Closes scylladb/scylladb#15762	2023-10-23 15:47:01 +03:00
Botond Dénes	8180f61147	test/boost/multishard_mutation_query_test: fix querier cache misses expectations There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many shards we have without readers on them.	2023-10-23 08:07:14 -04:00
Botond Dénes	0a34f29ea5	test/lib/test_utils: add require_* variants for all comparators Not just equal. This allows for better error messages, printing both values and the failed relation operator, instead of a generic fail message.	2023-10-23 07:52:38 -04:00
Avi Kivity	ee9cc450d4	logalloc: report increases of reserves The log-structured allocator maintains memory reserves to so that operations using log-strucutured allocator memory can have some working memory and can allocate. The reserves start small and are increased if allocation failures are encountered. Before starting an operation, the allocator first frees memory to satisfy the reserves. One problem is that if the reserves are set to a high value and we encounter a stall, then, first, we have no idea what value the reserves are set to, and second, we have no idea what operation caused the reserves to be increased. We fix this problem by promoting the log reports of reserve increases from DEBUG level to INFO level and by attaching a stack trace to those reports. This isn't optimal since the messages are used for debugging, not for informing the user about anything important for the operation of the node, but I see no other way to obtain the information. Ref #13930. Closes scylladb/scylladb#15153	2023-10-23 13:37:50 +02:00
Tomasz Grabiec	4af585ec0e	Merge 'row_cache: make_reader_opt(): make make_context() reentrant ' from Botond Dénes Said method is called in an allocating section, which will re-try the enclosed lambda on allocation failure. `read_context()` however moves the permit parameter so on the second and later calls, the permit will be in a moved-from state, triggering a `nullptr` dereference and therefore a segfault. We already have a unit test (`test_exception_safety_of_reads` in `row_cache_test.cc`) which was supposed to cover this, but: * It only tests range scans, not single partition reads, which is a separate path. * Turns out allocation failure tests are again silently broken (no error is injected at all). This is because `test/lib/memtable_snapshot_source.hh` creates a critical alloc section which accidentally covers the entire duration of tests using it. Fixes: #15578 Closes scylladb/scylladb#15614 * github.com:scylladb/scylladb: test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads test/lib/memtable_snapshot_source: disable critical alloc section while waiting row_cache: make_reader_opt(): make make_context() reentrant	2023-10-23 11:22:13 +02:00
Raphael S. Carvalho	ea6c281b9f	replica: Fix major compaction semantics by performing off-strategy first Major compaction semantics is that all data of a table will be compacted together, so user can expect e.g. a recently introduced tombstone to be compacted with the data it shadows. Today, it can happen that all data in maintenance set won't be included for major, until they're promoted into main set by off-strategy. So user might be left wondering why major is not having the expected effect. To fix this, let's perform off-strategy first, so data in maintenance set will be made available by major. A similar approach is done for data in memtable, so flush is performed before major starts. The only exception will be data in staging, which cannot be compacted until view building is done with it, to avoid inconsistency in view replicas. The serialization in comapaction manager of reshape jobs guarantee correctness if there's an ongoing off-strategy on behalf of the table. Fixes #11915. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15792	2023-10-23 11:32:03 +03:00
Nadav Har'El	e7dd0ec033	test/cql-pytest: reproduce incompatibility with same-name bind marks This patch adds a reproducer for a minor compatibility between Scylla's and Cassandra's handling of a prepared statement when a bind marker with the same name is used more than once, e.g., ``` SELECT * FROM tbl WHERE p=:x AND c=:x ``` It turns out that Scylla tells the driver that there is only one bind marker, :x, whereas Cassandra tells the driver that there are two bind markers, both named :x. This makes no different if the user passes a map `{'x': 3}`, but if the user passes a tuple, Scylla accepts only `(3,)` (assigning both bind markers the same value) and Cassandra accepts only `(3,3)`. The test added in this patch demonstrates this incompatibility. It fails on Scylla, passes on Cassandra, and is marked "xfail". Refs #15559 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#15564	2023-10-23 11:19:15 +03:00
Aleksandra Martyniuk	a1271d2d5c	repair: throw more detailed exception Exception thrown from row_level_repair::run does not show the root cause of a failure making it harder to debug. Add the internal exception contents to runtime_error message. After the change the log will mention the real cause (last line), e.g.: repair - repair[92db0739-584b-4097-b6e2-e71a66e40325]: 33 out of 132 ranges failed, keyspace=system_distributed, tables={cdc_streams_descriptions_v2, cdc_generation_timestamps, view_build_status, service_levels}, repair_reason=bootstrap, nodes_down_during_repair={}, aborted_by_user=false, failed_because=seastar::nested_exception: std::runtime_error (Failed to repair for keyspace=system_distributed, cf=cdc_streams_descriptions_v2, range=(8720988750842579417,+inf)) (while cleaning up after seastar::abort_requested_exception (abort requested)) Closes scylladb/scylladb#15770	2023-10-23 11:15:25 +03:00
Botond Dénes	950a1ff22c	Merge 'doc: improve the docs for handling failures' from Anna Stuchlik This PR improves the way of how handling failures is documented and accessible to the user. - The Handling Failures section is moved from Raft to Troubleshooting. - Two new topics about failure are added to Troubleshooting with a link to the Handling Failures page (Failure to Add, Remove, or Replace a Node, Failure to Update the Schema). - A note is added to the add/remove/replace node procedures to indicate that a quorum is required. See individual commits for more details. Fixes https://github.com/scylladb/scylladb/issues/13149 Closes scylladb/scylladb#15628 * github.com:scylladb/scylladb: doc: add a note about Raft doc: add the quorum requirement to procedures doc: add more failure info to Troubleshooting doc: move Handling Failures to Troubleshooting	2023-10-23 11:09:28 +03:00
Kefu Chai	5a17a02abb	build: cmake: add -ffile-prefix-map option this mirrors what we already have in configure.py. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15798	2023-10-23 10:26:21 +03:00
Botond Dénes	940c2d1138	Merge 'build: cmake: use add_compile_options() and add_link_options() when appropriate ' from Kefu Chai instead of appending the options to the CMake variables, use the command to do this. simpler this way. and the bonus is that the options are de-duplicated. Closes scylladb/scylladb#15797 * github.com:scylladb/scylladb: build: cmake: use add_link_options() when appropriate build: cmake: use add_compile_options() when appropriate	2023-10-23 09:58:10 +03:00
Botond Dénes	c960c2cdbf	Merge 'build: extract code fragments into functions' from Kefu Chai this series is one of the steps to remove global statements in `configure.py`. not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system. Refs #15379 Closes scylladb/scylladb#15780 * github.com:scylladb/scylladb: build: update modeval using a dict build: pass args.test_repeat and args.test_timeout explicitly build: pull in jsoncpp using "pkgs" build: build: extract code fragments into functions	2023-10-23 09:42:37 +03:00
Kefu Chai	0080b15939	build: cmake: use add_link_options() when appropriate instead of appending to CMAKE_EXE_LINKER_FLAGS, use add_link_options() to add more options. as CMAKE_EXE_LINKER_FLAGS is a string, and typically set by user, let's use add_link_options() instead. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 12:06:42 +08:00
Kefu Chai	686adec52e	build: cmake: use add_compile_options() when appropriate instead of appending to CMAKE_CXX_FLAGS, use add_compile_options() to add more options. as CMAKE_CXX_FLAGS is a string, and typically set by user, let's use add_compile_options() instead, the options added by this command will be added before CMAKE_CXX_FLAGS, and will have lower priority. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 12:06:42 +08:00
Kefu Chai	8756838b16	test/pylib: remove duplicated imports Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 10:36:05 +08:00
Kefu Chai	6b84bc50c3	test/pylib: extract the env variable printing into MinIoServer less repeatings this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 10:36:05 +08:00
Kefu Chai	02cad8f85b	test/pylib: extract _set_environ() out will add _unset_environ() later. extracting this helper out helps with the readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 10:36:05 +08:00
Kefu Chai	b36cef6f1a	sstable: remove _remote_prefix from s3_storage since we use the sstable.generation() for the remote prefix of the key of the object for storing the sstable component, there is no need to set remote_prefix beforehand. since `s3_storage::ensure_remote_prefix()` and `system_kesypace::sstables_registry_lookup_entry()` are not used anymore, they are removed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 10:08:22 +08:00
Kefu Chai	af8bc8ba63	sstable: switch to uuid identifier for naming S3 sstable objects before this change, we create a new UUID for a new sstable managed by the s3_storage, and we use the string representation of UUID defined by RFC4122 like "0aa490de-7a85-46e2-8f90-38b8f496d53b" for naming the objects stored on s3_storage. but this representation is not what we are using for storing sstables on local filesystem when the option of "uuid_sstable_identifiers_enabled" is enabled. instead, we are using a base36-based representation which is shorter. to be consistent with the naming of the sstables created for local filesystem, and more importantly, to simplify the interaction between the local copy of sstables and those stored on object storage, we should use the same string representation of the sstable identifier. so, in this change: 1. instead of creating a new UUID, just reuse the generation of the sstable for the object's key. 2. do not store the uuid in the sstable_registry system table. As we already have the generation of the sstable for the same purpose. 3. switch the sstable identifier representation from the one defined by the RFC4122 (implemented by fmt::formatter<utils::UUID>) to the base36-based one (implemented by fmt::formatter<sstables::generation_type>) 4. enable the `uuid_sstable_identifers` cluster feature if it is enabled in the `test_env_config`, so that it the sstable manager can enable the uuid-based uuid when creating a new uuid for sstable. 5. throw if the generation of sstable is not UUID-based when accessing / manipulating an sstable with S3 storage backend. as the S3 storage backend now relies on this option. as, otherwise we'd have sstables with key like s3://bucket/number/basename, which is just unable to serve as a unique id for sstable if the bucket is shared across multiple tables. Fixes #14175 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-23 10:08:22 +08:00
Avi Kivity	f181ac033a	Merge 'tools/nodetool: implement additional commands, part 2/N' from Botond Dénes The following new commands are implemented: * stop * compactionhistory All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15649 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement compactionhistory command tools/scylla-nodetool: implement stop command mutation/json: extract generic streaming writer into utils/rjson.hh test/nodetool: rest_api_mock.py: add support for error responses	2023-10-21 00:11:42 +03:00
Botond Dénes	19fc01be23	Merge 'Sanitize API -> task_manager dependency' from Pavel Emelyanov This is the continuation of `8c03eeb85d` Registering API handlers for services need to * get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything) * unset the handlers on stop so that the service is not used after it's stopped (and before API server is stopped) This makes task manager handlers work this way Closes scylladb/scylladb#15764 * github.com:scylladb/scylladb: api: Unset task_manager test API handlers api: Unset task_manager API handlers api: Remove ctx->task_manager dependency api: Use task_manager& argument in test API handlers api: Push sharded<task_manager>& down the test API set calls api: Use task_manager& argument in API handlers api: Push sharded<task_manager>& down the API set calls	2023-10-20 18:07:20 +03:00
Botond Dénes	4b57c2bf18	tools/scylla-nodetool: implement compactionhistory command	2023-10-20 10:55:38 -04:00
Botond Dénes	f811a63e1b	docs: nodetool compact: remove common arguments These are already documented in the nodetool index page. The list in the nodetool index page is less informative, so copy the list from nodetool compact over there.	2023-10-20 10:16:42 -04:00
Botond Dénes	397f67990f	docs: nodetool stop: fix compaction types and examples Nodetool doesn't recognize RESHARD, even though ScyllaDB supports stopping RESHARD compaction. Remove VALIDATE from the list - ScyllaDB doesn't support it. Add a note about the unimplemented --id option. Fix the examples, they are broken. Fix the entry in the nodetool command list, the command is called `stop`, not `stop compaction`.	2023-10-20 10:15:47 -04:00
Botond Dénes	70ba6b94c3	docs: nodetool compact: remove unsupported partition option This option is not supported by either the nodetool frontend, nor ScyllaDB itself. Remove it. Also improve the wording on the unsupported options.	2023-10-20 10:15:44 -04:00
Botond Dénes	a212ddc5b1	tools/scylla-nodetool: implement stop command	2023-10-20 10:04:56 -04:00
Botond Dénes	9231454acd	mutation/json: extract generic streaming writer into utils/rjson.hh This writer is generally useful, not just for writing mutations as json. Make it generally available as well.	2023-10-20 10:04:56 -04:00
Botond Dénes	6db2698786	test/nodetool: rest_api_mock.py: add support for error responses	2023-10-20 10:04:56 -04:00
Kefu Chai	9f62bfa961	build: update modeval using a dict instead of updating `modes` in with global statements, update it in a function. for better readablity. and to reduce the statements in global scope. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-20 21:37:07 +08:00
Botond Dénes	ad90bb8d87	replica/database: remove "streaming" from dirty memory metric description We don't have streaming memtables for a while now. Closes scylladb/scylladb#15638	2023-10-20 13:09:57 +03:00
Kefu Chai	c240c70278	build: pass args.test_repeat and args.test_timeout explicitly for better readability. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-20 16:53:16 +08:00
Kefu Chai	c2cd11a8b3	build: pull in jsoncpp using "pkgs" this change adds "jsoncpp" dependency using "pkgs". simpler this way. it also helps to remove more global statements. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-20 16:53:16 +08:00
Kefu Chai	890113a9cf	build: build: extract code fragments into functions this change extract `get_warnings_options()` out. it helps to remove more global statements. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-20 16:53:16 +08:00
Patryk Jędrzejczak	fbcd667030	replica: keyspace::create_replication_strategy: remove a redundant parameter The options parameter is redundant. We always use `_metadata->strategy_options()` and `keyspace::create_replication_strategy` already assumes that `_metadata` is set by using its other fields. Closes scylladb/scylladb#15776	2023-10-20 10:20:49 +03:00
Botond Dénes	460bc7d8e1	test/boost/row_cache_test: test_exception_safety_of_reads: also cover single-partition reads The test currently only covers scans. Single partition reads have a different code-path, make sure it is also covered.	2023-10-20 03:16:57 -04:00
Botond Dénes	ffefa623f4	test/lib/memtable_snapshot_source: disable critical alloc section while waiting memtable_snapshot_source starts a background fiber in its constructor, which compacts LSA memory in a loop. The loop's inside is covered with a critical alloc section. It also contains a wait on a condition variable and in its present form the critical section also covers the wait, effectively turning off allocation failure injection for any test using the memtable_snapshot_source. This patch disables the critical alloc section while the loop waits on the condition variable.	2023-10-20 03:16:57 -04:00
Botond Dénes	92966d935a	row_cache: make_reader_opt(): make make_context() reentrant Said lambda currently moves the permit parameter, so on the second and later calls it will possibly run into use-after-move. This can happen if the allocating section below fails and is re-tried.	2023-10-20 03:16:57 -04:00
Kefu Chai	11d7cadf0d	install-dependencies.sh: drop java deps the java related build dependencies are installed by * tools/java/install-dependencies.sh * tools/jmx/install-dependencies.sh respectively. and the parent `install-dependencies.sh` always invoke these scripts, so there is no need to repeat them in the parent `install-dependenceies.sh` anymore. in addition to dedup the build deps, this change also helps to reduce the size of build dependencies. as by default, `dnf` install the weak deps, unless `-setopt=install_weak_deps=False` is passed to it, so this change also helps to reduce the traffic and foot print of the installed packages for building scylla. see also `9dddad27bf` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15473	2023-10-20 09:43:28 +03:00
Kamil Braun	059d647ee5	test/pylib: scylla_cluster: protect `ScyllaCluster.stop` with a lock test.py calls `uninstall()` and `stop()` concurrently from exit artifacts, and `uninstall()` internally calls `stop()`. This leads to premature releasing of IP addresses from `uninstall()` (returning IPs to the pool) while the servers using those IPs are still stopping. Then a server might obtain that IP from the pool and fail to start due to "Address already in use". Put a lock around the body of `stop()` to prevent that. Fixes: scylladb/scylladb#15755 Closes scylladb/scylladb#15763	2023-10-20 09:30:37 +03:00
Kefu Chai	80c656a08b	types: use more readable error message when serializing non-ASCII string before this change, we print marshaling error: Value not compatible with type org.apache.cassandra.db.marshal.AsciiType: '...' but the wording is not quite user friendly, it is a mapping of the underlying implementation, user would have difficulty understanding "marshaling" and/or "org.apache.cassandra.db.marshal.AsciiType" when reading this error message. so, in this change 1. change the error message to: Invalid ASCII character in string literal: '...' which should be more straightforward, and easier to digest. 2. update the test accordingly please note, the quoted non-ASCII string is preserved instead of being printed in hex, as otherwise user would not be able to map it with his/her input. Refs #14320 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15678	2023-10-20 09:25:44 +03:00
Pavel Emelyanov	0c69a312db	Update seastar submodule * seastar bab1625c...17183ed4 (73): > thread_pool: Reference reactor, not point to > sstring: inherit publicly from string_view formatter > circleci: use conditional steps > weak_ptr: include used header > build: disable the -Wunused-* warnings for checkheaders > resource: move variable into smaller lexical scope > resource: use structured binding when appropriate > httpd: Added server and client addresses to request structure > io_queue: do not dereference moved-away shared pointer > treewide: explicitly define ctor and assignment operator > memory: use `err` for the error string > doc: Add document describing all the math behind IO scheduler > io_queue: Add flow-rate based self slowdown backlink > io_queue: Make main throttler uncapped > io_queue: Add queue-wide metrics > io_queue: Introduce "flow monitor" > io_queue: Count total number of dispatched and completed requests so far > io_queue: Introduce io_group::io_latency_goal() > tests: test the vector overload for when_all_succeed > core: add a vector overload to when_all_succeed > loop: Fix iterator_range_estimate_vector_capacity for random iters > loop: Add test for iterator_range_estimate_vector_capacity > core/posix return old behaviour using non-portable pthread_attr_setaffinity_np when present > memory: s/throw()/noexcept/ > build: enable -Wdeprecated compiler option > reactor: mark kernel_completion's dtor protected > tests: always wait for promise > http, json, net: define-generated copy ctor for polymorphic types > treewide: do not define constexpr static out-of-line > reactor: do not define dtor of kernel_completion > http/exception: stop using dynamic exception specification > metrics: replace vector with deque > metrics: change metadata vector to deque > utils/backtrace.hh: make simple_backtrace formattable > reactor: Unfriend disk_config_params > reactor: Move add_to_flush_poller() to internal namespace > reactor: Unfriend a bunch of sched group template calls > rpc_test: Test rpc send glitches > net: Implement batch flush support for existing sockets > iostream: Configure batch flushes if sink can do it > net: Added remote address accessors > circleci: update the image to CircleCI "standard" image > build: do not add header check target if no headers to check > build: pass target name to seastar_check_self_contained > build: detect glibc features using CMake > build: extract bits checking libc into CheckLibc.cmake > http/exception: add formatter for httpd::base_exception > http/client: Mark write_body() const > http/client: Introduce request::_bytes_written > http/client: Mark maybe_wait_for_continue() const > http/client: Mark send_request_head() const > http/client: Detach setup_request() > http/api_docs: copy in api_docs's copy constructor > script: do not inherit from object > scripts: addr2line: change StdinBacktraceIterator to a function > scripts: addr2line: use yield instead defining a class > tests: skip tests that require backtrace if execinfo.h is not found > backtrace: check for existence of execinfo.h > core: use ino_t and off_t as glibc sets these to 64bit if 64bit api is used > core: add sleep_abortable instantiation for manual_clock > tls: Return EPIPE exception when writing to shutdown socket > http/client: Don't cache connection if server advertises it > http/client: Mark connection as "keep in cache" > core: fix strerror_r usage from glibc extension > reactor: access sigevent.sigev_notify_thread_id with a macro > posix: use pthread_setaffinity_np instead of pthread_attr_setaffinity_np > reactor: replace __mode_t with mode_t > reactor: change sys/poll.h to posix poll.h > rpc: Add unit test for per-domain metrics > rpc: Report client connections metrics > rpc: Count dead client stats > rpc: Add seastar::rpc::metrics > rpc: Make public queues length getters io-scheduler fixes refs: #15312 refs: #11805 http client fixes refs: #13736 refs: #15509 rpc fixes refs: #15462 Closes scylladb/scylladb#15774	2023-10-19 20:52:37 +03:00
Tomasz Grabiec	899ecaffcd	test: tablets: Enable for verbose logging in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode To help diagnose #14746 where we experience timeouts due to connection dropping. Closes scylladb/scylladb#15773	2023-10-19 16:58:53 +03:00
Raphael S. Carvalho	fded314e46	sstables: Fix update of tombstone GC settings to have immediate effect After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode, gc period if applicable, etc) is used to estimate the amount of droppable data (or determine full expiration = max_deletion_time < gc_before). It could happen that the user switched from timeout to repair mode, but sstables will still use the old mode, despite the user asked for a new one. Another example is when you play with value of grace period, to prevent data resurrection if repair won't be able to run in a timely manner. The problem persists until all sstables using old GC settings are recompacted or node is restarted. To fix this, we have to feed latest schema into sstable procedures used for expiration purposes. Fixes #15643. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15746	2023-10-19 16:27:59 +03:00
Kefu Chai	a6e68d8309	build: cmake: move message/* into message/CMakeLists.txt messaging_service.cc depends on idl, but many source files in scylla-main do no depend on idl, so let's * move "message/" into its own directory and add an inter-library dependency between it and the "idl" library. rename the target of "message" under test/manual to "message_test" to avoid the name collision this should address the compilation failure of ``` FAILED: CMakeFiles/scylla-main.dir/message/messaging_service.cc.o /usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -MF CMakeFiles/scylla-main.dir/message/messaging_service.cc.o.d -o CMakeFiles/scylla-main.dir/message/messaging_service.cc.o -c /home/kefu/dev/scylladb/message/messaging_service.cc /home/kefu/dev/scylladb/message/messaging_service.cc:81:10: fatal error: 'idl/join_node.dist.hh' file not found ^~~~~~~~~~~~~~~~~~~~~~~ ``` where the compiler failed to find the included `idl/join_node.dist.hh`, which is exposed by the idl library as part of its public interface. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15657	2023-10-19 13:33:29 +03:00
Botond Dénes	60145d9526	Merge 'build: extract code fragments into functions' from Kefu Chai this series is one of the steps to remove global statements in `configure.py`. not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system. Refs #15379 Closes scylladb/scylladb#15668 * github.com:scylladb/scylladb: build: move check for NIX_CC into dynamic_linker_option() build: extract dynamic_linker_option(): out build: move `headers` into write_build_file()	2023-10-19 13:31:33 +03:00
Avi Kivity	39966e0eb1	Merge 'build: cmake: pass -dynamic-linker to ld' from Kefu Chai to match the behavior of `configure.py`. Closes scylladb/scylladb#15667 * github.com:scylladb/scylladb: build: cmake: pass -dynamic-linker to ld build: cmake: set CMAKE_EXE_LINKER_FLAGS in mode.common.cmake	2023-10-19 13:15:47 +03:00
Aleksandra Martyniuk	56221f2161	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk	520d9db92d	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk	b91064bd2a	tasks: fail if a task was aborted run() method of task_manager::task::impl does not have to throw when a task is aborted with task manager api. Thus, a user will see that the task finished successfully which makes it inconsistent. Finish a task with a failure if it was aborted with task manager api.	2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk	0681795417	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-10-19 10:47:17 +02:00
Jan Ciolek	c256cca6f1	cql3/expr: add more comments in expression.hh `expression` is a std::variant with 16 different variants that represent different types of AST nodes. Let's add documentation that explains what each of these 16 types represents. For people who are not familiar with expression code it might not be clear what each of them does, so let's add clear descriptions for all of them. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com> Closes scylladb/scylladb#15767	2023-10-19 10:56:38 +03:00
Kefu Chai	b105be220b	build: cmake: add join_node.idl.hh to CMake we add a new verb in `7cbe5e3af8`, so let's update the CMake-based building system accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15658	2023-10-19 10:19:16 +03:00
Nikita Kurashkin	2a7932efa1	alternator: fix DeleteTable return values to match DynamoDB's It seems that Scylla has more values returned by DeleteTable operation than DynamoDB. In this patch I added a table status check when generating output. If we delete the table, values KeySchema, AttributeDefinitions and CreationDateTime won't be returned. The test has also been modified to check that these attributes are not returned. Fixes scylladb#14132 Closes scylladb/scylladb#15707	2023-10-19 09:34:16 +03:00
Pavel Emelyanov	ec94cc9538	Merge 'test: set use_uuid to true by default in sstables::test_env ' from Kefu Chai this series 1. let sstable tests using test_env to use uuid-based sstable identifiers by default 2. let the test who requires integer-based identifier keep using it this should enable us to perform the s3 related test after enforcing the uuid-based identifier for s3 backend, otherwise the s3 related test would fail as it also utilize `test_env`. Closes scylladb/scylladb#14553 * github.com:scylladb/scylladb: test: set use_uuid to true by default in sstables::test_env test: enable test to set uuid_sstable_identifiers	2023-10-19 09:09:38 +03:00
Pavel Emelyanov	0981661f8b	api: Unset task_manager test API handlers So that the task_manager reference is not used when it shouldn't on stop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-18 18:56:24 +03:00
Pavel Emelyanov	2d543af78e	api: Unset task_manager API handlers So that the task_manager reference is not used when it shouldn't on stop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-18 18:56:01 +03:00
Pavel Emelyanov	0632ad50f3	api: Remove ctx->task_manager dependency Now the task manager's API (and test API) use the argument and this explicit dependency is no longer required Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-18 18:55:27 +03:00
Pavel Emelyanov	572c880d97	api: Use task_manager& argument in test API handlers Now it's there and can be used. This will allow removing the ctx->task_manager dependency soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-18 18:55:13 +03:00
Pavel Emelyanov	0396ce7977	api: Push sharded<task_manager>& down the test API set calls This is to make it possible to use this reference instead of the ctx.tm one by the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-18 18:54:53 +03:00
Pavel Emelyanov	ef1d2b2c86	api: Use task_manager& argument in API handlers Now it's there and can be used. This will allow removing the ctx->task_manager dependency soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-18 18:54:24 +03:00
Pavel Emelyanov	14e10e7db4	api: Push sharded<task_manager>& down the API set calls This is to make it possible to use this reference instead of the ctx.tm one by the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-18 18:52:46 +03:00
Avi Kivity	7d5e22b43b	replica: memtable: don't forget memtable memory allocation statistics A memtable object contains two logalloc::allocating_section members that track memory allocation requirements during reads and writes. Because these are local to the memtable, each time we seal a memtable and create a new one, these statistics are forgotten. As a result we may have to re-learn the typical size of reads and writes, incurring a small performance penalty. The solution is to move the allocating_section object to the memtable_list container. The workload is the same across all memtables of the same table, so we don't lose discrimination here. The performance penalty may be increased later if log changes to memory reserve thresholds including a backtrace, so this reduces the odds of incurring such a penalty. Closes scylladb/scylladb#15737	2023-10-18 17:43:33 +02:00
Kefu Chai	c8cb70918b	sstable: drop unused parse() overload for deletion_time `deletion_time` is a part of the `partition_header`, which is in turn a part of `partition`. and `data_file` is a sequence of `partition`. `data_file` represents *-Data.db component of an SSTable. see docs/architecture/sstable3/sstables-3-data-file-format.rst. we always parse the data component via `flat_mutation_reader_v2`, which is in turn implemented with mx/reader.cc or kl/reader.cc depending on the version of SSTable to be read. in other words, we decode `deletion_time` in mx/reader.cc or kl/reader.cc, not in sstable.cc. so let's drop the overload parse() for deletion_time. it's not necessary and more importantly, confusing. Refs #15116 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15756	2023-10-18 18:41:56 +03:00
Avi Kivity	f3dc01c85e	Merge 'Enlight sstable_directory construction' from Pavel Emelyanov Currently distributed_loader starts sharded<sstable_directory> with four sharded parameters. That's quite bulky and can be made much shorter. Closes scylladb/scylladb#15653 * github.com:scylladb/scylladb: distributed_loader: Remove explicit sharded<erms> distributed_loader: Brush up start_subdir() sstable_directory: Add enlightened construction table: Add global_table_ptr::as_sharded_parameter()	2023-10-18 16:42:04 +03:00
Anna Stuchlik	274cf7a93a	doc:remove upgrade guides for unsupported versions This commit: - Removes upgrade guides for versions older than 5.0. The oldest one is from version 4.6 to 5.0. - Adds the redirections for the removed pages. Closes scylladb/scylladb#15709	2023-10-18 15:12:26 +03:00
Kefu Chai	f69a44bb37	test/object_store: redirect to STDOUT and STDERR pytest changes the test's sys.stdout and sys.stderr to the captured fds when it captures the outputs of the test. so we are not able to get the STDOUT_FILENO and STDERR_FILENO in C by querying `sys.stdout.fileno()` and `sys.stderr.fileno()`. their return values are not 1 and 2 anymore, unless pytest is started with "-s". so, to ensure that we always redirect the child process's outputs to the log file. we need to use 1 and 2 for accessing the well-known fds, which are the ones used by the child process, when it writes to stdout and stderr. this change should address the problem that the log file is always empty, unless "-s" is specified. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15560	2023-10-18 14:54:01 +03:00
Yaron Kaikov	b340bd6d9e	release: prepare for 5.5.0-dev	2023-10-18 14:40:06 +03:00
Botond Dénes	f7e269ccb8	Merge 'Progress of compaction executors' from Aleksandra Martyniuk compaction_read_monitor_generator is an existing mechanism for monitoring progress of sstables reading during compaction. In this change information gathered by compaction_read_monitor_generator is utilized by task manager compaction tasks of the lowest level, i.e. compaction executors, to calculate task progress. compaction_read_monitor_generator has a flag, which decides whether monitored changes will be registered by compaction_backlog_tracker. This allows us to pass the generator to all compaction readers without impacting the backlog. Task executors have access to compaction_read_monitor_generator_wrapper, which protects the internals of compaction_read_monitor_generator and provides only the necessary functionality. Closes scylladb/scylladb#14878 * github.com:scylladb/scylladb: compaction: add get_progress method to compaction_task_impl compaction: find total compaction size compaction: sstables: monitor validation scrub with compaction_read_generator compaction: keep compaction_progress_monitor in compaction_task_executor compaction: use read monitor generator for all compactions compaction: add compaction_progress_monitor compaction: add flag to compaction_read_monitor_generator	2023-10-18 12:19:51 +03:00
Kamil Braun	c1486fee40	Merge 'commitlog: drop truncation_records after replay' from Petr Gusev This is a follow-up for #15279 and it fixes two problems. First, we restore flushes on writes for the tables that were switched to the schema commitlog if `SCHEMA_COMMITLOG` feature is not yet enabled. Otherwise durability is not guaranteed. Second, we address the problem with truncation records, which could refer to the old commitlog if any of the switched tables were truncated in the past. If the node crashes later, and we replay schema commitlog, we may skip some mutations since their `replay_position`s will be smaller than the `replay_position`s stored for the old commitlog in the `truncated` table. It turned out that this problem exists even if we don't switch commitlogs for tables. If the node was rebooted the segment ids will start from some small number - they use `steady_clock` which is usually bound to boot time. This means that if the node crashed we may skip the mutations because their RPs will be smaller than the last truncation record RP. To address this problem we delete truncation records as soon as commitlog is replayed. We also include a test which demonstrates the problem. Fixes #15354 Closes scylladb/scylladb#15532 * github.com:scylladb/scylladb: add test_commitlog system.truncated: Remove replay_position data from truncated on start main.cc: flush only local memtables when replaying schema commitlog main.cc: drop redundant supervisor::notify system_keyspace: flush if schema commitlog is not available	2023-10-18 11:14:31 +02:00
Gleb Natapov	f80fff3484	gossip: remove unused STATUS_LEAVING gossiper status The status is no longer used. The function that referenced it was removed by `5a96751534` and it was unused back then for awhile already. Message-Id: <ZS92mcGE9Ke5DfXB@scylladb.com>	2023-10-18 11:13:14 +02:00
Botond Dénes	7f81957437	Merge 'Initialize datadir for system and non-system keyspaces the same way' from Pavel Emelyanov When populating system keyspace the sstable_directory forgets to create upload/ subdir in the tables' datadir because of the way it's invoked from distributed loader. For non-system keyspaces directories are created in table::init_storage() which is self-contained and just creates the whole layout regardless of what. This PR makes system keyspace's tables use table::init_storage() as well so that the datadir layout is the same for all on-disk tables. Test included. fixes: #15708 closes: scylladb/scylla-manager#3603 Closes scylladb/scylladb#15723 * github.com:scylladb/scylladb: test: Add test for datadir/ layout sstable_directory: Indentation fix after previous patch db,sstables: Move storage init for system keyspace to table creation	2023-10-18 12:12:19 +03:00
David Garcia	51466dcb23	docs: add latest option to aws_images extension rollback only latest Closes scylladb/scylladb#15651	2023-10-18 11:43:21 +03:00
Petr Gusev	a0aee54f2c	add test_commitlog Check that commitlog provides durability in case of a node reboot: * truncate table T, truncation_record RP=1000; * clean shutdown node/reboot machine/restart node, now RP=~0 since segment ids count from boot time; * write some data to T; crash/restart * check data is retained	2023-10-17 18:16:50 +04:00
Calle Wilund	6fbd210679	system.truncated: Remove replay_position data from truncated on start Once we've started clean, and all replaying is done, truncation logs commit log regarding replay positions are invalid. We should exorcise them as soon as possible. Note that we cannot remove truncation data completely though, since the time stamps stored are used by things like batch log to determine if it should use or discard old batch data.	2023-10-17 18:16:48 +04:00
Petr Gusev	dde36b5d9d	main.cc: flush only local memtables when replaying schema commitlog Schema commitlog can be used only on shard 0, so it's redundant to flush any other memtables.	2023-10-17 18:15:51 +04:00
Petr Gusev	54dd7cf1da	main.cc: drop redundant supervisor::notify Later in the code we have 'replaying schema commit log', which duplicates this one. Also, maybe_init_schema_commitlog may skip schema commitlog initialization if the SCHEMA_COMMITLOG feature is not yet supported by the cluster, so this notification can be misleading.	2023-10-17 18:15:49 +04:00
Petr Gusev	c89ead55ff	system_keyspace: flush if schema commitlog is not available In PR #15279 we removed flushes when writing to a number of tables from the system keyspace. This was made possible by switching these tables to the schema commitlog. Schema commitlog is enabled only when the SCHEMA_COMMITLOG feature is supported by all nodes in the cluster. Before that these tables will use the regular commitlog, which is not durable because it uses db::commitlog::sync_mode::PERIODIC. This means that we may lose data if a node crashes during upgrade to the version with schema commitlog. In this commit we fix this problem by restoring flushes after writes to the tables if the schema commitlog is not enabled yet. The patch also contains a test that demonstrates the problem. We need flush_schema_tables_after_modification option since otherwise schema changes are not durable and node fails after restart.	2023-10-17 18:14:27 +04:00
Pavel Emelyanov	d59cd662f8	test: Add test for datadir/ layout The test checks that - for non-system keyspace datadir and its staging/ and upload/ subdirs are created when the table is created _and_ that the directory is re-populated on boot in case it was explicitly removed - for system non-virtual tables it checks that the same directory layout is created on boot - for system virtual tables it checks that the directory layout doesn't exist Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-16 16:26:48 +03:00
Pavel Emelyanov	c3b3e5b107	sstable_directory: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-16 16:26:37 +03:00
Pavel Emelyanov	059d7c795e	db,sstables: Move storage init for system keyspace to table creation User and system keyspaces are created and populated slightly differently. System keyspace is created via system_keyspace::make() which eventually calls calls add_column_family(). Then it's populated via init_system_keyspace() which calls sstable_directory::prepare() which, in turn, optionally creates directories in datadir/ or checks the directory permissions if it exists User keyspaces are created with the help of add_column_family_and_make_directory() call which calls the add_column_family() mentioned above _and_ calls table::init_storage() to create directories. When it's populated with init_non_system_keyspaces() it also calls sstable_directory::prepare() which notices that the directory exists and then checks the permissions. As a result, sstable_directory::prepare() initializes storage for system keyspace only and there's a BUG (#15708) that the upload/ subdir is not created. This patch makes the directories creation for _all_ keyspaces with the table::init_storage(). The change only touches system keyspace by moving the creation of directories from sstable_directory::prepare() into system_keyspace::make(). Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-16 16:19:25 +03:00
Patryk Jędrzejczak	7810e8d860	table_helper: announce twice in setup_keyspace We refactor table_helper::setup_keyspace so that it calls migration_manager::announce at most twice. We achieve it by announcing all tables at once. The number of announcements should further be reduced to one, but it requires a big refactor. The CQL code used in parse_new_cf_statement assumes the keyspace has already been created. We cannot have such an assumption if we want to announce a keyspace and its tables together. However, we shouldn't touch the CQL code as it would impact user requests, too. One solution is using schema_builder instead of the CQL statements to create tables in table_helper. Another approach is removing table_helper completely. It is used only for the system_traces keyspace, which Scylla creates automatically. We could refactor the way Scylla handles this keyspace and make table_helper unneeded.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	2b4e1e0f9c	table_helper: refactor setup_table In the following commit, we reduce migration_manager::announce calls in table_helper::setup_keyspace by announcing all tables together. To do it, we cannot use table_helper::setup_table anymore, which announces a single table itself. However, the new code still has to translate CQL statements, so we extract it to the new parse_new_cf_statement function to avoid duplication.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	fad71029f0	redis: create_keyspace_if_not_exists_impl: fix indentation Broken in the previous commit.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	a3044d1f46	redis: announce once in create_keyspace_if_not_exists_impl We refactor create_keyspace_if_not_exists_impl so that it takes at most one group 0 guard and calls migration_manager::announce at most once.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	98d067e77d	db: system_distributed_keyspace: fix indentation Broken in the previous commit.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	5ebc0e8617	db: system_distributed_keyspace: announce once in start We refactor system_distributed_keyspace::start so that it takes at most one group 0 guard and calls migration_manager::announce at most once. We remove a catch expression together with the FIXME from get_updated_service_levels (add_new_columns_if_missing before the patch) because we cannot treat the service_levels update differently anymore.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	449b4c79c2	tablet_allocator: update on_before_create_column_family After adding the keyspace_metadata parameter to migration_listener::on_before_create_column_family, tablet_allocator doesn't need to load it from the database. This change is necessary before merging migration_manager::announce calls in the following commit.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	7653059369	migration_listener: add parameter to on_before_create_column_family After adding the new prepare_new_column_family_announcement that doesn't assume the existence of a keyspace, we also need to get rid of the same assumption in all on_before_create_column_family calls. After all, they may be initiated before creating the keyspace. However, some listeners require keyspace_metadata, so we pass it as a new parameter.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	96d9e768c4	alternator: executor: use new prepare_new_column_family_announcement We can use the new prepare_new_column_family_announcement function that doesn't assume the existence of the keyspace instead of the previous work-around.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	fcd092473c	alternator: executor: introduce create_keyspace_metadata We need to store a new keyspace's keyspace_metadata as a local variable in create_table_on_shard0. In the following commit, we use it to call the new prepare_new_column_family_announcement function.	2023-10-16 14:59:53 +02:00
Patryk Jędrzejczak	7e6017d62d	migration_manager: add new prepare_new_column_family_announcement In the following commits, we reduce the number of the migration_manager::anounce calls by merging some of them in a way that logically makes sense. Some of these merges are similar -- we announce a new keyspace and its tables together. However, we cannot use the current prepare_new_column_family_announcement there because it assumes that the keyspace has already been created (when it loads the keyspace from the database). Luckily, this assumption is not necessary as this function only needs keyspace_metadata. Instead of loading it from the database, we can pass it as a parameter.	2023-10-16 14:59:53 +02:00
Aleksandra Martyniuk	198119f737	compaction: add get_progress method to compaction_task_impl compaction_task_impl::get_progress is used by the lowest level compaction tasks which progress can be taken from compaction_progress_monitor.	2023-10-12 17:16:05 +02:00
Aleksandra Martyniuk	39e96c6521	compaction: find total compaction size	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	7b3e0ab1f2	compaction: sstables: monitor validation scrub with compaction_read_generator Validation scrub bypasses the usual compaction machinery, though it still needs to be tracked with compaction_progress_monitor so that we could reach its progress from compaction task executor. Track sstable scrub in validate mode with read monitors.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	3553556708	compaction: keep compaction_progress_monitor in compaction_task_executor Keep compaction_progress_monitor in compaction_task_executor and pass a reference to it further, so that the compaction progress could be retrieved out of it.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	37da5a0638	compaction: use read monitor generator for all compactions Compaction read monitor generators are used in all compaction types. Classes which did not use _monitor_generator so far, create it with _use_backlog_tracker set to no, not to impact backlog tracker.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	22bf3c03df	compaction: add compaction_progress_monitor In the following patches compaction_read_monitor_generator will be used to find progress of compaction_task_executor's. To avoid unnecessary life prolongation and exposing internals of the class out of compaction.cc, compaction_progress_monitor is created. Compaction class keeps a reference to the compaction_progress_monitor. Inheriting classes which actually use compaction_read_monitor_generator, need to set it with set_generator method.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	b852ad25bf	compaction: add flag to compaction_read_monitor_generator Following patches will use compaction_read_monitor_generator to track progress of all types of compaction. Some of them should not be registered in compaction_backlog_tracker. _use_backlog_tracker flag, which is by default set to true, is added to compaction_read_monitor_generator and passed to all compaction_read_monitors created by this generator.	2023-10-12 17:03:46 +02:00
Kefu Chai	e76a02abc5	build: move check for NIX_CC into dynamic_linker_option() `employ_ld_trickery` is only used by `dynamic_linker_option()`, so move it into this function. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-09 11:11:57 +08:00
Kefu Chai	e85fc9f8be	build: extract dynamic_linker_option(): out this change helps to remove more global statements. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-09 11:11:57 +08:00
Kefu Chai	21b61e8f0a	build: move `headers` into write_build_file() `headers` is only used in this function, so move it closer to where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-09 11:11:57 +08:00
Kefu Chai	b3e5c8c348	build: cmake: pass -dynamic-linker to ld to match the behavior of `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-09 11:07:13 +08:00
Kefu Chai	ce46f7b91b	build: cmake: set CMAKE_EXE_LINKER_FLAGS in mode.common.cmake so that CMakeLists.txt is less cluttered. as we will append `--dynamic-linker` option to the LDFLAGS. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-09 11:07:13 +08:00
Kefu Chai	1efd0d9a92	test: set use_uuid to true by default in sstables::test_env for better coverage of uuid-based sstable identifier. since this option is enabled by default, this also match our tests with the default behavior of scylladb. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-07 18:56:47 +08:00
Kefu Chai	50c8619ed9	test: enable test to set uuid_sstable_identifiers some of the tests are still relying on the integer-based sstable identifier, so let's add a method to test_env, so that the tests relying on this can opt-out. we will change the default setting of sstables::test_env to use uuid-base sstable identifier in the next commit. this change does not change the existing behavior. it just adds a new knob to test_env_config. and let the tests relying on this to customize the test_env_config to disable use_uuid. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-07 18:56:47 +08:00
Tomasz Grabiec	7862ffbd14	perf_simple_query: Allow running with tablets	2023-10-06 23:49:15 +02:00
Tomasz Grabiec	0edb39715d	tests: cql_test_env: Allow creating keyspace with tablets	2023-10-06 23:49:15 +02:00
Tomasz Grabiec	0ff10c72de	tests: cql_test_env: Register storage_service in migration notifier The procedure in main already does this. Processing of tablet metadata on schema changes relies on this. Without this, creating a tablet-based table will fail on missing tablet map in token metadata because the listener in storage service does not fire.	2023-10-06 23:49:15 +02:00
Tomasz Grabiec	3c0d723ad4	test: cql_test_env: Initialize node state in topology This is necessary for using tablets with cql_test_env in tools like perf-simple-query. Otherwise, the test will fail with: Shard count not known for node c06a7e7f-ee6c-44e5-9257-09cdc5b2bb10 The existing tablets_test works because it creates its own topology bypassing the one in storage_service.	2023-10-06 23:49:15 +02:00
Anna Stuchlik	5d3584faa5	doc: add a note about Raft This commit adds a note to specify that the information on the Handling Failures page only refers to clusters with Raft enabled. Also, the comment is included to remove the note in future versions.	2023-10-06 16:04:43 +02:00
Pavel Emelyanov	e485c854b2	distributed_loader: Remove explicit sharded<erms> The sharded replication map was needed to provide sharded for sstable directory. Now it gets sharded via table reference and thus the erms thing becomes unused Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-06 15:57:45 +03:00
Pavel Emelyanov	c2eb1ae543	distributed_loader: Brush up start_subdir() Drop some local references to class members and line-up arguments to starting distributed sstable directory. Purely a clean up patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-06 15:57:03 +03:00
Pavel Emelyanov	795dcf2ead	sstable_directory: Add enlightened construction The existing constructor is pretty heavyweight for the distributed loader to use -- it needs to pass it 4 sharded parameters which looks pretty bulky in the text editor. However, 5 constructor arguments are obtained directly from the table, so the dist. loader code with global table pointer at hand can pass _it_ as sharded parameter and let the sstable directory extract what it needs. Sad news is that sstable_directory cannot be switched to just use table reference. Tools code doesn't have table at hand, but needs the facilities sstable_directory provides Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-06 15:54:51 +03:00
Pavel Emelyanov	e004469827	table: Add global_table_ptr::as_sharded_parameter() The method returns seastar::sharded_parameter<> for the global table that evaluates into local table reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-06 15:53:57 +03:00
Anna Stuchlik	eb5a9c535a	doc: add the quorum requirement to procedures This commit adds a note to the docs for cluster management that a quorum is required to add, remove, or replace a node, and update the schema.	2023-10-04 13:16:21 +02:00
Anna Stuchlik	bf25b5fe76	doc: add more failure info to Troubleshooting This commit adds new pages with reference to Handling Node Failures to Troubleshooting. The pages are: - Failure to Add, Remove, or Replace a Node (in the Cluster section) - Failure to Update the Schema (in the Data Modeling section)	2023-10-04 12:44:26 +02:00
Anna Stuchlik	8c4f9379d5	doc: move Handling Failures to Troubleshooting This commit moves the content of the Handling Failures section on the Raft page to the new Handling Node Failures page in the Troubleshooting section. Background: When Raft was experimental, the Handling Failures section was only applicable to clusters where Raft was explicitly enabled. Now that Raft is the default, the information about handling failures is relevant to all users.	2023-10-04 12:23:33 +02:00
Benny Halevy	bec489409e	row_cache: abort on exteral_updater::execute errors Currently the cache updaters aren't exception safe yet they are intended to be. Instead of allowing exceptions from `external_updater::execute` escape `row_cache::update`, abort using `on_fatal_internal_error`. Future changes should harden all `execute` implementations to effectively make them `noexcept`, then the pure virtual definition can be made `noexcept` to cement that. Fixes scylladb/scylladb#15576 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-28 09:11:04 +03:00
Benny Halevy	80bba3d4b7	row_cache: do_update: simplify _prev_snapshot_pos setup ring_position::min() is noexcept since `6d7ae4ead1` So no need to call it outside of the critical noexcept block. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-28 08:21:30 +03:00
Takuya ASADA	ea61b14f27	scylla_swap_setup: use fallocate on ext4 We stop using fallocate for allocating swap since it does not work on xfs (#6650). However, dd is much slower than fallocate since it filling data on the file, let's use fallocate when filesystem is ext4 since it actually works and faster. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2023-02-01 01:58:13 +09:00
Takuya ASADA	dffadabb94	scylla_swap_setup: run error check before allocating swap We should run error check before running dd, otherwise it will left swapfile on disk without completing swap setup. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2023-02-01 01:58:13 +09:00

2346 changed files with 121249 additions and 50890 deletions

225

.clang-format Normal file

View File

@@ -0,0 +1,225 @@
 ---
 Language: Cpp
 AccessModifierOffset: -4
 AlignAfterOpenBracket: Align
 AlignArrayOfStructures: None
 AlignConsecutiveAssignments:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: true
 AlignConsecutiveBitFields:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: false
 AlignConsecutiveDeclarations:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: false
 AlignConsecutiveMacros:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: false
 AlignConsecutiveShortCaseStatements:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCaseColons: false
 AlignEscapedNewlines: Right
 AlignOperands: Align
 AlignTrailingComments:
   Kind: Always
   OverEmptyLines: 0
 AllowAllArgumentsOnNextLine: true
 AllowAllParametersOfDeclarationOnNextLine: true
 AllowShortBlocksOnASingleLine: Never
 AllowShortCaseLabelsOnASingleLine: false
 AllowShortEnumsOnASingleLine: true
 AllowShortFunctionsOnASingleLine: InlineOnly
 AllowShortIfStatementsOnASingleLine: Never
 AllowShortLambdasOnASingleLine: All
 AllowShortLoopsOnASingleLine: false
 AlwaysBreakAfterDefinitionReturnType: None
 AlwaysBreakAfterReturnType: None
 AlwaysBreakBeforeMultilineStrings: false
 AlwaysBreakTemplateDeclarations: Yes
 AttributeMacros:
   - __capability
 BinPackArguments: false
 BinPackParameters: false
 BitFieldColonSpacing: Both
 BraceWrapping:
   AfterCaseLabel: false
   AfterClass: false
   AfterControlStatement: Never
   AfterEnum: false
   AfterExternBlock: false
   AfterFunction: false
   AfterNamespace: false
   AfterObjCDeclaration: false
   AfterStruct: false
   AfterUnion: false
   BeforeCatch: false
   BeforeElse: false
   BeforeLambdaBody: false
   BeforeWhile: false
   IndentBraces: false
   SplitEmptyFunction: true
   SplitEmptyRecord: true
   SplitEmptyNamespace: true
 BreakAfterAttributes: Never
 BreakAfterJavaFieldAnnotations: false
 BreakArrays: true
 BreakBeforeBinaryOperators: None
 BreakBeforeConceptDeclarations: Always
 BreakBeforeBraces: Attach
 BreakBeforeInlineASMColon: OnlyMultiline
 BreakBeforeTernaryOperators: true
 BreakConstructorInitializers: BeforeComma
 BreakInheritanceList: BeforeColon
 BreakStringLiterals: true
 ColumnLimit: 160
 CommentPragmas: '^ IWYU pragma:'
 CompactNamespaces: false
 ConstructorInitializerIndentWidth: 4
 ContinuationIndentWidth: 4
 Cpp11BracedListStyle: true
 DerivePointerAlignment: false
 DisableFormat: false
 EmptyLineAfterAccessModifier: Never
 EmptyLineBeforeAccessModifier: LogicalBlock
 ExperimentalAutoDetectBinPacking: false
 FixNamespaceComments: true
 ForEachMacros:
   - foreach
   - Q_FOREACH
   - BOOST_FOREACH
 IfMacros:
   - KJ_IF_MAYBE
 IncludeBlocks: Preserve
 IncludeCategories:
   - Regex: '^"(llvm|llvm-c|clang|clang-c)/'
     Priority: 2
     SortPriority: 0
     CaseSensitive: false
   - Regex: '^(<|"(gtest|gmock|isl|json)/)'
     Priority: 3
     SortPriority: 0
     CaseSensitive: false
   - Regex: '.*'
     Priority: 1
     SortPriority: 0
     CaseSensitive: false
 IncludeIsMainRegex: '(Test)?$'
 IncludeIsMainSourceRegex: ''
 IndentAccessModifiers: false
 IndentCaseBlocks: false
 IndentCaseLabels: false
 IndentExternBlock: AfterExternBlock
 IndentGotoLabels: true
 IndentPPDirectives: None
 IndentRequiresClause: true
 IndentWidth: 4
 IndentWrappedFunctionNames: false
 InsertBraces: false
 InsertNewlineAtEOF: true
 InsertTrailingCommas: None
 IntegerLiteralSeparator:
   Binary: 0
   BinaryMinDigits: 0
   Decimal: 0
   DecimalMinDigits: 0
   Hex: 0
   HexMinDigits: 0
 JavaScriptQuotes: Leave
 JavaScriptWrapImports: true
 KeepEmptyLinesAtTheStartOfBlocks: true
 KeepEmptyLinesAtEOF: false
 LambdaBodyIndentation: Signature
 LineEnding: DeriveLF
 MacroBlockBegin: ''
 MacroBlockEnd: ''
 MaxEmptyLinesToKeep: 2
 NamespaceIndentation: None
 PackConstructorInitializers: NextLine
 PenaltyBreakAssignment: 2
 PenaltyBreakBeforeFirstCallParameter: 19
 PenaltyBreakComment: 300
 PenaltyBreakFirstLessLess: 120
 PenaltyBreakOpenParenthesis: 0
 PenaltyBreakString: 1000
 PenaltyBreakTemplateDeclaration: 10
 PenaltyExcessCharacter: 1000000
 PenaltyIndentedWhitespace: 0
 PenaltyReturnTypeOnItsOwnLine: 60
 PointerAlignment: Left
 PPIndentWidth: -1
 QualifierAlignment: Leave
 ReferenceAlignment: Pointer
 ReflowComments: true
 RemoveBracesLLVM: false
 RemoveParentheses: Leave
 RemoveSemicolon: false
 RequiresClausePosition: OwnLine
 RequiresExpressionIndentation: OuterScope
 SeparateDefinitionBlocks: Leave
 ShortNamespaceLines: 1
 SortIncludes: CaseSensitive
 SortJavaStaticImport: Before
 SortUsingDeclarations: LexicographicNumeric
 SpaceAfterCStyleCast: false
 SpaceAfterLogicalNot: false
 SpaceAfterTemplateKeyword: true
 SpaceAroundPointerQualifiers: Default
 SpaceBeforeAssignmentOperators: true
 SpaceBeforeCaseColon: false
 SpaceBeforeCpp11BracedList: false
 SpaceBeforeCtorInitializerColon: true
 SpaceBeforeInheritanceColon: true
 SpaceBeforeJsonColon: false
 SpaceBeforeParens: ControlStatements
 SpaceBeforeParensOptions:
   AfterControlStatements: true
   AfterForeachMacros: true
   AfterFunctionDefinitionName: false
   AfterFunctionDeclarationName: false
   AfterIfMacros: true
   AfterOverloadedOperator: false
   AfterRequiresInClause: false
   AfterRequiresInExpression: false
   BeforeNonEmptyParentheses: false
 SpaceBeforeRangeBasedForLoopColon: true
 SpaceBeforeSquareBrackets: false
 SpaceInEmptyBlock: false
 SpacesBeforeTrailingComments: 1
 SpacesInAngles: Never
 SpacesInContainerLiterals: true
 SpacesInLineCommentPrefix:
   Minimum: 1
   Maximum: -1
 SpacesInParens: Never
 SpacesInParensOptions:
   InCStyleCasts: false
   InConditionalStatements: false
   InEmptyParentheses: false
   Other: false
 SpacesInSquareBrackets: false
 Standard: Latest
 TabWidth: 8
 UseTab: Never
 VerilogBreakBetweenInstancePorts: true
 WhitespaceSensitiveMacros:
   - BOOST_PP_STRINGIZE
   - CF_SWIFT_NAME
   - NS_SWIFT_NAME
   - PP_STRINGIZE
   - STRINGIZE
 ...

1

.gitattributes vendored

View File

@@ -1,3 +1,4 @@
 *.cc diff=cpp
 *.hh diff=cpp
 *.svg binary
 docs/_static/api/js/* binary

56

.github/CODEOWNERS vendored

View File

@@ -1,5 +1,5 @@
 # AUTH
 auth/* @elcallio @vladzcloudius
 auth/* @nuivall @ptrsmrn @KrzaQ
 # CACHE
 row_cache* @tgrabiec
@@ -7,9 +7,9 @@ row_cache* @tgrabiec
 test/boost/mvcc* @tgrabiec
 # CDC
 cdc/* @kbr- @elcallio @piodul @jul-stas
 test/cql/cdc_* @kbr- @elcallio @piodul @jul-stas
 test/boost/cdc_* @kbr- @elcallio @piodul @jul-stas
 cdc/* @kbr-scylla @elcallio @piodul
 test/cql/cdc_* @kbr-scylla @elcallio @piodul
 test/boost/cdc_* @kbr-scylla @elcallio @piodul
 # COMMITLOG / BATCHLOG
 db/commitlog/* @elcallio @eliransin
@@ -19,24 +19,24 @@ db/batch* @elcallio
 service/storage_proxy* @gleb-cloudius
 # COMPACTION
 compaction/* @raphaelsc @nyh
 compaction/* @raphaelsc
 # CQL TRANSPORT LAYER
 transport/*
 # CQL QUERY LANGUAGE
 cql3/* @tgrabiec @cvybhu @nyh
 cql3/* @tgrabiec @nuivall @ptrsmrn @KrzaQ
 # COUNTERS
 counters* @jul-stas
 tests/counter_test* @jul-stas
 counters* @nuivall @ptrsmrn @KrzaQ
 tests/counter_test* @nuivall @ptrsmrn @KrzaQ
 # DOCS
 docs/* @annastuchlik @tzach
 docs/alternator @annastuchlik @tzach @nyh @havaker @nuivall
 docs/alternator @annastuchlik @tzach @nyh @nuivall @ptrsmrn @KrzaQ
 # GOSSIP
 gms/* @tgrabiec @asias
 gms/* @tgrabiec @asias @kbr-scylla
 # DOCKER
 dist/docker/*
@@ -45,44 +45,44 @@ dist/docker/*
 utils/logalloc* @tgrabiec
 # MATERIALIZED VIEWS
 db/view/* @nyh @cvybhu @piodul
 cql3/statements/*view* @nyh @cvybhu @piodul
 test/boost/view_* @nyh @cvybhu @piodul
 db/view/* @nyh @piodul
 cql3/statements/*view* @nyh @piodul
 test/boost/view_* @nyh @piodul
 # PACKAGING
 dist/* @syuu1228
 # REPAIR
 repair/* @tgrabiec @asias @nyh
 repair/* @tgrabiec @asias
 # SCHEMA MANAGEMENT
 db/schema_tables* @tgrabiec @nyh
 db/legacy_schema_migrator* @tgrabiec @nyh
 service/migration* @tgrabiec @nyh
 schema* @tgrabiec @nyh
 db/schema_tables* @tgrabiec
 db/legacy_schema_migrator* @tgrabiec
 service/migration* @tgrabiec
 schema* @tgrabiec
 # SECONDARY INDEXES
 index/* @nyh @cvybhu @piodul
 cql3/statements/*index* @nyh @cvybhu @piodul
 test/boost/*index* @nyh @cvybhu @piodul
 index/* @nyh @piodul
 cql3/statements/*index* @nyh @piodul
 test/boost/*index* @nyh @piodul
 # SSTABLES
 sstables/* @tgrabiec @raphaelsc @nyh
 sstables/* @tgrabiec @raphaelsc
 # STREAMING
 streaming/* @tgrabiec @asias
 service/storage_service.* @tgrabiec @asias
 # ALTERNATOR
 alternator/* @nyh @havaker @nuivall
 test/alternator/* @nyh @havaker @nuivall
 alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
 test/alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
 # HINTED HANDOFF
 db/hints/* @piodul @vladzcloudius @eliransin
 # REDIS
 redis/* @nyh @syuu1228
 test/redis/* @nyh @syuu1228
 redis/* @syuu1228
 test/redis/* @syuu1228
 # READERS
 reader_* @denesb
@@ -94,8 +94,8 @@ test/boost/querier_cache_test.cc @denesb
 test/cql-pytest/* @nyh
 # RAFT
 raft/* @kbr- @gleb-cloudius @kostja
 test/raft/* @kbr- @gleb-cloudius @kostja
 raft/* @kbr-scylla @gleb-cloudius @kostja
 test/raft/* @kbr-scylla @gleb-cloudius @kostja
 # HEAT-WEIGHTED LOAD BALANCING
 db/heat_load_balance.* @nyh @gleb-cloudius

									
										20

.github/clang-include-cleaner.json
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,20 @@

				{

				    "problemMatcher": [

				        {

				            "owner": "clang-include-cleaner",

				            "severity": "error",

				            "pattern": [

				                {

				                    "regexp": "^([^\\-\\+].*)$",

				                    "file": 1

				                },

				                {

				                    "regexp": "^(-\\s+[^\\s]+)\\s+@Line:(\\d+)$",

				                    "line": 2,

				                    "message": 1,

				                    "loop": true

				                }

				            ]

				        }

				    ]

				}

									
										18

.github/clang-matcher.json
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,18 @@

				{

				    "problemMatcher": [

				        {

				            "owner": "clang",

				            "pattern": [

				                {

				                    "regexp": "^([^:]+):(\\d+):(\\d+):\\s+(warning|error):\\s+(.*?)\\s+\\[(.*?)\\]$",

				                    "file": 1,

				                    "line": 2,

				                    "column": 3,

				                    "severity": 4,

				                    "message": 5,

				                    "code": 6

				                }

				            ]

				        }

				    ]

				}

									
										92

.github/mergify.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,92 @@

				pull_request_rules:

				  - name: put PR in draft if conflicts

				    conditions:

				      - label = conflicts

				      - author = mergify[bot]

				      - head ~= ^mergify/

				    actions:

				      edit:

				         draft: true

				  - name: Delete mergify backport branch

				    conditions:

				      - base~=branch-

				      - or:

				        - merged

				        - closed

				    actions:

				      delete_head_branch:

				  - name: Automate backport pull request 6.1

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/6.1 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 6.1] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				           Refs #{{number}}

				        branches:

				          - branch-6.1

				        assignees:

				          - "{{ author }}"

				  - name: Automate backport pull request 5.4

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/5.4 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 5.4] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				          Refs #{{number}}

				        branches:

				          - branch-5.4

				        assignees:

				          - "{{ author }}"

				  - name: Automate backport pull request 6.0

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/6.0 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 6.0] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				           Refs #{{number}}

				        branches:

				          - branch-6.0

				        assignees:

				          - "{{ author }}"

1

.github/pull_request_template.md vendored Normal file

View File

				`@@ -0,0 +1 @@`
				`*Please replace this line with justification for the backport/\ labels added to this PR**`

									
										186

.github/scripts/auto-backport.py
									
										vendored
									
										Executable file
									
												View File
												
				@@ -0,0 +1,186 @@

				#!/usr/bin/env python3

				import argparse

				import os

				import re

				import sys

				import tempfile

				import logging

				from github import Github, GithubException

				from git import Repo, GitCommandError

				logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				except KeyError:

				    print("Please set the 'GITHUB_TOKEN' environment variable")

				    sys.exit(1)

				def is_pull_request():

				    return '--pull-request' in sys.argv[1:]

				def parse_args():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--repo', type=str, required=True, help='Github repository name')

				    parser.add_argument('--base-branch', type=str, default='refs/heads/master', help='Base branch')

				    parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')

				    parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')

				    parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')

				    return parser.parse_args()

				def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):

				    pr_body = f'{pr.body}\n\n'

				    for commit in commits:

				        pr_body += f'- (cherry picked from commit {commit})\n\n'

				    pr_body += f'Parent PR: #{pr.number}'

				    try:

				        backport_pr = repo.create_pull(

				            title=backport_pr_title,

				            body=pr_body,

				            head=f'scylladbbot:{new_branch_name}',

				            base=base_branch_name,

				            draft=is_draft

				        )

				        logging.info(f"Pull request created: {backport_pr.html_url}")

				        backport_pr.add_to_assignees(pr.user)

				        if is_draft:

				            backport_pr.add_to_labels("conflicts")

				            pr_comment = f"@{pr.user} - This PR was marked as draft because it has conflicts\n"

				            pr_comment += "Please resolve them and mark this PR as ready for review"

				            backport_pr.create_issue_comment(pr_comment)

				        logging.info(f"Assigned PR to original author: {pr.user}")

				        return backport_pr

				    except GithubException as e:

				        if 'A pull request already exists' in str(e):

				            logging.warning(f'A pull request already exists for {pr.user}:{new_branch_name}')

				        else:

				            logging.error(f'Failed to create PR: {e}')

				def get_pr_commits(repo, pr, stable_branch, start_commit=None):

				    commits = []

				    if pr.merged:

				        merge_commit = repo.get_commit(pr.merge_commit_sha)

				        if len(merge_commit.parents) > 1:  # Check if this merge commit includes multiple commits

				            commits.append(pr.merge_commit_sha)

				        else:

				            if start_commit:

				                promoted_commits = repo.compare(start_commit, stable_branch).commits

				            else:

				                promoted_commits = repo.get_commits(sha=stable_branch)

				            for commit in pr.get_commits():

				                for promoted_commit in promoted_commits:

				                    commit_title = commit.commit.message.splitlines()[0]

				                    # In Scylla-pkg and scylla-dtest, for example,

				                    # we don't create a merge commit for a PR with multiple commits,

				                    # according to the GitHub API, the last commit will be the merge commit,

				                    # which is not what we need when backporting (we need all the commits).

				                    # So here, we are validating the correct SHA for each commit so we can cherry-pick

				                    if promoted_commit.commit.message.startswith(commit_title):

				                        commits.append(promoted_commit.sha)

				    elif pr.state == 'closed':

				        events = pr.get_issue_events()

				        for event in events:

				            if event.event == 'closed':

				                commits.append(event.commit_id)

				    return commits

				def create_pr_comment_and_remove_label(pr, comment_body):

				    labels = pr.get_labels()

				    pattern = re.compile(r"backport/\d+\.\d+$")

				    for label in labels:

				        if pattern.match(label.name):

				            print(f"Removing label: {label.name}")

				            comment_body += f'- {label.name}\n'

				            pr.remove_from_labels(label)

				    pr.create_issue_comment(comment_body)

				def backport(repo, pr, version, commits, backport_base_branch):

				    new_branch_name = f'backport/{pr.number}/to-{version}'

				    backport_pr_title = f'[Backport {version}] {pr.title}'

				    repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'

				    fork_repo = f'https://scylladbbot:{github_token}@github.com/scylladbbot/{repo.name}.git'

				    with (tempfile.TemporaryDirectory() as local_repo_path):

				        try:

				            repo_local = Repo.clone_from(repo_url, local_repo_path, branch=backport_base_branch)

				            repo_local.git.checkout(b=new_branch_name)

				            is_draft = False

				            for commit in commits:

				                try:

				                    repo_local.git.cherry_pick(commit, '-m1', '-x')

				                except GitCommandError as e:

				                    logging.warning(f'Cherry-pick conflict on commit {commit}: {e}')

				                    is_draft = True

				                    repo_local.git.add(A=True)

				                    repo_local.git.cherry_pick('--continue')

				            if not repo.private and not repo.has_in_collaborators(pr.user.login):

				                repo.add_to_collaborators(pr.user.login, permission="push")

				                comment = f':warning:  @{pr.user.login} you have been added as collaborator to scylladbbot fork '

				                comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again'

				                create_pr_comment_and_remove_label(pr, comment)

				                return

				            repo_local.git.push(fork_repo, new_branch_name, force=True)

				            create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,

				                                is_draft=is_draft)

				        except GitCommandError as e:

				            logging.warning(f"GitCommandError: {e}")

				def main():

				    args = parse_args()

				    base_branch = args.base_branch.split('/')[2]

				    promoted_label = 'promoted-to-master'

				    repo_name = args.repo

				    if 'scylla-enterprise' in args.repo:

				        promoted_label = 'promoted-to-enterprise'

				    stable_branch = base_branch

				    backport_branch = 'branch-'

				    backport_label_pattern = re.compile(r'backport/\d+\.\d+$')

				    g = Github(github_token)

				    repo = g.get_repo(repo_name)

				    closed_prs = []

				    start_commit = None

				    if args.commits:

				        start_commit, end_commit = args.commits.split('..')

				        commits = repo.compare(start_commit, end_commit).commits

				        for commit in commits:

				            match = re.search(rf"Closes .*#([0-9]+)", commit.commit.message, re.IGNORECASE)

				            if match:

				                pr_number = int(match.group(1))

				                pr = repo.get_pull(pr_number)

				                closed_prs.append(pr)

				    if args.pull_request:

				        start_commit = args.head_commit

				        pr = repo.get_pull(args.pull_request)

				        closed_prs = [pr]

				    for pr in closed_prs:

				        labels = [label.name for label in pr.labels]

				        backport_labels = [label for label in labels if backport_label_pattern.match(label)]

				        if promoted_label not in labels:

				            print(f'no {promoted_label} label: {pr.number}')

				            continue

				        if not backport_labels:

				            print(f'no backport label: {pr.number}')

				            continue

				        commits = get_pr_commits(repo, pr, stable_branch, start_commit)

				        logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")

				        for backport_label in backport_labels:

				            version = backport_label.replace('backport/', '')

				            backport_base_branch = backport_label.replace('backport/', backport_branch)

				            backport(repo, pr, version, commits, backport_base_branch)

				if __name__ == "__main__":

				    main()

									
										32

.github/scripts/label_promoted_commits.py
									
										vendored
									
												View File
												
				@@ -1,8 +1,9 @@

				from github import Github

				import argparse

				import re

				import sys

				import os

				from github import Github

				from github.GithubException import UnknownObjectException

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				@@ -15,13 +16,8 @@ def parser():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--repository', type=str, required=True,

				                        help='Github repository name (e.g., scylladb/scylladb)')

				    parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('

				                                                                               'newest commit).')

				    parser.add_argument('--commit_after_merge', type=str, required=True,

				                        help='Git commit ID to end labeling at (oldest '

				                             'commit, exclusive).')

				    parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '

				                                                                         'done')

				    parser.add_argument('--commits', type=str, required=True, help='Range of promoted commits.')

				    parser.add_argument('--label', type=str, default='promoted-to-master', help='Label to use')

				    parser.add_argument('--ref', type=str, required=True, help='PR target branch')

				    return parser.parse_args()

				@@ -52,10 +48,11 @@ def main():

				    target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)

				    g = Github(github_token)

				    repo = g.get_repo(args.repository, lazy=False)

				    commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)

				    start_commit, end_commit = args.commits.split('..')

				    commits = repo.compare(start_commit, end_commit).commits

				    processed_prs = set()

				    # Print commit information

				    for commit in commits.commits:

				    for commit in commits:

				        print(f'Commit sha is: {commit.sha}')

				        match = pr_pattern.search(commit.commit.message)

				        if match:

				@@ -65,21 +62,24 @@ def main():

				            if target_branch:

				                pr = repo.get_pull(pr_number)

				                branch_name = target_branch[1]

				                refs_pr = re.findall(r'Refs (?:#|https.*?)(\d+)', pr.body)

				                refs_pr = re.findall(r'Parent PR: (?:#|https.*?)(\d+)', pr.body)

				                if refs_pr:

				                    print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')

				                    # 1. change the backport label of the parent PR to note that

				                    #    we've merge the corresponding backport PR

				                    #    we've merged the corresponding backport PR

				                    # 2. close the backport PR and leave a comment on it to note

				                    #    that it has been merged with a certain git commit,

				                    #    that it has been merged with a certain git commit.

				                    ref_pr_number = refs_pr[0]

				                    mark_backport_done(repo, ref_pr_number, branch_name)

				                    comment = f'Closed via {commit.sha}'

				                    add_comment_and_close_pr(pr, comment)

				            else:

				                print(f'master branch, pr number is: {pr_number}')

				                pr = repo.get_pull(pr_number)

				                pr.add_to_labels('promoted-to-master')

				                try:

				                    pr = repo.get_pull(pr_number)

				                    pr.add_to_labels('promoted-to-master')

				                    print(f'master branch, pr number is: {pr_number}')

				                except UnknownObjectException:

				                    print(f'{pr_number} is not a PR but an issue, no need to add label')

				            processed_prs.add(pr_number)

									
										95

.github/scripts/sync_labels.py
									
										vendored
									
										Executable file
									
												View File
												
				@@ -0,0 +1,95 @@

				#!/usr/bin/env python3

				import argparse

				import os

				import sys

				from github import Github

				import re

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				except KeyError:

				    print("Please set the 'GITHUB_TOKEN' environment variable")

				    sys.exit(1)

				def parser():

				    parse = argparse.ArgumentParser()

				    parse.add_argument('--repo', type=str, required=True, help='Github repository name (e.g., scylladb/scylladb)')

				    parse.add_argument('--number', type=int, required=True, help='Pull request or issue number to sync labels from')

				    parse.add_argument('--label', type=str, default=None, help='Label to add/remove from an issue or PR')

				    parse.add_argument('--is_issue', action='store_true', help='Determined if label change is in Issue or not')

				    parse.add_argument('--action', type=str, choices=['opened', 'labeled', 'unlabeled'], required=True, help='Sync labels action')

				    return parse.parse_args()

				def copy_labels_from_linked_issues(repo, pr_number):

				    pr = repo.get_pull(pr_number)

				    if pr.body:

				        linked_issue_numbers = set(re.findall(r'Fixes:? (?:#|https.*?/issues/)(\d+)', pr.body))

				        for issue_number in linked_issue_numbers:

				            try:

				                issue = repo.get_issue(int(issue_number))

				                for label in issue.labels:

				                    pr.add_to_labels(label.name)

				                print(f"Labels from issue #{issue_number} copied to PR #{pr_number}")

				            except Exception as e:

				                print(f"Error processing issue #{issue_number}: {e}")

				def get_linked_pr_from_issue_number(repo, number):

				    linked_prs = []

				    for pr in repo.get_pulls(state='all', base='master'):

				        if pr.body and f'{number}' in pr.body:

				            linked_prs.append(pr.number)

				            break

				        else:

				            continue

				    return linked_prs

				def get_linked_issues_based_on_pr_body(repo, number):

				    pr = repo.get_pull(number)

				    repo_name = repo.full_name

				    pattern = rf"(?:fix(?:|es|ed)|resolve(?:|d|s))\s*:?\s*(?:(?:(?:{repo_name})?#)|https://github\.com/{repo_name}/issues/)(\d+)"

				    issue_number_from_pr_body = []

				    if pr.body is None:

				        return issue_number_from_pr_body

				    matches = re.findall(pattern, pr.body, re.IGNORECASE)

				    if matches:

				        for match in matches:

				            issue_number_from_pr_body.append(match)

				            print(f"Found issue number: {match}")

				    return issue_number_from_pr_body

				def sync_labels(repo, number, label, action, is_issue=False):

				    if is_issue:

				        linked_prs_or_issues = get_linked_pr_from_issue_number(repo, number)

				    else:

				        linked_prs_or_issues = get_linked_issues_based_on_pr_body(repo, number)

				    for pr_or_issue_number in linked_prs_or_issues:

				        if is_issue:

				            target = repo.get_issue(pr_or_issue_number)

				        else:

				            target = repo.get_issue(int(pr_or_issue_number))

				        if action == 'labeled':

				            target.add_to_labels(label)

				            print(f"Label '{label}' successfully added.")

				        elif action == 'unlabeled':

				            target.remove_from_labels(label)

				            print(f"Label '{label}' successfully removed.")

				        elif action == 'opened':

				            copy_labels_from_linked_issues(repo, number)

				        else:

				            print("Invalid action. Use 'labeled', 'unlabeled' or 'opened'.")

				def main():

				    args = parser()

				    github = Github(github_token)

				    repo = github.get_repo(args.repo)

				    sync_labels(repo, args.number, args.label, args.action, args.is_issue)

				if __name__ == "__main__":

				    main()

									
										51

.github/workflows/add-label-when-promoted.yaml
									
										vendored
									
												View File
												
				@@ -5,9 +5,10 @@ on:

				    branches:

				      - master

				      - branch-*.*

				env:

				  DEFAULT_BRANCH: 'master'

				      - enterprise

				    pull_request_target:

				      types: [labeled]

				      branches: [master, next, enterprise]

				jobs:

				  check-commit:

				@@ -20,17 +21,51 @@ jobs:

				        env:

				          GITHUB_CONTEXT: ${{ toJson(github) }}

				        run: echo "$GITHUB_CONTEXT"

				      - name: Set Default Branch

				        id: set_branch

				        run: |

				          if [[ "${{ github.repository }}" == *enterprise* ]]; then

				            echo "DEFAULT_BRANCH=enterprise" >> $GITHUB_ENV

				          else

				            echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV

				          fi

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				          repository: ${{ github.repository }}

				          ref: ${{ env.DEFAULT_BRANCH }}

				          token: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				          fetch-depth: 0  # Fetch all history for all tags and branches

				      - name: Set up Git identity

				        run: |

				          git config --global user.name "GitHub Action"

				          git config --global user.email "action@github.com"

				          git config --global merge.conflictstyle diff3

				      - name: Install dependencies

				        run: sudo apt-get install -y python3-github

				        run: sudo apt-get install -y python3-github python3-git

				      - name: Run python script

				        if: github.event_name == 'push'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --ref ${{ github.ref }}

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/label_promoted_commits.py  --commits ${{ github.event.before }}..${{ github.sha }} --repository ${{ github.repository }} --ref ${{ github.ref }}

				      - name: Run auto-backport.py when promotion completed

				        if: ${{ github.event_name == 'push' && github.ref == format('refs/heads/{0}', env.DEFAULT_BRANCH) }}

				        env:

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}

				      - name: Check if label starts with 'backport/' and contains digits

				        id: check_label

				        run: |

				          label_name="${{ github.event.label.name }}"

				          if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then

				            echo "Label matches backport/X.X pattern."

				            echo "backport_label=true" >> $GITHUB_OUTPUT

				          else

				            echo "Label does not match the required pattern."

				            echo "backport_label=false" >> $GITHUB_OUTPUT

				          fi

				      - name: Run auto-backport.py when label was added

				        if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && github.event.pull_request.state == 'closed' }}

				        env:

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}

									
										33

.github/workflows/backport-pr-fixes-validation.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,33 @@

				name: Fixes validation for backport PR

				on:

				  pull_request:

				    types: [opened, reopened, edited]

				    branches: [branch-*]

				jobs:

				  check-fixes-prefix:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Check PR body for "Fixes" prefix patterns

				        uses: actions/github-script@v7

				        with:

				          script: |

				            const body = context.payload.pull_request.body;

				            const repo = context.payload.repository.full_name;

				            // Regular expression pattern to check for "Fixes" prefix

				            // Adjusted to dynamically insert the repository full name

				            const pattern = `Fixes:? (?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)`;

				            const regex = new RegExp(pattern);

				            if (!regex.test(body)) {

				              const error = "PR body does not contain a valid 'Fixes' reference.";

				              core.setFailed(error);

				              await github.rest.issues.createComment({

				                issue_number: context.issue.number,

				                owner: context.repo.owner,

				                repo: context.repo.repo,

				                body: `:warning: ${error}`

				              });

				            }

									
										39

.github/workflows/build-scylla.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,39 @@

				name: Build Scylla

				on:

				  workflow_call:

				    inputs:

				      build_mode:

				        description: 'the build mode'

				        type: string

				        required: true

				    outputs:

				      md5sum:

				        description: 'the md5sum for scylla executable'

				        value: ${{ jobs.build.outputs.md5sum }}

				jobs:

				  read-toolchain:

				    uses: ./.github/workflows/read-toolchain.yaml

				  build:

				    if: github.repository == 'scylladb/scylladb'

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    outputs:

				      md5sum: ${{ steps.checksum.outputs.md5sum }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: recursive

				      - name: Generate the building system

				        run: |

				          git config --global --add safe.directory $GITHUB_WORKSPACE

				          ./configure.py --mode ${{ inputs.build_mode }} --with scylla

				      - run: |

				          ninja build/${{ inputs.build_mode }}/scylla

				      - id: checksum

				        run: |

				          checksum=$(md5sum build/${{ inputs.build_mode }}/scylla | cut -c -32)

				          echo "md5sum=$checksum" >> $GITHUB_OUTPUT

									
										66

.github/workflows/clang-nightly.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,66 @@

				name: clang-nightly

				on:

				  schedule:

				    # only at 5AM Saturday

				    - cron: '0 5 * * SAT'

				env:

				  # use the development branch explicitly

				  CLANG_VERSION: 20

				  BUILD_DIR: build

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  clang-dev:

				    name: Build with clang nightly

				    if: github.repository == 'scylladb/scylladb'

				    runs-on: ubuntu-latest

				    container: fedora:40

				    strategy:

				      matrix:

				        build_type:

				          - Debug

				          - RelWithDebInfo

				          - Dev

				    steps:

				      - run: |

				          sudo dnf -y install git

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Install build dependencies

				        run: |

				          # use the copr repo for llvm snapshot builds, see

				          # https://copr.fedorainfracloud.org/coprs/g/fedora-llvm-team/llvm-snapshots/

				          sudo dnf -y install 'dnf-command(copr)'

				          sudo dnf copr enable -y @fedora-llvm-team/llvm-snapshots

				          # do not install java dependencies, which is not only not used here

				          sed -i.orig \

				            -e '/tools\/.*\/install-dependencies.sh/d' \

				            -e 's/(minio_download_jobs)/(true)/' \

				            ./install-dependencies.sh

				          sudo ./install-dependencies.sh

				          sudo dnf -y install lld

				      - name: Generate the building system

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \

				            -DCMAKE_C_COMPILER=clang-$CLANG_VERSION     \

				            -DCMAKE_CXX_COMPILER=clang++-$CLANG_VERSION \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      # see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md

				      - run: |

				          echo "::add-matcher::.github/clang-matcher.json"

				      - run: |

				          cmake --build $BUILD_DIR --target scylla

				      - run: |

				          echo "::remove-matcher owner=clang::"

									
										64

.github/workflows/clang-tidy.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,64 @@

				name: clang-tidy

				on:

				  pull_request:

				    branches:

				      - master

				    paths-ignore:

				      - '**/*.rst'

				      - '**/*.md'

				      - 'docs/**'

				      - '.github/**'

				  workflow_dispatch:

				env:

				  BUILD_TYPE: RelWithDebInfo

				  BUILD_DIR: build

				  CLANG_TIDY_CHECKS: '-*,bugprone-use-after-move'

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  read-toolchain:

				    uses: ./.github/workflows/read-toolchain.yaml

				  clang-tidy:

				    name: Run clang-tidy

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    steps:

				      - env:

				          IMAGE: ${{ needs.read-toolchain.image }}

				        run: |

				          echo ${{ needs.read-toolchain.image }}

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          sudo dnf -y install clang-tools-extra

				      - name: Generate the building system

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=$BUILD_TYPE              \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DScylla_USE_LINKER=ld.lld                  \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON          \

				            -DCMAKE_CXX_CLANG_TIDY="clang-tidy;--checks=$CLANG_TIDY_CHECKS" \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      # see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md

				      - run: |

				          echo "::add-matcher::.github/clang-matcher.json"

				      - name: Build with clang-tidy enabled

				        run: |

				          cmake --build $BUILD_DIR --target scylla

				      - run: |

				          echo "::remove-matcher owner=clang::"

									
										17

.github/workflows/codespell.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,17 @@

				name: codespell

				on:

				  pull_request:

				    branches:

				      - master

				permissions: {}

				jobs:

				  codespell:

				    name: Check for spelling errors

				    runs-on: ubuntu-latest

				    steps:

				      - uses: actions/checkout@v4

				      - uses: codespell-project/actions-codespell@master

				        with:

				          only_warn: 1

				          ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison"

				          skip: "./.git,./build,./tools,*.js,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

									
										17

.github/workflows/docs-amplify-enhanced.yaml
									
										vendored
									
												View File
											
				@@ -1,17 +0,0 @@

				name: "Docs / Amplify enhanced"

				on: issue_comment

				jobs:

				  build:

				    runs-on: ubuntu-latest

				    if: ${{ github.event.issue.pull_request }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          fetch-depth: 0

				      - name: Amplify enhanced

				        env:

				          TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        uses: scylladb/sphinx-scylladb-theme/.github/actions/amplify-enhanced@master

									
										9

.github/workflows/docs-pages.yaml
									
										vendored
									
												View File
												
				@@ -4,12 +4,14 @@ name: "Docs / Publish"

				env:

				  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

				  DEFAULT_BRANCH: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'master' }}

				on:

				  push:

				    branches:

				      - 'master'

				      - 'enterprise'

				      - 'branch-**'

				    paths:

				      - "docs/**"

				  workflow_dispatch:

				@@ -19,14 +21,15 @@ jobs:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        uses: actions/checkout@v4

				        with:

				          ref: ${{ env.DEFAULT_BRANCH }}

				          persist-credentials: false

				          fetch-depth: 0

				      - name: Set up Python

				        uses: actions/setup-python@v3

				        uses: actions/setup-python@v5

				        with:

				          python-version: 3.7

				          python-version: "3.10"

				      - name: Set up env

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

									
										9

.github/workflows/docs-pr.yaml
									
										vendored
									
												View File
												
				@@ -12,20 +12,21 @@ on:

				      - enterprise

				    paths:

				      - "docs/**"

				      - "db/config.hh"

				      - "db/config.cc"

				jobs:

				  build:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        uses: actions/checkout@v4

				        with:

				          persist-credentials: false

				          fetch-depth: 0

				      - name: Set up Python

				        uses: actions/setup-python@v3

				        uses: actions/setup-python@v5

				        with:

				          python-version: 3.7

				          python-version: "3.10"

				      - name: Set up env

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

									
										80

.github/workflows/iwyu.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,80 @@

				name: iwyu

				on:

				  pull_request:

				    branches:

				      - master

				env:

				  BUILD_TYPE: RelWithDebInfo

				  BUILD_DIR: build

				  CLEANER_OUTPUT_PATH: build/clang-include-cleaner.log

				  CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  read-toolchain:

				    uses: ./.github/workflows/read-toolchain.yaml

				  clang-include-cleaner:

				    name: "Analyze #includes in source files"

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          sudo dnf -y install clang-tools-extra

				      - name: Generate compilation database

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=$BUILD_TYPE              \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON          \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      - name: Build headers

				        run: |

				          swagger_targets=''

				          for f in api/api-doc/*.json; do

				            if test "${f#*.}" = json; then

				              name=$(basename "$f" .json)

				              if test $name != swagger20_header; then

				                swagger_targets+=" scylla_swagger_gen_$name"

				              fi

				            fi

				          done

				          cmake                                         \

				            --build build                               \

				             --target seastar_http_request_parser       \

				             --target idl-sources                       \

				             --target $swagger_targets

				      - run: |

				          echo "::add-matcher::.github/clang-include-cleaner.json"

				      - name: clang-include-cleaner

				        run: |

				          for d in $CLEANER_DIRS; do

				            find $d -name '*.cc' -o -name '*.hh'          \

				              -exec echo {} \;                            \

				              -exec clang-include-cleaner                 \

				                --ignore-headers=seastarx.hh              \

				                --print=changes                           \

				                -p $BUILD_DIR                             \

				                {} \; | tee --append $CLEANER_OUTPUT_PATH

				          done

				      - run: |

				          echo "::remove-matcher owner=clang-include-cleaner::"

				      - uses: actions/upload-artifact@v4

				        with:

				          name: Logs (clang-include-cleaner)

				          path: "./${{ env.CLEANER_OUTPUT_PATH }}"

									
										27

.github/workflows/make-pr-ready-for-review.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,27 @@

				name: Mark PR as Ready When Conflicts Label is Removed

				on:

				  pull_request_target:

				    types:

				      - unlabeled

				env:

				  DEFAULT_BRANCH: 'master'

				jobs:

				  mark-ready:

				    if: github.event.label.name == 'conflicts'

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				          repository: ${{ github.repository }}

				          ref: ${{ env.DEFAULT_BRANCH }}

				          token: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				          fetch-depth: 1

				      - name: Mark pull request as ready for review

				        run:  gh pr ready "${{ github.event.pull_request.number }}"

				        env:

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

									
										22

.github/workflows/pr-require-backport-label.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,22 @@

				name: PR require backport label

				on:

				  pull_request:

				    types: [opened, labeled, unlabeled, synchronize]

				    branches:

				      - master

				      - next

				jobs:

				  label:

				    if: github.event.pull_request.draft == false

				    runs-on: ubuntu-latest

				    permissions:

				      issues: write

				      pull-requests: write

				    steps:

				      - uses: mheap/github-action-required-labels@v5

				        with:

				          mode: minimum

				          count: 1

				          labels: "backport/none\nbackport/\\d.\\d"

				          use_regex: true

				          add_comment: false

									
										23

.github/workflows/read-toolchain.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,23 @@

				name: Read Toolchain

				on:

				  workflow_call:

				    outputs:

				      image:

				        description: "the toolchain docker image"

				        value: ${{ jobs.read-toolchain.outputs.image }}

				jobs:

				  read-toolchain:

				    runs-on: ubuntu-latest

				    outputs:

				      image: ${{ steps.read.outputs.image }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          sparse-checkout: tools/toolchain/image

				          sparse-checkout-cone-mode: false

				      - id: read

				        run: |

				          image=$(cat tools/toolchain/image)

				          echo "image=$image" >> $GITHUB_OUTPUT

									
										35

.github/workflows/reproducible-build.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				name: Check Reproducible Build

				on:

				  schedule:

				    # 5AM every friday

				    - cron: '0 5 * * FRI'

				permissions: {}

				env:

				  BUILD_MODE: release

				jobs:

				  build-a:

				    uses: ./.github/workflows/build-scylla.yaml

				    with:

				      build_mode: release

				  build-b:

				    uses: ./.github/workflows/build-scylla.yaml

				    with:

				      build_mode: release

				  compare-checksum:

				    if: github.repository == 'scylladb/scylladb'

				    runs-on: ubuntu-latest

				    needs:

				      - build-a

				      - build-b

				    steps:

				      - env:

				          CHECKSUM_A: ${{needs.build-a.outputs.md5sum}}

				          CHECKSUM_B: ${{needs.build-b.outputs.md5sum}}

				        run: |

				          if [ $CHECKSUM_A != $CHECKSUM_B ]; then                             \

				            echo "::error::mismatched checksums: $CHECKSUM_A != $CHECKSUM_B"; \

				            exit 1;                                                           \

				          fi

									
										50

.github/workflows/seastar.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,50 @@

				name: Build with the latest Seastar

				on:

				  schedule:

				    # 5AM everyday

				    - cron: '0 5 * * *'

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				env:

				  BUILD_DIR: build

				jobs:

				  build-with-the-latest-seastar:

				    runs-on: ubuntu-latest

				    # be consistent with tools/toolchain/image

				    container: scylladb/scylla-toolchain:fedora-40-20240621

				    strategy:

				      matrix:

				        build_type:

				          - Debug

				          - RelWithDebInfo

				          - Dev

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          rm -rf seastar

				      - uses: actions/checkout@v4

				        with:

				          repository: scylladb/seastar

				          submodules: true

				          path: seastar

				      - name: Generate the building system

				        run: |

				          git config --global --add safe.directory $GITHUB_WORKSPACE

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      - run: |

				          cmake --build $BUILD_DIR --target scylla

									
										49

.github/workflows/sync-labels.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,49 @@

				name: Sync labels

				on:

				  pull_request_target:

				    types: [opened, labeled, unlabeled]

				    branches: [master, next]

				  issues:

				    types: [labeled, unlabeled]

				jobs:

				  label-sync:

				    if: ${{ github.repository == 'scylladb/scylladb' }}

				    name:  Synchronize labels between PR and the issue(s) fixed by it

				    runs-on: ubuntu-latest

				    permissions:

				      pull-requests: write

				      issues: write

				    steps:

				      - name: Dump GitHub context

				        env:

				          GITHUB_CONTEXT: ${{ toJson(github) }}

				        run: echo "$GITHUB_CONTEXT"

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				          sparse-checkout: |

				            .github/scripts/sync_labels.py

				          sparse-checkout-cone-mode: false

				      - name: Install dependencies

				        run: sudo apt-get install -y python3-github

				      - name: Pull request opened event

				        if: ${{ github.event.action == 'opened' }}

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}

				      - name: Pull request labeled or unlabeled event

				        if: github.event_name == 'pull_request_target' && startsWith(github.event.label.name, 'backport/')

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}

				      - name: Issue labeled or unlabeled event

				        if: github.event_name == 'issues' && startsWith(github.event.label.name, 'backport/')

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.issue.number }} --action ${{ github.event.action }} --is_issue --label ${{ github.event.label.name }}

6

.gitignore vendored

View File

@@ -3,6 +3,8 @@
 .settings
 build
 build.ninja
 cmake-build-*
 build.ninja.new
 cscope.*
 /debian/
 dist/ami/files/*.rpm
@@ -12,13 +14,14 @@ dist/ami/scylla_deploy.sh
 Cql.tokens
 .kdev4
 *.kdev4
 .idea
 CMakeLists.txt.user
 .cache
 .tox
 *.egg-info
 __pycache__CMakeLists.txt.user
 .gdbinit
 resources
 /resources
 .pytest_cache
 /expressions.tokens
 tags
@@ -30,3 +33,4 @@ compile_commands.json
 .ccls-cache/
 .mypy_cache
 .envrc
 clang_build

6

.gitmodules vendored

View File

@@ -6,9 +6,9 @@
 	path = swagger-ui
 	url = ../scylla-swagger-ui
 	ignore = dirty
 [submodule "scylla-jmx"]
 	path = tools/jmx
 	url = ../scylla-jmx
 [submodule "abseil"]
 	path = abseil
 	url = ../abseil-cpp
 [submodule "scylla-tools"]
 	path = tools/java
 	url = ../scylla-tools-java

									
										181

CMakeLists.txt
									
												View File
												
				@@ -2,56 +2,106 @@ cmake_minimum_required(VERSION 3.27)

				project(scylla)

				include(CTest)

				list(APPEND CMAKE_MODULE_PATH

				  ${CMAKE_CURRENT_SOURCE_DIR}/cmake

				  ${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)

				# Set the possible values of build type for cmake-gui

				set(scylla_build_types

				    "Debug" "Release" "Dev" "Sanitize" "Coverage")

				set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				  ${scylla_build_types})

				if(NOT CMAKE_BUILD_TYPE)

				    set(CMAKE_BUILD_TYPE "Release" CACHE

				        STRING "Choose the type of build." FORCE)

				    message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'Release'")

				elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)

				    message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "

				        "Following types are supported: ${scylla_build_types}")

				endif()

				string(TOUPPER "${CMAKE_BUILD_TYPE}" build_mode)

				include(mode.${build_mode})

				    "Debug" "RelWithDebInfo" "Dev" "Sanitize" "Coverage")

				if(DEFINED CMAKE_BUILD_TYPE)

				    set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				     ${scylla_build_types})

				    if(NOT CMAKE_BUILD_TYPE)

				        set(CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE

				            STRING "Choose the type of build." FORCE)

				        message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'RelWithDebInfo'")

				    elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)

				        message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "

				            "Following types are supported: ${scylla_build_types}")

				    endif()

				endif(DEFINED CMAKE_BUILD_TYPE)

				include(mode.common)

				add_compile_definitions(

				    ${Seastar_DEFINITIONS_${build_mode}}

				    FMT_DEPRECATED_OSTREAM)

				if(CMAKE_CONFIGURATION_TYPES)

				    foreach(config ${CMAKE_CONFIGURATION_TYPES})

				        include(mode.${config})

				        list(APPEND scylla_build_modes ${scylla_build_mode_${config}})

				    endforeach()

				    add_custom_target(mode_list

				        COMMAND ${CMAKE_COMMAND} -E echo "$<JOIN:${scylla_build_modes}, >"

				        COMMENT "List configured modes"

				        BYPRODUCTS mode-list.phony.stamp

				        COMMAND_EXPAND_LISTS)

				else()

				    include(mode.${CMAKE_BUILD_TYPE})

				    add_custom_target(mode_list

				        ${CMAKE_COMMAND} -E echo "${scylla_build_mode}"

				        COMMENT "List configured modes")

				endif()

				include(limit_jobs)

				# Configure Seastar compile options to align with Scylla

				set(CMAKE_CXX_STANDARD "20" CACHE INTERNAL "")

				set(CMAKE_CXX_STANDARD "23" CACHE INTERNAL "")

				set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")

				set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")

				set(CMAKE_CXX_VISIBILITY_PRESET hidden)

				set(Seastar_TESTING ON CACHE BOOL "" FORCE)

				set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)

				set(Seastar_DEPRECATED_OSTREAM_FORMATTERS OFF CACHE BOOL "" FORCE)

				set(Seastar_APPS ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_IO_URING OFF CACHE BOOL "" FORCE)

				set(Seastar_SCHEDULING_GROUPS_COUNT 16 CACHE STRING "" FORCE)

				set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

				add_subdirectory(seastar)

				set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)

				find_package(Sanitizers QUIET)

				set(sanitizer_cxx_flags

				    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)

				if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

				    set(ABSL_GCC_FLAGS ${sanitizer_cxx_flags})

				elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")

				    set(ABSL_LLVM_FLAGS ${sanitizer_cxx_flags})

				endif()

				set(ABSL_DEFAULT_LINKOPTS

				    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_LINK_LIBRARIES>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_LINK_LIBRARIES>>)

				add_subdirectory(abseil)

				add_library(absl-headers INTERFACE)

				target_include_directories(absl-headers SYSTEM INTERFACE

				    "${PROJECT_SOURCE_DIR}/abseil")

				add_library(absl::headers ALIAS absl-headers)

				# Exclude absl::strerror from the default "all" target since it's not

				# used in Scylla build and, moreover, makes use of deprecated glibc APIs,

				# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,

				# which happens to be the case for recent Fedora distribution versions.

				#

				# Need to use the internal "absl_strerror" target name instead of namespaced

				# variant because `set_target_properties` does not understand the latter form,

				# unfortunately.

				set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)

				# System libraries dependencies

				find_package(Boost REQUIRED

				    COMPONENTS filesystem program_options system thread regex unit_test_framework)

				target_link_libraries(Boost::regex

				  INTERFACE

				    ICU::i18n

				    ICU::uc)

				find_package(Lua REQUIRED)

				find_package(ZLIB REQUIRED)

				find_package(ICU COMPONENTS uc i18n REQUIRED)

				find_package(absl COMPONENTS hash raw_hash_set REQUIRED)

				find_package(fmt 10.0.0 REQUIRED)

				find_package(libdeflate REQUIRED)

				find_package(libxcrypt REQUIRED)

				find_package(Snappy REQUIRED)

				find_package(RapidJSON REQUIRED)

				find_package(Thrift REQUIRED)

				find_package(xxHash REQUIRED)

				find_package(zstd REQUIRED)

				set(scylla_gen_build_dir "${CMAKE_BINARY_DIR}/gen")

				file(MAKE_DIRECTORY "${scylla_gen_build_dir}")

				@@ -59,6 +109,14 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")

				include(add_version_library)

				generate_scylla_version()

				add_library(scylla-zstd STATIC

				    zstd.cc)

				target_link_libraries(scylla-zstd

				  PRIVATE

				    db

				    Seastar::seastar

				    zstd::libzstd)

				add_library(scylla-main STATIC)

				target_sources(scylla-main

				  PRIVATE

				@@ -78,9 +136,9 @@ target_sources(scylla-main

				    debug.cc

				    init.cc

				    keys.cc

				    message/messaging_service.cc

				    multishard_mutation_query.cc

				    mutation_query.cc

				    node_ops/task_manager_module.cc

				    partition_slice_builder.cc

				    querier.cc

				    query.cc

				@@ -94,21 +152,52 @@ target_sources(scylla-main

				    serializer.cc

				    sstables_loader.cc

				    table_helper.cc

				    tasks/task_handler.cc

				    tasks/task_manager.cc

				    timeout_config.cc

				    unimplemented.cc

				    validation.cc

				    vint-serialization.cc

				    zstd.cc)

				    vint-serialization.cc)

				target_link_libraries(scylla-main

				  PRIVATE

				    "$<LINK_LIBRARY:WHOLE_ARCHIVE,scylla-zstd>"

				    db

				    absl::headers

				    absl::btree

				    absl::hash

				    absl::raw_hash_set

				    Seastar::seastar

				    Snappy::snappy

				    systemd

				    ZLIB::ZLIB)

				option(Scylla_CHECK_HEADERS

				  "Add check-headers target for checking the self-containness of headers")

				if(Scylla_CHECK_HEADERS)

				    add_custom_target(check-headers)

				    # compatibility target used by CI, which builds "check-headers" only for

				    # the "Dev" mode.

				    # our CI currently builds "dev-headers" using ninja without specify a build

				    # mode. where "dev" is actually a prefix encoded in the target name for the

				    # underlying "headers" target. while we don't have this convention in CMake

				    # targets. in contrast, the "check-headers" which is built for all

				    # configurations defined by "CMAKE_DEFAULT_CONFIGS". however, we only need

				    # to build "check-headers" for the "Dev" configuration. Therefore, before

				    # updating the CI to use build "check-headers:Dev", let's add a new target

				    # that specifically builds "check-headers" only for Dev configuration. The

				    # new target will do nothing for other configurations.

				    add_custom_target(dev-headers

				        COMMAND ${CMAKE_COMMAND}

				          "$<IF:$<CONFIG:Dev>,--build;${CMAKE_BINARY_DIR};--config;$<CONFIG>;--target;check-headers,-E;echo;skipping;dev-headers;in;$<CONFIG>>"

				        COMMAND_EXPAND_LISTS)

				endif()

				include(check_headers)

				check_headers(check-headers scylla-main

				  GLOB ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

				add_custom_target(compiler-training)

				add_subdirectory(api)

				add_subdirectory(alternator)

				add_subdirectory(db)

				@@ -121,9 +210,9 @@ add_subdirectory(dht)

				add_subdirectory(gms)

				add_subdirectory(idl)

				add_subdirectory(index)

				add_subdirectory(interface)

				add_subdirectory(lang)

				add_subdirectory(locator)

				add_subdirectory(message)

				add_subdirectory(mutation)

				add_subdirectory(mutation_writer)

				add_subdirectory(node_ops)

				@@ -138,7 +227,6 @@ add_subdirectory(service)

				add_subdirectory(sstables)

				add_subdirectory(streaming)

				add_subdirectory(test)

				add_subdirectory(thrift)

				add_subdirectory(tools)

				add_subdirectory(tracing)

				add_subdirectory(transport)

				@@ -165,6 +253,7 @@ target_link_libraries(scylla PRIVATE

				    index

				    lang

				    locator

				    message

				    mutation

				    mutation_writer

				    raft

				@@ -178,52 +267,24 @@ target_link_libraries(scylla PRIVATE

				    sstables

				    streaming

				    test-perf

				    thrift

				    tools

				    tracing

				    transport

				    types

				    utils)

				target_link_libraries(Boost::regex

				  INTERFACE

				    ICU::i18n

				    ICU::uc)

				target_link_libraries(scylla PRIVATE

				    seastar

				    absl::headers

				    Boost::program_options)

				# Force SHA1 build-id generation

				set(default_linker_flags "-Wl,--build-id=sha1")

				include(CheckLinkerFlag)

				set(Scylla_USE_LINKER

				    ""

				    CACHE

				    STRING

				    "Use specified linker instead of the default one")

				if(Scylla_USE_LINKER)

				    set(linkers "${Scylla_USE_LINKER}")

				else()

				    set(linkers "lld" "gold")

				endif()

				foreach(linker ${linkers})

				    set(linker_flag "-fuse-ld=${linker}")

				    check_linker_flag(CXX ${linker_flag} "CXX_LINKER_HAVE_${linker}")

				    if(CXX_LINKER_HAVE_${linker})

				        string(APPEND default_linker_flags " ${linker_flag}")

				        break()

				    elseif(Scylla_USE_LINKER)

				        message(FATAL_ERROR "${Scylla_USE_LINKER} is not supported.")

				    endif()

				endforeach()

				set(CMAKE_EXE_LINKER_FLAGS "${default_linker_flags}" CACHE INTERNAL "")

				# TODO: patch dynamic linker to match configure.py behavior

				target_include_directories(scylla PRIVATE

				    "${CMAKE_CURRENT_SOURCE_DIR}"

				    "${scylla_gen_build_dir}")

				add_custom_target(maybe-scylla

				  DEPENDS $<$<CONFIG:Dev>:$<TARGET_FILE:scylla>>)

				add_dependencies(compiler-training

				  maybe-scylla)

				add_subdirectory(dist)

									
										21

HACKING.md
									
												View File
												
				@@ -19,18 +19,18 @@ $ git submodule update --init --recursive

				### Dependencies

				Scylla is fairly fussy about its build environment, requiring a very recent

				version of the C++20 compiler and numerous tools and libraries to build.

				version of the C++23 compiler and numerous tools and libraries to build.

				Run `./install-dependencies.sh` (as root) to use your Linux distributions's

				package manager to install the appropriate packages on your build machine.

				However, this will only work on very recent distributions. For example,

				currently Fedora users must upgrade to Fedora 32 otherwise the C++ compiler

				will be too old, and not support the new C++20 standard that Scylla uses.

				will be too old, and not support the new C++23 standard that Scylla uses.

				Alternatively, to avoid having to upgrade your build machine or install

				various packages on it, we provide another option - the **frozen toolchain**.

				This is a script, `./tools/toolchain/dbuild`, that can execute build or run

				commands inside a Docker image that contains exactly the right build tools and

				commands inside a container that contains exactly the right build tools and

				libraries. The `dbuild` technique is useful for beginners, but is also the way

				in which ScyllaDB produces official releases, so it is highly recommended.

				@@ -43,6 +43,12 @@ $ ./tools/toolchain/dbuild ninja build/release/scylla

				$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1

				```

				Note: do not mix environemtns - either perform all your work with dbuild, or natively on the host.

				Note2: you can get to an interactive shell within dbuild by running it without any parameters:

				```bash

				$ ./tools/toolchain/dbuild

				```

				### Build system

				**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native

				@@ -116,6 +122,13 @@ Run all tests through the test execution wrapper with

				$ ./test.py --mode={debug,release}

				```

				or, if you are using `dbuild`, you need to build the code and the tests and then you can run them at will:

				```bash

				$ ./tools/toolchain/dbuild ninja {debug,release,dev}-build

				$ ./tools/toolchain/dbuild ./test.py --mode {debug,release,dev}

				```

				The `--name` argument can be specified to run a particular test.

				Alternatively, you can execute the test executable directly. For example,

				@@ -199,7 +212,7 @@ The `scylla.yaml` file in the repository by default writes all database data to

				Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.

				Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.

				Additionally, when running on under-powered platforms like portable laptops, the `--overprovisioned` flag is useful.

				On a development machine, one might run Scylla as

									
										16

README.md
									
												View File
												
				@@ -15,7 +15,7 @@ For more information, please see the [ScyllaDB web site].

				## Build Prerequisites

				Scylla is fairly fussy about its build environment, requiring very recent

				versions of the C++20 compiler and of many libraries to build. The document

				versions of the C++23 compiler and of many libraries to build. The document

				[HACKING.md](HACKING.md) includes detailed information on building and

				developing Scylla, but to get Scylla building quickly on (almost) any build

				machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),

				@@ -65,11 +65,13 @@ $ ./tools/toolchain/dbuild ./build/release/scylla --help

				## Testing

				[![Build with the latest Seastar](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml) [![Check Reproducible Build](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml) [![clang-nightly](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml)

				See [test.py manual](docs/dev/testing.md).

				## Scylla APIs and compatibility

				By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and

				Thrift. There is also support for the API of Amazon DynamoDB™,

				By default, Scylla is compatible with Apache Cassandra and its API - CQL.

				There is also support for the API of Amazon DynamoDB™,

				which needs to be enabled and configured in order to be used. For more

				information on how to enable the DynamoDB™ API in Scylla,

				and the current compatibility of this feature as well as Scylla-specific extensions, see

				@@ -82,11 +84,11 @@ Documentation can be found [here](docs/dev/README.md).

				Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).

				User documentation can be found [here](https://docs.scylladb.com/).

				## Training 

				## Training

				Training material and online courses can be found at [Scylla University](https://university.scylladb.com/). 

				The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, 

				administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, 

				Training material and online courses can be found at [Scylla University](https://university.scylladb.com/).

				The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling,

				administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions,

				multi-datacenters and how Scylla integrates with third-party applications.

				## Contributing to Scylla

12

SCYLLA-VERSION-GEN

View File

@@ -28,7 +28,7 @@ The files created are:
 By default, these files are created in the 'build'
 subdirectory under the directory containing the script.
 The destination directory can be overriden by
 The destination directory can be overridden by
 using '-o PATH' option.
 END
 )
@@ -78,7 +78,7 @@ fi
 # Default scylla product/version tags
 PRODUCT=scylla
 VERSION=5.4.10
 VERSION=6.2.4
 if test -f version
 then
@@ -87,12 +87,14 @@ then
 else
 	SCYLLA_VERSION=$VERSION
 	if [ -z "$SCYLLA_RELEASE" ]; then
 		DATE=$(date --utc +%Y%m%d)
 		GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
 		# For custom package builds, replace "0" with "counter.your_name",
 		# For custom package builds, replace "0" with "counter.yourname",
 		# where counter starts at 1 and increments for successive versions.
 		# This ensures that the package manager will select your custom
 		# package over the standard release.
 		# Do not use any special characters like - or _ in the name above!
 		# These characters either have special meaning or are illegal in
 		# version strings.
 		SCYLLA_BUILD=0
 		SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
 	elif [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
@@ -102,7 +104,7 @@ else
 fi
 if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
 	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" |cut -d . -f 3)
 	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" | rev | cut -d . -f 1 | rev)
 	if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
 		exit 0
 	fi

1

abseil Submodule

Submodule abseil added at d7aaad83b4

									
										6

alternator/CMakeLists.txt
									
												View File
												
				@@ -27,4 +27,8 @@ target_link_libraries(alternator

				  cql3

				  idl

				  Seastar::seastar

				  xxHash::xxhash)

				  xxHash::xxhash

				  absl::headers)

				check_headers(check-headers alternator

				  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

									
										31

alternator/auth.cc
									
												View File
												
				@@ -7,37 +7,37 @@

				 */

				#include "alternator/error.hh"

				#include "auth/common.hh"

				#include "log.hh"

				#include <string>

				#include <string_view>

				#include "bytes.hh"

				#include "alternator/auth.hh"

				#include <fmt/format.h>

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "auth/roles-metadata.hh"

				#include "service/storage_proxy.hh"

				#include "alternator/executor.hh"

				#include "cql3/selection/selection.hh"

				#include "query-result-set.hh"

				#include "cql3/result_set.hh"

				#include "types/types.hh"

				#include <seastar/core/coroutine.hh>

				namespace alternator {

				static logging::logger alogger("alternator-auth");

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {

				    schema_ptr schema = proxy.data_dictionary().find_schema("system_auth", "roles");

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username) {

				    schema_ptr schema = proxy.data_dictionary().find_schema(auth::get_auth_ks_name(as.query_processor()), "roles");

				    partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));

				    dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};

				    std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};

				    const column_definition* salted_hash_col = schema->get_column_definition(bytes("salted_hash"));

				    if (!salted_hash_col) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("Credentials cannot be fetched for: {}", username)));

				    const column_definition* can_login_col = schema->get_column_definition(bytes("can_login"));

				    if (!salted_hash_col || !can_login_col) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("Credentials cannot be fetched for: {}", username)));

				    }

				    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col});

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id}, selection->get_query_options());

				    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col, can_login_col});

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id, can_login_col->id}, selection->get_query_options());

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,

				            proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				@@ -51,11 +51,18 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::strin

				    auto result_set = builder.build();

				    if (result_set->empty()) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("User not found: {}", username)));

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("User not found: {}", username)));

				    }

				    const managed_bytes_opt& salted_hash = result_set->rows().front().front(); // We only asked for 1 row and 1 column

				    const auto& result = result_set->rows().front();

				    bool can_login = result[1] && value_cast<bool>(boolean_type->deserialize(*result[1]));

				    if (!can_login) {

				        // This is a valid role name, but has "login=False" so should not be

				        // usable for authentication (see #19735).

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("Role {} has login=false so cannot be used for login", username)));

				    }

				    const managed_bytes_opt& salted_hash = result.front();

				    if (!salted_hash) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("No password found for user: {}", username)));

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("No password found for user: {}", username)));

				    }

				    co_return value_cast<sstring>(utf8_type->deserialize(*salted_hash));

				}

									
										6

alternator/auth.hh
									
												View File
												
				@@ -9,10 +9,8 @@

				#pragma once

				#include <string>

				#include <string_view>

				#include <array>

				#include "gc_clock.hh"

				#include "utils/loading_cache.hh"

				#include "auth/service.hh"

				namespace service {

				class storage_proxy;

				@@ -22,6 +20,6 @@ namespace alternator {

				using key_cache = utils::loading_cache<std::string, std::string, 1>;

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username);

				}

									
										13

alternator/conditions.cc
									
												View File
												
				@@ -6,12 +6,9 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <list>

				#include <map>

				#include <string_view>

				#include "alternator/conditions.hh"

				#include "alternator/error.hh"

				#include "cql3/constants.hh"

				#include <unordered_map>

				#include "utils/rjson.hh"

				#include "serialization.hh"

				@@ -45,12 +42,12 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_

				            {"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},

				    };

				    if (!comparison_operator.IsString()) {

				        throw api_error::validation(format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				        throw api_error::validation(fmt::format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				    }

				    std::string op = comparison_operator.GetString();

				    auto it = ops.find(op);

				    if (it == ops.end()) {

				        throw api_error::validation(format("Unsupported comparison operator {}", op));

				        throw api_error::validation(fmt::format("Unsupported comparison operator {}", op));

				    }

				    return it->second;

				}

				@@ -342,7 +339,7 @@ static bool check_NOT_NULL(const rjson::value* val) {

				}

				// Only types S, N or B (string, number or bytes) may be compared by the

				// various comparion operators - lt, le, gt, ge, and between.

				// various comparison operators - lt, le, gt, ge, and between.

				// Note that in particular, if the value is missing (v->IsNull()), this

				// check returns false.

				static bool check_comparable_type(const rjson::value& v) {

				@@ -432,7 +429,7 @@ static bool check_BETWEEN(const T& v, const T& lb, const T& ub, bool bounds_from

				    if (cmp_lt()(ub, lb)) {

				        if (bounds_from_query) {

				            throw api_error::validation(

				                format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));

				                fmt::format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));

				        } else {

				            return false;

				        }

				@@ -616,7 +613,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req) {

				        return conditional_operator_type::OR;

				    } else {

				        throw api_error::validation(

				                format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));

				                fmt::format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));

				    }

				}

									
										2

alternator/conditions.hh
									
												View File
												
				@@ -18,8 +18,6 @@

				#pragma once

				#include "cql3/restrictions/statement_restrictions.hh"

				#include "serialization.hh"

				#include "expressions_types.hh"

				namespace alternator {

									
										18

alternator/controller.cc
									
												View File
												
				@@ -32,8 +32,10 @@ controller::controller(

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config)

				    : _gossiper(gossiper)

				        const db::config& config,

				        seastar::scheduling_group sg)

				    : protocol_server(sg)

				    , _gossiper(gossiper)

				    , _proxy(proxy)

				    , _mm(mm)

				    , _sys_dist_ks(sys_dist_ks)

				@@ -62,7 +64,9 @@ std::vector<socket_address> controller::listen_addresses() const {

				}

				future<> controller::start_server() {

				    return seastar::async([this] {

				    seastar::thread_attributes attr;

				    attr.sched_group = _sched_group;

				    return seastar::async(std::move(attr), [this] {

				        _listen_addresses.clear();

				        auto preferred = _config.listen_interface_prefer_ipv6() ? std::make_optional(net::inet_address::family::INET6) : std::nullopt;

				@@ -73,11 +77,11 @@ future<> controller::start_server() {

				        // shards - if necessary for LWT.

				        smp_service_group_config c;

				        c.max_nonlocal_requests = 5000;

				        _ssg = create_smp_service_group(c).get0();

				        _ssg = create_smp_service_group(c).get();

				        rmw_operation::set_default_write_isolation(_config.alternator_write_isolation());

				        net::inet_address addr = utils::resolve(_config.alternator_address, family).get0();

				        net::inet_address addr = utils::resolve(_config.alternator_address, family).get();

				        auto get_cdc_metadata = [] (cdc::generation_service& svc) { return std::ref(svc.get_cdc_metadata()); };

				        auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {

				@@ -156,7 +160,9 @@ future<> controller::stop_server() {

				}

				future<> controller::request_stop_server() {

				    return stop_server();

				    return with_scheduling_group(_sched_group, [this] {

				        return stop_server();

				    });

				}

				}

									
										3

alternator/controller.hh
									
												View File
												
				@@ -80,7 +80,8 @@ public:

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config);

				        const db::config& config,

				        seastar::scheduling_group sg);

				    virtual sstring name() const override;

				    virtual sstring protocol() const override;

									
										18

alternator/error.hh
									
												View File
												
				@@ -10,6 +10,7 @@

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include "utils/rjson.hh"

				namespace alternator {

				@@ -27,10 +28,16 @@ public:

				    status_type _http_code;

				    std::string _type;

				    std::string _msg;

				    api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)

				    // Additional data attached to the error, null value if not set. It's wrapped in copyable_value

				    // class because copy constructor is required for exception classes otherwise it won't compile

				    // (despite that its use may be optimized away).

				    rjson::copyable_value _extra_fields; 

				    api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request,

				    rjson::value extra_fields = rjson::null_value())

				        : _http_code(std::move(http_code))

				        , _type(std::move(type))

				        , _msg(std::move(msg))

				        , _extra_fields(std::move(extra_fields))

				    { }

				    // Factory functions for some common types of DynamoDB API errors

				@@ -58,8 +65,13 @@ public:

				    static api_error access_denied(std::string msg) {

				        return api_error("AccessDeniedException", std::move(msg));

				    }

				    static api_error conditional_check_failed(std::string msg) {

				        return api_error("ConditionalCheckFailedException", std::move(msg));

				    static api_error conditional_check_failed(std::string msg, rjson::value&& item) {

				        if (!item.IsNull()) {

				            auto tmp = rjson::empty_object();

				            rjson::add(tmp, "Item", std::move(item));

				            item = std::move(tmp);

				        }

				        return api_error("ConditionalCheckFailedException", std::move(msg), status_type::bad_request, std::move(item));

				    }

				    static api_error expired_iterator(std::string msg) {

				        return api_error("ExpiredIteratorException", std::move(msg));

741

alternator/executor.cc

View File

File diff suppressed because it is too large Load Diff

									
										6

alternator/executor.hh
									
												View File
												
				@@ -9,7 +9,6 @@

				#pragma once

				#include <seastar/core/future.hh>

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include <seastar/json/json_elements.hh>

				#include <seastar/core/sharded.hh>

				@@ -263,4 +262,9 @@ public:

				// add more than a couple of levels in its own output construction.

				bool is_big(const rjson::value& val, int big_size = 100'000);

				// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,

				// SELECT, DROP, etc.) on the given table. When permission is denied an

				// appropriate user-readable api_error::access_denied is thrown.

				future<> verify_permission(const service::client_state&, const schema_ptr&, auth::permission);

				}

									
										52

alternator/expressions.cc
									
												View File
												
				@@ -28,7 +28,7 @@

				namespace alternator {

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>

				static Result do_with_parser(std::string_view input, Func&& f) {

				    expressionsLexer::InputStreamType input_stream{

				        reinterpret_cast<const ANTLR_UINT8*>(input.data()),

				@@ -43,7 +43,7 @@ static Result do_with_parser(std::string_view input, Func&& f) {

				    return result;

				}

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>

				static Result parse(const char* input_name, std::string_view input, Func&& f) {

				    if (input.length() > 4096) {

				        throw expressions_syntax_error(format("{} expression size {} exceeds allowed maximum 4096.",

				@@ -57,10 +57,10 @@ static Result parse(const char* input_name, std::string_view input, Func&& f) {

				        // TODO: displayRecognitionError could set a position inside the

				        // expressions_syntax_error in throws, and we could use it here to

				        // mark the broken position in 'input'.

				        throw expressions_syntax_error(format("Failed parsing {} '{}': {}",

				        throw expressions_syntax_error(fmt::format("Failed parsing {} '{}': {}",

				            input_name, input, e.what()));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing {} '{}': {}",

				        throw expressions_syntax_error(fmt::format("Failed parsing {} '{}': {}",

				            input_name, input, std::current_exception()));

				    }

				}

				@@ -133,21 +133,6 @@ void path::check_depth_limit() {

				    }

				}

				std::ostream& operator<<(std::ostream& os, const path& p) {

				    os << p.root();

				    for (const auto& op : p.operators()) {

				        std::visit(overloaded_functor {

				            [&] (const std::string& member) {

				                os << '.' << member;

				            },

				            [&] (unsigned index) {

				                os << '[' << index << ']';

				            }

				        }, op);

				    }

				    return os;

				}

				} // namespace parsed

				// The following resolve_*() functions resolve references in parsed

				@@ -175,12 +160,12 @@ static std::optional<std::string> resolve_path_component(const std::string& colu

				    if (column_name.size() > 0 && column_name.front() == '#') {

				        if (!expression_attribute_names) {

				            throw api_error::validation(

				                    format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));

				                    fmt::format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));

				        }

				        const rjson::value* value = rjson::find(*expression_attribute_names, column_name);

				        if (!value || !value->IsString()) {

				            throw api_error::validation(

				                    format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));

				                    fmt::format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));

				        }

				        used_attribute_names.emplace(column_name);

				        return std::string(rjson::to_string_view(*value));

				@@ -217,16 +202,16 @@ static void resolve_constant(parsed::constant& c,

				        [&] (const std::string& valref) {

				            if (!expression_attribute_values) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));

				                        fmt::format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));

				            }

				            const rjson::value* value = rjson::find(*expression_attribute_values, valref);

				            if (!value) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues missing entry '{}' required by expression", valref));

				                        fmt::format("ExpressionAttributeValues missing entry '{}' required by expression", valref));

				            }

				            if (value->IsNull()) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));

				                        fmt::format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));

				            }

				            validate_value(*value, "ExpressionAttributeValues");

				            used_attribute_values.emplace(valref);

				@@ -723,7 +708,7 @@ rjson::value calculate_value(const parsed::value& v,

				            auto function_it = function_handlers.find(std::string_view(f._function_name));

				            if (function_it == function_handlers.end()) {

				                throw api_error::validation(

				                        format("{}: unknown function '{}' called.", caller, f._function_name));

				                        fmt::format("{}: unknown function '{}' called.", caller, f._function_name));

				            }

				            return function_it->second(caller, previous_item, f);

				        },

				@@ -756,3 +741,20 @@ rjson::value calculate_value(const parsed::set_rhs& rhs,

				}

				} // namespace alternator

				auto fmt::formatter<alternator::parsed::path>::format(const alternator::parsed::path& p, fmt::format_context& ctx) const

				        -> decltype(ctx.out()) {

				    auto out = ctx.out();

				    out = fmt::format_to(out, "{}", p.root());

				    for (const auto& op : p.operators()) {

				        std::visit(overloaded_functor {

				            [&] (const std::string& member) {

				                out = fmt::format_to(out, ".{}", member);

				            },

				            [&] (unsigned index) {

				                out = fmt::format_to(out, "[{}]", index);

				            }

				        }, op);

				    }

				    return out;

				}

									
										38

alternator/expressions.hh
									
												View File
												
				@@ -60,24 +60,30 @@ enum class calculate_value_caller {

				    UpdateExpression, ConditionExpression, ConditionExpressionAlone

				};

				inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {

				    switch (caller) {

				        case calculate_value_caller::UpdateExpression:

				            out << "UpdateExpression";

				            break;

				        case calculate_value_caller::ConditionExpression:

				            out << "ConditionExpression";

				            break;

				        case calculate_value_caller::ConditionExpressionAlone:

				            out << "ConditionExpression";

				            break;

				        default:

				            out << "unknown type of expression";

				            break;

				    }

				    return out;

				}

				template <> struct fmt::formatter<alternator::calculate_value_caller> {

				    constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }

				    auto format(alternator::calculate_value_caller caller, fmt::format_context& ctx) const {

				        std::string_view name = "unknown type of expression";

				        switch (caller) {

				            using enum alternator::calculate_value_caller;

				            case UpdateExpression:

				                name = "UpdateExpression";

				                break;

				            case ConditionExpression:

				                name = "ConditionExpression";

				                break;

				            case ConditionExpressionAlone:

				                name = "ConditionExpression";

				                break;

				        }

				        return fmt::format_to(ctx.out(), "{}", name);

				    }

				};

				namespace alternator {

				rjson::value calculate_value(const parsed::value& v,

				        calculate_value_caller caller,

				        const rjson::value* previous_item);

									
										5

alternator/expressions_types.hh
									
												View File
												
				@@ -66,7 +66,6 @@ public:

				    std::vector<std::variant<std::string, unsigned>>& operators() {

				        return _operators;

				    }

				    friend std::ostream& operator<<(std::ostream&, const path&);

				};

				// When an expression is first parsed, all constants are references, like

				@@ -255,3 +254,7 @@ public:

				} // namespace parsed

				} // namespace alternator

				template <> struct fmt::formatter<alternator::parsed::path> : fmt::formatter<string_view> {

				    auto format(const alternator::parsed::path&, fmt::format_context& ctx) const -> decltype(ctx.out());

				};

									
										8

alternator/rmw_operation.hh
									
												View File
												
				@@ -19,7 +19,7 @@ namespace alternator {

				// operations which may involve a read of the item before the write

				// (so-called Read-Modify-Write operations). These operations include PutItem,

				// UpdateItem and DeleteItem: All of these may be conditional operations (the

				// "Expected" parameter) which requir a read before the write, and UpdateItem

				// "Expected" parameter) which require a read before the write, and UpdateItem

				// may also have an update expression which refers to the item's old value.

				//

				// The code below supports running the read and the write together as one

				@@ -69,7 +69,11 @@ protected:

				    enum class returnvalues {

				        NONE, ALL_OLD, UPDATED_OLD, ALL_NEW, UPDATED_NEW

				    } _returnvalues;

				    enum class returnvalues_on_condition_check_failure {

				        NONE, ALL_OLD

				    } _returnvalues_on_condition_check_failure;

				    static returnvalues parse_returnvalues(const rjson::value& request);

				    static returnvalues_on_condition_check_failure parse_returnvalues_on_condition_check_failure(const rjson::value& request);

				    // When _returnvalues != NONE, apply() should store here, in JSON form,

				    // the values which are to be returned in the "Attributes" field.

				    // The default null JSON means do not return an Attributes field at all.

				@@ -77,6 +81,8 @@ protected:

				    // it (see explanation below), but note that because apply() may be

				    // called more than once, if apply() will sometimes set this field it

				    // must set it (even if just to the default empty value) every time.

				    // Additionally when _returnvalues_on_condition_check_failure is ALL_OLD

				    // then condition check failure will also result in storing values here.

				    mutable rjson::value _return_attributes;

				public:

				    // The constructor of a rmw_operation subclass should parse the request

									
										38

alternator/serialization.cc
									
												View File
												
				@@ -11,7 +11,6 @@

				#include "log.hh"

				#include "serialization.hh"

				#include "error.hh"

				#include "rapidjson/writer.h"

				#include "concrete_types.hh"

				#include "cql3/type_json.hh"

				#include "mutation/position_in_partition.hh"

				@@ -59,7 +58,7 @@ type_representation represent_type(alternator_type atype) {

				// calculate its magnitude and precision from its scale() and unscaled_value().

				// So in the following ugly implementation we calculate them from the string

				// representation instead. We assume the number was already parsed

				// sucessfully to a big_decimal to it follows its syntax rules.

				// successfully to a big_decimal to it follows its syntax rules.

				//

				// FIXME: rewrite this function to take a big_decimal, not a string.

				// Maybe a snippet like this can help:

				@@ -144,17 +143,17 @@ static big_decimal parse_and_validate_number(std::string_view s) {

				        big_decimal ret(s);

				        auto [magnitude, precision] = internal::get_magnitude_and_precision(s);

				        if (magnitude > 125) {

				            throw api_error::validation(format("Number overflow: {}. Attempting to store a number with magnitude larger than supported range.", s));

				            throw api_error::validation(fmt::format("Number overflow: {}. Attempting to store a number with magnitude larger than supported range.", s));

				        }

				        if (magnitude < -130) {

				            throw api_error::validation(format("Number underflow: {}. Attempting to store a number with magnitude lower than supported range.", s));

				            throw api_error::validation(fmt::format("Number underflow: {}. Attempting to store a number with magnitude lower than supported range.", s));

				        }

				        if (precision > 38) {

				            throw api_error::validation(format("Number too precise: {}. Attempting to store a number with more significant digits than supported.", s));

				            throw api_error::validation(fmt::format("Number too precise: {}. Attempting to store a number with more significant digits than supported.", s));

				        }

				        return ret;

				    } catch (const marshal_exception& e) {

				        throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", s));

				        throw api_error::validation(fmt::format("The parameter cannot be converted to a numeric value: {}", s));

				    }

				}

				@@ -266,7 +265,7 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co

				    std::string column_name = column.name_as_text();

				    const rjson::value* key_typed_value = rjson::find(item, column_name);

				    if (!key_typed_value) {

				        throw api_error::validation(format("Key column {} not found", column_name));

				        throw api_error::validation(fmt::format("Key column {} not found", column_name));

				    }

				    return get_key_from_typed_value(*key_typed_value, column);

				}

				@@ -278,19 +277,26 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co

				// mentioned in the exception message).

				// If the type does match, a reference to the encoded value is returned.

				static const rjson::value& get_typed_value(const rjson::value& key_typed_value, std::string_view type_str, std::string_view name, std::string_view value_name) {

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1 ||

				            !key_typed_value.MemberBegin()->value.IsString()) {

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {

				        throw api_error::validation(

				                format("Malformed value object for {} {}: {}",

				                fmt::format("Malformed value object for {} {}: {}",

				                        value_name, name, key_typed_value));

				    }

				    auto it = key_typed_value.MemberBegin();

				    if (rjson::to_string_view(it->name) != type_str) {

				        throw api_error::validation(

				                format("Type mismatch: expected type {} for {} {}, got type {}",

				                fmt::format("Type mismatch: expected type {} for {} {}, got type {}",

				                        type_str, value_name, name, it->name));

				    }

				    // We assume this function is called just for key types (S, B, N), and

				    // all of those always have a string value in the JSON.

				    if (!it->value.IsString()) {

				        throw api_error::validation(

				            fmt::format("Malformed value object for {} {}: {}",

				                    value_name, name, key_typed_value));

				    }

				    return it->value;

				}

				@@ -396,16 +402,16 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        throw api_error::validation(format("{}: invalid number object", diagnostic));

				        throw api_error::validation(fmt::format("{}: invalid number object", diagnostic));

				    }

				    auto it = v.MemberBegin();

				    if (it->name != "N") {

				        throw api_error::validation(format("{}: expected number, found type '{}'", diagnostic, it->name));

				        throw api_error::validation(fmt::format("{}: expected number, found type '{}'", diagnostic, it->name));

				    }

				    if (!it->value.IsString()) {

				        // We shouldn't reach here. Callers normally validate their input

				        // earlier with validate_value().

				        throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));

				        throw api_error::validation(fmt::format("{}: improperly formatted number constant", diagnostic));

				    }

				    big_decimal ret = parse_and_validate_number(rjson::to_string_view(it->value));

				    return ret;

				@@ -486,7 +492,7 @@ rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {

				    auto [set1_type, set1] = unwrap_set(v1);

				    auto [set2_type, set2] = unwrap_set(v2);

				    if (set1_type != set2_type) {

				        throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));

				        throw api_error::validation(fmt::format("Mismatched set types: {} and {}", set1_type, set2_type));

				    }

				    if (!set1 || !set2) {

				        throw api_error::validation("UpdateExpression: ADD operation for sets must be given sets as arguments");

				@@ -514,7 +520,7 @@ std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value&

				    auto [set1_type, set1] = unwrap_set(v1);

				    auto [set2_type, set2] = unwrap_set(v2);

				    if (set1_type != set2_type) {

				        throw api_error::validation(format("Set DELETE type mismatch: {} and {}", set1_type, set2_type));

				        throw api_error::validation(fmt::format("Set DELETE type mismatch: {} and {}", set1_type, set2_type));

				    }

				    if (!set1 || !set2) {

				        throw api_error::validation("UpdateExpression: DELETE operation can only be performed on a set");

									
										63

alternator/server.cc
									
												View File
												
				@@ -8,6 +8,7 @@

				#include "alternator/server.hh"

				#include "log.hh"

				#include <fmt/ranges.h>

				#include <seastar/http/function_handlers.hh>

				#include <seastar/http/short_streams.hh>

				#include <seastar/core/coroutine.hh>

				@@ -16,14 +17,18 @@

				#include <seastar/util/short_streams.hh>

				#include "seastarx.hh"

				#include "error.hh"

				#include "service/client_state.hh"

				#include "service/qos/service_level_controller.hh"

				#include "utils/assert.hh"

				#include "timeout_config.hh"

				#include "utils/rjson.hh"

				#include "auth.hh"

				#include <cctype>

				#include <string_view>

				#include <utility>

				#include "service/storage_proxy.hh"

				#include "gms/gossiper.hh"

				#include "utils/overloaded_functor.hh"

				#include "utils/fb_utilities.hh"

				#include "utils/aws_sigv4.hh"

				static logging::logger slogger("alternator-server");

				@@ -34,8 +39,6 @@ using reply = http::reply;

				namespace alternator {

				static constexpr auto TARGET = "X-Amz-Target";

				inline std::vector<std::string_view> split(std::string_view text, char separator) {

				    std::vector<std::string_view> tokens;

				    if (text == "") {

				@@ -118,7 +121,7 @@ public:

				                 }

				                 return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				             }

				             auto res = resf.get0();

				             auto res = resf.get();

				             std::visit(overloaded_functor {

				                 [&] (const json::json_return_type& json_return_value) {

				                     slogger.trace("api_handler success case");

				@@ -156,6 +159,9 @@ public:

				protected:

				    void generate_error_reply(reply& rep, const api_error& err) {

				        rjson::value results = rjson::empty_object();

				        if (!err._extra_fields.IsNull() && err._extra_fields.IsObject()) {

				            results = rjson::copy(err._extra_fields);

				        }

				        rjson::add(results, "__type", rjson::from_string("com.amazonaws.dynamodb.v20120810#" + err._type));

				        rjson::add(results, "message", err._msg);

				        rep._content = rjson::print(std::move(results));

				@@ -210,9 +216,11 @@ protected:

				        for (auto& ip : local_dc_nodes) {

				            // Note that it's not enough for the node to be is_alive() - a

				            // node joining the cluster is also "alive" but not responsive to

				            // requests. We need the node to be in normal state. See #19694.

				            if (_gossiper.is_normal(ip)) {

				                rjson::push_back(results, rjson::from_string(ip.to_sstring()));

				            // requests. We alive *and* normal. See #19694, #21538.

				            if (_gossiper.is_alive(ip) && _gossiper.is_normal(ip)) {

				                // Use the gossiped broadcast_rpc_address if available instead

				                // of the internal IP address "ip". See discussion in #18711.

				                rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(ip)));

				            }

				        }

				        rep->set_status(reply::status_type::ok);

				@@ -255,7 +263,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				    std::string_view authorization_header = authorization_it->second;

				    auto pos = authorization_header.find_first_of(' ');

				    if (pos == std::string_view::npos || authorization_header.substr(0, pos) != "AWS4-HMAC-SHA256") {

				        throw api_error::invalid_signature(format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));

				        throw api_error::invalid_signature(fmt::format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));

				    }

				    authorization_header.remove_prefix(pos+1);

				    std::string credential;

				@@ -290,7 +298,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				    std::vector<std::string_view> credential_split = split(credential, '/');

				    if (credential_split.size() != 5) {

				        throw api_error::validation(format("Incorrect credential information format: {}", credential));

				        throw api_error::validation(fmt::format("Incorrect credential information format: {}", credential));

				    }

				    std::string user(credential_split[0]);

				    std::string datestamp(credential_split[1]);

				@@ -311,8 +319,8 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				        }

				    }

				    auto cache_getter = [&proxy = _proxy] (std::string username) {

				        return get_key_from_roles(proxy, std::move(username));

				    auto cache_getter = [&proxy = _proxy, &as = _auth_service] (std::string username) {

				        return get_key_from_roles(proxy, as, std::move(username));

				    };

				    return _key_cache.get_ptr(user, cache_getter).then([this, &req, &content,

				                                                    user = std::move(user),

				@@ -375,7 +383,7 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_

				        std::string buf;

				        tracing::add_session_param(trace_state, "alternator_op", op);

				        tracing::add_query(trace_state, truncated_content_view(query, buf));

				        tracing::begin(trace_state, format("Alternator {}", op), client_state.get_client_address());

				        tracing::begin(trace_state, seastar::format("Alternator {}", op), client_state.get_client_address());

				        if (!username.empty()) {

				            tracing::set_username(trace_state, auth::authenticated_user(username));

				        }

				@@ -385,10 +393,10 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_

				future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {

				    _executor._stats.total_operations++;

				    sstring target = req->get_header(TARGET);

				    std::vector<std::string_view> split_target = split(target, '.');

				    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)

				    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());

				    sstring target = req->get_header("X-Amz-Target");

				    // target is DynamoDB API version followed by a dot '.' and operation type (e.g. CreateTable)

				    auto dot = target.find('.');

				    std::string_view op = (dot == sstring::npos) ? std::string_view() : std::string_view(target).substr(dot+1);

				    // JSON parsing can allocate up to roughly 2x the size of the raw

				    // document, + a couple of bytes for maintenance.

				    // TODO: consider the case where req->content_length is missing. Maybe

				@@ -400,7 +408,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				        ++_executor._stats.requests_blocked_memory;

				    }

				    auto units = co_await std::move(units_fut);

				    assert(req->content_stream);

				    SCYLLA_ASSERT(req->content_stream);

				    chunked_content content = co_await util::read_entire_stream(*req->content_stream);

				    auto username = co_await verify_signature(*req, content);

				@@ -411,7 +419,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				    auto callback_it = _callbacks.find(op);

				    if (callback_it == _callbacks.end()) {

				        _executor._stats.unsupported_operations++;

				        co_return api_error::unknown_operation(format("Unsupported operation {}", op));

				        co_return api_error::unknown_operation(fmt::format("Unsupported operation {}", op));

				    }

				    if (_pending_requests.get_count() >= _max_concurrent_requests) {

				        _executor._stats.requests_shed++;

				@@ -419,11 +427,11 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				    }

				    _pending_requests.enter();

				    auto leave = defer([this] () noexcept { _pending_requests.leave(); });

				    //FIXME: Client state can provide more context, e.g. client's endpoint address

				    // We use unique_ptr because client_state cannot be moved or copied

				    executor::client_state client_state = username.empty()

				        ? service::client_state{service::client_state::internal_tag()}

				        : service::client_state{service::client_state::internal_tag(), _auth_service, _sl_controller, username};

				    executor::client_state client_state(service::client_state::external_tag(),

				        _auth_service, &_sl_controller, _timeout_config.current_values(), req->get_client_address());

				    if (!username.empty()) {

				        client_state.set_login(auth::authenticated_user(username));

				    }

				    co_await client_state.maybe_update_per_service_level_params();

				    tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);

				@@ -470,6 +478,7 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos

				        , _enforce_authorization(false)

				        , _enabled_servers{}

				        , _pending_requests{}

				        , _timeout_config(_proxy.data_dictionary().get_config())

				      , _callbacks{

				        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				@@ -569,14 +578,14 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:

				            set_routes(_https_server._routes);

				            _https_server.set_content_length_limit(server::content_length_limit);

				            _https_server.set_content_streaming(true);

				            _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {

				            auto server_creds = creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {

				                if (ep) {

				                    slogger.warn("Exception loading {}: {}", files, ep);

				                } else {

				                    slogger.info("Reloaded {}", files);

				                }

				            }).get0());

				            _https_server.listen(socket_address{addr, *https_port}).get();

				            }).get();

				            _https_server.listen(socket_address{addr, *https_port}, std::move(server_creds)).get();

				            _enabled_servers.push_back(std::ref(_https_server));

				        }

				    });

				@@ -634,7 +643,7 @@ future<> server::json_parser::stop() {

				const char* api_error::what() const noexcept {

				    if (_what_string.empty()) {

				        _what_string = format("{} {}: {}", static_cast<int>(_http_code), _type, _msg);

				        _what_string = fmt::format("{} {}: {}", std::to_underlying(_http_code), _type, _msg);

				    }

				    return _what_string.c_str();

				}

									
										5

alternator/server.hh
									
												View File
												
				@@ -42,6 +42,11 @@ class server {

				    bool _enforce_authorization;

				    utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;

				    gate _pending_requests;

				    // In some places we will need a CQL updateable_timeout_config object even

				    // though it isn't really relevant for Alternator which defines its own

				    // timeouts separately. We can create this object only once.

				    updateable_timeout_config _timeout_config;

				    alternator_callbacks_map _callbacks;

				    semaphore* _memory_limiter;

									
										12

alternator/stats.cc
									
												View File
												
				@@ -21,10 +21,12 @@ stats::stats() : api_operations{} {

				    _metrics.add_group("alternator", {

				#define OPERATION(name, CamelCaseName) \

				                seastar::metrics::make_total_operations("operation", api_operations.name, \

				                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),

				                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}).set_skip_when_empty(),

				#define OPERATION_LATENCY(name, CamelCaseName) \

				                seastar::metrics::make_histogram("op_latency", \

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name);}),

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \

								seastar::metrics::make_summary("op_latency_summary", \

										                        seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),

				            OPERATION(batch_get_item, "BatchGetItem")

				            OPERATION(batch_write_item, "BatchWriteItem")

				            OPERATION(create_backup, "CreateBackup")

				@@ -65,6 +67,8 @@ stats::stats() : api_operations{} {

				            OPERATION_LATENCY(get_item_latency, "GetItem")

				            OPERATION_LATENCY(delete_item_latency, "DeleteItem")

				            OPERATION_LATENCY(update_item_latency, "UpdateItem")

				            OPERATION_LATENCY(batch_write_item_latency, "BatchWriteItem")

				            OPERATION_LATENCY(batch_get_item_latency, "BatchGetItem")

				            OPERATION(list_streams, "ListStreams")

				            OPERATION(describe_stream, "DescribeStream")

				            OPERATION(get_shard_iterator, "GetShardIterator")

				@@ -92,6 +96,10 @@ stats::stats() : api_operations{} {

				                    seastar::metrics::description("number of rows read and matched during filtering operations")),

				            seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },

				                    seastar::metrics::description("number of rows read and dropped during filtering operations")),

				                    seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchWriteItem")},

				                            api_operations.batch_write_item_batch_total).set_skip_when_empty(),

				                    seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchGetItem")},

				                            api_operations.batch_get_item_batch_total).set_skip_when_empty(),

				    });

				}

									
										17

alternator/stats.hh
									
												View File
												
				@@ -11,8 +11,7 @@

				#include <cstdint>

				#include <seastar/core/metrics_registration.hh>

				#include "seastarx.hh"

				#include "utils/estimated_histogram.hh"

				#include "utils/histogram.hh"

				#include "cql3/stats.hh"

				namespace alternator {

				@@ -27,6 +26,8 @@ public:

				    struct {

				        uint64_t batch_get_item = 0;

				        uint64_t batch_write_item = 0;

				        uint64_t batch_get_item_batch_total = 0;

				        uint64_t batch_write_item_batch_total = 0;

				        uint64_t create_backup = 0;

				        uint64_t create_global_table = 0;

				        uint64_t create_table = 0;

				@@ -66,11 +67,13 @@ public:

				        uint64_t get_shard_iterator = 0;

				        uint64_t get_records = 0;

				        utils::time_estimated_histogram put_item_latency;

				        utils::time_estimated_histogram get_item_latency;

				        utils::time_estimated_histogram delete_item_latency;

				        utils::time_estimated_histogram update_item_latency;

				        utils::time_estimated_histogram get_records_latency;

				        utils::timed_rate_moving_average_summary_and_histogram put_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram get_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram delete_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram update_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram batch_write_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram batch_get_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram get_records_latency;

				    } api_operations;

				    // Miscellaneous event counters

				    uint64_t total_operations = 0;

									
										39

alternator/streams.cc
									
												View File
												
				@@ -13,8 +13,7 @@

				#include <seastar/json/formatter.hh>

				#include "utils/base64.hh"

				#include "log.hh"

				#include "auth/permission.hh"

				#include "db/config.hh"

				#include "cdc/log.hh"

				@@ -25,7 +24,6 @@

				#include "utils/UUID_gen.hh"

				#include "cql3/selection/selection.hh"

				#include "cql3/result_set.hh"

				#include "cql3/type_json.hh"

				#include "cql3/column_identifier.hh"

				#include "schema/schema_builder.hh"

				#include "service/storage_proxy.hh"

				@@ -33,7 +31,6 @@

				#include "gms/feature_service.hh"

				#include "executor.hh"

				#include "rmw_operation.hh"

				#include "data_dictionary/data_dictionary.hh"

				/**

				@@ -237,11 +234,8 @@ struct shard_id {

				    // dynamo specifies shardid as max 65 chars. 

				    friend std::ostream& operator<<(std::ostream& os, const shard_id& id) {

				        boost::io::ios_flags_saver fs(os);

				        return os << marker << std::hex  

				            << id.time.time_since_epoch().count()

				            << ':' << id.id.to_bytes()

				            ;

				        fmt::print(os, "{} {:x}:{}", marker, id.time.time_since_epoch().count(), id.id.to_bytes());

				        return os;

				    }

				};

				@@ -280,7 +274,7 @@ struct sequence_number {

				         * Timeuuids viewed as msb<<64|lsb are _not_,

				         * but they are still sorted as

				         *  timestamp() << 64|lsb

				         * so we can simpy unpack the mangled msb

				         * so we can simply unpack the mangled msb

				         * and use as hi 64 in our "bignum".

				         */

				        uint128_t hi = uint64_t(num.uuid.timestamp());

				@@ -419,7 +413,7 @@ using namespace std::string_literals;

				 *

				 * In scylla, this is sort of akin to an ID having corresponding ID/ID:s

				 * that cover the token range it represents. Because ID:s are per

				 * vnode shard however, this relation can be somewhat ambigous.

				 * vnode shard however, this relation can be somewhat ambiguous.

				 * We still provide some semblance of this by finding the ID in

				 * older generation that has token start < current ID token start.

				 * This will be a partial overlap, but it is the best we can do.

				@@ -526,7 +520,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl

				        // (see explanation above) since we want to find closest

				        // token boundary when determining parent.

				        // #7346 - we processed and searched children/parents in

				        // stored order, which is not neccesarily token order,

				        // stored order, which is not necessarily token order,

				        // so the finding of "closest" token boundary (using upper bound)

				        // could give somewhat weird results.

				        static auto token_cmp = [](const cdc::stream_id& id1, const cdc::stream_id& id2) {

				@@ -783,7 +777,7 @@ struct event_id {

				    cdc::stream_id stream;

				    utils::UUID timestamp;

				    static const auto marker = 'E';

				    static constexpr auto marker = 'E';

				    event_id(cdc::stream_id s, utils::UUID ts)

				        : stream(s)

				@@ -791,10 +785,8 @@ struct event_id {

				    {}

				    friend std::ostream& operator<<(std::ostream& os, const event_id& id) {

				        boost::io::ios_flags_saver fs(os);

				        return os << marker << std::hex << id.stream.to_bytes()

				            << ':' << id.timestamp

				            ;

				        fmt::print(os, "{}{}:{}", marker, id.stream.to_bytes(), id.timestamp);

				        return os;

				    }

				};

				}

				@@ -827,11 +819,13 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    }

				    if (!schema || !base || !is_alternator_keyspace(schema->ks_name())) {

				        throw api_error::resource_not_found(fmt::to_string(iter.table));

				        co_return api_error::resource_not_found(fmt::to_string(iter.table));

				    }

				    tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());

				    co_await verify_permission(client_state, schema, auth::permission::SELECT);

				    db::consistency_level cl = db::consistency_level::LOCAL_QUORUM;

				    partition_key pk = iter.shard.id.to_partition_key(*schema);

				@@ -896,7 +890,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),

				            query::tombstone_limit(_proxy.get_tombstone_limit()), query::row_limit(limit * mul));

				    return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(

				    co_return co_await _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(

				            [this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), start_time = std::move(start_time), limit, key_names = std::move(key_names), attr_names = std::move(attr_names), type, iter, high_ts] (service::storage_proxy::coordinator_query_result qr) mutable {       

				        cql3::selection::result_set_builder builder(*selection, gc_clock::now());

				        query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));

				@@ -1020,7 +1014,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				            // shard did end, then the next read will have nrecords == 0 and

				            // will notice end end of shard and not return NextShardIterator.

				            rjson::add(ret, "NextShardIterator", next_iter);

				            _stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);

				            _stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);

				            return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));

				        }

				@@ -1043,7 +1037,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				                shard_iterator next_iter(iter.table, iter.shard, utils::UUID_gen::min_time_UUID(high_ts.time_since_epoch()), true);

				                rjson::add(ret, "NextShardIterator", iter);

				            }

				            _stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);

				            _stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);

				            if (is_big(ret)) {

				                return make_ready_future<executor::request_return_type>(make_streamed(std::move(ret)));

				            }

				@@ -1061,9 +1055,6 @@ void executor::add_stream_options(const rjson::value& stream_specification, sche

				    if (stream_enabled->GetBool()) {

				        auto db = sp.data_dictionary();

				        if (!db.features().cdc) {

				            throw api_error::validation("StreamSpecification: streams (CDC) feature not enabled in cluster.");

				        }

				        if (!db.features().alternator_streams) {

				            throw api_error::validation("StreamSpecification: alternator streams feature not enabled in cluster.");

				        }

									
										123

alternator/ttl.cc
									
												View File
												
				@@ -26,19 +26,19 @@

				#include "log.hh"

				#include "gc_clock.hh"

				#include "replica/database.hh"

				#include "service/client_state.hh"

				#include "service_permit.hh"

				#include "timestamp.hh"

				#include "service/storage_proxy.hh"

				#include "service/pager/paging_state.hh"

				#include "service/pager/query_pagers.hh"

				#include "gms/feature_service.hh"

				#include "sstables/types.hh"

				#include "mutation/mutation.hh"

				#include "types/types.hh"

				#include "types/map.hh"

				#include "utils/assert.hh"

				#include "utils/rjson.hh"

				#include "utils/big_decimal.hh"

				#include "utils/fb_utilities.hh"

				#include "cql3/selection/selection.hh"

				#include "cql3/values.hh"

				#include "cql3/query_options.hh"

				@@ -81,6 +81,11 @@ future<executor::request_return_type> executor::update_time_to_live(client_state

				        co_return api_error::validation("UpdateTimeToLive requires boolean Enabled");

				    }

				    bool enabled = v->GetBool();

				    // Alternator TTL doesn't yet work when the table uses tablets (#16567)

				    if (enabled && _proxy.local_db().find_keyspace(schema->ks_name()).get_replication_strategy().uses_tablets()) {

				        co_return api_error::validation("TTL not yet supported on a table using tablets (issue #16567). "

				            "Create a table with the tag 'experimental:initial_tablets' set to 'none' to use vnodes.");

				    }

				    v = rjson::find(*spec, "AttributeName");

				    if (!v || !v->IsString()) {

				        co_return api_error::validation("UpdateTimeToLive requires string AttributeName");

				@@ -94,6 +99,7 @@ future<executor::request_return_type> executor::update_time_to_live(client_state

				    }

				    sstring attribute_name(v->GetString(), v->GetStringLength());

				    co_await verify_permission(client_state, schema, auth::permission::ALTER);

				    co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [&](std::map<sstring, sstring>& tags_map) {

				        if (enabled) {

				            if (tags_map.contains(TTL_TAG_KEY)) {

				@@ -155,7 +161,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta

				// node owning this range as a "primary range" (the first node in the ring

				// with this range), but when this node is down, the secondary owner (the

				// second in the ring) may take over.

				// An expiration thread is reponsible for all tables which need expiration

				// An expiration thread is responsible for all tables which need expiration

				// scans. Currently, the different tables are scanned sequentially (not in

				// parallel).

				// The expiration thread scans item using CL=QUORUM to ensures that it reads

				@@ -309,7 +315,7 @@ static size_t random_offset(size_t min, size_t max) {

				// this range's primary node is down. For this we need to return not just

				// a list of this node's secondary ranges - but also the primary owner of

				// each of those ranges.

				static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary_ranges(

				static future<std::vector<std::pair<dht::token_range, gms::inet_address>>> get_secondary_ranges(

				        const locator::effective_replication_map_ptr& erm,

				        gms::inet_address ep) {

				    const auto& tm = *erm->get_token_metadata_ptr();

				@@ -320,6 +326,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				    }

				    auto prev_tok = sorted_tokens.back();

				    for (const auto& tok : sorted_tokens) {

				        co_await coroutine::maybe_yield();

				        inet_address_vector_replica_set eps = erm->get_natural_endpoints(tok);

				        if (eps.size() <= 1 || eps[1] != ep) {

				            prev_tok = tok;

				@@ -347,7 +354,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				        }

				        prev_tok = tok;

				    }

				    return ret;

				    co_return ret;

				}

				@@ -380,65 +387,68 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				// the chances of covering all ranges during a scan when restarts occur.

				// A more deterministic way would be to regularly persist the scanning state,

				// but that incurs overhead that we want to avoid if not needed.

				enum primary_or_secondary_t {primary, secondary};

				template<primary_or_secondary_t primary_or_secondary>

				class token_ranges_owned_by_this_shard {

				    // ranges_holder_primary holds just the primary ranges themselves

				    class ranges_holder_primary {

				        const dht::token_range_vector _token_ranges;

				     public:

				        ranges_holder_primary(const locator::vnode_effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)

				            : _token_ranges(erm->get_primary_ranges(ep)) {}

				        std::size_t size() const { return _token_ranges.size(); }

				        const dht::token_range& operator[](std::size_t i) const {

				            return _token_ranges[i];

				        }

				        bool should_skip(std::size_t i) const {

				            return false;

				        }

				    };

				    // ranges_holder<secondary> holds the secondary token ranges plus each

				    // range's primary owner, needed to implement should_skip().

				    class ranges_holder_secondary {

				        std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;

				        gms::gossiper& _gossiper;

				     public:

				        ranges_holder_secondary(const locator::effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)

				            : _token_ranges(get_secondary_ranges(erm, ep))

				            , _gossiper(g) {}

				        std::size_t size() const { return _token_ranges.size(); }

				        const dht::token_range& operator[](std::size_t i) const {

				            return _token_ranges[i].first;

				        }

				        // range i should be skipped if its primary owner is alive.

				        bool should_skip(std::size_t i) const {

				            return _gossiper.is_alive(_token_ranges[i].second);

				        }

				    };

				//

				// FIXME: Check if this algorithm is safe with tablet migration.

				// https://github.com/scylladb/scylladb/issues/16567

				// ranges_holder_primary holds just the primary ranges themselves

				class ranges_holder_primary {

				    dht::token_range_vector _token_ranges;

				public:

				    explicit ranges_holder_primary(dht::token_range_vector token_ranges) : _token_ranges(std::move(token_ranges)) {}

				    static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map_ptr& erm, gms::inet_address ep) {

				        co_return ranges_holder_primary(co_await erm->get_primary_ranges(ep));

				    }

				    std::size_t size() const { return _token_ranges.size(); }

				    const dht::token_range& operator[](std::size_t i) const {

				        return _token_ranges[i];

				    }

				    bool should_skip(std::size_t i) const {

				        return false;

				    }

				};

				// ranges_holder<secondary> holds the secondary token ranges plus each

				// range's primary owner, needed to implement should_skip().

				class ranges_holder_secondary {

				    std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;

				    const gms::gossiper& _gossiper;

				public:

				    explicit ranges_holder_secondary(std::vector<std::pair<dht::token_range, gms::inet_address>> token_ranges, const gms::gossiper& g)

				        : _token_ranges(std::move(token_ranges))

				        , _gossiper(g) {}

				    static future<ranges_holder_secondary> make(const locator::effective_replication_map_ptr& erm, gms::inet_address ep, const gms::gossiper& g) {

				        co_return ranges_holder_secondary(co_await get_secondary_ranges(erm, ep), g);

				    }

				    std::size_t size() const { return _token_ranges.size(); }

				    const dht::token_range& operator[](std::size_t i) const {

				        return _token_ranges[i].first;

				    }

				    // range i should be skipped if its primary owner is alive.

				    bool should_skip(std::size_t i) const {

				        return _gossiper.is_alive(_token_ranges[i].second);

				    }

				};

				template<class primary_or_secondary_t>

				class token_ranges_owned_by_this_shard {

				    schema_ptr _s;

				    locator::effective_replication_map_ptr _erm;

				    // _token_ranges will contain a list of token ranges owned by this node.

				    // We'll further need to split each such range to the pieces owned by

				    // the current shard, using _intersecter.

				    using ranges_holder = std::conditional_t<

				            primary_or_secondary == primary_or_secondary_t::primary,

				            ranges_holder_primary,

				            ranges_holder_secondary>;

				    const ranges_holder _token_ranges;

				    const primary_or_secondary_t _token_ranges;

				    // NOTICE: _range_idx is used modulo _token_ranges size when accessing

				    // the data to ensure that it doesn't go out of bounds

				    size_t _range_idx;

				    size_t _end_idx;

				    std::optional<dht::selective_token_range_sharder> _intersecter;

				    locator::effective_replication_map_ptr _erm;

				public:

				    token_ranges_owned_by_this_shard(replica::database& db, gms::gossiper& g, schema_ptr s)

				    token_ranges_owned_by_this_shard(schema_ptr s, primary_or_secondary_t token_ranges)

				        :  _s(s)

				        , _token_ranges(db.find_keyspace(s->ks_name()).get_effective_replication_map(),

				                g, utils::fb_utilities::get_broadcast_address())

				        , _erm(s->table().get_effective_replication_map())

				        , _token_ranges(std::move(token_ranges))

				        , _range_idx(random_offset(0, _token_ranges.size() - 1))

				        , _end_idx(_range_idx + _token_ranges.size())

				        , _erm(s->table().get_effective_replication_map())

				    {

				        tlogger.debug("Generating token ranges starting from base range {} of {}", _range_idx, _token_ranges.size());

				    }

				@@ -492,6 +502,7 @@ struct scan_ranges_context {

				    bytes column_name;

				    std::optional<std::string> member;

				    service::client_state internal_client_state;

				    ::shared_ptr<cql3::selection::selection> selection;

				    std::unique_ptr<service::query_state> query_state_ptr;

				    std::unique_ptr<cql3::query_options> query_options;

				@@ -501,6 +512,7 @@ struct scan_ranges_context {

				        : s(s)

				        , column_name(column_name)

				        , member(member)

				        , internal_client_state(service::client_state::internal_tag())

				    {

				        // FIXME: don't read the entire items - read only parts of it.

				        // We must read the key columns (to be able to delete) and also

				@@ -519,10 +531,9 @@ struct scan_ranges_context {

				        std::vector<query::clustering_range> ck_bounds{query::clustering_range::make_open_ended_both_sides()};

				        auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), opts);

				        command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				        executor::client_state client_state{executor::client_state::internal_tag()};

				        tracing::trace_state_ptr trace_state;

				        // NOTICE: empty_service_permit is used because the TTL service has fixed parallelism

				        query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, empty_service_permit());

				        query_state_ptr = std::make_unique<service::query_state>(internal_client_state, trace_state, empty_service_permit());

				        // FIXME: What should we do on multi-DC? Will we run the expiration on the same ranges on all

				        // DCs or only once for each range? If the latter, we need to change the CLs in the

				        // scanner and deleter.

				@@ -545,7 +556,7 @@ static future<> scan_table_ranges(

				        expiration_service::stats& expiration_stats)

				{

				    const schema_ptr& s = scan_ctx.s;

				    assert (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.

				    SCYLLA_ASSERT (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.

				    auto p = service::pager::query_pagers::pager(proxy, s, scan_ctx.selection, *scan_ctx.query_state_ptr,

				            *scan_ctx.query_options, scan_ctx.command, std::move(partition_ranges), nullptr);

				    while (!p->is_exhausted()) {

				@@ -718,7 +729,9 @@ static future<bool> scan_table(

				    expiration_stats.scan_table++;

				    // FIXME: need to pace the scan, not do it all at once.

				    scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};

				    token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), gossiper, s);

				    auto erm = db.real_database().find_keyspace(s->ks_name()).get_vnode_effective_replication_map();

				    auto my_address = erm->get_topology().my_address();

				    token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_address));

				    while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {

				        // Note that because of issue #9167 we need to run a separate

				        // query on each partition range, and can't pass several of

				@@ -738,7 +751,7 @@ static future<bool> scan_table(

				    // by tasking another node to take over scanning of the dead node's primary

				    // ranges. What we do here is that this node will also check expiration

				    // on its *secondary* ranges - but only those whose primary owner is down.

				    token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), gossiper, s);

				    token_ranges_owned_by_this_shard my_secondary_ranges(s, co_await ranges_holder_secondary::make(erm, my_address, gossiper));

				    while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {

				        expiration_stats.secondary_ranges_scanned++;

				        dht::partition_range_vector partition_ranges;

									
										14

api/CMakeLists.txt
									
												View File
												
				@@ -7,6 +7,7 @@ set(swagger_files

				  api-doc/commitlog.json

				  api-doc/compaction_manager.json

				  api-doc/config.json

				  api-doc/cql_server_test.json

				  api-doc/endpoint_snitch_info.json

				  api-doc/error_injection.json

				  api-doc/failure_detector.json

				@@ -15,10 +16,12 @@ set(swagger_files

				  api-doc/lsa.json

				  api-doc/messaging_service.json

				  api-doc/metrics.json

				  api-doc/raft.json

				  api-doc/storage_proxy.json

				  api-doc/storage_service.json

				  api-doc/stream_manager.json

				  api-doc/system.json

				  api-doc/tasks.json

				  api-doc/task_manager.json

				  api-doc/task_manager_test.json

				  api-doc/utils.json)

				@@ -44,6 +47,7 @@ target_sources(api

				    commitlog.cc

				    compaction_manager.cc

				    config.cc

				    cql_server_test.cc

				    endpoint_snitch.cc

				    error_injection.cc

				    authorization_cache.cc

				@@ -52,12 +56,15 @@ target_sources(api

				    hinted_handoff.cc

				    lsa.cc

				    messaging_service.cc

				    raft.cc

				    storage_proxy.cc

				    storage_service.cc

				    stream_manager.cc

				    system.cc

				    tasks.cc

				    task_manager.cc

				    task_manager_test.cc

				    token_metadata.cc

				    ${swagger_gen_files})

				target_include_directories(api

				  PUBLIC

				@@ -66,6 +73,9 @@ target_include_directories(api

				target_link_libraries(api

				  idl

				  wasmtime_bindings

				  Seastar::seastar

				  xxHash::xxhash)

				  xxHash::xxhash

				  absl::headers)

				check_headers(check-headers api

				  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

									
										4

api/api-doc/collectd.json
									
												View File
												
				@@ -67,7 +67,7 @@

				               "parameters":[

				                  {

				                     "name":"pluginid",

				                     "description":"The plugin ID, describe the component the metric belongs to. Examples are cache, thrift, etc'. Regex are supported.The plugin ID, describe the component the metric belong to. Examples are: cache, thrift etc'. regex are supported",

				                     "description":"The plugin ID, describe the component the metric belongs to. Examples are cache and alternator, etc'. Regex are supported.",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -199,4 +199,4 @@

				         }

				      }

				   }

				}

				}

									
										10

api/api-doc/column_family.json
									
												View File
												
				@@ -92,6 +92,14 @@

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"consider_only_existing_data",

				                     "description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"split_output",

				                     "description":"true if the output of the major compaction should be split in several sstables",

				@@ -211,7 +219,7 @@

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Sets the minumum and maximum number of sstables in queue before compaction kicks off",

				               "summary":"Sets the minimum and maximum number of sstables in queue before compaction kicks off",

				               "type":"string",

				               "nickname":"set_compaction_threshold",

				               "produces":[

									
										15

api/api-doc/commitlog.json
									
												View File
												
				@@ -144,6 +144,21 @@

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/commitlog/metrics/max_disk_size",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get max disk size",

				          "type": "long",

				          "nickname": "get_max_disk_size",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    }

				   ]

				}

									
										26

api/api-doc/cql_server_test.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,26 @@

				{

				    "apiVersion":"0.0.1",

				    "swaggerVersion":"1.2",

				    "basePath":"{{Protocol}}://{{Host}}",

				    "resourcePath":"/cql_server_test",

				    "produces":[

				        "application/json"

				    ],

				    "apis":[

				        {

				            "path":"/cql_server_test/connections_params",

				            "operations":[

				                {

				                    "method":"GET",

				                    "summary":"Get service level params of each CQL connection",

				                    "type":"connections_service_level_params",

				                    "nickname":"connections_params",

				                    "produces":[

				                        "application/json"

				                    ],

				                    "parameters":[]

				                }

				            ]

				        }

				    ]

				}

									
										80

api/api-doc/error_injection.json
									
												View File
												
				@@ -63,6 +63,28 @@

				                     "paramType":"path"

				                  }

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Read the state of an injection from all shards",

				               "type":"array",

				               "items":{

				                  "type":"error_injection_info"

				               },

				               "nickname":"read_injection",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"injection",

				                     "description":"injection name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      },

				@@ -90,6 +112,30 @@

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/disconnect/{ip}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Drop connection to a given IP",

				               "type":"void",

				               "nickname":"inject_disconnect",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ip",

				                     "description":"IP address to disconnect from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/injection",

				         "operations":[

				@@ -128,5 +174,39 @@

				            }

				         }

				      }

				   },

				   "models":{

				      "mapper":{

				         "id":"mapper",

				         "description":"A key value mapping",

				         "properties":{

				            "key":{

				               "type":"string",

				               "description":"The key"

				            },

				            "value":{

				               "type":"string",

				               "description":"The value"

				            }

				         }

				      },

				       "error_injection_info":{

				         "id":"error_injection_info",

				         "description":"Information about an error injection",

				         "properties":{

				            "enabled":{

				               "type":"boolean",

				               "description":"Is the error injection enabled"

				            },

				            "parameters":{

				               "type":"array",

				               "items":{

				                  "type":"mapper"

				               },

				               "description":"The parameter values"

				            }

				         },

				         "required":["enabled"]

				      }

				   }

				}

									
										4

api/api-doc/gossiper.json
									
												View File
												
				@@ -12,7 +12,7 @@

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get the addreses of the down endpoints",

				               "summary":"Get the addresses of the down endpoints",

				               "type":"array",

				               "items":{

				                  "type":"string"

				@@ -31,7 +31,7 @@

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get the addreses of live endpoints",

				               "summary":"Get the addresses of live endpoints",

				               "type":"array",

				               "items":{

				                  "type":"string"

									
										6

api/api-doc/metrics.def.json
									
												View File
												
				@@ -7,11 +7,11 @@

				                "items": {

				                    "type": "string"

				                },

				                "description": "The source labels, a match is based on concatination of the labels"

				                "description": "The source labels, a match is based on concatenation of the labels"

				            },

				            "action": {

				                "type": "string",

				                "description": "The action to perfrom on match",

				                "description": "The action to perform on match",

				                "enum": ["skip_when_empty", "report_when_empty", "replace", "keep", "drop", "drop_label"]

				            },

				            "target_label": {

				@@ -28,7 +28,7 @@

				            },

				            "separator": {

				                "type": "string",

				                "description": "The separator string to use when concatinating the labels"

				                "description": "The separator string to use when concatenating the labels"

				            }

				        }

				    }

									
										56

api/api-doc/raft.json
									
												View File
												
				@@ -38,6 +38,62 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/raft/leader_host",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Returns host ID of the current leader of the given Raft group",

				               "type":"string",

				               "nickname":"get_leader_host",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"group_id",

				                     "description":"The ID of the group. When absent, group0 is used.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path": "/raft/read_barrier",

				         "operations": [

				            {

				               "method": "POST",

				               "summary": "Triggers read barrier for the given Raft group to wait for previously committed commands in this group to be applied locally. For example, can be used on group 0 to wait for the node to obtain latest schema changes.",

				               "type": "string",

				               "nickname": "read_barrier",

				               "produces": [

				                  "application/json"

				               ],

				               "parameters": [

				                  {

				                     "name": "group_id",

				                     "description": "The ID of the group. When absent, group0 is used.",

				                     "required": false,

				                     "allowMultiple": false,

				                     "type": "string",

				                     "paramType": "query"

				                  },

				                  {

				                     "name": "timeout",

				                     "description": "Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",

				                     "required": false,

				                     "allowMultiple": false,

				                     "type": "long",

				                     "paramType": "query"

				                  }

				               ]

				            }

				         ]

				      }

				   ]

				}

									
										526

api/api-doc/storage_service.json
									
												View File
												
				@@ -90,7 +90,7 @@

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Returns a list of the tokens endpoint mapping",

				               "summary":"Returns a list of the tokens endpoint mapping, provide keyspace and cf param to get tablet mapping",

				               "type":"array",

				               "items":{

				                  "type":"mapper"

				@@ -100,6 +100,22 @@

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to provide the tablet mapping for",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"The table to provide the tablet mapping for",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				@@ -336,6 +352,14 @@

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Column family name",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -368,25 +392,6 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/describe_ring/",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"The TokenRange for a any keyspace",

				               "type":"array",

				               "items":{

				                  "type":"token_range"

				               },

				               "nickname":"describe_any_ring",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/describe_ring/{keyspace}",

				         "operations":[

				@@ -409,6 +414,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"table",

				                     "description":"The name of table to fetch information about",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -436,6 +449,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Column family name",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -720,11 +741,123 @@

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"consider_only_existing_data",

				                     "description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				          "path":"/storage_service/backup",

				          "operations":[

				              {

				                  "method":"POST",

				                  "summary":"Starts copying SSTables from a specified keyspace to a designated bucket in object storage",

				                  "type":"string",

				                  "nickname":"start_backup",

				                  "produces":[

				                      "application/json"

				                  ],

				                  "parameters":[

				                      {

				                          "name":"endpoint",

				                          "description":"ID of the configured object storage endpoint to copy sstables to",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"bucket",

				                          "description":"Name of the bucket to backup sstables to",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"keyspace",

				                          "description":"Name of a keyspace to copy sstables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"snapshot",

				                          "description":"Name of a snapshot to copy sstables from",

				                          "required":false,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      }

				                  ]

				              }

				          ]

				      },

				      {

				          "path":"/storage_service/restore",

				          "operations":[

				              {

				                  "method":"POST",

				                  "summary":"Starts copying SSTables from a designated bucket in object storage to a specified keyspace",

				                  "type":"string",

				                  "nickname":"start_restore",

				                  "produces":[

				                      "application/json"

				                  ],

				                  "parameters":[

				                      {

				                          "name":"endpoint",

				                          "description":"ID of the configured object storage endpoint to copy SSTables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"bucket",

				                          "description":"Name of the bucket to read SSTables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"snapshot",

				                          "description":"Name of a snapshot to copy SSTables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"keyspace",

				                          "description":"Name of a keyspace to copy SSTables to",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"table",

				                          "description":"Name of a table to copy SSTables to",

				                          "required":false,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      }

				                  ]

				              }

				          ]

				      },

				      {

				         "path":"/storage_service/keyspace_compaction/{keyspace}",

				         "operations":[

				@@ -739,7 +872,7 @@

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "description":"The keyspace to compact",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -760,6 +893,14 @@

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"consider_only_existing_data",

				                     "description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -779,7 +920,7 @@

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "description":"The keyspace to cleanup",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -797,6 +938,21 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/cleanup_all",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger a global cleanup",

				               "type":"long",

				               "nickname":"cleanup_all",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/keyspace_offstrategy_compaction/{keyspace}",

				         "operations":[

				@@ -1169,6 +1325,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"small_table_optimization",

				                     "description":"If the value is the string 'true' with any capitalization, perform small table optimization. When this option is enabled, user can send the repair request to any of the nodes in the cluster. There is no need to send repair requests to multiple nodes. All token ranges for the table will be repaired automatically.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            },

				@@ -1502,6 +1666,15 @@

				                     "type":"string",

				                     "enum": [ "all", "user", "non_local_strategy" ],

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"replication",

				                     "description":"Filter keyspaces for the replication used: vnodes or tablets (default: all)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "enum": [ "all", "vnodes", "tablets" ],

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -1636,33 +1809,11 @@

				      {

				         "path":"/storage_service/rpc_server",

				         "operations":[

				            {

				               "method":"DELETE",

				               "summary":"Allows a user to disable thrift",

				               "type":"void",

				               "nickname":"stop_rpc_server",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"POST",

				               "summary":"allows a user to reenable thrift",

				               "type":"void",

				               "nickname":"start_rpc_server",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Determine if thrift is running",

				               "type":"boolean",

				               "nickname":"is_rpc_server_running",

				               "nickname":"is_thrift_server_running",

				               "produces":[

				                  "application/json"

				               ],

				@@ -1860,6 +2011,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"Enforce the source_dc option, even if it unsafe to use for rebuild",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -2017,7 +2176,7 @@

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Enables/Disables tracing for the whole system. Only thrift requests can start tracing currently",

				               "summary":"Enables/Disables tracing for the whole system.",

				               "type":"void",

				               "nickname":"set_trace_probability",

				               "produces":[

				@@ -2457,6 +2616,254 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/move",

				         "operations":[

				            {

				               "nickname":"move_tablet",

				               "method":"POST",

				               "summary":"Moves a tablet replica",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ks",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"Table name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"token",

				                     "description":"Token owned by the tablet to move",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"src_host",

				                     "description":"Source host id",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_host",

				                     "description":"Destination host id",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"src_shard",

				                     "description":"Source shard number",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_shard",

				                     "description":"Destination shard number",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"When set to true, replication strategy constraints can be broken (false by default)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/add_replica",

				         "operations":[

				            {

				               "nickname":"add_tablet_replica",

				               "method":"POST",

				               "summary":"Adds replica to tablet",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ks",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"Table name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"token",

				                     "description":"Token owned by the tablet to add replica to",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_host",

				                     "description":"Destination host id",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_shard",

				                     "description":"Destination shard number",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"When set to true, replication strategy constraints can be broken (false by default)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/del_replica",

				         "operations":[

				            {

				               "nickname":"del_tablet_replica",

				               "method":"POST",

				               "summary":"Deletes replica from tablet",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ks",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"Table name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"token",

				                     "description":"Token owned by the tablet to delete replica from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"host",

				                     "description":"Host id to remove replica from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"shard",

				                     "description":"Shard number to remove replica from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"When set to true, replication strategy constraints can be broken (false by default)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/balancing",

				         "operations":[

				            {

				               "nickname":"tablet_balancing_enable",

				               "method":"POST",

				               "summary":"Controls tablet load-balancing",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"enabled",

				                     "description":"When set to false, tablet load balancing is disabled",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/quiesce_topology",

				         "operations":[

				            {

				               "nickname":"quiesce_topology",

				               "method":"POST",

				               "summary":"Waits until there are no ongoing topology operations. Guarantees that topology operations which started before the call are finished after the call. This doesn't consider requested but not started operations. Such operations may start after the call succeeds.",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/metrics/total_hints",

				         "operations":[

				@@ -2558,6 +2965,33 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/raft_topology/upgrade",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger the upgrade to topology on raft.",

				               "type":"void",

				               "nickname":"upgrade_to_raft_topology",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Get information about the current upgrade status of topology on raft.",

				               "type":"string",

				               "nickname":"raft_topology_upgrade_status",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      }

				   ],

				   "models":{

									
										30

api/api-doc/system.json
									
												View File
												
				@@ -179,6 +179,36 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/system/dump_llvm_profile",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Dump llvm profile data (raw profile data) that can later be used for coverage reporting or PGO (no-op if the current binary is not instrumented)",

				               "type":"void",

				               "nickname":"dump_profile",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/system/highest_supported_sstable_version",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get highest supported sstable version",

				               "type":"string",

				               "nickname":"get_highest_supported_sstable_version",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      }

				   ]

				}

									
										79

api/api-doc/task_manager.json
									
												View File
												
				@@ -115,7 +115,7 @@

				               "parameters":[

				                  {

				                     "name":"task_id",

				                     "description":"The uuid of a task to abort",

				                     "description":"The uuid of a task to abort; if the task is not abortable, 403 status code is returned",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -144,6 +144,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"timeout",

				                     "description":"Timeout for waiting; if times out, 408 status code is returned",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"long",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -197,11 +205,60 @@

				                     "paramType":"query"

				                  }

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Get current ttl value",

				               "type":"long",

				               "nickname":"get_ttl",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/task_manager/drain/{module}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Drain finished local tasks",

				               "type":"void",

				               "nickname":"drain_tasks",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"module",

				                     "description":"The module to drain",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      }

				   ],

				   "models":{

				      "task_identity":{

				         "id": "task_identity",

				         "description":"Id and node of a task",

				         "properties":{

				            "task_id":{

				               "type":"string",

				               "description":"The uuid of a task"

				            },

				            "node":{

				               "type":"string",

				               "description":"Address of a server on which a task is created"

				            }

				         }

				      },

				      "task_stats" :{

				         "id": "task_stats",

				         "description":"A task statistics object",

				@@ -224,6 +281,14 @@

				               "type":"string",

				               "description":"The description of the task"

				            },

				            "kind":{

				               "type":"string",

				               "enum":[

				                  "node",

				                  "cluster"

				               ],

				               "description":"The kind of a task"

				            },

				            "scope":{

				               "type":"string",

				               "description":"The scope of the task"

				@@ -258,6 +323,14 @@

				               "type":"string",

				               "description":"The description of the task"

				            },

				            "kind":{

				               "type":"string",

				               "enum":[

				                  "node",

				                  "cluster"

				               ],

				               "description":"The kind of a task"

				            },

				            "scope":{

				               "type":"string",

				               "description":"The scope of the task"

				@@ -327,9 +400,9 @@

				            "children_ids":{

				               "type":"array",

				               "items":{

				                  "type":"string"

				                  "type":"task_identity"

				               },

				               "description":"Task IDs of children of this task"

				               "description":"Task identities of children of this task"

				            }

				         }

				      }

									
										230

api/api-doc/tasks.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,230 @@

				{

				   "apiVersion":"0.0.1",

				   "swaggerVersion":"1.2",

				   "basePath":"{{Protocol}}://{{Host}}",

				   "resourcePath":"/tasks",

				   "produces":[

				      "application/json"

				   ],

				   "apis":[

				      {

				         "path":"/tasks/compaction/keyspace_compaction/{keyspace}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Forces major compaction of a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",

				               "type":"string",

				               "nickname":"force_keyspace_compaction_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"flush_memtables",

				                     "description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_cleanup/{keyspace}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger a cleanup of keys on a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",

				               "type": "string",

				               "nickname":"force_keyspace_cleanup_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_offstrategy_compaction/{keyspace}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Perform offstrategy compaction, if needed, in a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",

				               "type":"string",

				               "nickname":"perform_keyspace_offstrategy_compaction_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to operate on",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_scrub/{keyspace}",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Scrub (deserialize + reserialize at the latest version, resolving corruptions if any) the given keyspace asynchronously, returns uuid which can be used to check progress with task manager. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false. Scrub has the following modes: Abort (default) - abort scrub if corruption is detected; Skip (same as `skip_corrupted=true`) skip over corrupt data, omitting them from the output; Segregate - segregate data into multiple sstables if needed, such that each sstable contains data with valid order; Validate - read (no rewrite) and validate data, logging any problems found.",

				               "type": "string",

				               "nickname":"scrub_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"disable_snapshot",

				                     "description":"When set to true, disable snapshot",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"skip_corrupted",

				                     "description":"When set to true, skip corrupted",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"scrub_mode",

				                     "description":"How to handle corrupt data (overrides 'skip_corrupted'); ",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "enum":[

				                        "ABORT",

				                        "SKIP",

				                        "SEGREGATE",

				                        "VALIDATE"

				                     ],

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"quarantine_mode",

				                     "description":"Controls whether to scrub quarantined sstables (default INCLUDE)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "enum":[

				                        "INCLUDE",

				                        "EXCLUDE",

				                        "ONLY"

				                     ],

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_upgrade_sstables/{keyspace}",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first asynchronously, returns uuid which can be used to check progress with task manager.",

				               "type": "string",

				               "nickname":"upgrade_sstables_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"exclude_current_version",

				                     "description":"When set to true exclude current version",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      }

				   ]

				}

									
										2

api/api-doc/utils.json
									
												View File
												
				@@ -75,7 +75,7 @@

				               "items":{

				                  "type":"double"

				               },

				               "description":"One, five and fifteen mintues rates"

				               "description":"One, five and fifteen minutes rates"

				            },

				            "mean_rate": {

				               "type":"double",

									
										123

api/api.cc
									
												View File
												
				@@ -10,7 +10,9 @@

				#include <seastar/http/file_handler.hh>

				#include <seastar/http/transformers.hh>

				#include <seastar/http/api_docs.hh>

				#include "cql_server_test.hh"

				#include "storage_service.hh"

				#include "token_metadata.hh"

				#include "commitlog.hh"

				#include "gossiper.hh"

				#include "failure_detector.hh"

				@@ -31,6 +33,7 @@

				#include "api/config.hh"

				#include "task_manager.hh"

				#include "task_manager_test.hh"

				#include "tasks.hh"

				#include "raft.hh"

				logging::logger apilog("api");

				@@ -66,6 +69,13 @@ future<> set_server_init(http_context& ctx) {

				                "The system related API");

				        rb02->add_definitions_file(r, "metrics");

				        set_system(ctx, r);

				        rb->register_function(r, "error_injection",

				            "The error injection API");

				        set_error_injection(ctx, r);

				        rb->register_function(r, "storage_proxy",

				                "The storage proxy API");

				        rb->register_function(r, "storage_service",

				                "The storage service API");

				    });

				}

				@@ -76,6 +86,10 @@ future<> set_server_config(http_context& ctx, const db::config& cfg) {

				    });

				}

				future<> unset_server_config(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_config(ctx, r); });

				}

				static future<> register_api(http_context& ctx, const sstring& api_name,

				        const sstring api_desc,

				        std::function<void(http_context& ctx, routes& r)> f) {

				@@ -95,16 +109,16 @@ future<> unset_transport_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_transport_controller(ctx, r); });

				}

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl) {

				    return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_rpc_controller(ctx, r, ctl); });

				future<> set_thrift_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { set_thrift_controller(ctx, r); });

				}

				future<> unset_rpc_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_rpc_controller(ctx, r); });

				future<> unset_thrift_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_thrift_controller(ctx, r); });

				}

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {

				    return register_api(ctx, "storage_service", "The storage service API", [&ss, &group0_client] (http_context& ctx, routes& r) {

				    return ctx.http_server.set_routes([&ctx, &ss, &group0_client] (routes& r) {

				            set_storage_service(ctx, r, ss, group0_client);

				        });

				}

				@@ -113,6 +127,14 @@ future<> unset_server_storage_service(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_storage_service(ctx, r); });

				}

				future<> set_load_meter(http_context& ctx, service::load_meter& lm) {

				    return ctx.http_server.set_routes([&ctx, &lm] (routes& r) { set_load_meter(ctx, r, lm); });

				}

				future<> unset_load_meter(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_load_meter(ctx, r); });

				}

				future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader) {

				    return ctx.http_server.set_routes([&ctx, &sst_loader] (routes& r) { set_sstables_loader(ctx, r, sst_loader); });

				}

				@@ -156,6 +178,14 @@ future<> unset_server_snapshot(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });

				}

				future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm) {

				    return ctx.http_server.set_routes([&ctx, &tm] (routes& r) { set_token_metadata(ctx, r, tm); });

				}

				future<> unset_server_token_metadata(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_token_metadata(ctx, r); });

				}

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch) {

				    return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", [&snitch] (http_context& ctx, routes& r) {

				        set_endpoint_snitch(ctx, r, snitch);

				@@ -167,20 +197,31 @@ future<> unset_server_snitch(http_context& ctx) {

				}

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {

				    return register_api(ctx, "gossiper",

				    co_await register_api(ctx, "gossiper",

				                "The gossiper API", [&g] (http_context& ctx, routes& r) {

				                    set_gossiper(ctx, r, g.local());

				                });

				    co_await register_api(ctx, "failure_detector",

				                "The failure detector API", [&g] (http_context& ctx, routes& r) {

				                    set_failure_detector(ctx, r, g.local());

				                });

				}

				future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {

				future<> unset_server_gossip(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) {

				        unset_gossiper(ctx, r);

				        unset_failure_detector(ctx, r);

				    });

				}

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {

				    return register_api(ctx, "column_family",

				                "The column family API", [&sys_ks] (http_context& ctx, routes& r) {

				                    set_column_family(ctx, r, sys_ks);

				                });

				}

				future<> unset_server_load_sstable(http_context& ctx) {

				future<> unset_server_column_family(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });

				}

				@@ -195,10 +236,7 @@ future<> unset_server_messaging_service(http_context& ctx) {

				}

				future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy) {

				    return register_api(ctx, "storage_proxy",

				                "The storage proxy API", [&proxy] (http_context& ctx, routes& r) {

				                    set_storage_proxy(ctx, r, proxy);

				                });

				    return ctx.http_server.set_routes([&ctx, &proxy] (routes& r) { set_storage_proxy(ctx, r, proxy); });

				}

				future<> unset_server_storage_proxy(http_context& ctx) {

				@@ -221,6 +259,10 @@ future<> set_server_cache(http_context& ctx) {

				            "The cache service API", set_cache_service);

				}

				future<> unset_server_cache(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cache_service(ctx, r); });

				}

				future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& proxy) {

				    return register_api(ctx, "hinted_handoff",

				                "The hinted handoff API", [&proxy] (http_context& ctx, routes& r) {

				@@ -232,16 +274,6 @@ future<> unset_hinted_handoff(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_hinted_handoff(ctx, r); });

				}

				future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &g](routes& r) {

				        rb->register_function(r, "failure_detector",

				                "The failure detector API");

				        set_failure_detector(ctx, r, g.local());

				    });

				}

				future<> set_server_compaction_manager(http_context& ctx) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				@@ -265,36 +297,65 @@ future<> set_server_done(http_context& ctx) {

				        rb->register_function(r, "collectd",

				                "The collectd API");

				        set_collectd(ctx, r);

				        rb->register_function(r, "error_injection",

				                "The error injection API");

				        set_error_injection(ctx, r);

				    });

				}

				future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg) {

				future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &cfg = *cfg](routes& r) {

				    return ctx.http_server.set_routes([rb, &ctx, &tm, &cfg = *cfg](routes& r) {

				        rb->register_function(r, "task_manager",

				                "The task manager API");

				        set_task_manager(ctx, r, cfg);

				        set_task_manager(ctx, r, tm, cfg);

				    });

				}

				future<> unset_server_task_manager(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_task_manager(ctx, r); });

				}

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				future<> set_server_task_manager_test(http_context& ctx) {

				future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx](routes& r) mutable {

				    return ctx.http_server.set_routes([rb, &ctx, &tm](routes& r) mutable {

				        rb->register_function(r, "task_manager_test",

				                "The task manager test API");

				        set_task_manager_test(ctx, r);

				        set_task_manager_test(ctx, r, tm);

				    });

				}

				future<> unset_server_task_manager_test(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_task_manager_test(ctx, r); });

				}

				future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl) {

				    return register_api(ctx, "cql_server_test", "The CQL server test API", [&ctl] (http_context& ctx, routes& r) {

				        set_cql_server_test(ctx, r, ctl);

				    });

				}

				future<> unset_server_cql_server_test(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cql_server_test(ctx, r); });

				}

				#endif

				future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &ss, &snap_ctl](routes& r) {

				        rb->register_function(r, "tasks",

				                "The tasks API");

				        set_tasks_compaction_module(ctx, r, ss, snap_ctl);

				    });

				}

				future<> unset_server_tasks_compaction_module(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_tasks_compaction_module(ctx, r); });

				}

				future<> set_server_raft(http_context& ctx, sharded<service::raft_group_registry>& raft_gr) {

				    auto rb = std::make_shared<api_registry_builder>(ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &raft_gr] (routes& r) {

									
										28

api/api.hh
									
												View File
												
				@@ -14,11 +14,11 @@

				#include <boost/algorithm/string/split.hpp>

				#include <boost/algorithm/string/classification.hpp>

				#include <boost/units/detail/utility.hpp>

				#include "api/api_init.hh"

				#include "api/api-doc/utils.json.hh"

				#include "utils/histogram.hh"

				#include "utils/estimated_histogram.hh"

				#include <seastar/http/exception.hh>

				#include "api_init.hh"

				#include "seastarx.hh"

				namespace api {

				@@ -26,7 +26,9 @@ namespace api {

				template<class T>

				std::vector<sstring> container_to_vec(const T& container) {

				    std::vector<sstring> res;

				    for (auto i : container) {

				    res.reserve(std::size(container));

				    for (const auto& i : container) {

				        res.push_back(fmt::to_string(i));

				    }

				    return res;

				@@ -35,27 +37,31 @@ std::vector<sstring> container_to_vec(const T& container) {

				template<class T>

				std::vector<T> map_to_key_value(const std::map<sstring, sstring>& map) {

				    std::vector<T> res;

				    for (auto i : map) {

				    res.reserve(map.size());

				    for (const auto& [key, value] : map) {

				        res.push_back(T());

				        res.back().key = i.first;

				        res.back().value = i.second;

				        res.back().key = key;

				        res.back().value = value;

				    }

				    return res;

				}

				template<class T, class MAP>

				std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {

				    for (auto i : map) {

				    res.reserve(res.size() + std::size(map));

				    for (const auto& [key, value] : map) {

				        T val;

				        val.key = fmt::to_string(i.first);

				        val.value = fmt::to_string(i.second);

				        val.key = fmt::to_string(key);

				        val.value = fmt::to_string(value);

				        res.push_back(val);

				    }

				    return res;

				}

				template <typename T, typename S = T>

				T map_sum(T&& dest, const S& src) {

				    for (auto i : src) {

				    for (const auto& i : src) {

				        dest[i.first] += i.second;

				    }

				    return std::move(dest);

				@@ -64,6 +70,8 @@ T map_sum(T&& dest, const S& src) {

				template <typename MAP>

				std::vector<sstring> map_keys(const MAP& map) {

				    std::vector<sstring> res;

				    res.reserve(std::size(map));

				    for (const auto& i : map) {

				        res.push_back(fmt::to_string(i.first));

				    }

				@@ -238,7 +246,7 @@ public:

				                value = T{boost::lexical_cast<Base>(param)};

				            }

				        } catch (boost::bad_lexical_cast&) {

				            throw httpd::bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				            throw httpd::bad_param_exception(fmt::format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				        }

				    }

									
										46

api/api_init.hh
									
												View File
												
				@@ -33,6 +33,10 @@ namespace streaming {

				class stream_manager;

				}

				namespace gms {

				    class inet_address;

				}

				namespace locator {

				class token_metadata;

				@@ -42,7 +46,6 @@ class snitch_ptr;

				} // namespace locator

				namespace cql_transport { class controller; }

				class thrift_controller;

				namespace db {

				class snapshot_ctl;

				class config;

				@@ -62,6 +65,10 @@ class gossiper;

				namespace auth { class service; }

				namespace tasks {

				class task_manager;

				}

				namespace api {

				struct http_context {

				@@ -69,20 +76,16 @@ struct http_context {

				    sstring api_doc;

				    httpd::http_server_control http_server;

				    distributed<replica::database>& db;

				    service::load_meter& lmeter;

				    const sharded<locator::shared_token_metadata>& shared_token_metadata;

				    sharded<tasks::task_manager>& tm;

				    http_context(distributed<replica::database>& _db,

				            service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm, sharded<tasks::task_manager>& _tm)

				            : db(_db), lmeter(_lm), shared_token_metadata(_stm), tm(_tm) {

				    http_context(distributed<replica::database>& _db)

				            : db(_db)

				    {

				    }

				    const locator::token_metadata& get_token_metadata();

				};

				future<> set_server_init(http_context& ctx);

				future<> set_server_config(http_context& ctx, const db::config& cfg);

				future<> unset_server_config(http_context& ctx);

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);

				future<> unset_server_snitch(http_context& ctx);

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client&);

				@@ -95,15 +98,18 @@ future<> set_server_repair(http_context& ctx, sharded<repair_service>& repair);

				future<> unset_server_repair(http_context& ctx);

				future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);

				future<> unset_transport_controller(http_context& ctx);

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);

				future<> unset_rpc_controller(http_context& ctx);

				future<> set_thrift_controller(http_context& ctx);

				future<> unset_thrift_controller(http_context& ctx);

				future<> set_server_authorization_cache(http_context& ctx, sharded<auth::service> &auth_service);

				future<> unset_server_authorization_cache(http_context& ctx);

				future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);

				future<> unset_server_snapshot(http_context& ctx);

				future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm);

				future<> unset_server_token_metadata(http_context& ctx);

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);

				future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks);

				future<> unset_server_load_sstable(http_context& ctx);

				future<> unset_server_gossip(http_context& ctx);

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);

				future<> unset_server_column_family(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);

				future<> unset_server_messaging_service(http_context& ctx);

				future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy);

				@@ -112,13 +118,21 @@ future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_

				future<> unset_server_stream_manager(http_context& ctx);

				future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& p);

				future<> unset_hinted_handoff(http_context& ctx);

				future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);

				future<> set_server_cache(http_context& ctx);

				future<> unset_server_cache(http_context& ctx);

				future<> set_server_compaction_manager(http_context& ctx);

				future<> set_server_done(http_context& ctx);

				future<> set_server_task_manager(http_context& ctx, lw_shared_ptr<db::config> cfg);

				future<> set_server_task_manager_test(http_context& ctx);

				future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg);

				future<> unset_server_task_manager(http_context& ctx);

				future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);

				future<> unset_server_task_manager_test(http_context& ctx);

				future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);

				future<> unset_server_tasks_compaction_module(http_context& ctx);

				future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);

				future<> unset_server_raft(http_context&);

				future<> set_load_meter(http_context& ctx, service::load_meter& lm);

				future<> unset_load_meter(http_context& ctx);

				future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl);

				future<> unset_server_cql_server_test(http_context& ctx);

				}

									
										2

api/authorization_cache.cc
									
												View File
												
				@@ -9,8 +9,6 @@

				#include "api/api-doc/authorization_cache.json.hh"

				#include "api/authorization_cache.hh"

				#include "api/api.hh"

				#include "auth/common.hh"

				#include "auth/service.hh"

				namespace api {

									
										15

api/authorization_cache.hh
									
												View File
												
				@@ -8,11 +8,20 @@

				#pragma once

				#include "api.hh"

				#include <seastar/core/sharded.hh>

				namespace seastar::httpd {

				class routes;

				}

				namespace auth {

				class service;

				}

				namespace api {

				void set_authorization_cache(http_context& ctx, httpd::routes& r, sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_authorization_cache(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, seastar::httpd::routes& r);

				}

									
										56

api/cache_service.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include "cache_service.hh"

				#include "api/api.hh"

				#include "api/api-doc/cache_service.json.hh"

				#include "column_family.hh"

				@@ -195,9 +196,9 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return db.row_cache_tracker().region().occupancy().used_space();

				    cs::get_row_capacity.set(r, [] (std::unique_ptr<http::request> req) {

				        return seastar::map_reduce(smp::all_cpus(), [] (int cpu) {

				            return make_ready_future<uint64_t>(memory::stats().total_memory());

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				@@ -240,9 +241,9 @@ void set_cache_service(http_context& ctx, routes& r) {

				    cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        // In origin row size is the weighted size.

				        // We currently do not support weights, so we use num entries instead

				        // We currently do not support weights, so we use raw size in bytes instead

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return db.row_cache_tracker().partitions();

				            return db.row_cache_tracker().region().occupancy().used_space();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				@@ -319,5 +320,50 @@ void set_cache_service(http_context& ctx, routes& r) {

				    });

				}

				void unset_cache_service(http_context& ctx, routes& r) {

				    cs::get_row_cache_save_period_in_seconds.unset(r);

				    cs::set_row_cache_save_period_in_seconds.unset(r);

				    cs::get_key_cache_save_period_in_seconds.unset(r);

				    cs::set_key_cache_save_period_in_seconds.unset(r);

				    cs::get_counter_cache_save_period_in_seconds.unset(r);

				    cs::set_counter_cache_save_period_in_seconds.unset(r);

				    cs::get_row_cache_keys_to_save.unset(r);

				    cs::set_row_cache_keys_to_save.unset(r);

				    cs::get_key_cache_keys_to_save.unset(r);

				    cs::set_key_cache_keys_to_save.unset(r);

				    cs::get_counter_cache_keys_to_save.unset(r);

				    cs::set_counter_cache_keys_to_save.unset(r);

				    cs::invalidate_key_cache.unset(r);

				    cs::invalidate_counter_cache.unset(r);

				    cs::set_row_cache_capacity_in_mb.unset(r);

				    cs::set_key_cache_capacity_in_mb.unset(r);

				    cs::set_counter_cache_capacity_in_mb.unset(r);

				    cs::save_caches.unset(r);

				    cs::get_key_capacity.unset(r);

				    cs::get_key_hits.unset(r);

				    cs::get_key_requests.unset(r);

				    cs::get_key_hit_rate.unset(r);

				    cs::get_key_hits_moving_avrage.unset(r);

				    cs::get_key_requests_moving_avrage.unset(r);

				    cs::get_key_size.unset(r);

				    cs::get_key_entries.unset(r);

				    cs::get_row_capacity.unset(r);

				    cs::get_row_hits.unset(r);

				    cs::get_row_requests.unset(r);

				    cs::get_row_hit_rate.unset(r);

				    cs::get_row_hits_moving_avrage.unset(r);

				    cs::get_row_requests_moving_avrage.unset(r);

				    cs::get_row_size.unset(r);

				    cs::get_row_entries.unset(r);

				    cs::get_counter_capacity.unset(r);

				    cs::get_counter_hits.unset(r);

				    cs::get_counter_requests.unset(r);

				    cs::get_counter_hit_rate.unset(r);

				    cs::get_counter_hits_moving_avrage.unset(r);

				    cs::get_counter_requests_moving_avrage.unset(r);

				    cs::get_counter_size.unset(r);

				    cs::get_counter_entries.unset(r);

				}

				}

									
										8

api/cache_service.hh
									
												View File
												
				@@ -8,10 +8,14 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_cache_service(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_cache_service(http_context& ctx, seastar::httpd::routes& r);

				void unset_cache_service(http_context& ctx, seastar::httpd::routes& r);

				}

									
										2

api/collectd.cc
									
												View File
												
				@@ -10,9 +10,9 @@

				#include "api/api-doc/collectd.json.hh"

				#include <seastar/core/scollectd.hh>

				#include <seastar/core/scollectd_api.hh>

				#include "endian.h"

				#include <boost/range/irange.hpp>

				#include <regex>

				#include "api/api_init.hh"

				namespace api {

									
										7

api/collectd.hh
									
												View File
												
				@@ -8,10 +8,13 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_collectd(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_collectd(http_context& ctx, seastar::httpd::routes& r);

				}

									
										225

api/column_family.cc
									
												View File
												
				@@ -6,12 +6,16 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <fmt/ranges.h>

				#include "column_family.hh"

				#include "api/api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include <vector>

				#include <seastar/http/exception.hh>

				#include "sstables/sstables.hh"

				#include "sstables/metadata_collector.hh"

				#include "utils/assert.hh"

				#include "utils/estimated_histogram.hh"

				#include <algorithm>

				#include "db/system_keyspace.hh"

				@@ -27,6 +31,7 @@ using namespace httpd;

				using namespace json;

				namespace cf = httpd::column_family_json;

				namespace ss = httpd::storage_service_json;

				std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {

				    auto pos = name.find("%3A");

				@@ -78,6 +83,65 @@ future<json::json_return_type>  get_cf_stats(http_context& ctx,

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type> set_tables(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {

				    if (tables.empty()) {

				        tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				    }

				    return do_with(keyspace, std::move(tables), [&ctx, set] (const sstring& keyspace, const std::vector<sstring>& tables) {

				        return ctx.db.invoke_on_all([&keyspace, &tables, set] (replica::database& db) {

				            return parallel_for_each(tables, [&db, &keyspace, set] (const sstring& table) {

				                replica::table& t = db.find_column_family(keyspace, table);

				                return set(t);

				            });

				        });

				    }).then([] {

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				}

				class autocompaction_toggle_guard {

				    replica::database& _db;

				public:

				    autocompaction_toggle_guard(replica::database& db) : _db(db) {

				        SCYLLA_ASSERT(this_shard_id() == 0);

				        if (!_db._enable_autocompaction_toggle) {

				            throw std::runtime_error("Autocompaction toggle is busy");

				        }

				        _db._enable_autocompaction_toggle = false;

				    }

				    autocompaction_toggle_guard(const autocompaction_toggle_guard&) = delete;

				    autocompaction_toggle_guard(autocompaction_toggle_guard&&) = default;

				    ~autocompaction_toggle_guard() {

				        SCYLLA_ASSERT(this_shard_id() == 0);

				        _db._enable_autocompaction_toggle = true;

				    }

				};

				static future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    apilog.info("set_tables_autocompaction: enabled={} keyspace={} tables={}", enabled, keyspace, tables);

				    return ctx.db.invoke_on(0, [&ctx, keyspace, tables = std::move(tables), enabled] (replica::database& db) {

				        auto g = autocompaction_toggle_guard(db);

				        return set_tables(ctx, keyspace, tables, [enabled] (replica::table& cf) {

				            if (enabled) {

				                cf.enable_auto_compaction();

				            } else {

				                return cf.disable_auto_compaction();

				            }

				            return make_ready_future<>();

				        }).finally([g = std::move(g)] {});

				    });

				}

				static future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    apilog.info("set_tables_tombstone_gc: enabled={} keyspace={} tables={}", enabled, keyspace, tables);

				    return set_tables(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {

				        t.set_tombstone_gc_enabled(enabled);

				        return make_ready_future<>();

				    });

				}

				static future<json::json_return_type>  get_cf_stats_count(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {

				    return map_reduce_cf(ctx, name, int64_t(0), [f](const replica::column_family& cf) {

				@@ -303,10 +367,20 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared

				    return ratio_holder(f + sst->filter_get_recent_true_positive(), f);

				}

				uint64_t accumulate_on_active_memtables(replica::table& t, noncopyable_function<uint64_t(replica::memtable& mt)> action) {

				    uint64_t ret = 0;

				    t.for_each_active_memtable([&] (replica::memtable& mt) {

				        ret += action(mt);

				    });

				    return ret;

				}

				void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {

				    cf::get_column_family_name.set(r, [&ctx] (const_req req){

				        std::vector<sstring> res;

				        ctx.db.local().get_tables_metadata().for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {

				        const replica::database::tables_metadata& meta = ctx.db.local().get_tables_metadata();

				        res.reserve(meta.size());

				        meta.for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {

				            res.push_back(kscf.first + ":" + kscf.second);

				        });

				        return res;

				@@ -326,21 +400,23 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_column_family_name_keyspace.set(r, [&ctx] (const_req req){

				        std::vector<sstring> res;

				        for (auto i = ctx.db.local().get_keyspaces().cbegin(); i!=  ctx.db.local().get_keyspaces().cend(); i++) {

				            res.push_back(i->first);

				        const flat_hash_map<sstring, replica::keyspace>& keyspaces = ctx.db.local().get_keyspaces();

				        res.reserve(keyspaces.size());

				        for (const auto& i : keyspaces) {

				            res.push_back(i.first);

				        }

				        return res;

				    });

				    cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t{0}, [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				            return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));

				        }, std::plus<>());

				    });

				    cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t{0}, [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				            return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));

				        }, std::plus<>());

				    });

				@@ -354,33 +430,33 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().total_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().total_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				@@ -418,9 +494,9 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				@@ -759,24 +835,6 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        auto uuid = get_uuid(req->param["name"], ctx.db.local());

				        return ctx.db.local().find_column_family(uuid).get_snapshot_details().then([](

				                const std::unordered_map<sstring, replica::column_family::snapshot_details>& sd) {

				            int64_t res = 0;

				            for (auto i : sd) {

				                res += i.second.total;

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				@@ -872,26 +930,32 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/enable_auto_compaction: name={}", req->get_path_param("name"));

				        return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {

				            auto g = replica::database::autocompaction_toggle_guard(db);

				            return foreach_column_family(ctx, req->get_path_param("name"), [](replica::column_family &cf) {

				                cf.enable_auto_compaction();

				            }).then([g = std::move(g)] {

				                return make_ready_future<json::json_return_type>(json_void());

				            });

				        });

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_autocompaction(ctx, ks, {std::move(cf)}, true);

				    });

				    cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/disable_auto_compaction: name={}", req->get_path_param("name"));

				        return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {

				            auto g = replica::database::autocompaction_toggle_guard(db);

				            return foreach_column_family(ctx, req->get_path_param("name"), [](replica::column_family &cf) {

				                return cf.disable_auto_compaction();

				            }).then([g = std::move(g)] {

				                return make_ready_future<json::json_return_type>(json_void());

				            });

				        });

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_autocompaction(ctx, ks, {std::move(cf)}, false);

				    });

				    ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);

				        return set_tables_autocompaction(ctx, keyspace, tables, true);

				    });

				    ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);

				        return set_tables_autocompaction(ctx, keyspace, tables, false);

				    });

				    cf::get_tombstone_gc.set(r, [&ctx] (const_req req) {

				@@ -902,20 +966,32 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/enable_tombstone_gc: name={}", req->get_path_param("name"));

				        return foreach_column_family(ctx, req->get_path_param("name"), [](replica::table& t) {

				            t.set_tombstone_gc_enabled(true);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, true);

				    });

				    cf::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/disable_tombstone_gc: name={}", req->get_path_param("name"));

				        return foreach_column_family(ctx, req->get_path_param("name"), [](replica::table& t) {

				            t.set_tombstone_gc_enabled(false);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, false);

				    });

				    ss::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("enable_tombstone_gc: keyspace={} tables={}", keyspace, tables);

				        return set_tables_tombstone_gc(ctx, keyspace, tables, true);

				    });

				    ss::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("disable_tombstone_gc: keyspace={} tables={}", keyspace, tables);

				        return set_tables_tombstone_gc(ctx, keyspace, tables, false);

				    });

				    cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {

				@@ -1050,6 +1126,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        auto params = req_params({

				            std::pair("name", mandatory::yes),

				            std::pair("flush_memtables", mandatory::no),

				            std::pair("consider_only_existing_data", mandatory::no),

				            std::pair("split_output", mandatory::no),

				        });

				        params.process(*req);

				@@ -1058,7 +1135,8 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        }

				        auto [ks, cf] = parse_fully_qualified_cf_name(*params.get("name"));

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.info("column_family/force_major_compaction: name={} flush={}", req->get_path_param("name"), flush);

				        auto consider_only_existing_data = params.get_as<bool>("consider_only_existing_data").value_or(false);

				        apilog.info("column_family/force_major_compaction: name={} flush={} consider_only_existing_data={}", req->get_path_param("name"), flush, consider_only_existing_data);

				        auto keyspace = validate_keyspace(ctx, ks);

				        std::vector<table_info> table_infos = {table_info{

				@@ -1067,11 +1145,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        }};

				        auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();

				        std::optional<major_compaction_task_impl::flush_mode> fmopt;

				        if (!flush) {

				            fmopt = major_compaction_task_impl::flush_mode::skip;

				        std::optional<flush_mode> fmopt;

				        if (!flush && !consider_only_existing_data) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt);

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt, consider_only_existing_data);

				        co_await task->done();

				        co_return json_void();

				    });

				@@ -1151,8 +1229,6 @@ void unset_column_family(http_context& ctx, routes& r) {

				    cf::get_speculative_retries.unset(r);

				    cf::get_all_speculative_retries.unset(r);

				    cf::get_key_cache_hit_rate.unset(r);

				    cf::get_true_snapshots_size.unset(r);

				    cf::get_all_true_snapshots_size.unset(r);

				    cf::get_row_cache_hit_out_of_range.unset(r);

				    cf::get_all_row_cache_hit_out_of_range.unset(r);

				    cf::get_row_cache_hit.unset(r);

				@@ -1169,6 +1245,13 @@ void unset_column_family(http_context& ctx, routes& r) {

				    cf::get_auto_compaction.unset(r);

				    cf::enable_auto_compaction.unset(r);

				    cf::disable_auto_compaction.unset(r);

				    ss::enable_auto_compaction.unset(r);

				    ss::disable_auto_compaction.unset(r);

				    cf::get_tombstone_gc.unset(r);

				    cf::enable_tombstone_gc.unset(r);

				    cf::disable_tombstone_gc.unset(r);

				    ss::enable_tombstone_gc.unset(r);

				    ss::disable_tombstone_gc.unset(r);

				    cf::get_built_indexes.unset(r);

				    cf::get_compression_metadata_off_heap_memory_used.unset(r);

				    cf::get_compression_parameters.unset(r);

									
										5

api/column_family.hh
									
												View File
												
				@@ -8,11 +8,10 @@

				#pragma once

				#include "api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include "replica/database.hh"

				#include <seastar/core/future-util.hh>

				#include <seastar/json/json_elements.hh>

				#include <any>

				#include "api/api_init.hh"

				namespace db {

				class system_keyspace;

									
										6

api/commitlog.cc
									
												View File
												
				@@ -9,6 +9,7 @@

				#include "commitlog.hh"

				#include "db/commitlog/commitlog.hh"

				#include "api/api-doc/commitlog.json.hh"

				#include "api/api_init.hh"

				#include "replica/database.hh"

				#include <vector>

				@@ -16,7 +17,7 @@ namespace api {

				using namespace seastar::httpd;

				template<typename T>

				static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {

				static auto acquire_cl_metric(http_context& ctx, std::function<T (const db::commitlog*)> func) {

				    typedef T ret_type;

				    return ctx.db.map_reduce0([func = std::move(func)](replica::database& db) {

				@@ -62,6 +63,9 @@ void set_commitlog(http_context& ctx, routes& r) {

				    httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_max_disk_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::disk_limit, std::placeholders::_1));

				    });

				}

				}

									
										8

api/commitlog.hh
									
												View File
												
				@@ -8,10 +8,12 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_commitlog(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_commitlog(http_context& ctx, seastar::httpd::routes& r);

				}

									
										9

api/compaction_manager.cc
									
												View File
												
				@@ -11,6 +11,7 @@

				#include "compaction_manager.hh"

				#include "compaction/compaction_manager.hh"

				#include "api/api.hh"

				#include "api/api-doc/compaction_manager.json.hh"

				#include "db/system_keyspace.hh"

				#include "column_family.hh"

				@@ -51,7 +52,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				            for (const auto& c : cm.get_compactions()) {

				                cm::summary s;

				                s.id = c.compaction_uuid.to_sstring();

				                s.id = fmt::to_string(c.compaction_uuid);

				                s.ks = c.ks_name;

				                s.cf = c.cf_name;

				                s.unit = "keys";

				@@ -116,9 +117,9 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				            table_names = map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());

				        }

				        auto type = req->get_query_param("type");

				        co_await ctx.db.invoke_on_all([&ks_name, &table_names, type] (replica::database& db) {

				        co_await ctx.db.invoke_on_all([&] (replica::database& db) {

				            auto& cm = db.get_compaction_manager();

				            return parallel_for_each(table_names, [&db, &cm, &ks_name, type] (sstring& table_name) {

				            return parallel_for_each(table_names, [&] (sstring& table_name) {

				                auto& t = db.find_column_family(ks_name, table_name);

				                return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {

				                    return cm.stop_compaction(type, &ts);

				@@ -161,7 +162,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				                co_await s.write("[");

				                co_await ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable -> future<> {

				                        cm::history h;

				                        h.id = entry.id.to_sstring();

				                        h.id = fmt::to_string(entry.id);

				                        h.ks = std::move(entry.ks);

				                        h.cf = std::move(entry.cf);

				                        h.compacted_at = entry.compacted_at;

									
										8

api/compaction_manager.hh
									
												View File
												
				@@ -8,10 +8,12 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_compaction_manager(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r);

				}

									
										113

api/config.cc
									
												View File
												
				@@ -6,14 +6,21 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "api/api.hh"

				#include "api/config.hh"

				#include "api/api-doc/config.json.hh"

				#include "api/api-doc/storage_proxy.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "replica/database.hh"

				#include "db/config.hh"

				#include <sstream>

				#include <boost/algorithm/string/replace.hpp>

				#include <seastar/http/exception.hh>

				namespace api {

				using namespace seastar::httpd;

				namespace sp = httpd::storage_proxy_json;

				namespace ss = httpd::storage_service_json;

				template<class T>

				json::json_return_type get_json_return_type(const T& val) {

				@@ -100,6 +107,112 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx

				        }

				        throw bad_param_exception(sstring("No such config entry: ") + id);

				    });

				    sp::get_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.request_timeout_in_ms()/1000.0;

				    });

				    sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_read_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.read_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_write_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_counter_write_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.counter_write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_cas_contention_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.cas_contention_timeout_in_ms()/1000.0;

				    });

				    sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_range_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.range_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_truncate_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.truncate_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    ss::get_all_data_file_locations.set(r, [&cfg](const_req req) {

				        return container_to_vec(cfg.data_file_directories());

				    });

				    ss::get_saved_caches_location.set(r, [&cfg](const_req req) {

				        return cfg.saved_caches_directory();

				    });

				}

				void unset_config(http_context& ctx, routes& r) {

				    cs::find_config_id.unset(r);

				    sp::get_rpc_timeout.unset(r);

				    sp::set_rpc_timeout.unset(r);

				    sp::get_read_rpc_timeout.unset(r);

				    sp::set_read_rpc_timeout.unset(r);

				    sp::get_write_rpc_timeout.unset(r);

				    sp::set_write_rpc_timeout.unset(r);

				    sp::get_counter_write_rpc_timeout.unset(r);

				    sp::set_counter_write_rpc_timeout.unset(r);

				    sp::get_cas_contention_timeout.unset(r);

				    sp::set_cas_contention_timeout.unset(r);

				    sp::get_range_rpc_timeout.unset(r);

				    sp::set_range_rpc_timeout.unset(r);

				    sp::get_truncate_rpc_timeout.unset(r);

				    sp::set_truncate_rpc_timeout.unset(r);

				    ss::get_all_data_file_locations.unset(r);

				    ss::get_saved_caches_location.unset(r);

				}

				}

									
										3

api/config.hh
									
												View File
												
				@@ -8,10 +8,11 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				#include <seastar/http/api_docs.hh>

				namespace api {

				void set_config(std::shared_ptr<httpd::api_registry_builder20> rb, http_context& ctx, httpd::routes& r, const db::config& cfg, bool first = false);

				void unset_config(http_context& ctx, httpd::routes& r);

				}

									
										70

api/cql_server_test.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,70 @@

				/*

				 * Copyright (C) 2024-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				#include <seastar/core/coroutine.hh>

				#include <boost/range/algorithm/transform.hpp>

				#include "api/api-doc/cql_server_test.json.hh"

				#include "cql_server_test.hh"

				#include "transport/controller.hh"

				#include "transport/server.hh"

				#include "service/qos/qos_common.hh"

				namespace api {

				namespace cst = httpd::cql_server_test_json;

				using namespace json;

				using namespace seastar::httpd;

				struct connection_sl_params : public json::json_base {

				    json::json_element<sstring> _role_name;

				    json::json_element<sstring> _workload_type;

				    json::json_element<sstring> _timeout;

				    connection_sl_params(const sstring& role_name, const sstring& workload_type, const sstring& timeout) {

				        _role_name = role_name;

				        _workload_type = workload_type;

				        _timeout = timeout;

				        register_params();

				    }

				    connection_sl_params(const connection_sl_params& params)

				        : connection_sl_params(params._role_name(), params._workload_type(), params._timeout()) {}

				    void register_params() {

				        add(&_role_name, "role_name");

				        add(&_workload_type, "workload_type");

				        add(&_timeout, "timeout");

				    }    

				};

				void set_cql_server_test(http_context& ctx, seastar::httpd::routes& r, cql_transport::controller& ctl) {

				    cst::connections_params.set(r, [&ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto sl_params = co_await ctl.get_connections_service_level_params();

				        std::vector<connection_sl_params> result;

				        boost::transform(std::move(sl_params), std::back_inserter(result), [] (const cql_transport::connection_service_level_params& params) {

				            auto nanos = std::chrono::duration_cast<std::chrono::nanoseconds>(params.timeout_config.read_timeout).count();

				            return connection_sl_params(

				                    std::move(params.role_name), 

				                    sstring(qos::service_level_options::to_string(params.workload_type)), 

				                    to_string(cql_duration(months_counter{0}, days_counter{0}, nanoseconds_counter{nanos})));

				        });

				        co_return result;

				    });

				}

				void unset_cql_server_test(http_context& ctx, seastar::httpd::routes& r) {

				    cst::connections_params.unset(r);

				}

				}

				#endif

									
										29

api/cql_server_test.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,29 @@

				/*

				 * Copyright (C) 2024-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				#pragma once

				namespace cql_transport {

				class controller;

				}

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				struct http_context;

				void set_cql_server_test(http_context& ctx, seastar::httpd::routes& r, cql_transport::controller& ctl);

				void unset_cql_server_test(http_context& ctx, seastar::httpd::routes& r);

				}

				#endif

									
										32

api/endpoint_snitch.cc
									
												View File
												
				@@ -6,45 +6,15 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "locator/token_metadata.hh"

				#include "locator/snitch_base.hh"

				#include "locator/production_snitch_base.hh"

				#include "endpoint_snitch.hh"

				#include "api/api-doc/endpoint_snitch_info.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "utils/fb_utilities.hh"

				namespace api {

				using namespace seastar::httpd;

				void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>& snitch) {

				    static auto host_or_broadcast = [](const_req req) {

				        auto host = req.get_query_param("host");

				        return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);

				    };

				    httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&ctx](const_req req) {

				        auto& topology = ctx.shared_token_metadata.local().get()->get_topology();

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.dc;

				        }

				        return topology.get_datacenter(ep);

				    });

				    httpd::endpoint_snitch_info_json::get_rack.set(r, [&ctx](const_req req) {

				        auto& topology = ctx.shared_token_metadata.local().get()->get_topology();

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.rack;

				        }

				        return topology.get_rack(ep);

				    });

				    httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [&snitch] (const_req req) {

				        return snitch.local()->get_name();

				    });

				@@ -60,8 +30,6 @@ void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_p

				}

				void unset_endpoint_snitch(http_context& ctx, routes& r) {

				    httpd::endpoint_snitch_info_json::get_datacenter.unset(r);

				    httpd::endpoint_snitch_info_json::get_rack.unset(r);

				    httpd::endpoint_snitch_info_json::get_snitch_name.unset(r);

				    httpd::storage_service_json::update_snitch.unset(r);

				}

									
										2

api/endpoint_snitch.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace locator {

				class snitch_ptr;

									
										30

api/error_injection.cc
									
												View File
												
				@@ -7,10 +7,8 @@

				 */

				#include "api/api-doc/error_injection.json.hh"

				#include "api/api.hh"

				#include "api/api_init.hh"

				#include <seastar/http/exception.hh>

				#include "log.hh"

				#include "utils/error_injection.hh"

				#include "utils/rjson.hh"

				#include <seastar/core/future-util.hh>

				@@ -64,6 +62,32 @@ void set_error_injection(http_context& ctx, routes& r) {

				        });

				    });

				    hf::read_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {

				        const sstring injection = req->get_path_param("injection");

				        std::vector<error_injection_json::error_injection_info> error_injection_infos(smp::count, error_injection_json::error_injection_info{});

				        co_await smp::invoke_on_all([&] {

				            auto& info = error_injection_infos[this_shard_id()];

				            auto& errinj = utils::get_local_injector();

				            const auto enabled = errinj.is_enabled(injection);

				            info.enabled = enabled;

				            if (!enabled) {

				                return;

				            }

				            std::vector<error_injection_json::mapper> parameters;

				            for (const auto& p : errinj.get_injection_parameters(injection)) {

				                error_injection_json::mapper param;

				                param.key = p.first;

				                param.value = p.second;

				                parameters.push_back(std::move(param));

				            }

				            info.parameters = std::move(parameters);

				        });

				        co_return json::json_return_type(error_injection_infos);

				    });

				    hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {

				        auto& errinj = utils::get_local_injector();

				        return errinj.disable_on_all().then([] {

									
										2

api/error_injection.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace api {

									
										14

api/failure_detector.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include "failure_detector.hh"

				#include "api/api.hh"

				#include "api/api-doc/failure_detector.json.hh"

				#include "gms/application_state.hh"

				#include "gms/gossiper.hh"

				@@ -65,7 +66,7 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				        return g.container().invoke_on(0, [] (gms::gossiper& g) {

				            std::map<sstring, sstring> nodes_status;

				            g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {

				                nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");

				                nodes_status.emplace(fmt::to_string(node), g.is_alive(node) ? "UP" : "DOWN");

				            });

				            return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));

				        });

				@@ -98,5 +99,16 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				}

				void unset_failure_detector(http_context& ctx, routes& r) {

				    fd::get_all_endpoint_states.unset(r);

				    fd::get_up_endpoint_count.unset(r);

				    fd::get_down_endpoint_count.unset(r);

				    fd::get_phi_convict_threshold.unset(r);

				    fd::get_simple_states.unset(r);

				    fd::set_phi_convict_threshold.unset(r);

				    fd::get_endpoint_state.unset(r);

				    fd::get_endpoint_phi_values.unset(r);

				}

				}

									
										3

api/failure_detector.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api_init.hh"

				namespace gms {

				@@ -19,5 +19,6 @@ class gossiper;

				namespace api {

				void set_failure_detector(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				void unset_failure_detector(http_context& ctx, httpd::routes& r);

				}

									
										11

api/gossiper.cc
									
												View File
												
				@@ -12,6 +12,7 @@

				#include "api/api-doc/gossiper.json.hh"

				#include "gms/endpoint_state.hh"

				#include "gms/gossiper.hh"

				#include "api/api.hh"

				namespace api {

				using namespace seastar::httpd;

				@@ -70,4 +71,14 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				}

				void unset_gossiper(http_context& ctx, routes& r) {

				    httpd::gossiper_json::get_down_endpoint.unset(r);

				    httpd::gossiper_json::get_live_endpoint.unset(r);

				    httpd::gossiper_json::get_endpoint_downtime.unset(r);

				    httpd::gossiper_json::get_current_generation_number.unset(r);

				    httpd::gossiper_json::get_current_heart_beat_version.unset(r);

				    httpd::gossiper_json::assassinate_endpoint.unset(r);

				    httpd::gossiper_json::force_remove_endpoint.unset(r);

				}

				}

									
										3

api/gossiper.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace gms {

				@@ -19,5 +19,6 @@ class gossiper;

				namespace api {

				void set_gossiper(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				void unset_gossiper(http_context& ctx, httpd::routes& r);

				}

									
										2

api/hinted_handoff.cc
									
												View File
												
				@@ -6,10 +6,10 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <algorithm>

				#include <vector>

				#include "hinted_handoff.hh"

				#include "api/api.hh"

				#include "api/api-doc/hinted_handoff.json.hh"

				#include "gms/inet_address.hh"

									
										2

api/hinted_handoff.hh
									
												View File
												
				@@ -9,7 +9,7 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				#include "api/api_init.hh"

				namespace service { class storage_proxy; }

									
										6

api/lsa.cc
									
												View File
												
				@@ -8,12 +8,10 @@

				#include "api/api-doc/lsa.json.hh"

				#include "api/lsa.hh"

				#include "api/api.hh"

				#include <seastar/http/exception.hh>

				#include "utils/logalloc.hh"

				#include "log.hh"

				#include "replica/database.hh"

				namespace api {

				using namespace seastar::httpd;

				@@ -21,9 +19,9 @@ using namespace seastar::httpd;

				static logging::logger alogger("lsa-api");

				void set_lsa(http_context& ctx, routes& r) {

				    httpd::lsa_json::lsa_compact.set(r, [&ctx](std::unique_ptr<request> req) {

				    httpd::lsa_json::lsa_compact.set(r, [](std::unique_ptr<request> req) {

				        alogger.info("Triggering compaction");

				        return ctx.db.invoke_on_all([] (replica::database&) {

				        return smp::invoke_on_all([] {

				            logalloc::shard_tracker().reclaim(std::numeric_limits<size_t>::max());

				        }).then([] {

				            return json::json_return_type(json::json_void());

									
										2

api/lsa.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace api {

									
										17

api/messaging_service.cc
									
												View File
												
				@@ -10,8 +10,8 @@

				#include "message/messaging_service.hh"

				#include <seastar/rpc/rpc_types.hh>

				#include "api/api-doc/messaging_service.json.hh"

				#include <iostream>

				#include <sstream>

				#include "api/api-doc/error_injection.json.hh"

				#include "api/api.hh"

				using namespace seastar::httpd;

				using namespace httpd::messaging_service_json;

				@@ -19,6 +19,8 @@ using namespace netw;

				namespace api {

				namespace hf = httpd::error_injection_json;

				using shard_info = messaging_service::shard_info;

				using msg_addr = messaging_service::msg_addr;

				@@ -112,7 +114,7 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging

				    }));

				    get_version.set(r, [&ms](const_req req) {

				        return ms.local().get_raw_version(req.get_query_param("addr"));

				        return ms.local().get_raw_version(gms::inet_address(req.get_query_param("addr")));

				    });

				    get_dropped_messages_by_ver.set(r, [&ms](std::unique_ptr<request> req) {

				@@ -142,6 +144,14 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    hf::inject_disconnect.set(r, [&ms] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				        auto ip = msg_addr(req->get_path_param("ip"));

				        co_await ms.invoke_on_all([ip] (netw::messaging_service& ms) {

				            ms.remove_rpc_client(ip);

				        });

				        co_return json::json_void();

				    });

				}

				void unset_messaging_service(http_context& ctx, routes& r) {

				@@ -155,6 +165,7 @@ void unset_messaging_service(http_context& ctx, routes& r) {

				    get_respond_completed_messages.unset(r);

				    get_version.unset(r);

				    get_dropped_messages_by_ver.unset(r);

				    hf::inject_disconnect.unset(r);

				}

				}

Compare commits

5496 Commits branch-5.4 ... next-6.2

225 .clang-format Normal file Unescape Escape View File

1 .gitattributes vendored Unescape Escape View File

56 .github/CODEOWNERS vendored Unescape Escape View File

20 .github/clang-include-cleaner.json vendored Normal file Unescape Escape View File

18 .github/clang-matcher.json vendored Normal file Unescape Escape View File

92 .github/mergify.yml vendored Normal file Unescape Escape View File

1 .github/pull_request_template.md vendored Normal file Unescape Escape View File

186 .github/scripts/auto-backport.py vendored Executable file Unescape Escape View File

32 .github/scripts/label_promoted_commits.py vendored Unescape Escape View File

95 .github/scripts/sync_labels.py vendored Executable file Unescape Escape View File

51 .github/workflows/add-label-when-promoted.yaml vendored Unescape Escape View File

33 .github/workflows/backport-pr-fixes-validation.yaml vendored Normal file Unescape Escape View File

39 .github/workflows/build-scylla.yaml vendored Normal file Unescape Escape View File

66 .github/workflows/clang-nightly.yaml vendored Normal file Unescape Escape View File

64 .github/workflows/clang-tidy.yaml vendored Normal file Unescape Escape View File

17 .github/workflows/codespell.yaml vendored Normal file Unescape Escape View File

17 .github/workflows/docs-amplify-enhanced.yaml vendored Unescape Escape View File

9 .github/workflows/docs-pages.yaml vendored Unescape Escape View File

9 .github/workflows/docs-pr.yaml vendored Unescape Escape View File

80 .github/workflows/iwyu.yaml vendored Normal file Unescape Escape View File

27 .github/workflows/make-pr-ready-for-review.yaml vendored Normal file Unescape Escape View File

22 .github/workflows/pr-require-backport-label.yaml vendored Normal file Unescape Escape View File

23 .github/workflows/read-toolchain.yaml vendored Normal file Unescape Escape View File

35 .github/workflows/reproducible-build.yaml vendored Normal file Unescape Escape View File

50 .github/workflows/seastar.yaml vendored Normal file Unescape Escape View File

49 .github/workflows/sync-labels.yaml vendored Normal file Unescape Escape View File

6 .gitignore vendored Unescape Escape View File

6 .gitmodules vendored Unescape Escape View File

181 CMakeLists.txt Unescape Escape View File

21 HACKING.md Unescape Escape View File

16 README.md Unescape Escape View File

12 SCYLLA-VERSION-GEN Unescape Escape View File

1 abseil Submodule

6 alternator/CMakeLists.txt Unescape Escape View File

31 alternator/auth.cc Unescape Escape View File

6 alternator/auth.hh Unescape Escape View File

13 alternator/conditions.cc Unescape Escape View File

2 alternator/conditions.hh Unescape Escape View File

18 alternator/controller.cc Unescape Escape View File

3 alternator/controller.hh Unescape Escape View File

18 alternator/error.hh Unescape Escape View File

741 alternator/executor.cc View File

6 alternator/executor.hh Unescape Escape View File

52 alternator/expressions.cc Unescape Escape View File

38 alternator/expressions.hh Unescape Escape View File

5 alternator/expressions_types.hh Unescape Escape View File

8 alternator/rmw_operation.hh Unescape Escape View File

38 alternator/serialization.cc Unescape Escape View File

63 alternator/server.cc Unescape Escape View File

5 alternator/server.hh Unescape Escape View File

12 alternator/stats.cc Unescape Escape View File

17 alternator/stats.hh Unescape Escape View File

39 alternator/streams.cc Unescape Escape View File

123 alternator/ttl.cc Unescape Escape View File

14 api/CMakeLists.txt Unescape Escape View File

4 api/api-doc/collectd.json Unescape Escape View File

10 api/api-doc/column_family.json Unescape Escape View File

15 api/api-doc/commitlog.json Unescape Escape View File

26 api/api-doc/cql_server_test.json Normal file Unescape Escape View File

80 api/api-doc/error_injection.json Unescape Escape View File

4 api/api-doc/gossiper.json Unescape Escape View File

6 api/api-doc/metrics.def.json Unescape Escape View File

56 api/api-doc/raft.json Unescape Escape View File

526 api/api-doc/storage_service.json Unescape Escape View File

30 api/api-doc/system.json Unescape Escape View File

79 api/api-doc/task_manager.json Unescape Escape View File

230 api/api-doc/tasks.json Normal file Unescape Escape View File

2 api/api-doc/utils.json Unescape Escape View File

123 api/api.cc Unescape Escape View File

28 api/api.hh Unescape Escape View File

46 api/api_init.hh Unescape Escape View File

2 api/authorization_cache.cc Unescape Escape View File

15 api/authorization_cache.hh Unescape Escape View File

56 api/cache_service.cc Unescape Escape View File

8 api/cache_service.hh Unescape Escape View File

2 api/collectd.cc Unescape Escape View File

7 api/collectd.hh Unescape Escape View File

225 api/column_family.cc Unescape Escape View File

5496 Commits

branch-5.4 ... next-6.2

225

.clang-format Normal file

View File

1

.gitattributes vendored

View File

56

.github/CODEOWNERS vendored

View File

20

.github/clang-include-cleaner.json vendored Normal file

View File

18

.github/clang-matcher.json vendored Normal file

View File

92

.github/mergify.yml vendored Normal file

View File

1

.github/pull_request_template.md vendored Normal file

View File

186

.github/scripts/auto-backport.py vendored Executable file

View File

32

.github/scripts/label_promoted_commits.py vendored

View File

95

.github/scripts/sync_labels.py vendored Executable file

View File

51

.github/workflows/add-label-when-promoted.yaml vendored

View File

33

.github/workflows/backport-pr-fixes-validation.yaml vendored Normal file

View File

39

.github/workflows/build-scylla.yaml vendored Normal file

View File

66

.github/workflows/clang-nightly.yaml vendored Normal file

View File

64

.github/workflows/clang-tidy.yaml vendored Normal file

View File

17

.github/workflows/codespell.yaml vendored Normal file

View File

17

.github/workflows/docs-amplify-enhanced.yaml vendored

View File

9

.github/workflows/docs-pages.yaml vendored

View File

9

.github/workflows/docs-pr.yaml vendored

View File

80

.github/workflows/iwyu.yaml vendored Normal file

View File

27

.github/workflows/make-pr-ready-for-review.yaml vendored Normal file

View File

22

.github/workflows/pr-require-backport-label.yaml vendored Normal file

View File

23

.github/workflows/read-toolchain.yaml vendored Normal file

View File

35

.github/workflows/reproducible-build.yaml vendored Normal file

View File

50

.github/workflows/seastar.yaml vendored Normal file

View File

49

.github/workflows/sync-labels.yaml vendored Normal file

View File

6

.gitignore vendored

View File

6

.gitmodules vendored

View File

181

CMakeLists.txt

View File

21

HACKING.md

View File

16

README.md

View File

12

SCYLLA-VERSION-GEN

View File

1

abseil Submodule

6

alternator/CMakeLists.txt

View File

31

alternator/auth.cc

View File

6

alternator/auth.hh

View File

13

alternator/conditions.cc

View File

2

alternator/conditions.hh

View File

18

alternator/controller.cc

View File

3

alternator/controller.hh

View File

18

alternator/error.hh

View File

741

alternator/executor.cc

View File

6

alternator/executor.hh

View File

52

alternator/expressions.cc

View File

38

alternator/expressions.hh

View File

5

alternator/expressions_types.hh

View File

8

alternator/rmw_operation.hh

View File

38

alternator/serialization.cc

View File

63

alternator/server.cc

View File

5

alternator/server.hh

View File

12

alternator/stats.cc

View File

17

alternator/stats.hh

View File

39

alternator/streams.cc

View File

123

alternator/ttl.cc

View File

14

api/CMakeLists.txt

View File

4

api/api-doc/collectd.json

View File

10

api/api-doc/column_family.json

View File

15

api/api-doc/commitlog.json

View File

26

api/api-doc/cql_server_test.json Normal file

View File

80

api/api-doc/error_injection.json

View File

4

api/api-doc/gossiper.json

View File

6

api/api-doc/metrics.def.json

View File

56

api/api-doc/raft.json

View File

526

api/api-doc/storage_service.json

View File

30

api/api-doc/system.json

View File

79

api/api-doc/task_manager.json

View File

230

api/api-doc/tasks.json Normal file

View File

2

api/api-doc/utils.json

View File

123

api/api.cc

View File

28

api/api.hh

View File

46

api/api_init.hh

View File

2

api/authorization_cache.cc

View File

15

api/authorization_cache.hh

View File

56

api/cache_service.cc

View File

8

api/cache_service.hh

View File

2

api/collectd.cc

View File

7

api/collectd.hh

View File

225

api/column_family.cc

View File

5

api/column_family.hh

View File