scylladb

Author	SHA1	Message	Date
Kefu Chai	3146a09638	dist: systemd: use default KillMode before this change, we specify the KillMode of the scylla-service service unit explicitly to "process". according to according to https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html, > If set to process, only the main process itself is killed (not recommended!). and the document suggests use "control-group" over "process". but scylla server is not a multi-process server, it is a multi-threaded server. so it should not make any difference even if we switch to the recommended "control-group". in the light that we've been seeing "defunct" scylla process after stopping the scylla service using systemd. we are wondering if we should try to change the `KillMode` to "control-group", which is the default value of this setting. in this change, we just drop the setting so that the systemd stops the service by stopping all processes in the control group of this unit are stopped. Fixes scylladb/scylladb#21507 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21508 (cherry picked from commit `961a53f716`) Closes scylladb/scylladb#23177	2025-04-04 17:56:15 +03:00
Botond Dénes	b567d60624	Update seastar submodule * seastar 882ed7ac...af8ae075 (1): > util/backtrace: Optimize formatter to reduce memory allocation overhead	2025-02-11 10:27:00 +02:00
Jenkins Promoter	50c4e91d4e	Update ScyllaDB version to: 6.1.6	2025-02-10 11:56:06 +02:00
Botond Dénes	7b80816721	service: query_pager: fix last-position for filtering queries On short-pages, cut short because of a tombstone prefix. When page-results are filtered and the filter drops some rows, the last-position is taken from the page visitor, which does the filtering. This means that last partition and row position will be that of the last row the filter saw. This will not match the last position of the replica, when the replica cut the page due to tombstones. When fetching the next page, this means that all the tombstone suffix of the last page, will be re-fetched. Worse still: the last position of the next page will not match that of the saved reader left on the replica, so the saved reader will be dropped and a new one created from scratch. This wasted work will show up as elevated tail latencies. Fix by always taking the last position from raw query results. Fixes: #22620 Closes scylladb/scylladb#22622 (cherry picked from commit `7ce932ce01`) Closes scylladb/scylladb#22717	2025-02-06 13:30:18 +02:00
Avi Kivity	acd5bd924f	Update seastar submodule (hwloc failure on some AWS instances) * seastar 908ccd936a...882ed7ac3c (1): > resource: fallback to sysconf when failed to detect memory size from hwloc Fixes #22382.	2025-02-04 16:32:05 +02:00
Michael Litvak	3337ca35e4	view_builder: fix loop in view builder when tokens are moved The view builder builds a view by going over the entire token ring, consuming the base table partitions, and generating view updates for each partition. A view is considered as built when we complete a full cycle of the token ring. Suppose we start to build a view at a token F. We will consume all partitions with tokens starting at F until the maximum token, then go back to the minimum token and consume all partitions until F, and then we detect that we pass F and complete building the view. This happens in the view builder consumer in `check_for_built_views`. The problem is that we check if we pass the first token F with the condition `_step.current_token() >= it->first_token` whenever we consume a new partition or the current_token goes back to the minimum token. But suppose that we don't have any partitions with a token greater than or equal to the first token (this could happen if the partition with token F was moved to another node for example), then this condition will never be satisfied, and we don't detect correctly when we pass F. Instead, we go back to the minimum token, building the same token ranges again, in a possibly infinite loop. To fix this we add another step when reaching the end of the reader's stream. When this happens it means we don't have any more fragments to consume until the end of the range, so we advance the current_token to the end of the range, simulating a partition, and check for built views in that range. Fixes scylladb/scylladb#21829 Closes scylladb/scylladb#22493 (cherry picked from commit `6d34125eb7`) Closes scylladb/scylladb#22605	2025-02-03 19:22:01 +01:00
Avi Kivity	b94b6bce4b	seatar: point submodule at scylla-seastar.git This allows backporting commits to seastar.	2025-01-31 19:49:58 +02:00
Aleksandra Martyniuk	003e3f212e	repair: add repair_service gate In main.cc storage_service is started before and stopped after repair_service. storage_service keeps a reference to sharded repair_service and calls its methods, but nothing ensures that repair_service's local instance would be alive for the whole execution of the method. Add a gate to repair_service and enter it in storage_service before executing methods on local instances of repair_service. Fixes: #21964. Closes scylladb/scylladb#22145 (cherry picked from commit `32ab58cdea`) Closes scylladb/scylladb#22317	2025-01-30 11:40:22 +02:00
Aleksandra Martyniuk	9c035f810f	repair: check tasks local to given shard Currently task_manager_module::is_aborted checks the tasks local to caller's shard on a given shard. Fix the method to check the task map local to the given shard. Fixes: #22156. Closes scylladb/scylladb#22161 (cherry picked from commit `a91e03710a`) Closes scylladb/scylladb#22196	2025-01-30 07:46:40 +02:00
Botond Dénes	753c603f40	tools/scylla-sstable: dump-statistics: fix handling of {min,max}_column_names Said fields in statistics are of type `disk_array<uint32_t, disk_string<uint16_t>>` and currently are handled as array of regular strings. However these fields store exploded clustering keys, so the elements store binary data and converting to string can yield invalid UTF-8 characters that certain JSON parsers (jq, or python's json) can choke on. Fix this by treating them as binary and using `to_hex()` to convert them to string. This requires some massaging of the json_dumper: passing field offset to all visit() methods and using a caller-provided disk-string to sstring converter to convert disk strings to sstring, so in the case of statistics, these fields can be intercepted and properly handled. While at it, the type of these fields is also fixed in the documentation. Before: "min_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "�2y\u0000�}\u007f" ], "max_column_names": [ "��Z��\u0011�\u0012ŷ4^��<", "}��B\u0019l%^" ], After: "min_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "80327900e2827d7f" ], "max_column_names": [ "9dd55a92bc8811ef12c5b7345eadf73c", "7df79242196c255e" ], Fixes: #22078 Closes scylladb/scylladb#22225 (cherry picked from commit `f899f0e411`) Closes scylladb/scylladb#22295	2025-01-29 20:27:31 +02:00
Botond Dénes	a1ab46d54d	replica: remove noexcept from token -> tablet resolution path The methods to resolve a key/token/range to a table are all noexcept. Yet the method below all of these, `storage_group_for_id()` can throw. This means that if due to any mistake a tablet without local replica is attempted to be looked up, it will result in a crash, as the exception bubbles up into the noexcept methods. There is no value in pretending that looking up the tablet replica is noexcept, remove the noexcept specifiers so that any bad lookup only fails the operation at hand and doesn't crash the node. This is especially relevant to replace, which still has a window where writes can arrive for tablets that don't (yet) have a local replica. Currently, this results in a crash. After this patch, this will only fail the writes and the replace can move on. Fixes: #21480 Closes scylladb/scylladb#22251 (cherry picked from commit `55963f8f79`) Closes scylladb/scylladb#22378	2025-01-29 20:26:23 +02:00
Kefu Chai	4f8d2f48af	compress: fix compressor initialization order by making namespace_prefix a function Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be initialized before compressor::namespace_prefix due to undefined global variable initialization order across translation units. This was causing ZstdCompressor to be unregistered in release builds, making it impossible to create tables with Zstd compression. Replace the global namespace_prefix variable with a function that returns the fully qualified compressor name. This ensures proper initialization order and fixes the registration of the ZstdCompressor. Fixes scylladb/scylladb#22444 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22451 (cherry picked from commit `4a268362b9`) Closes scylladb/scylladb#22509	2025-01-29 20:25:53 +02:00
Avi Kivity	6d7ce6890a	Merge '[Backport 6.1] repair: handle no_such_keyspace in repair preparation phase' from null Currently, data sync repair handles most no_such_keyspace exceptions, but it omits the preparation phase, where the exception could be thrown during make_global_effective_replication_map. Skip the keyspace repair if no_such_keyspace is thrown during preparations. Fixes: #22073. Requires backport to 6.1 and 6.2 as they contain the bug - (cherry picked from commit `bfb1704afa`) - (cherry picked from commit `54e7f2819c`) Parent PR: #22473 Closes scylladb/scylladb#22540 * github.com:scylladb/scylladb: test: add test to check if repair handles no_such_keyspace repair: handle keyspace dropped	2025-01-29 19:53:06 +02:00
Michael Litvak	b395d35c6c	cdc: fix handling of new generation during raft upgrade During raft upgrade, a node may gossip about a new CDC generation that was propagated through raft. The node that receives the generation by gossip may have not applied the raft update yet, and it will not find the generation in the system tables. We should consider this error non-fatal and retry to read until it succeeds or becomes obsolete. Another issue is when we fail with a "fatal" exception and not retrying to read, the cdc metadata is left in an inconsistent state that causes further attempts to insert this CDC generation to fail. What happens is we complete preparing the new generation by calling `prepare`, we insert an empty entry for the generation's timestamp, and then we fail. The next time we try to insert the generation, we skip inserting it because we see that it already has an entry in the metadata and we determine that there's nothing to do. But this is wrong, because the entry is empty, and we should continue to insert the generation. To fix it, we change `prepare` to return `true` when the entry already exists but it's empty, indicating we should continue to insert the generation. Fixes scylladb/scylladb#21227 Closes scylladb/scylladb#22093 (cherry picked from commit `4f5550d7f2`) Closes scylladb/scylladb#22544	2025-01-29 19:50:30 +02:00
Aleksandra Martyniuk	044841ef9c	test: add test to check if repair handles no_such_keyspace (cherry picked from commit `54e7f2819c`)	2025-01-28 21:49:47 +00:00
Aleksandra Martyniuk	8fbfabaac4	repair: handle keyspace dropped Currently, data sync repair handles most no_such_keyspace exceptions, but it omits the preparation phase, where the exception could be thrown during make_global_effective_replication_map. Skip the keyspace repair if no_such_keyspace is thrown during preparations. (cherry picked from commit `bfb1704afa`)	2025-01-28 21:49:46 +00:00
Kamil Braun	7bdccd8b49	Merge '[Backport 6.1] raft: Handle non-critical config update errors in when changing voter status.' from Sergey Z When a node is bootstrapped and joined a cluster as a non-voter and changes it's role to a voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried. Fixes scylladb/scylladb#20814 Backport: This issue occurs frequently and disrupts the CI workflow to some extent. Backports are needed for versions 6.1 and 6.2. - (cherry picked from commit `775411ac56`) - (cherry picked from commit `16053a86f0`) - (cherry picked from commit `8c48f7ad62`) - (cherry picked from commit `3da4848810`) - (cherry picked from commit `228a66d030`) Parent PR: #22253 Closes scylladb/scylladb#22357 * github.com:scylladb/scylladb: raft: refactor `remove_from_raft_config` to use a timed `modify_config` call. raft: Refactor functions using `modify_config` to use a common wrapper for retrying. raft: Handle non-critical config update errors in when changing status to voter. test: Add test to check that a node does not fail on unknown commit status error when starting up. raft: Add run_op_with_retry in raft_group0.	2025-01-24 17:07:03 +01:00
Sergey Zolotukhin	7f75a5c7d8	raft: refactor `remove_from_raft_config` to use a timed `modify_config` call. To avoid potential hangs during the `remove_from_raft_config` operation, use a timed `modify_config` call. This ensures the operation doesn't get stuck indefinitely. (cherry picked from commit `228a66d030`)	2025-01-22 09:41:29 +01:00
Sergey Zolotukhin	dfc8559bea	raft: Refactor functions using `modify_config` to use a common wrapper for retrying. There are several places in `raft_group0` where almost identical code is used for retrying `modify_config` in case of `commit_status_unknown` error. To avoid code duplication all these places were changed to use a new wrapper `run_op_with_retry`. (cherry picked from commit `3da4848810`)	2025-01-22 09:41:26 +01:00
Kefu Chai	56644f1a22	docs: fix monospace formatting for `rm` command Add missing space before `rm` to ensure proper rendering in monospace font within documentation. Fixes scylladb/scylladb#22255 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21576 (cherry picked from commit `6955b8238e`) Closes scylladb/scylladb#22256	2025-01-20 11:27:42 +02:00
Michael Litvak	c847806182	view_builder: write status to tables before starting to build When adding a new view for building, first write the status to the system tables and then add the view building step that will start building it. Otherwise, if we start building it before the status is written to the table, it may happen that we complete building the view, write the SUCCESS status, and then overwrite it with the STARTED status. The view_build_status table will remain in incorrect state indicating the view building is not complete. Fixes scylladb/scylladb#20638 (cherry picked from commit `b1be2d3c41`) Closes scylladb/scylladb#22355	2025-01-19 18:22:07 +02:00
Sergey Zolotukhin	81169eda19	raft: Handle non-critical config update errors in when changing status to voter. When a node is bootstrapped and joins a cluster as a non-voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried. Fixes scylladb/scylladb#20814 (cherry picked from commit `8c48f7ad62`)	2025-01-16 20:09:04 +00:00
Sergey Zolotukhin	76be0f8a1e	test: Add test to check that a node does not fail on unknown commit status error when starting up. Test that a node is starting successfully if while joining a cluster and becoming a voter, it receives an unknown commit status error. Test for scylladb/scylladb#20814 (cherry picked from commit `16053a86f0`)	2025-01-16 20:09:04 +00:00
Sergey Zolotukhin	4c962bbc54	raft: Add run_op_with_retry in raft_group0. Since when calling `modify_config` it's quite often we need to do retries, to avoid code duplication, a function wrapper that allows a function to be called with automatic retries in case of failures was added. (cherry picked from commit `775411ac56`)	2025-01-16 20:09:03 +00:00
Kamil Braun	116857c7ed	Merge '[Backport 6.1] Fix possible data corruption due to token keys clashing in read repair.' from Sergey This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101 Since the issue affects all the relevant scylla versions, backport to: 6.1, 6.2 - (cherry picked from commit `e577f1d141`) - (cherry picked from commit `39785c6f4e`) - (cherry picked from commit `155480595f`) Parent PR: #21996 Closes scylladb/scylladb#22297 * github.com:scylladb/scylladb: storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. test: Add test case for checking read repair diff calculation when having conflicting keys.	2025-01-16 17:15:09 +01:00
Sergey Zolotukhin	8f8b8e902c	test: Include parent test name in `ScyllaClusterManager` log file names. Add the test file name to `ScyllaClusterManager` log file names alongside the test function name. This avoids race conditions when tests with the same function names are executed simultaneously. Fixes scylladb/scylladb#21807 Backport: not needed since this is a fix in the testing scripts. Closes scylladb/scylladb#22192 (cherry picked from commit `2f1731c551`) Closes scylladb/scylladb#22248	2025-01-14 16:34:15 +01:00
Sergey Zolotukhin	d5dd364b49	storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. The `data_read_resolver` class inherits from `abstract_read_resolver`, which already includes the `schema_ptr _schema` member. Therefore, using a separate function parameter in `data_read_resolver::resolve` initialized with the same variable in `abstract_read_executor` is redundant. (cherry picked from commit `1554805`)	2025-01-14 14:48:25 +01:00
Sergey Zolotukhin	e65d1a3665	storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101 (cherry picked from commit `39785c6`)	2025-01-14 11:25:49 +01:00
Sergey Zolotukhin	63d58022a6	test: Add test case for checking read repair diff calculation when having conflicting keys. The test updates two rows with keys that result in a Murmur3 hash collision, which is used to generate Scylla tokens. These tokens are involved in read repair diff calculations. Due to the identical token values, a hash map key collision occurs. Consequently, an incorrect value from the second row (with a different primary key) is then sent for writing as 'repaired', causing data corruption. (cherry picked from commit `e577f1d141`)	2025-01-13 22:05:06 +00:00
Kamil Braun	52a09a2f2d	Merge '[Backport 6.1] cache_algorithm_test: fix flaky failures' from Michał Chojnowski This series attempts to get read of flakiness in cache_algorithm_test by solving two problems. Problem 1: The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form: 0x0000000000000000000000000000000000000000... 0x0100000000000000000000000000000000000000... 0x0200000000000000000000000000000000000000... But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar... 0x0100000000000000garbagegarbagegarbagegar... 0x0200000000000000garbagegarbagegarbagegar... Each of these keys is created several times and -- for the test to pass -- the result must be the same each time. By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses. Problem 2: Cache stats are global, so there's no good way to reliably verify that e.g. a given read causes 0 cache misses, because something done by Scylla in a background can trigger a cache miss. This can cause the test to fail spuriously. With how the test framework and the cache are designed, there's probably no good way to test this properly. It would require ensuring that cache stats are per-read, or at least per-table, and that Scylla's background activity doesn't cause enough memory pressure to evict the tested rows. This patch tries to deal with the flakiness without deleting the test altogether by letting it retry after a failure if it notices that it can be explained by a read which wasn't done by the test. (Though, if the test can't be written well, maybe it just shouldn't be written...) Fixes scylladb/scylladb#21536 (cherry picked from commit `1fffd976a4`) (cherry picked from commit `6caaead4ac`) Parent PR: scylladb/scylladb#21948 Closes scylladb/scylladb#22227 * github.com:scylladb/scylladb: cache_algorithm_test: harden against stats being confused by background activity cache_algorithm_test: fix a use of an uninitialized variable	2025-01-09 14:30:54 +01:00
Anna Stuchlik	98dfb50c99	doc: add troubleshooting removal with --autoremove-ubuntu This commit adds a troubleshooting article on removing ScyllaDB with the --autoremove option. Fixes https://github.com/scylladb/scylladb/issues/21408 Closes scylladb/scylladb#21697 (cherry picked from commit `8d824a564f`) Closes scylladb/scylladb#22230	2025-01-08 13:11:28 +02:00
Yaron Kaikov	03a19d586e	.github/scripts/auto-backport.py: Add comment to PR when conflicts apply When we open a PR with conflicts, the PR owner gets a notification about the assignment but has no idea if this PR is with conflicts or not (in Scylla it's important since CI will not start on draft PR) Let's add a comment to notify the user we have conflicts Closes scylladb/scylladb#21939 (cherry picked from commit `2e6755ecca`) Closes scylladb/scylladb#22189	2025-01-08 13:11:00 +02:00
Botond Dénes	af2cb66cfc	Merge 'sstables_manager: do not reclaim unlinked sstables' from Lakshmi Narayanan Sreethar When an sstable is unlinked, it remains in the _active list of the sstable manager. Its memory might be reclaimed and later reloaded, causing issues since the sstable is already unlinked. This patch updates the on_unlink method to reclaim memory from the sstable upon unlinking, remove it from memory tracking, and thereby prevent the issues described above. Added a testcase to verify the fix. Fixes #21887 This is a bug fix in the bloom filter reload/reclaim mechanism and should be backported to older versions. Closes scylladb/scylladb#21895 * github.com:scylladb/scylladb: sstables_manager: reclaim memory from sstables on unlink sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable() sstables: introduce disable_component_memory_reload() sstables_manager: log sstable name when reclaiming components (cherry picked from commit `d4129ddaa6`) Closes scylladb/scylladb#21997	2025-01-08 13:10:30 +02:00
Michał Chojnowski	e10def2f2a	cache_algorithm_test: harden against stats being confused by background activity Cache stats are global, so there's no good way to reliably verify that e.g. a given read causes 0 cache misses, because something done by Scylla in a background can trigger a cache miss. This can cause the test to fail spuriously. With how the test framework and the cache are designed, there's probably no good way to test this properly. It would require ensuring that cache stats are per-read, or at least per-table, and that Scylla's background activity doesn't cause enough memory pressure to evict the tested rows. This patch tries to deal with the flakiness without deleting the test altogether by letting it retry after a failure if it notices that it can be explained by a read which wasn't done by the test. (Though, if the test can't be written well, maybe it just shouldn't be written...) (cherry picked from commit `6caaead4ac`)	2025-01-08 11:43:15 +01:00
Michał Chojnowski	f0f2749c5c	cache_algorithm_test: fix a use of an uninitialized variable The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form: 0x0000000000000000000000000000000000000000... 0x0100000000000000000000000000000000000000... 0x0200000000000000000000000000000000000000... But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar... 0x0100000000000000garbagegarbagegarbagegar... 0x0200000000000000garbagegarbagegarbagegar... Each of these keys is created several times and -- for the test to pass -- the result must be the same each time. By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses. (cherry picked from commit `1fffd976a4`)	2025-01-08 11:43:04 +01:00
Patryk Jędrzejczak	c5f28d8099	[Backport 6.1] raft: improve logs for abort while waiting for apply New logs allow us to easily distinguish two cases in which waiting for apply times out: - the node didn't receive the entry it was waiting for, - the node received the entry but didn't apply it in time. Distinguishing these cases simplifies reasoning about failures. The first case indicates that something went wrong on the leader. The second case indicates that something went wrong on the node on which waiting for apply timed out. As it turns out, many different bugs result in the `read_barrier` (which calls `wait_for_apply`) timeout. This change should help us in debugging bugs like these. We want to backport this change to all supported branches so that it helps us in all tests. Fixes scylladb/scylladb#22160 Closes scylladb/scylladb#22157	2025-01-07 17:03:17 +01:00
Kamil Braun	9618e9b0d3	Merge '[Backport 6.1] Do not reset quarantine list in non raft mode' from Gleb The series contains small fixes to the gossiper one of which fixes #21930. Others I noticed while debugged the issue. Fixes: #21930 - (cherry picked from commit `91cddcc17f`) Parent PR: #21956 Closes scylladb/scylladb#21990 * github.com:scylladb/scylladb: gossiper: do not reset _just_removed_endpoints in non raft mode gossiper: do not call apply for the node's old state	2025-01-07 17:00:35 +01:00
Abhinav	1e54ee19ce	Fix gossiper orphan node floating problem by adding a remover fiber In the current scenario, if during startup, a node crashes after initiating gossip and before joining group0, then it keeps floating in the gossiper forever because the raft based gossiper purging logic is only effective once node joins group0. This orphan node hinders the successor node from same ip to join cluster since it collides with it during gossiper shadow round. This commit intends to fix this issue by adding a background thread which periodically checks for such orphan entries in gossiper and removes them. A test is also added in to verify this logic. This test fails without this background thread enabled, hence verifying the behavior. Fixes: scylladb/scylladb#20082 Closes scylladb/scylladb#21600 (cherry picked from commit `6c90a25014`) Closes scylladb/scylladb#21821	2025-01-02 14:59:28 +01:00
Gleb Natapov	2163839c6d	gossiper: do not reset _just_removed_endpoints in non raft mode By the time the function is called during start it may already be populated. Fixes: scylladb/scylladb#21930 (cherry picked from commit `e318dfb83a`)	2024-12-25 11:47:26 +02:00
Gleb Natapov	951bfd4203	gossiper: do not call apply for the node's old state If a nodes changed its address an old state may be still in a gossiper, so ignore it. (cherry picked from commit `e80355d3a1`)	2024-12-23 11:46:47 +02:00
Piotr Dulikowski	e3b1216cba	Merge '[Backport 6.1] service_levels: increase timeout of internal queries and update cache on startup' from Michael Litvak Backport of two service level related fixes: service/qos/service_level_controller: update cache on startup Fixes scylladb/scylladb#21763 Parent PR: scylladb/scylladb#21773 service/qos: increase timeout of internal get_service_levels queries Fixes scylladb/scylladb#20483 Parent PR: scylladb/scylladb#21748 Closes scylladb/scylladb#21889 * github.com:scylladb/scylladb: service/qos/service_level_controller: update cache on startup service/qos: increase timeout of internal get_service_levels queries	2024-12-17 11:21:19 +01:00
Yaron Kaikov	dae61b51a8	github: check if PR is closed instead of merge In Scylla, we can have either `closed` or `merged` PRs. Based on that we decide when to start the backport process when the label was added after the PR is closed (or merged), In https://github.com/scylladb/scylladb/pull/21876 even when adding the proper backport label didn't trigger the backport automation. Https://github.com/scylladb/scylladb/pull/21809/ caused this, we should have left the `state=closed` (this includes both closed and merged PR) Fixing it Closes scylladb/scylladb#21906 (cherry picked from commit `b4b7617554`) Closes scylladb/scylladb#21921	2024-12-16 14:08:03 +02:00
Anna Stuchlik	8b3f5d277b	doc: remove wrong image upgrade info (5.2-to-2023.1) This commit removes the information about the recommended way of upgrading ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade procedure is not supported (it was implemented, but then reverted). Refs https://github.com/scylladb/scylladb/issues/15733 Closes scylladb/scylladb#21876 Fixes https://github.com/scylladb/scylla-enterprise/issues/5041 Fixes https://github.com/scylladb/scylladb/issues/21898 (cherry picked from commit `98860905d8`)	2024-12-12 15:23:30 +02:00
Michael Litvak	39186f76c7	service/qos/service_level_controller: update cache on startup Update the service level cache in the node startup sequence, after the service level and auth service are initialized. The cache update depends on the service level data accessor being set and the auth service being initialized. Before the commit, it may happen that a cache update is not triggered after the initialization. The commit adds an explicit call to update the cache where it is guaranteed to be ready. Fixes scylladb/scylladb#21763 Closes scylladb/scylladb#21773 (cherry picked from commit `373855b493`)	2024-12-11 16:55:38 +02:00
Michael Litvak	93e3e256c1	service/qos: increase timeout of internal get_service_levels queries The function get_service_levels is used to retrieve all service levels and it is called from multiple different contexts. Importantly, it is called internally from the context of group0 state reload, where it should be executed with a long timeout, similarly to other internal queries, because a failure of this function affects the entire group0 client, and a longer timeout can be tolerated. The function is also called in the context of the user command LIST SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is preferred. The commit introduces a function parameter to indicate whether the context is internal or not. For internal context, a long timeout is chosen for the query. Otherwise, the timeout is shorter, the same as before. When the distinction is not important, a default value is chosen which maintains the same behavior. The main purpose is to fix the case where the timeout is too short and causes a failure that propagates and fails the group0 client. Fixes scylladb/scylladb#20483 Closes scylladb/scylladb#21748 (cherry picked from commit `53224d90be`)	2024-12-11 15:23:53 +02:00
Tomasz Grabiec	6ce18dca32	Merge '[Backport 6.1] utils: cached_file: Mark permit as awaiting on page miss' from ScyllaDB Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522). But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or partition index IO involved because that IO will signal read concurrency semaphore to invite more concurrency. Fixes #21325 (cherry picked from commit `868f5b59c4`) (cherry picked from commit `0f2101b055`) Refs #21323 Closes scylladb/scylladb#21359 * github.com:scylladb/scylladb: utils: cached_file: Mark permit as awaiting on page miss utils: cached_file: Push resource_unit management down to cached_file	2024-12-09 22:32:01 +01:00
Tomasz Grabiec	86ebca4621	utils: cached_file: Mark permit as awaiting on page miss Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO. But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or index file IO involved because that IO will signal read concurrency semaphore to invite more concurrency. (cherry picked from commit `0f2101b055`)	2024-12-09 17:45:04 +01:00
Tomasz Grabiec	6e2f5c2bd9	utils: cached_file: Push resource_unit management down to cached_file It saves us permit operations on the hot path when we hit in cache. Also, it will lay the ground for marking the permit as awaiting later. (cherry picked from commit `868f5b59c4`)	2024-12-09 17:45:02 +01:00
Kefu Chai	b7dcf7420a	github: do not nest ${{}} inside condition In commit `2596d157`, we added a condition to run auto-backport.py only when the GitHub Action is triggered by a push to the default branch. However, this introduced an unexpected error due to incorrect condition handling. Problem: - `github.event.before` evaluates to an empty string - GitHub Actions' single-pass expression evaluation system causes the step to always execute, regardless of `github.event_name` Despite GitHub's documentation suggesting that ${{ }} can be omitted, it recommends using explicit ${{}} expressions for compound conditions. Changes: - Use explicit ${{}} expression for compound conditions - Avoid string interpolation in conditional statements Root Cause: The previous implementation failed because of how GitHub Actions evaluates conditional expressions, leading to an unintended script execution and a 404 error when attempting to compare commits. Example Error: ``` python .github/scripts/auto-backport.py --repo scylladb/scylladb --base-branch refs/heads/master --commits ..2b07d93beac7bc83d955dadc20ccc307f13f20b6 shell: /usr/bin/bash -e {0} env: DEFAULT_BRANCH: master GITHUB_TOKEN: *** Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 201, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 162, in main commits = repo.compare(start_commit, end_commit).commits File "/usr/lib/python3/dist-packages/github/Repository.py", line 888, in compare headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits", "status": "404"} ``` Fixes scylladb/scylladb#21808 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21809 (cherry picked from commit `e04aca7efe`) Closes scylladb/scylladb#21819	2024-12-06 16:35:30 +02:00
Avi Kivity	4be2d3d6c0	Merge 'compaction: update maintenance sstable set on scrub compaction completion' from Lakshmi Narayanan Sreethar Scrub compaction can pick up input sstables from maintenance sstable set but on compaction completion, it doesn't update the maintenance set leaving the original sstable in set after it has been scrubbed. To fix this, on compaction completion has to update the maintenance sstable if the input originated from there. This PR solves the issue by updating the correct sstable_sets on compaction completion. Fixes #20030 This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2. Closes scylladb/scylladb#21582 * github.com:scylladb/scylladb: compaction: remove unused `update_sstable_lists_on_off_strategy_completion` compaction_group: replace `update_sstable_lists_on_off_strategy_completion` compaction_group: rename `update_main_sstable_list_on_compaction_completion` compaction_group: update maintenance sstable set on scrub compaction completion compaction_group: store table::sstable_list_builder::result in replacement_desc table::sstable_list_builder: remove old sstables only from current list table::sstable_list_builder: return removed sstables from build_new_list (cherry picked from commit `58baeac0ad`) Closes scylladb/scylladb#21789	2024-12-06 10:37:23 +02:00
Michael Pedersen	2bbb0859e1	docs: correct the storage size for n2-highmem-32 to 9000GB updated storage size for n2-highmem-32 to 9000GB as this is default in SC Fixes scylladb/scylladb#21785 Closes scylladb/scylladb#21537 (cherry picked from commit `309f1606ae`) Closes scylladb/scylladb#21594	2024-12-05 09:51:51 +02:00
Avi Kivity	4c6ddcf6c1	Merge 'sstables: Fix use-after-free on page cache buffer when parsing promoted index entries across pages' from Tomasz Grabiec This fixes a use-after-free bug when parsing clustering key across pages. Also includes a fix for allocating section retry, which is potentially not safe (not in practice yet). Details of the first problem: Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In `93482439`, the promoted index cursor was optimized to avoid fully page copy when parsing index blocks. Instead, parser is given a temporary_buffer which is a view on the page. A bit earlier, in `b1b5bda`, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. Details of the solution: We adapt page_view to a temporary_buffer-like API. For this, a new concept is introduced called ContiguousSharedBuffer. We also change parsers so that they can be templated on the type of the buffer they work with (page_view vs temporary_buffer). This way we don't introduce indirection to existing algorithms. We use page_view instead of temporary_buffer in the promoted index parser which works with page cache buffers. page_view can be safely shared via share() and stored across allocating sections. It keeps hold to the LSA buffer even across allocating sections by the means of cached_file::page_ptr. Fixes #20766 Closes scylladb/scylladb#20837 * github.com:scylladb/scylladb: sstables: bsearch_clustered_cursor: Add trace-level logging sstables: bsearch_clustered_cursor: Move definitions out of line test, sstables: Verify parsing stability when allocating section is retried test, sstables: Verify parsing stability when buffers cross page boundary sstables: bsearch_clustered_cursor: Switch parsers to work with page_view cached_file: Adapt page_view to ContiguousSharedBuffer cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start sstables, utils: Allow parsers to work with different buffer types sstables: promoted_index_block_parser: Make reset() always bring parser to initial state sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried (cherry picked from commit `fb8743b2d6`) Closes scylladb/scylladb#20906	2024-12-05 09:50:07 +02:00
Tomasz Grabiec	159c1b0847	utils: UUID: Make get_time_UUID() respect the clock offset schema_change_test currently fails due to failure to start a cql test env in unit tests after the point where this is called (in one of the test cases): forward_jump_clocks(std::chrono::seconds(606024*31)); The problem manifests with a failure to join the cluster due to missing_column exception ("missing_column: done") being thrown from system_keyspace::get_topology_request_state(). It's a symptom of join request being missing in system.topology_requests. It's missing because the row is expired. When request is created, we insert the mutations with intended TTL of 1 month. The actual TTL value is computed like this: ttl_opt topology_request_tracking_mutation_builder::ttl() const { return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1) - std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch()); } _ts comes from the request_id, which is supposed to be a timeuuid set from current time when request starts. It's set using utils::UUID_gen::get_time_UUID(). It reads the system clock without adding the clock offset, so after forward_jump_clocks(), _ts and gc_clock::now() may be far off. In some cases the accumulated offset is larger than 1month and the ttl becomes negative, causing the request row to expire immediately and failing the boot sequence. The fix is to use db_clock, which respects offsets and is consistent with gc_clock. The test doesn't fail in CI becuase there each test case runs in a separate process, so there is no bootstrap attempt (by new cql test env) after forward_jump_clocks(). Closes scylladb/scylladb#21558 (cherry picked from commit `1d0c6aa26f`) Closes scylladb/scylladb#21583 Fixes #21581	2024-12-04 14:19:47 +01:00
Aleksandra Martyniuk	b487931396	repair: implement tablet_repair_task_impl::release_resources tablet_repair_task_impl keeps a vector of tablet_repair_task_meta, each of which keeps an effective_replication_map_ptr. So, after the task completes, the token metadata version will not change for task_ttl seconds. Implement tablet_repair_task_impl::release_resources method that clears tablet_repair_task_meta vector when the task finishes. Set task_ttl to 1h in test_tablet_repair to check whether the test won't time out. Fixes: #21503. Closes scylladb/scylladb#21504 (cherry picked from commit `572b005774`) Closes scylladb/scylladb#21621	2024-12-04 13:58:16 +02:00
Kefu Chai	cf71fd3977	test: topology_custom: ensure node visibility before keyspace creation Building upon commit `69b47694`, this change addresses a subtle synchronization weakness in node visibility checks during recovery mode testing. Previous Approach: - Waited only for the first node to see its peers - Insufficient to guarantee full cluster consistency Current Solution: 1. Implement comprehensive node visibility verification 2. Ensure all nodes mutually recognize each other 3. Prevent potential schema propagation race conditions Key Improvements: - Robust cluster state validation before keyspace creation - Eliminate partial visibility scenarios Fixes scylladb/scylladb#21724 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21726 (cherry picked from commit `65949ce607`) Closes scylladb/scylladb#21733	2024-12-04 13:57:55 +02:00
André LFA	9cd356d66c	Update report-scylla-problem.rst removing references to old Health Check Report Closes scylladb/scylladb#21467 Fixes scylladb/scylladb#21599 (cherry picked from commit `703e6f3b1f`) Closes scylladb/scylladb#21590	2024-12-04 13:55:31 +02:00
Jenkins Promoter	dd9dcb28a3	Update ScyllaDB version to: 6.1.5	2024-12-01 15:58:56 +02:00
Botond Dénes	3771405482	Merge 'repair: fix task_manager_module::abort_all_repairs' from Aleksandra Martyniuk Currently, task_manager_module::abort_all_repairs marks top-level repairs as aborted (but does not abort them) and aborts all existing shard tasks. A running repair checks whether its id isn't contained in _aborted_pending_repairs and then proceeds to create shard tasks. If abort_all_repairs is executed after _aborted_pending_repairs is checked but before shard tasks are created, then those new tasks won't be aborted. The issue is the most severe for tablet_repair_task_impl that checks the _aborted_pending_repairs content from different shards, that do not see the top-level task. Hence the repair isn't stopped but it creates shard repair tasks on all shards but the one that initialized repair. Abort top-level tasks in abort_all_repairs. Fix the shard on which the task abort is checked. Fixes: #21612. Needs backport to 6.1 and 6.2 as they contain the bug. Closes scylladb/scylladb#21616 * github.com:scylladb/scylladb: test: add test to check if repair is properly aborted repair: add shard param to task_manager_module::is_aborted repair: use task abort source to abort repair repair: drop _aborted_pending_repairs and utilize tasks abort mechanism repair: fix task_manager_module::abort_all_repairs (cherry picked from commit `5ccbd500e0`) Closes scylladb/scylladb#21641	2024-11-25 11:01:12 +02:00
Nadav Har'El	506b366e5d	alternator: fix "/localnodes" to not return down nodes Alternator's "/localnodes" HTTP requests is supposed to return the list of nodes in the local DC to which the user can send requests. Before commit `bac7c33313` we used the gossiper is_alive() method to determine if a node should be returned. That commit changed the check to is_normal() - because a node can be alive but in non-normal (e.g., joining) state and not ready for requests. However, it turns out that checking is_normal() is not enough, because if node is stopped abruptly, other nodes will still consider it "normal", but down (this is so-called "DN" state). So we need to check both is_alive() and is_normal(). This patch also adds a test reproducing this case, where a node is shut down abruptly. Before this patch, the test failed ("/localnodes" continued to return the dead node), and after it it passes. Fixes #21538 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21540 (cherry picked from commit `7607f5e33e`) Closes scylladb/scylladb#21633	2024-11-21 08:50:44 +02:00
Anna Stuchlik	f2bed0f362	doc: add the 6.0-to-2024.2 upgrade guide-from-6 This commit adds an upgrade guide from ScyllDB 6.0 to ScyllaDB Enterprise 2024.2. Fixes https://github.com/scylladb/scylladb/issues/20063 Fixes https://github.com/scylladb/scylladb/issues/20062 Refs https://github.com/scylladb/scylla-enterprise/issues/4544 (cherry picked from commit `3d4b7e41ef`) Closes scylladb/scylladb#21619	2024-11-18 17:22:12 +02:00
Raphael S. Carvalho	b0bb40e8d4	replica: Fix schema change during migration cleanup During migration cleanup, there's a small window in which the storage group was stopped but not yet removed from the list. So concurrent operations traversing the list could work with stopped groups. During a test which emitted schema changes during migrations, a failure happened when updating the compaction strategy of a table, but since the group was stopped, the compaction manager was unable to find the state for that group. In order to fix it, we'll skip stopped groups when traversing the list since they're unused at this stage of migration and going away soon. Fixes #20699. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `b8d6f864bc`) Closes scylladb/scylladb#21203	2024-11-15 10:40:21 +02:00
Calle Wilund	5058d6af41	cql_test_env/gossip: Prevent double shutdown call crash Fixes #21159 When an exception is thrown in sstable write etc such that storage_manager::isolate is initiated, we start a shutdown chain for message service, gossip etc. These are synced (properly) in storage_manager::stop, but if we somehow call gossiper::shutdown outside the normal service::stop cycle, we can end up running the method simultaneously, intertwined (missing the guard because of the state change between check and set). We then end up co_awaiting an invalid future (_failure_detector_loop_done) - a second wait. Fixed by a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added in `20496ed`, ages ago. However, it should not be needed nowadays. b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure. (cherry picked from commit `c28a5173d9`) Closes scylladb/scylladb#21394	2024-11-15 10:40:04 +02:00
Emil Maskovsky	730d39df40	test/topology_custom: fix the flaky test_raft_recovery_stuck The test is only sending a subset of the running servers for the rolling restart. The rolling restart is checking the visibility of the restarted node agains the other nodes, but if that set is incomplete some of the running servers might not have seen the restarted node yet. Improved the manager client rolling restart method to consider all the running nodes for checking the restarted node visibility. Fixes: scylladb/scylladb#19959 Closes scylladb/scylladb#21477 (cherry picked from commit `92db2eca0b`) Closes scylladb/scylladb#21555	2024-11-15 10:39:18 +02:00
Botond Dénes	78ad345f7f	Merge 'scylla_raid_setup: fix failure on SELinux package installation' from Takuya ASADA After merged `5a470b2bfb`, we found that scylla_raid_setup fails on offline mode installation. This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect internet. Seems like it occur because of missing "policycoreutils-python-utils" package, which is the package for "semange" command. So we need to implement the relabeling patch without using the command. Fixes https://github.com/scylladb/scylladb/issues/21441 Also, since Amazon Linux 2 has different package name for semange, we need to adjust package name. Fixes https://github.com/scylladb/scylladb/issues/21351 Closes scylladb/scylladb#21474 * github.com:scylladb/scylladb: scylla_raid_setup: support installing semanage on Amazon Linux 2 scylla_raid_setup: fix failure on SELinux package installation (cherry picked from commit `1c212df62d`) Closes scylladb/scylladb#21546	2024-11-14 15:57:47 +02:00
Botond Dénes	4610dde4da	streaming: stream-session: switch to tracking permit The stream-session is the receiving end of streaming, it reads the mutation fragment stream from an RPC stream and writes it onto the disk. As such, this part does no disk IO and therefore, using a permit with count resources is superfluous. Furthermore, after `d98708013c`, the count resources on this permit can cause a deadlock on the receiver end, via the `db::view::check_view_update_path()`, which wants to read the content of a system table and therefore has to obtain a permit of its own. Switch to a tracking-only permit, primarily to resolve the deadlock, but also because admission is not necessary for a read which does no IO. Refs: scylladb/scylladb#20885 (partial fix, solves only one of the deadlocks) Fixes: scylladb/scylladb#21264 Fixes: scylladb/scylladb#21570 Closes scylladb/scylladb#21059 (cherry picked from commit `7c75fc599f`) Closes scylladb/scylladb#21571	2024-11-14 12:45:03 +02:00
Botond Dénes	ecb9cb374e	Merge '[Backport 6.1] compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors' from ScyllaDB stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them by continue with shutdown. stop_ongoing_compactions, in particular, currently returns the status of stopped compaction tasks from `stop_tasks`, but still all tasks must be stopped after it, even if they failed, so assert that and ignore the errors. Fixes scylladb/scylladb#21159 * Needs backport to 6.2 and 6.1, as commit `8cc99973eb` causes handles storage that might cause compaction tasks to fail and eventually terminate on shudown when the exceptions are thrown in noexcept context in the deferred stop destructor body (cherry picked from commit `e942c074f2`) (cherry picked from commit `d8500472b3`) (cherry picked from commit `c08ba8af68`) (cherry picked from commit `a7a55298ea`) (cherry picked from commit `6cce67bec8`) Refs #21299 Closes scylladb/scylladb#21435 * github.com:scylladb/scylladb: compaction_manager: stop: await _stop_future if engaged compaction_manager: really_do_stop: assert that no tasks are left behind compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors compaction/compaction_manager: stop_tasks(): unlink stopped tasks compaction/compaction_manager: make _tasks an intrusive list	2024-11-14 07:00:28 +02:00
Benny Halevy	5f9b3b08f4	compaction_manager: stop: await _stop_future if engaged The current condition that consults the compaction manager state for awaiting `_stop_future` works since _stop_future is assigned after the state is set to `stopped`, but it is incidental. What matters is that `_stop_future` is engaged. While at it, exchange _stop_future with a ready future so that stop() can be safely called multiple times. And dropped the superfluous co_return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `6cce67bec8`)	2024-11-12 15:21:04 +02:00
Benny Halevy	fe03c9b724	compaction_manager: really_do_stop: assert that no tasks are left behind stop_ongoing_compactions now ignores any errors returned by tasks, and it should leave no task left behind. Assert that here, before the compaction_manager is destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `a7a55298ea`)	2024-11-12 15:21:00 +02:00
Benny Halevy	cbddf18727	compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them but continue with shutdown. Leaked errors on the stop path may cause termination on shutdown, when called in a deferred action destructor. Fixes scylladb/scylladb#21298 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `c08ba8af68`)	2024-11-12 15:14:21 +02:00
Botond Dénes	2a32e2ae82	compaction/compaction_manager: stop_tasks(): unlink stopped tasks Stopped tasks currently linger in _tasks until the fiber that created the task is scheduled again and unlinks the task. This window between stop and remove prevents reliable checks for empty _tasks list after all tasks are stopped. Unlink the task early so really_do_stop() can safely check for an empty _tasks list (next patch). (cherry picked from commit `d8500472b3`)	2024-11-12 15:13:32 +02:00
Botond Dénes	d63b9efa7e	compaction/compaction_manager: make _tasks an intrusive list _tasks is currently std::list<shared_ptr<compaction_task_executor>>, but it has no role in keeping the instances alive, this is done by the fibers which create the task (and pin a shared ptr instance). This lends itself to an intrusive list, avoiding that extra allocation upon push_back(). Using an intrusive list also makes it simpler and much cheaper (O(1) vs. O(N)) to remove tasks from the _tasks list. This will be made use of in the next patch. Code using _task has to be updated because the value_type changes from shared_ptr<compaction_task_executor> to compaction_task_executor&. (cherry picked from commit `e942c074f2`)	2024-11-12 11:42:34 +02:00
Yaron Kaikov	a1fea6b225	./github/workflows/add-label-when-promoted.yaml: Run auto-backport only on default branch In https://github.com/scylladb/scylladb/pull/21496#event-15221789614 ``` scylladbbot force-pushed the backport/21459/to-6.1 branch from 414691c to `59a4ccd` Compare 2 days ago ``` Backport automation triggered by `push` but also should either start from `master` branch (or `enterprise` branch from Enterprise), we need to verify it by checking also the default branch. Fixes: https://github.com/scylladb/scylladb/issues/21514 Closes scylladb/scylladb#21515 (cherry picked from commit `2596d1577b`) Closes scylladb/scylladb#21530	2024-11-11 17:44:41 +02:00
Michał Chojnowski	04b3d96259	mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity For performance reasons, mutation_partition_v2::maybe_drop(), and by extension also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&) can evict empty row entries, and hence change the continuity of the merged entry. For checking that apply_to_incomplete respects continuity, test_apply_to_incomplete_respects_continuity obtains the continuity of the partition entry before and after apply_to_incomplete by calling e.squashed().get_continuity(). But squashed() uses apply_monotonically(), so in some circumstances the result of squashed() can have smaller continuity than the argument of squashed(), which messes with the thing that the test is trying to check, and causes spurious failures. This patch changes the method of calculating the continuity set, so that it matches the entry exactly, fixing the test failures. Fixes scylladb/scylladb#13757 Closes scylladb/scylladb#21459 (cherry picked from commit `35921eb67e`) Closes scylladb/scylladb#21496	2024-11-08 15:33:20 +01:00
Yaron Kaikov	236b235a89	.github/scripts/auto-backport.py: update method to get closed prs `commit.get_pulls()` in PyGithub returns pull requests that are directly associated with the given commit Since in closed PR. the relevant commit is an event type, the backport automation didn't get the PR info for backporting Ref: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21468 (cherry picked from commit `ef104b7b96`) Closes scylladb/scylladb#21482	2024-11-08 10:26:44 +02:00
Yaron Kaikov	3ddb61c90e	.github/script/auto-backport.py: push backport PR to `scylladbbot` fork Since Scylla is a public repo, when we create a fork, it doesn't fork the team and permissions (unlike private repos where it does). When we have a backport PR with conflicts, the developers need to be able to update the branch to fix the conflicts. To do so, we modified the logic of the backport automation as follows: - Every backport PR (with and without conflicts) will be open directly on the `scylladbbot` fork repo - When there are conflicts, an email will be sent to the original PR author with an invitation to become a contributor in the `scylladbbot` fork with `push` permissions. This will happen only once if Auther is not a contributor. - Together with sending the invite, all backport labels will be removed and a comment will be added to the original PR with instructions - The PR author must add the backport labels after the invitation is accepted Fixes: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21401 (cherry picked from commit `77604b4ac7`) Closes scylladb/scylladb#21465	2024-11-07 15:05:56 +02:00
Yaron Kaikov	160823ccaf	github: add script for backports automation instead of Mergify Adding an auto-backport.py script to handle backport automation instead of Mergify. The rules of backport are as follows: * Merged or Closed PRs with any backport/x.y label (one or more) and promoted-to-master label * Backport PR will be automatically assigned to the original PR author * In case of conflicts the backport PR will be open in the original autoor fork in draft mode. This will give the PR owner the option to resolve conflicts and push those changes to the PR branch (Today in Scylla when we have conflicts, the developers are forced to open another PR and manually close the backport PR opened by Mergify) * Fixing cherry-pick the wrong commit SHA. With the new script, we always take the SHA from the stable branch * Support backport for enterprise releases (from Enterprise branch) Fixes: https://github.com/scylladb/scylladb/issues/18973 (cherry picked from commit `f9e171c7af`) Closes scylladb/scylladb#21470	2024-11-07 06:58:16 +02:00
Jenkins Promoter	9ff31c6c4e	Update ScyllaDB version to: 6.1.4	2024-11-06 16:08:17 +02:00
Botond Dénes	6a66faab41	Merge '[Backport 6.1] repair: Fix finished ranges metrics for removenode' from ScyllaDB The skipped ranges should be multiplied by the number of tables Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 (cherry picked from commit `cffe3dc49f`) (cherry picked from commit `1392a6068d`) (cherry picked from commit `9868ccbac0`) Refs #21252 Closes scylladb/scylladb#21314 * github.com:scylladb/scylladb: test: Add test_node_ops_metrics.py repair: Make the ranges more consistent in the log repair: Fix finished ranges metrics for removenode	2024-11-05 09:44:29 +02:00
Tzach Livyatan	c1e42cacac	Update os-support-info.rst - add CentOS ScyllaDB support RHEL 9 and derivatives, including CentOS 9. Fix https://github.com/scylladb/scylladb/issues/21309 (cherry picked from commit `1878af9399`) Closes scylladb/scylladb#21333	2024-11-05 09:43:51 +02:00
Benny Halevy	baa4d1a6e7	compaction_manager: compaction_disabled: return true if not in compaction_state When a compaction_group is removed via `compaction_manager::remove`, it is erase from `_compaction_state`, and therefore compaction is definitely not enabled on it. This triggers an internal error if tablets are cleaned up during drop/truncate, which checks that compaction is disabled in all compaction groups. Note that the callers of `compaction_disabled` aren't really interested in compaction being actively disabled on the compaction_group, but rather if it's enabled or not. A follow-up patch can be consider to reverse the logic and expose `compaction_enabled` rather than `compaction_disabled`. Fixes scylladb/scylladb#20060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `78ceaeabca`) Closes scylladb/scylladb#21405	2024-11-05 09:42:01 +02:00
Kamil Braun	b057168dd0	Merge '[Backport 6.1] cql/tablets: fix retrying ALTER tablets KEYSPACE' from Marcin Maliszkiewicz ALTER tablets-enabled KEYSPACES (KS) may fail due to group0_concurrent_modification, in which case it's repeated by a for loop surrounding the code. But because raft's add_entry consumes the raft's guard (by std::move'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned for loop altogether and rethrow the exception, as the rf_change event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. Note: refactor is implemented in the follow-up commit. Fixes: https://github.com/scylladb/scylladb/issues/21102 Should be backported to every 6.x branch, as it may lead to a crash. (cherry picked from commit `de511f56ac`) (cherry picked from commit `3f4c8a30e3`) (cherry picked from commit `522bede8ec`) Refs https://github.com/scylladb/scylladb/pull/21121 Closes scylladb/scylladb#21340 * github.com:scylladb/scylladb: test: topology: add disable_schema_agreement_wait utility function test: add UT to test retrying ALTER tablets KEYSPACE cql/tablets: fix indentation in `rf_change` event handler cql/tablets: fix retrying ALTER tablets KEYSPACE	2024-11-04 12:23:47 +01:00
Benny Halevy	7dbe39a9a5	storage_service: on_change: update_peer_info only if peer info changed Return an optional peer_info from get_peer_info_for_update when the `app_state_map` arg does not change peer_info, so that we can skip calling update_peer_info, if it didn't change. Fixes scylladb/scylladb#20991 Refs scylladb/scylladb#16376 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21152 (cherry picked from commit `04d741bcbb`)	2024-11-04 11:44:05 +02:00
Tomasz Grabiec	eec3e22c6a	node-exporter: Disable hwmon collector This collector reads nvme temperature sensor, which was observed to cause bad performance on Azure cloud following the reading of the sensor for ~6 seconds. During the event, we can see elevated system time (up to 30%) and softirq time. CPU utilization is high, with nvm_queue_rq taking several orders of magnitude more time than normally. There are signs of contention, we can see __pv_queued_spin_lock_slowpath in the perf profile, called. This manifests as latency spikes and potentially also throughput drop due to reduced CPU capacity. By default, the monitoring stack queries it once every 60s. (cherry picked from commit `93777fa907`) Closes scylladb/scylladb#21305	2024-10-31 14:05:38 +01:00
Marcin Maliszkiewicz	7d87f744ea	test: topology: add disable_schema_agreement_wait utility function Code extracted from `fa45fdf5f7` as it's being used by test_alter_tablets_keyspace_concurrent_modification and we're backporting it.	2024-10-30 16:57:19 +01:00
Piotr Smaron	d8e36873cf	test: add UT to test retrying ALTER tablets KEYSPACE The newly added testcase is based on the already existing `test_alter_dropped_tablets_keyspace`. A new error injection is created, which stops the ALTER execution just before the changes are submitted to RAFT. In the meantime, a new schema change is performed using the 2nd node in the cluster, thus causing the 1st node to retry the ALTER statement. (cherry picked from commit `522bede8ec`)	2024-10-30 16:49:33 +01:00
Piotr Smaron	1dddd2a8ca	cql/tablets: fix indentation in `rf_change` event handler Just moved the code that previously was under a `for` loop by 1 tab, i.e. 4 spaces, to the left. (cherry picked from commit `3f4c8a30e3`)	2024-10-30 16:49:33 +01:00
Piotr Smaron	ab333f2453	cql/tablets: fix retrying ALTER tablets KEYSPACE ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. `topology_coordinator::handle_topology_coordinator_error` handling the case of `group0_concurrent_modification` has been extended with logging in order not to write catch-log-throw boilerplate. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102 (cherry picked from commit `de511f56ac`)	2024-10-30 16:49:33 +01:00
Gleb Natapov	0b502a2610	topology coordinator: take a copy of a replication state in raft_topology_cmd_handler Current code takes a reference and holds it past preemption points. And while the state itself is not suppose to change the reference may become stale because the state is re-created on each raft topology command. Fix it by taking a copy instead. This is a slow path anyway. Fixes: scylladb/scylladb#21220 (cherry picked from commit `fb38bfa35d`) Closes scylladb/scylladb#21373	2024-10-30 14:12:44 +01:00
Kamil Braun	51f7ff8697	Merge '[Backport 6.1] storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from ScyllaDB During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in filter_for_query(): the map is considered incorrect if the list of replicas contains a node from a data center whose replication factor is 0. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2. (cherry picked from commit `132358dc92`) (cherry picked from commit `ae23d42889`) (cherry picked from commit `ad93cf5753`) (cherry picked from commit `8db6d6bd57`) (cherry picked from commit `c373edab2d`) Refs #20851 Closes scylladb/scylladb#21068 * github.com:scylladb/scylladb: Add conditions checking for get_read_executor Avoid an extra call to block_for in db::filter_for_query. Improve code readability in consistency_level.cc and storage_proxy.cc tools: Add build_info header with functions providing build type information tests: Add tests for alter table with RF=1 to RF=0	2024-10-29 12:32:48 +01:00
Asias He	9fdc596ff7	test: Add test_node_ops_metrics.py It tests the node_ops_metrics_done metric reaches 100% when a node ops is done. Refs: #21174 (cherry picked from commit `9868ccbac0`)	2024-10-28 09:54:30 +00:00
Asias He	5a2196b94a	repair: Make the ranges more consistent in the log Consider the number of tables for the number of ranges logging. Make it more consistent with the log when the ops starts. (cherry picked from commit `1392a6068d`)	2024-10-28 09:54:30 +00:00
Asias He	34cb594dd5	repair: Fix finished ranges metrics for removenode The skipped ranges should be multiplied by the number of tables. Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 (cherry picked from commit `cffe3dc49f`)	2024-10-28 09:54:30 +00:00
Lakshmi Narayanan Sreethar	91c693bf93	[Backport 6.1] replica/table: check memtable before discarding tombstone during read On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 `perf-simple-query` stats before and after this fix : `build/Dev/scylla perf-simple-query --smp=1 --flush` : ``` // Before this Fix // --------------- 94941.79 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59393 insns/op, 24029 cycles/op, 0 errors) 97551.14 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59376 insns/op, 23966 cycles/op, 0 errors) 96599.92 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59367 insns/op, 23998 cycles/op, 0 errors) 97774.91 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59370 insns/op, 23968 cycles/op, 0 errors) 97796.13 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59368 insns/op, 23947 cycles/op, 0 errors) throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79 instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02 cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19 // After this Fix // -------------- 95313.53 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59392 insns/op, 24058 cycles/op, 0 errors) 97311.48 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59375 insns/op, 24005 cycles/op, 0 errors) 98043.10 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59381 insns/op, 23941 cycles/op, 0 errors) 96750.31 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59396 insns/op, 24025 cycles/op, 0 errors) 93381.21 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59390 insns/op, 24097 cycles/op, 0 errors) throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21 instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73 cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22 ``` This PR fixes a regression introduced in `ce96b472d3` and should be backported to older versions. Closes scylladb/scylladb#20985 * github.com:scylladb/scylladb: topology-custom: add test to verify tombstone gc in read path replica/table: check memtable before discarding tombstone during read compaction_group: track maximum timestamp across all sstables (cherry picked from commit `519e167611`) Backported from #20985 to 6.1. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#21250	2024-10-25 11:13:54 +03:00
Piotr Dulikowski	77f0533a01	SCYLLA-VERSION-GEN: correct the logic for skipping SCYLLA--FILE The SCYLLA-VERSION-GEN file skips updating the SCYLLA--FILE files if the commit hash from SCYLLA-RELEASE-FILE is the same. The original reason for this was to prevent the date in the version string from changing if multiple modes are built across midnight (scylladb/scylla-pkg#826). However - intentionally or not - it serves another purpose: it prevents an infinite loop in the build process. If the build.ninja file needs to be rebuilt, the configure.py script unconditionally calls ./SCYLLA-VERSION-GEN. On the other hand, if one of the SCYLLA-*-FILE files is updated then this triggers rebuild of build.ninja. Apparently, this is sufficient for ninja to enter an infinite loop. However, the check assumes that the RELEASE is in the format <build identifier>.<date>.<commit hash> and assumes that none of the components have a dot inside - otherwise it breaks and just works incorrectly. Specifically, when building a private version, it is recommended to set the build identifier to `count.yourname`. Previously, before `85219e9`, this problem wasn't noticed most likely because reconfigure process was broken and stopped overwriting the build.ninja file after the first iteration. Fix the problem by fixing the logic that extracts the commit hash - instead of looking at the third dot-separated field counting from the left side, look at the last field. Fixes: scylladb/scylladb#21027 (cherry picked from commit `64ca58125e`) Closes scylladb/scylladb#21104	2024-10-25 11:09:51 +03:00
Benny Halevy	145230e032	storage_service: rebuild: warn about tablets-enabled keyspaces Until we automatically support rebuild for tablets-enabled keyspaces, warn the user about them. The reason this is not an error, is that after increasing RF in a new datacenter, the current procedure is to run `nodetool rebuild` on all nodes in that dc to rebuild the new vnode replicas. This is not required for tablets, since the additional replicas are rebuilt automatically as part of ALTER KS. However, `nodetool rebuild` is also run after local data loss (e.g. due to corruption and removal of sstables). In this case, rebuild is not supported for tablets-enabled keyspaces, as tablet replicas that had lost data may have already been migrated to other nodes, and rebuilding the requested node will not know about it. It is advised to repair all nodes in the datacenter instead. Refs scylladb/scylladb#17575 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ed1e9a1543`) Closes scylladb/scylladb#20723	2024-10-25 11:06:38 +03:00
Tomasz Grabiec	39c1a448f6	Merge '[Backport 6.1] replica: Fix tombstone GC during tablet split preparation' from Raphael Raph Carvalho During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: sstable B is split first, and moved from main (unsplit) group to a split-ready group now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes https://github.com/scylladb/scylladb/issues/20044. Please replace this line with justification for the backport/* labels added to this PR Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed. (cherry picked from commit `bcd358595f`) (cherry picked from commit `93815e0649`) Refs https://github.com/scylladb/scylladb/pull/20939 Closes scylladb/scylladb#21205 * github.com:scylladb/scylladb: replica: Fix tombstone GC during tablet split preparation service: Improve error handling for split	2024-10-23 11:41:36 +02:00
Botond Dénes	03f370e971	Merge '[Backport 6.1] Check system.tablets update before putting it into the table' from ScyllaDB Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load. fixes #20043 (cherry picked from commit `f09fe4f351`) (cherry picked from commit `e5bf376cbc`) (cherry picked from commit `1863ccd900`) Refs #21020 Closes scylladb/scylladb#21110 * github.com:scylladb/scylladb: tablets: Validate system.tablets update group0_client: Introduce change validation group0_client: Add shared_token_metadata dependency replica/tablets: Add to_tablet_metadata_(row_)?key helpers replica/tablets: extract tablet_replica_set_from_cell()	2024-10-23 10:02:13 +03:00
Pavel Emelyanov	c52e5a8a87	tablets: Validate system.tablets update Implement change validation for raft topology_change command. For now the only check is that the "pending replicas" contains at most one entry. The check mirrors similar one in `process_one_row` function. If not passed, this prevents system.tablets from being updated with the mutation(s) that will not be loaded later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:17:00 +03:00
Pavel Emelyanov	337c777635	group0_client: Introduce change validation Add validate_change() methods (well, a template and an overload) that are called by prepare_command() and are supposed to validate the proposed change before it hits persistent storage Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:16:56 +03:00
Pavel Emelyanov	881ec8600f	group0_client: Add shared_token_metadata dependency It will be needed later to get tablet_metadata from. The dependency is "OK", shared_token_metadata is low-level sharded service. Client already references db::system_keyspace, which in turn references replica::database which, finally, references token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:16:52 +03:00
Pavel Emelyanov	4bed029b56	replica/tablets: Add to_tablet_metadata_(row_)?key helpers Extraceted from larger patch `f5976aa87b` (replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()) by Botond. The helpers are needed to decode mutations with tablets update to validate them later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:16:47 +03:00
Kefu Chai	751f1fda16	replica/tablets: extract tablet_replica_set_from_cell() so it can be reused to implement a low-level tool which reads tablets data from sstables Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 13:16:44 +03:00
Botond Dénes	0d41447e1a	Merge '[Backport 6.1] atomic_delete: allow deletion of sstables from several prefixes' from ScyllaDB Allow create_pending_deletion_log to delete a bunch of sstables potentially resides in different prefixes (e.g. in the base directory and under staging/). The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups. Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories. Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit. Fixes scylladb/scylladb#18862 Needs backport to 6.0 since tablets require this capability (cherry picked from commit `a7b92d7b6f`) (cherry picked from commit `027e64876a`) (cherry picked from commit `44bd183187`) (cherry picked from commit `f47b5e60bc`) Refs #19555 Closes scylladb/scylladb#20644 * github.com:scylladb/scylladb: sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory sstables: storage: keep base directory in base class sstables: storage: define opened_directory in header file sstable_directory: use only dirlog	2024-10-22 09:17:26 +03:00
Benny Halevy	71d90b2fbc	view: check_needs_view_update_path: get token_metadata_ptr check_needs_view_update_path is async and might yield so the token_metadata reference passed to it must be kept alive throughout the call. Fixes scylladb/scylladb#20979 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `d34878e96c`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21039	2024-10-22 09:16:40 +03:00
Daniel Reis	a22486a6d3	docs: fix redirect from cert-based auth to security/enable-auth page (cherry picked from commit `28a265ccd8`) Closes scylladb/scylladb#21123	2024-10-22 09:13:42 +03:00
Raphael S. Carvalho	5106d40577	replica: Fix tombstone GC during tablet split preparation During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: 1) sstable B is split first, and moved from main (unsplit) group to a split-ready group 2) now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes #20044. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `93815e0649`)	2024-10-20 20:44:44 -03:00
Benny Halevy	a8e472178f	sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory To be able to atomically delete sstables both in base table directory and in its sub-directories, like `staging/`, use a shared pending_delete_dir under under the base directory. Note that this requires loading and processing the base directory first. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `f47b5e60bc`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> # Conflicts: # sstables/sstable_directory.hh	2024-10-20 09:10:47 +03:00
Benny Halevy	8c646c2942	sstables: storage: keep base directory in base class so we can use the base (table) directory for e.g. pending_delete logs, in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `44bd183187`)	2024-10-20 09:09:06 +03:00
Benny Halevy	334d56fcfd	sstables: storage: define opened_directory in header file So it can be used outside the storage module in the following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `027e64876a`)	2024-10-20 09:09:00 +03:00
Benny Halevy	e141e97f2d	sstable_directory: use only dirlog Currently, there are leftover log messages using sstlog rather than dirlog, that was introduced in `aebd965f0e`, and that makes debugging harder. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `a7b92d7b6f`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> # Conflicts: # sstables/sstable_directory.cc	2024-10-20 09:08:49 +03:00
Botond Dénes	7367544ea2	Merge '[Backport 6.1] tablet: Fix single-sstable split when attaching new unsplit sstables' from ScyllaDB To fix a race between split and repair here `c1de4859d8`, a new sstable generated during streaming can be split before being attached to the sstable set. That's to prevent an unsplit sstable from reaching the set after the tablet map is resized. So we can think this split is an extension of the sstable writer. A failure during split means the new sstable won't be added. Also, the duration of split is also adding to the time erm is held. For example, repair writer will only release its erm once the split sstable is added into the set. This single-sstable split is going through run_custom_job(), which serializes with other maintenance tasks. That was a terrible decision, since the split may have to wait for ongoing maintenance task to finish, which means holding erm for longer. Additionally, if split monitor decides to run split on the entire compaction group, it can cause single-sstable split to be aborted since the former wants to select all sstables, propagating a failure to the streaming writer. That results in new sstable being leaked and may cause problems on restart, since the underlying tablet may have moved elsewhere or multiple splits may have happened. We have some fragility today in cleaning up leaked sstables on streaming failure, but this single-sstable split made it worse since the failure can happen during normal operation, when there's e.g. no I/O error. It makes sense to kill run_custom_job() usage, since the single-sstable split is offline and an extension of sstable writing, therefore it makes no sense to serialize with maintenance tasks. It must also inherit the sched group of the process writing the new sstable. The inheritance happens today, but is fragile. Fixes #20626. (cherry picked from commit `999f1f1318`) (cherry picked from commit `38ce2c605d`) Refs #20737 Closes scylladb/scylladb#20802 * github.com:scylladb/scylladb: tablet: Fix single-sstable split when attaching new unsplit sstables replica: Fix tablet split execute after restart	2024-10-17 19:36:47 +03:00
Piotr Smaron	f8d6215242	test: fix flaky `test_multidc_alter_tablets_rf` The testcase is flaky due to a known python driver issue: https://github.com/scylladb/python-driver/issues/317. This issue causes the `CREATE KEYSPACE` statement to be sometimes executed twice in a row, and the 2nd CREATE statement causes the test to fail. In order to work around it, it's enough to add `if not exists` when creating a ks. Fixes: #21034 Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch. (cherry picked from commit `3969ffb39f`) Closes scylladb/scylladb#21106	2024-10-17 10:59:52 +03:00
Piotr Smaron	750ff26371	cql/tablets: handle MVs in ALTER tablets KEYSPACE ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized views (MV), and only produced tablets mutations changing tables. With this patch we're producing tablets mutations for both tables and MVs, hence when e.g. we change the replication factor (RF) of a KS, both the tables' RFs and MVs' RFs are updated along with tablets replicas. The `test_tablet_rf_change` testcase has been extended to also verify that MVs' tablets replicas are updated when RF changes. Fixes: #20240 (cherry picked from commit `5ac16e29e6`) Closes scylladb/scylladb#21023	2024-10-16 10:39:07 +03:00
Kefu Chai	e22d8a3de3	install.sh: install seastar/scripts/addr2line.py as well seastar extracted `addr2line` python module out back in e078d7877273e4a6698071dc10902945f175e8bc. but `install.sh` was not updated accordingly. it still installs `seastar-addr2line` without installing its new dependency. this leaves us with a broken `seastar-addr2line` in the relocatable tarball. ```console $ /opt/scylladb/scripts/seastar-addr2line Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/seastar-addr2line", line 26, in <module> from addr2line import BacktraceResolver ModuleNotFoundError: No module named 'addr2line' ``` in this change, we redistribute `addr2line.py` as well. this should address the issue above. Fixes scylladb/scylladb#21077 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `da433aad9d`) Closes scylladb/scylladb#21087	2024-10-14 13:31:17 +03:00
Sergey Zolotukhin	6e15c244ec	Add conditions checking for get_read_executor During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in get_endpoints_for_reading(): the map is considered incorrect the number of read replica nodes is higher than replication factor. The check is applied only when built in non release mode. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 (cherry picked from commit `c373edab2d`)	2024-10-11 18:20:43 +00:00
Sergey Zolotukhin	5357e492ca	Avoid an extra call to block_for in db::filter_for_query. (cherry picked from commit `8db6d6bd57`)	2024-10-11 18:20:43 +00:00
Sergey Zolotukhin	09330bf597	Improve code readability in consistency_level.cc and storage_proxy.cc Add const correctness and rename some variables to improve code readability. (cherry picked from commit `ad93cf5753`)	2024-10-11 18:20:42 +00:00
Sergey Zolotukhin	116661a05b	tools: Add build_info header with functions providing build type information A new header provides `constexpr` functions to retrieve build type information: `get_build_type()`, `is_release_build()`, and `is_debug_build()`. These functions are useful when adding changes that should be enabled at compile time only for specific build types. (cherry picked from commit `ae23d42889`)	2024-10-11 18:20:42 +00:00
Sergey Zolotukhin	52955e940a	tests: Add tests for alter table with RF=1 to RF=0 Adding Vnodes and Tablets tests for alter keyspace operation that decreases replication factor from 1 to 0 for one of two data centers. Tablet version fails due to issue described in scylladb/scylladb#20625. Test for scylladb/scylladb#20625 (cherry picked from commit `132358dc92`)	2024-10-11 18:20:42 +00:00
Michał Chojnowski	9f0b19b7f7	reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources can_admit_read() returns reason::memory_resources when the permit is queued due to lack of count resources, and it returns reason::count_resources when the permit is queued due to lack of memory resources. It's supposed to be the other way around. This bug is causing the two counts to be swapped in the stat dumps printed to the logs when semaphores time out. (cherry picked from commit `c2ba300f1c`) Closes scylladb/scylladb#21031	2024-10-11 14:45:31 +03:00
Botond Dénes	1e847d0253	Merge '[Backport 6.1] cql: improve validating RF's change in ALTER tablets KS' from ScyllaDB This patch series fixes a couple of bugs around validating if RF is not changed by too much when performing ALTER tablets KS. RF cannot change by more than 1 in total, because tablets load balancer cannot handle more work at once. Fixes: #20039 Should be backported to 6.0 & 6.1 (wherever tablets feature is present), as this bug may break the cluster. (cherry picked from commit `042825247f`) (cherry picked from commit `adf453af3f`) (cherry picked from commit `9c5950533f`) (cherry picked from commit `47acdc1f98`) (cherry picked from commit `93d61d7031`) (cherry picked from commit `6676e47371`) (cherry picked from commit `2aabe7f09c`) (cherry picked from commit `ee56bbfe61`) Refs #20208 Closes scylladb/scylladb#21010 * github.com:scylladb/scylladb: cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS cql: join new and old KS options in ALTER tablets KS cql: fix validation of ALTERing RFs in tablets KS cql: harden `alter_keyspace_statement.cc::validate_rf_difference` cql: validate RF change for new DCs in ALTER tablets KS cql: extend test_alter_tablet_keyspace_rf cql: refactor test_tablets::test_alter_tablet_keyspace cql: remove unused helper function from test_tablets	2024-10-11 14:44:48 +03:00
Botond Dénes	b32304bdda	repair/row_level: remove reader timeout This timeout was added to catch reader related deadlocks. We have not seen such deadlocks for a long time, but we did see false-timeouts caused by this, see explanation below. Since the cost now outweight the benefit, remove the timeout altogether. The false timeout happens during mixed-shard repair. The `reader_permit::set_timeout()` call is called on the top-level permit which repair has a handle on. In the case of the mixed-shard repair, this belongs to the multishard reader. Calling set_timeout() on the multishard reader has no effect on the actual shard readers, except in one case: when the shard reader is created, it inherits the multishard reader's current timeout. As the shard reader can be alive for a long time, this timeout is not refreshed and ultimately causes a timeout and fails the repair. Refs: #18269 (cherry picked from commit `3ebb124eb2`) Closes scylladb/scylladb#20956	2024-10-11 14:42:06 +03:00
Kefu Chai	ef549dbeac	auth: capture boost::regex_error not std::regex_error in `a3db5401`, we introduced the TLS certi authenticator, which is configured using `auth_certificate_role_queries` option . the value of this option contains a regular expression. so there are chances the regular expression is malformatted. in that case, when converting its value presenting the regular expression to an instance of `boost::regex`, Boost.Regex throws a `boost::regex_error` exception, not `std::regex_error`. since we decided to use Boost.Regex, let's catch `boost::regex_error`. Refs `a3db5401` Fixes scylladb/scylladb#20941 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `c7eafc4dc1`) Closes scylladb/scylladb#20953	2024-10-11 14:38:28 +03:00
Anna Stuchlik	65c047f911	doc: document the option to run ScyllaDB in Docker on macOS This commit adds a description of a workaround to create a multi-node ScyllaDB cluster with Docker on macOS. Refs https://github.com/scylladb/scylladb/issues/16806 See https://forum.scylladb.com/t/running-3-node-scylladb-in-docker/1057/4 (cherry picked from commit `7eb1dc2ae5`) Closes scylladb/scylladb#20932	2024-10-11 14:37:55 +03:00
Calle Wilund	6ea4e4a289	database: Also forced new schema commitlog segment on user initiated memtable flush Refs #20686 Refs #15607 In #15060 we added forced new commitlog segment on user initated flush, mainly so that tests can verify tombstone gc and other compaction related things, without having to wait for "organic" segment deletion. Schema commitlog was not included, mainly because we did not have tests featuring compaction checks of schema related tables, but also because it was assumed to be lower general througput. There is however no real reason to not include it, and it will make some testing much quicker and more predictable. (cherry picked from commit `60f8a9f39d`) Closes scylladb/scylladb#20704	2024-10-11 14:36:26 +03:00
Avi Kivity	e31d6c278f	Merge '[Backport 6.1] scylla_raid_setup: configure SELinux file context' from ScyllaDB On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #19325 (cherry picked from commit `56c971373c`) (cherry picked from commit `0ac450de05`) Refs #20528 Closes scylladb/scylladb#20871 * github.com:scylladb/scylladb: scylla_raid_setup: configure SELinux file context scylla_coredump_setup: fix SELinux configuration for RHEL9	2024-10-10 19:01:40 +03:00
Gleb Natapov	592d925516	storage_proxy: make sure there is no end iterator in _live_iterators array storage_proxy::cancellable_write_handlers_list::update_live_iterators assumes that iterators in _live_iterators can be dereferenced, but the code does not make any attempt to make sure this is the case. The iterator can be the end iterator which cannot be dereferenced. The patch makes sure that there is no end iterator in _live_iterators. Fixes scylladb/scylladb#20874 (cherry picked from commit `da084d6441`) Closes scylladb/scylladb#21004	2024-10-09 20:16:53 +03:00
Piotr Smaron	08165851fb	cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS Tablets load balancer is unable to process more than a single pending replica, thus ALTER tablets KS cannot accept an ALTER statement which would result in creating 2+ pending replicas, hence it has to validate if the sum of absoulte differences of RFs specified in the statement is not greter than 1. (cherry picked from commit `ee56bbfe61`)	2024-10-08 18:06:54 +00:00
Piotr Smaron	1f6befe16d	cql: join new and old KS options in ALTER tablets KS A bug has been discovered while trying to ALTER tablets KS and specifying only 1 out of 2 DCs - the not specified DC's RF has been zeroed. This is because ALTER tablets KS updated the KS only with the RF-per-DC mapping specified in the ALTER tablets KS statement, so if a DC was ommitted, it was assigned a value of RF=0. This commit fixes that plus additionally passes all the KS options, not only the replication options, to the topology coordinator, where the KS update is performed. `initial_tablets` is a special case, which requires a special handling in the source code, as we cannot simply update old initial_tablet's settings with the new ones, because if only ` and TABLETS = {'enabled': true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but rather keep the old value - this is tested by the `test_alter_preserves_tablets_if_initial_tablets_skipped` testcase. Other than that, the above mentioned testcase started to fail with these changes, and it appeared to be an issue with the test not waiting until ALTER is completed, and thus reading the old value, hence the test's body has been modified to wait for ALTER to complete before performing validation. (cherry picked from commit `2aabe7f09c`)	2024-10-08 18:06:53 +00:00
Piotr Smaron	97b37fbbd0	cql: fix validation of ALTERing RFs in tablets KS The validation has been corrected with: 1. Checking if a DC specified in ALTER exists. 2. Removing `REPLICATION_STRATEGY_CLASS_KEY` key from a map of RFs that needs their RFs to be validated. (cherry picked from commit `6676e47371`)	2024-10-08 18:06:47 +00:00
Piotr Smaron	7c837837eb	cql: harden `alter_keyspace_statement.cc::validate_rf_difference` This function assumed that strings passed as arguments will be of integer types, but that wasn't the case, and we missed that because this function didn't have any validation, so this change adds proper validation and error logging. Arguments passed to this function were forwarded from a call to `ks_prop_defs::get_replication_options`, which, among rf-per-dc mapping, returns also `class:replication_strategy` pair. Second pair's member has been casted into an `int` type and somehow the code was still running fine, but only extra testing added later discovered a bug in here. (cherry picked from commit `93d61d7031`)	2024-10-08 18:06:47 +00:00
Piotr Smaron	0e0fe4d756	cql: validate RF change for new DCs in ALTER tablets KS ALTER tablets KS validated if RF is not changed by more than 1 for DCs that already had replicas, but not for DCs that didn't have them yet, so specifying an RF jump from 0 to 2 was possible when listing a new DC in ALTER tablets KS statement, which violated internal invariants of tablets load balancer. This PR fixes that bug and adds a multi-dc testcases to check if adding replicas to a new DC and removing replicas from a DC is honoring the RF change constraints. Refs: #20039 (cherry picked from commit `47acdc1f98`)	2024-10-08 18:06:46 +00:00
Piotr Smaron	78bf036419	cql: extend test_alter_tablet_keyspace_rf Added cases to also test decreasing RF and setting the same RF. Also added extra explanatory comments. (cherry picked from commit `9c5950533f`)	2024-10-08 18:06:45 +00:00
Piotr Smaron	4fc45b6fa6	cql: refactor test_tablets::test_alter_tablet_keyspace 1. Renamed the testcase to emphasize that it only focuses on testing changing RF - there are other tests that test ALTER tablets KS in general. 2. Fixed whitespaces according to PEP8 (cherry picked from commit `adf453af3f`)	2024-10-08 18:06:44 +00:00
Piotr Smaron	dbb912c8dd	cql: remove unused helper function from test_tablets `change_default_rf` is not used anywhere, moreover it uses `replication_factor` tag, which is forbidden in ALTER tablets KS statement. (cherry picked from commit `042825247f`)	2024-10-08 18:06:42 +00:00
Raphael S. Carvalho	684b16d709	service: Improve error handling for split Retry wasn't really happening since the loop was broken and sleep part was skipped on error. Also, we were treating abort of split during shutdown as if it were an actual error and that confused longevity tests that parse for logs with error level. The fix is about demoting the level of logs when we know the exception comes from shutdown. Fixes #20890. (cherry picked from commit `bcd358595f`)	2024-10-04 11:17:37 +00:00
Pavel Emelyanov	190385ee2b	cql: Check that CREATEing tablets/vnodes is consistent with the CLI There are two bits that control whenter replication strategy for a keyspace will use tablets or not -- the configuration option and CQL parameter. This patch tunes its parsing to implement the logic shown below: if (strategy.supports_tablets) { if (cql.with_tablets) { if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { throw "tablets are not enabled"; } } else if (cql.with_tablets = off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { return create_keyspace_without_tablets(); } } } else { // strategy doesn't support tablets if (cql.with_tablets == on) { throw "invalid cql parameter"; } else if (cql.with_tablets == off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified return create_keyspace_without_tablets(); } } closes: #20088 In order to enable tablets "by default" for NetworkTopologyStrategy there's explicit check near ks_prop_defs::get_initial_tablets(), that's not very nice. It needs more care to fix it, e.g. provide feature service reference to abstract_replication_strategy constructor. But since ks_prop_defs code already highjacks options specifically for that strategy type (see prepare_options() helper), it's OK for now. There's also #20768 misbehavior that's preserved in this patch, but should be fixed eventually as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20928	2024-10-03 17:09:21 +03:00
Calle Wilund	4a1e83d6be	commitlog: Fix buffer_list_bytes not updated correctly Fixes #20862 With the change in `60af2f3cb2` the bookkeep for buffer memory was changed subtly, the problem here that we would shrink buffer size before we after flush use said buffer's size to decrement the buffer_list_bytes value, previously inc:ed by the full, allocated size. I.e. we would slowly grow this value instead of adjusting properly to actual used bytes. Test included. (cherry picked from commit `ee5e71172f`) Closes scylladb/scylladb#20914	2024-10-03 09:11:40 +03:00
Kamil Braun	a96654bea3	Merge '[Backport 6.1] Populate raft address map from gossiper on raft configuration change' from ScyllaDB For each new node added to the raft config populate it's ID to IP mapping in raft address map from the gossiper. The mapping may have expired if a node is added to the raft configuration long after it first appears in the gossiper. Fixes scylladb/scylladb#20600 Backport to all supported versions since the bug may cause bootstrapping failure. (cherry picked from commit `bddaf498df`) (cherry picked from commit `9e4cd32096`) Refs #20601 Closes scylladb/scylladb#20848 * github.com:scylladb/scylladb: test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join group0: make sure that address map has an entry for each new node in the raft configuration	2024-09-30 17:03:03 +02:00
Takuya ASADA	295993d7f9	scylla_raid_setup: configure SELinux file context On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #20573 (cherry picked from commit `0ac450de05`)	2024-09-29 13:23:03 +00:00
Takuya ASADA	bd7e1cfc5f	scylla_coredump_setup: fix SELinux configuration for RHEL9 Seems like specific version of systemd pacakge on RHEL9 has a bug on SELinux configuration, it introduced "systemd-container-coredump" module to provide rule for systemd-coredump, but not enabled by default. We have to manually load it, otherwise it causes permission error. Fixes #19325 (cherry picked from commit `56c971373c`)	2024-09-29 13:23:03 +00:00
Kamil Braun	79119f58e8	Merge '[Backport 6.1] mark node as being replaced earlier' from Gleb Natapov Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended. Fixes: https://github.com/scylladb/scylladb/issues/20629 Need to be backported since this is a regression (cherry picked from commit `644e7a2012`) (cherry picked from commit `c0939d86f9`) (cherry picked from commit `1b4c255ffd`) Closes scylladb/scylladb#20834 * github.com:scylladb/scylladb: test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts topology coordinator:: mark node as being replaced earlier topology coordinator: do metadata barrier before calling finish_accepting_node() during replace	2024-09-27 16:10:07 +02:00
Andrei Chekun	392d95d2cd	test.py: Increase workers for cluster cleaning Increase workers for that used in method async_rmtree() that is used for cleaning directories. This should help to reduce flakiness. Increasing the workers count was introduced in `f54b7f5427` but there is no need to backport the whole commit. Closes scylladb/scylladb#20795	2024-09-27 14:47:08 +02:00
Kamil Braun	be76d6f9d9	service: raft: fix rpc error message What it called "leader" is actually the destination of the RPC. Trivial fix, should be backported to all affected versions. (cherry picked from commit `84dd0e922b`) Closes scylladb/scylladb#20827	2024-09-27 11:22:02 +02:00
Gleb Natapov	39a8203160	test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join (cherry picked from commit `9e4cd32096`)	2024-09-26 21:13:39 +00:00
Gleb Natapov	d2d1ed92c2	group0: make sure that address map has an entry for each new node in the raft configuration ID->IP mapping is added to the raft address map when the mapping first appears in the gossiper, but it is added as expiring entry. It becomes non expiring when a node is added to raft configuration. But when a node joins those two events may be distant in time (since the node's request may sit in the topology coordinator queue for a while) and mappings may expire already from the map. This patch makes sure to transfer the mapping from the gossiper for a node that is added to the raft configuration instead of assuming that the mapping is already there. (cherry picked from commit `bddaf498df`)	2024-09-26 21:13:39 +00:00
Gleb Natapov	c7be05cc50	test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts (cherry picked from commit `1b4c255ffd`)	2024-09-26 12:34:18 +03:00
Gleb Natapov	88712782de	topology coordinator:: mark node as being replaced earlier Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended. (cherry picked from commit `c0939d86f9`)	2024-09-26 12:34:04 +03:00
Gleb Natapov	eaade2f0ef	topology coordinator: do metadata barrier before calling finish_accepting_node() during replace During replace with the same IP a node may get queries that were intended for the node it was replacing since the new node declares itself UP before it advertises that it is a replacement. But after the node starts replacing procedure the old node is marked as "being replaced" and queries no longer sent there. It is important to do so before the new node start to get raft snapshot since the snapshot application is not atomic and queries that run parallel with it may see partial state and fail in weird ways. Queries that are sent before that will fail because schema is empty, so they will not find any tables in the first place. The is pre-existing and not addressed by this patch. (cherry picked from commit `644e7a2012`)	2024-09-26 12:33:06 +03:00
Kefu Chai	ef32ba704d	docs: explain precedence of configure options to explain for instance which setting takes effect if both command line options and `scylla.yaml` configures the same parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `1aa030a8cd`) Closes scylladb/scylladb#20775	2024-09-26 10:47:42 +03:00
Anna Stuchlik	10d71d2f4b	doc: update the unified installer instructions This commit updates the unified installer instructions to avoid specifying a given version. At the moment, we're technically unable to use variables in URLs, so we need to update the page each release. Fixes https://github.com/scylladb/scylladb/issues/20677 (cherry picked from commit `400a14eefa`) Closes scylladb/scylladb#20709	2024-09-26 10:45:53 +03:00
Anna Stuchlik	9afb3daf98	doc: fix a broken link This commit fixes a link to the Manager by adding a missing underscore to the external link. (cherry picked from commit `aa0c95c95c`) Closes scylladb/scylladb#20707	2024-09-26 10:45:17 +03:00
Tzach Livyatan	82e7cb5bf5	Update client-node-encryption: OpsnSSL is FIPS enabled (cherry picked from commit `cb864b11d8`) Closes scylladb/scylladb#20651	2024-09-26 10:42:12 +03:00
Lakshmi Narayanan Sreethar	58da8fdbbc	[Backport 6.1]: database::get_all_tables_flushed_at: fix return value The `database::get_all_tables_flushed_at` method returns a variable without setting the computed all_tables_flushed_at value. This causes its caller, `maybe_flush_all_tables` to flush all the tables everytime regardless of when they were last flushed. Fix this by returning the computed value from `database::get_all_tables_flushed_at`. Fixes #20301 Closes scylladb/scylladb#20471 * github.com:scylladb/scylladb: cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config database::get_all_tables_flushed_at: fix return value (cherry picked from commit `0e5b444777`) Backported from #20471 to 6.1. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20581	2024-09-26 10:40:48 +03:00
Kamil Braun	92156e7930	test: fix `topology_custom/test_raft_recovery_stuck` flakiness The test performs consecutive schema changes in RECOVERY mode. The second change relies on the first. However the driver might route the changes to different servers and we don't have group 0 to guarantee linearizability. We must rely on the first change coordinator to push the schema mutations to other servers before returning, but that only happens when it sees other servers as alive when doing the schema change. It wasn't guaranteed in the test. Fix this. Fixes scylladb/scylladb#20791 Should be backported to all branches containing this test to reduce flakiness. (cherry picked from commit `f390d4020a`) Closes scylladb/scylladb#20809	2024-09-25 15:11:50 +02:00
Abhinav	33b50a9d3a	raft topology: add error for removal of non-normal nodes In the current scenario, We check if a node being removed is normal on the node initiating the removenode request. However, we don't have a similar check on the topology coordinator. The node being removed could be normal when we initiate the request, but it doesn't have to be normal when the topology coordinator starts handling the request. For example, the topology coordinator could have removed this node while handling another removenode request that was added to the request queue earlier. This commit intends to fix this issue by adding more checks in the enqueuing phase and return errors for duplicate requests for node removal. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#20271 (cherry picked from commit `b25b8dccbd`) Closes scylladb/scylladb#20800	2024-09-25 11:35:27 +02:00
Raphael S. Carvalho	153a54626b	tablet: Fix single-sstable split when attaching new unsplit sstables To fix a race between split and repair here `c1de4859d8`, a new sstable generated during streaming can be split before being attached to the sstable set. That's to prevent an unsplit sstable from reaching the set after the tablet map is resized. So we can think this split is an extension of the sstable writer. A failure during split means the new sstable won't be added. Also, the duration of split is also adding to the time erm is held. For example, repair writer will only release its erm once the split sstable is added into the set. This single-sstable split is going through run_custom_job(), which serializes with other maintenance tasks. That was a terrible decision, since the split may have to wait for ongoing maintenance task to finish, which means holding erm for longer. Additionally, if split monitor decides to run split on the entire compaction group, it can cause single-sstable split to be aborted since the former wants to select all sstables, propagating a failure to the streaming writer. That results in new sstable being leaked and may cause problems on restart, since the underlying tablet may have moved elsewhere or multiple splits may have happened. We have some fragility today in cleaning up leaked sstables on streaming failure, but this single-sstable split made it worse since the failure can happen during normal operation, when there's e.g. no I/O error. It makes sense to kill run_custom_job() usage, since the single-sstable split is offline and an extension of sstable writing, therefore it makes no sense to serialize with maintenance tasks. It must also inherit the sched group of the process writing the new sstable. The inheritance happens today, but is fragile. Fixes #20626. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `38ce2c605d`)	2024-09-25 02:13:42 +00:00
Raphael S. Carvalho	c0b2e89d35	replica: Fix tablet split execute after restart let's assume there are 2 nodes, n1, n2. n1 is the coordinator. 1) n1 emits split 2) n1 and n2 complete split work 3) n1 becomes aware all replicas are ready for split 4) n2 restarts, but places split sstable into main group[1] 5) n1 executes split 6) n2 handles split completion, but see the main group is not empty [1]: During split, main group should only contain unsplit sstables. If all sstables are split, main must be empty. This is a result of replica not setting storage group to split mode on restart (using tablet map) and therefore sstables are incorrectly placed on main group. The fix is about looking at tablet map and setting group to split mode before sstables are populated into it. Refs #20626. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `999f1f1318`)	2024-09-25 02:13:42 +00:00
Gleb Natapov	43f9b3b997	test: skip test_lwt_semaphore::test_cas_semaphore in aarch64 debug mode The test configures write timeout to much smaller value to make the test run faster since for some writes sleep is inserted to hit the timeout, but it makes aarch64 debug flaky since timeout happens when it should not because of a natural slowness. (cherry picked from commit `71a5b1c6dd`) Closes scylladb/scylladb#20777	2024-09-24 15:20:09 +02:00
Botond Dénes	7ed2f87414	Merge '[Backport 6.1] cql3: add option to not unify bind variables with the same' from Avi Kivity Bind variables in CQL have two formats: positional (?) where a variable is referred to by its relative position in the statement, and named (:var), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the dialect and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes https://github.com/scylladb/scylladb/issues/15559 This may be useful to users transitioning from Cassandra, so merits a backport. (cherry picked from commit `f9322799af`) (cherry picked from commit `d69bf4f010`) (cherry picked from commit `ea8441dfa3`) Refs https://github.com/scylladb/scylladb/pull/19493 Closes scylladb/scylladb#20590 * github.com:scylladb/scylladb: cql3: add option to not unify bind variables with the same name cql3: introduce dialect infrastructure cql3: prepared_statement_cache: drop cache key default constructor Merge 'config: round-trip boolean configuration variables' from Avi Kivity	2024-09-24 15:15:05 +03:00
Jenkins Promoter	f4ad3436cb	Update ScyllaDB version to: 6.1.3	2024-09-24 15:07:23 +03:00
Benny Halevy	d13c77e1eb	time_window_compaction_strategy: get_reshaping_job: restrict sort of multi_window vector to its size Currently the function calls boost::partial_sort with a middle iterator that might be out of bound and cause undefined behavior. Check the vector size, and do a partial sort only if its longer than `max_sstables`, otherwise sort the whole vector. Fixes scylladb/scylladb#20608 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `39ce358d82`) Closes scylladb/scylladb#20663	2024-09-23 15:38:35 +03:00
Piotr Dulikowski	bf6dd16071	Merge '[Backport 6.1] message/messaging_service: guard adding maintenance tenant under cluster feature' from Michał Jadwiszczak In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant $maintenance, but the change wasn't protected by any cluster feature. This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels. This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios: version without $maintenance tenant -> version with $maintenance tenant guarded by a feature version with $maintenance tenant but not guarded by a feature -> version with $maintenance tenant guarded by a feature The PR adds enabled flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection. The $maintenance tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled. Fixes https://github.com/scylladb/scylladb/issues/20070 Refs https://github.com/scylladb/scylla-enterprise/issues/4403 (cherry picked from commit `d44844241d`) (cherry picked from commit `71a03ef6b0`) (cherry picked from commit `b4b91ca364`) Refs https://github.com/scylladb/scylladb/pull/19802 Closes scylladb/scylladb#20674 * github.com:scylladb/scylladb: message/messaging_service: guard adding maintenance tenant under cluster feature message/messaging_service: add feature_service dependency message/messaging_service: add `enabled` flag to statement tenants	2024-09-23 13:18:45 +02:00
Botond Dénes	f987afb2e1	Merge '[Manual Backport 6.1] generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek This is a manual backport of #20212 to 6.1, superseding #20345 (which had run into conflicts). Please see the individual commit messages for backport notes. Fixes #10305 Closes scylladb/scylladb#20355 * github.com:scylladb/scylladb: generic_server: make server::stop() idempotent generic_server: coroutinize server::shutdown() generic_server: make server::shutdown() idempotent test/generic_server: add test case configure, cmake: sort the lists of boost unit tests generic_server: convert connection tracking to seastar::gate	2024-09-18 15:52:32 +03:00
Michał Jadwiszczak	7e14df5ba7	message/messaging_service: guard adding maintenance tenant under cluster feature Set `enabled` flag for `$maintenance` tenant to false and enable it when `MAINTENANCE_TENANT` feature is enabled. (cherry-picked from `b4b91ca364`)	2024-09-18 11:31:26 +02:00
Michał Jadwiszczak	d11df0fcbc	message/messaging_service: add feature_service dependency (cherry-picked from `71a03ef6b0`)	2024-09-18 11:26:56 +02:00
Michał Jadwiszczak	f928bb7967	message/messaging_service: add `enabled` flag to statement tenants Adding a new tenant needs to be done under cluster feature protection. However it wasn't the case for adding `$maintenance` statement tenant and to fix it we need to support an upgrade from node which doesn't know about maintenance tenant at all and from one which uses it without any cluster feature protection. This commit adds `enabled` flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection. (cherry-picked from `d44844241d`)	2024-09-18 11:23:02 +02:00
Tomasz Grabiec	edea822bd7	Merge '[Backport 6.1] tablets: Fix race between repair and split' from Raphael "Raph" Carvalho Consider the following: ``` T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes ``` If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes https://github.com/scylladb/scylladb/issues/19378. Fixes https://github.com/scylladb/scylladb/issues/19416. Please replace this line with justification for the backport/* labels added to this PR (cherry picked from commit `239344ab55`) (cherry picked from commit `74612ad358`) Refs https://github.com/scylladb/scylladb/pull/19427 Closes scylladb/scylladb#20595 * github.com:scylladb/scylladb: tablets: Fix race between repair and split compaction: Allow "offline" sstable to be split	2024-09-17 13:25:03 +02:00
Avi Kivity	fb98d6f832	Merge '[Backport 6.1] replica: ignore cleanup of deallocated storage group' from Aleksandra Martyniuk Cleanup of a deallocated tablet throws an exception. Since failed cleanup is retried, we end up in an infinite loop. Ignore cleanup of deallocated storage groups. Fixes: https://github.com/scylladb/scylladb/issues/19752. Needs to be backported to all branches with tablets (6.0 and later) (cherry picked from commit `20d6cf55f2`) (cherry picked from commit `2c4b1d6b45`) Refs https://github.com/scylladb/scylladb/pull/20584 Closes scylladb/scylladb#20627 * github.com:scylladb/scylladb: test: check if cleanup of deallocated sg is ignored replica: ignore cleanup of deallocated storage group	2024-09-17 12:22:00 +03:00
Gleb Natapov	d2e9007442	paxos_state: release semaphore units before checking if a semaphore can be dropped To drop a semaphore it should not be held by anyone, so we need to release out units before checking if a semaphore can be dropped. Fixes: scylladb/scylladb#20602 (cherry picked from commit `9cc54932ae`) Closes scylladb/scylladb#20621	2024-09-16 22:08:45 +03:00
Aleksandra Martyniuk	032c9146d5	test: check if cleanup of deallocated sg is ignored (cherry picked from commit `2c4b1d6b45`)	2024-09-16 16:22:29 +02:00
Aleksandra Martyniuk	120ff5aeb8	replica: ignore cleanup of deallocated storage group Currently, attempt to cleanup deallocated storage group throws an exception. Failed tablet cleanup is retried, stucking in an endless loop. Ignore cleanup of deallocated storage group. (cherry picked from commit `20d6cf55f2`)	2024-09-16 12:44:36 +00:00
Raphael S. Carvalho	fe56fa39c0	tablets: Fix race between repair and split Consider the following: T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes #19378. Fixes #19416. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `74612ad358`)	2024-09-13 21:32:01 -03:00
Avi Kivity	8ddfd0d70d	cql3: add option to not unify bind variables with the same name Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes #15559 (cherry picked from commit `ea8441dfa3`) (cherry picked from commit `edb3068ecf`)	2024-09-13 18:17:15 +03:00
Avi Kivity	92dd47c6d6	cql3: introduce dialect infrastructure A dialect is a different way to interpret the same CQL statement. Examples: - how duplicate bind variable names are handled (later in this series) - whether `column = NULL` in LWT can return true (as is now) or whether it always returns NULL (as in SQL) Currently, dialect is an empty structure and will be filled in later. It is passed to query_processor methods that also accept a CQL string, and from there to the parser. It is part of the prepared statement cache key, so that if the dialect is changed online, previous parses of the statement are ignored and the statement is prepared again. The patch is careful to pick up the dialect at the entry point (e.g. CQL protocol server) so that the dialect doesn't change while a statement is parsed, prepared, and cached. (cherry picked from commit `d69bf4f010`)	2024-09-13 18:11:11 +03:00
Avi Kivity	4bf81f54b4	cql3: prepared_statement_cache: drop cache key default constructor It's unnecessary, and interferes with the following patch where we change the cache key type. (cherry picked from commit `f9322799af`)	2024-09-13 17:56:06 +03:00
Nadav Har'El	d9ba5423bb	Merge 'config: round-trip boolean configuration variables' from Avi Kivity When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in both directions. Not a regression, so a backport isn't strictly necessary. Closes scylladb/scylladb#19792 * github.com:scylladb/scylladb: config: specialize from-string conversion for bool config: wrap boost::lexical_cast<> when converting from strings (cherry picked from commit `9eb47b3ef0`)	2024-09-13 17:54:37 +03:00
Piotr Smaron	b60f9ef4c2	cql: fix exception when validating KS in CREATE TABLE `c70f321c6f` added an extra check if KS exists. This check can throw `data_dictionary::no_such_keyspace` exception, which is supposed to be caught and a more user-friendly exception should be thrown instead. This commit fixes the above problem and adds a testcase to validate it doesn't appear ever again. Also, I moved the check for the keyspace outside of the `for` loop, as it doesn't need to be checked repeatedly. Additionally, I added an extra comment to both `no_such_keyspace` and `no_such_column_family` exceptions explaining they should not be returned directly to the caller, as they lack error code, which may not trigger correct exceptions handling mechanisms on the driver side. Fixes: #20097 (cherry picked from commit `f1e8976fbe`) Closes scylladb/scylladb#20553	2024-09-13 11:36:51 +03:00
Piotr Dulikowski	00e96d4b70	Merge '[Backport 6.1]: hints: send hints with CL=ALL if target is leaving' from Piotr Dulikowski Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylladb#20558 Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835 This is a backport of the original PR without the tests, done avoid the need of resolving merge conflicts in that area. Closes scylladb/scylladb#20557 * github.com:scylladb/scylladb: hints: send hints with CL=ALL if target is leaving hints: inline do_send_one_mutation	2024-09-13 09:39:36 +02:00
Abhi	848054079b	raft: Add descriptions for requested abort errors Fixes: scylladb/scylladb#18902 This PR only improves error messages, no need to backport it. (cherry picked from commit `9b09439065`) Closes scylladb/scylladb#20526	2024-09-13 10:13:49 +03:00
Botond Dénes	c80cefe422	docs/cql/ddl.rst: fix description of sstable_compression ScyllaDB doesn't support custom compressors. The available compressors are the only available ones, not the default ones. Adjust the text to reflect this. (cherry picked from commit `08f109724b`) Closes scylladb/scylladb#20524	2024-09-13 10:12:59 +03:00
Takuya ASADA	b07c74a65c	install.sh: fix more incorrect permission on strict umask Even after `13caac7`, we still have more files incorrect permission, since we use "cp -r" and creating new file with redirect. To fix this, we need to replace "cp -r" with "cp -pr", and "chmod <perm>" on newly created files. Fixes #14383 Related #19775 (cherry picked from commit `9d7fed40b5`) Closes scylladb/scylladb#20432	2024-09-13 10:12:22 +03:00
Piotr Dulikowski	2556c7a0dc	hints: send hints with CL=ALL if target is leaving Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destiantion of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835 (cherry picked from commit `61ac0a336d`)	2024-09-12 10:55:29 +02:00
Piotr Dulikowski	132d77f447	hints: inline do_send_one_mutation It's a small method and it is only used once in send_one_mutation. Inlining it lets us get rid of its declaration in the header - now, if one needs to change the variables passed from one function to another, it is no longer necessary to change the header. (cherry picked from commit `8abb06ab82`)	2024-09-12 10:55:21 +02:00
Gleb Natapov	bb9249f055	db/consistency_level: do not use result from hit weighted load balancer if it contains duplicates Because of https://github.com/scylladb/scylladb/issues/9285 hit weighted load balancer may sometimes return same node twice. It may cause wrong data to be read or unexpected errors to be returned to a client. Since the original bug is not easy to fix and it is rare lets introduce a workaround. We will check for duplicates and will use non HWLB one if one is found. (cherry picked from commit `e06a772b87`) Closes scylladb/scylladb#20468	2024-09-10 17:18:47 +03:00
Kamil Braun	e4a18b0858	test: test_raft_no_quorum: increase raft timeout in debug mode The test cases in this file use an error injection to reduce raft group 0 timeouts (from the default 1 minute), in order to speed up the tests; the scenarios expect these timeouts to happen, so we want them to happen as quick as possible, but we don't want to reduce timeouts so much that it will make other operations fail when we don't expect them to (e.g. when the test wants to add a node to the cluster). Unfortunately the selected 5 seconds in debug mode was not enough and made the tests flaky: scylladb/scylladb#20111. Increase it to 10 seconds. This unfortunately will slow down these tests as they have to sometimes wait for 10 seconds for the timeout to happen. But better to have this than a flaky test. Fixes: scylladb/scylladb#20111 (cherry picked from commit `52fdf5b4c9`) Closes scylladb/scylladb#20477	2024-09-10 08:48:06 +03:00
Kefu Chai	105293b2ab	docs: do not install scylla/ppa repo when perform upgrade for following reasons: 1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing. 2. the ppa in question does not provide the packages used in production. it does provides the package for building scylla 3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages. so, in this change, we remove all references to enabling the Scylla/PPA repository. Fixes scylladb/scylladb#20449 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `fe0e961856`) Closes scylladb/scylladb#20453	2024-09-10 08:46:47 +03:00
Nadav Har'El	ad47c0e2f9	alternator ttl: fix use-after-free The Alternator TTL scanning code uses an object "scan_ranges_context" to hold the scanning context. One of the members of this object is a service::query_state, and that in turn holds a reference to a service::client_state. The existing constructor created a temporary client_state object and saved a reference to it - which can result in use after free as the temporary object is freed as soon as the constructor ends. The fix is to save a client_state in the scan_ranges_context object, instead of a temporary object. Fixes #19988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `15f8046fcb`) Closes scylladb/scylladb#20436	2024-09-10 08:43:14 +03:00
Kefu Chai	0eb66cbee5	sstables: correct the debugging message printed when removing temp dir in `372a4d1b79`, we introduced a change which was for debugging the logging message. but the logging message intended for printing the temp_dir not prints an `optional<int>`. this is both confusing, and more importantly, it hurts the debuggability. in this change, the related change is reverted. Fixes scylladb/scylladb#20408 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `d26bb9ae30`) Closes scylladb/scylladb#20434	2024-09-10 08:42:29 +03:00
Kefu Chai	a2458f07d7	dist: drop %pretrans section before this change, if user does not have `/bin/sh` around, when installing scylla packages, the script in `%pretrans" is executed, and fails due to missing `/bin/sh`. per https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans > Note that the %pretrans scriptlet will, in the particular case of > system installation, run before anything at all has been installed. > This implies that it cannot have any dependencies at all. For this > reason, %pretrans is best avoided, but if used it MUST (by necessity) > be written in Lua. See > https://rpm-software-management.github.io/rpm/manual/lua.html for more > information. but we were trying to warn users upgrading from scylla < 1.7.3, which was released 7 years ago at the time of writing. in this change, we drop the `%pretrans` section. hopefuly they will find their way out if they still exist. Fixes scylladb/scylladb#20321 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `6970c502c9`) Closes scylladb/scylladb#20384	2024-09-10 08:40:11 +03:00
Avi Kivity	b484effcad	docs: cql: document ZstdCompressor for CREATE TABLE Adjust the wording slightly to be less awkward. (cherry picked from commit `60acfd8c08`) Closes scylladb/scylladb#20380	2024-09-10 08:39:08 +03:00
Raphael S. Carvalho	4c4d1cce14	storage_service: avoid processing same table unnecessarily in split monitor If there's a token metadata for a given table, and it is in split mode, it will be registered such that split monitor can look at it, for example, to start split work, or do nothing if table completed it. during topology change, e.g. drain, split is stalled since it cannot take over the state machine. It was noticed that the log is being spammed with a message saying the table completed split work, since every tablet metadata update, means waking up the monitor on behalf of a table. So it makes sense to demote the logging level to debug. That persists until drain completes and split can finally complete. Another thing that was noticed is that during drain, a table can be submitted for processing faster than the monitor can handle, so the candidate queue may end up with multiple duplicated entries for same table, which means unnecessary work. That is fixed by using a sequenced set, which keeps the current FIFO behavior. Fixes #20339. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `26facd807e`) Closes scylladb/scylladb#20343	2024-09-10 08:37:20 +03:00
Botond Dénes	c64ae3f839	Merge '[Backport 6.1] repair: throw if batchlog manager isn't initialized' from ScyllaDB repair_service::repair_flush_hints_batchlog_handler may access batchlog manager while it is uninitialized. Throw if batchlog manager isn't initialized. Fixes: #20236. Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access. (cherry picked from commit `d8e4393418`) (cherry picked from commit `f38bb6483a`) Refs #20251 Closes scylladb/scylladb#20351 * github.com:scylladb/scylladb: test: add test to ensure repair won't fail with uninitialized bm repair: throw if batchlog manager isn't initialized	2024-09-04 07:02:18 +03:00
Kamil Braun	f77686cefb	Merge '[Backport 6.1] Fix node replace with inter-dc encryption enabled.' from Gleb Natapov Currently if a coordinator and a node being replaced are in the same DC while inter-dc encryption is enabled (connections between nodes in the same DC should not be encrypted) the replace operation will fail. It fails because a coordinator uses non encrypted connection to push raft data to the new node, but the new node will not accept such connection until it knows which DC the coordinator belongs to and for that the raft data needs to be transferred. The series adds the test for this scenario and the fix for the chicken&egg problem above. The series (or at least the fix itself) is needs to be backported because this is a serious regression. Fixes: https://github.com/scylladb/scylladb/issues/19025 (cherry picked from commit `84757a4ed3`) (cherry picked from commit `b98282a976`) (cherry picked from commit `2f1b1fd45e`) (cherry picked from commit `17f4a151ce`) (cherry picked from commit `32a59ba98f`) Refs https://github.com/scylladb/scylladb/pull/20290 Closes scylladb/scylladb#20374 * github.com:scylladb/scylladb: topology coordinator: fix indentation after the last patch topology coordinator: do not add replacing node without a ring to topology test: add test for replace in clusters with encryption enabled test.py: add server encryption support to cluster manager .gitignore: fix pattern for resources to match only one specific directory	2024-09-02 16:14:37 +02:00
Gleb Natapov	d6a1a55d6c	topology coordinator: fix indentation after the last patch (cherry picked from commit `32a59ba98f`)	2024-09-01 11:57:34 +03:00
Gleb Natapov	9db819763b	topology coordinator: do not add replacing node without a ring to topology When only inter dc encryption is enabled a non encrypted connection between two nodes is allowed only if both nodes are in the same dc. If a nodes that initiates the connection knows that dst is in the same dc and hence use non encrypted connection, but the dst not yet knows the topology of the src such connection will not be allowed since dst cannot guaranty that dst is in the same dc. Currently, when topology coordinator is used, a replacing node will appear in the coordinator's topology immediately after it is added to the group0. The coordinator will try to send raft message to the new node and (assuming only inter dc encryption is enabled and replacing node and the coordinator are in the same dc) it will try to open regular, non encrypted, connection to it. But the replacing node will not have the coordinator in it's topology yet (it needs to sync the raft state for that). so it will reject such connection. To solve the problem the patch does not add a replacing node that was just added to group0 to the topology. It will be added later, when tokens will be assigned to it. At this point a replacing node will already make sure that its topology state is up-to-date (since it will execute a raft barrier in join_node_response_params handler) and it knows coordinator's topology. This aligns replace behaviour with bootstrap since bootstrap also does not add a node without a ring to the topology. The patch effectively reverts `b8ee8911ca` Fixes: scylladb/scylladb#19025 (cherry picked from commit `17f4a151ce`)	2024-09-01 11:57:25 +03:00
Gleb Natapov	4769e694d1	test: add test for replace in clusters with encryption enabled (cherry picked from commit `2f1b1fd45e`)	2024-09-01 11:56:37 +03:00
Gleb Natapov	74012c562a	test.py: add server encryption support to cluster manager (cherry picked from commit `b98282a976`)	2024-09-01 11:56:25 +03:00
Gleb Natapov	51215fb7f7	.gitignore: fix pattern for resources to match only one specific directory (cherry picked from commit `84757a4ed3`)	2024-09-01 11:54:42 +03:00
Laszlo Ersek	370bf14872	generic_server: make server::stop() idempotent After server::shutdown(), make server::stop() more robust too, by allowing callers (internal or external) to call it several times (not concurrently though, just yet; see <https://github.com/scylladb/scylladb/issues/20309>). Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `49bff3b1ab`)	2024-08-30 16:17:44 +02:00
Laszlo Ersek	860a1872bc	generic_server: coroutinize server::shutdown() By turning server::shutdown() into a coroutine, we need not dynamically allocate "nr_conn". Verified as follows: (1) In terminal #1: build/Dev/scylla --overprovisioned --developer-mode=yes \ --memory=2G --smp=1 --default-log-level error \ --logger-log-level cql_server=debug:cql_server_controller=debug > INFO [...] cql_server_controller - Starting listening for CQL clients > on 127.0.0.1:9042 (unencrypted, > non-shard-aware) > INFO [...] cql_server_controller - Starting listening for CQL clients > on 127.0.0.1:19042 (unencrypted, > shard-aware) (2) In terminals #2 and #3: tools/cqlsh/bin/cqlsh.py (3) Press ^C in terminal #1: > DEBUG [...] cql_server - abort accept nr_total=2 > DEBUG [...] cql_server - abort accept 1 out of 2 done > DEBUG [...] cql_server - abort accept 2 out of 2 done > DEBUG [...] cql_server - shutdown connection nr_total=4 > DEBUG [...] cql_server - shutdown connection 1 out of 4 done > DEBUG [...] cql_server - shutdown connection 2 out of 4 done > DEBUG [...] cql_server - shutdown connection 3 out of 4 done > DEBUG [...] cql_server - shutdown connection 4 out of 4 done > INFO [...] cql_server_controller - CQL server stopped This patch is best viewed with "git show --word-diff=color". Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `1138347e7e`)	2024-08-30 16:17:44 +02:00
Laszlo Ersek	9e224136ab	generic_server: make server::shutdown() idempotent Make server::shutdown() more robust by allowing callers (internal or external) to call it several times (not concurrently though, just yet; see <https://github.com/scylladb/scylladb/issues/20309>). Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `2216275ebd`)	2024-08-30 16:17:44 +02:00
Laszlo Ersek	16321fc243	test/generic_server: add test case Check whether we can stop a generic server without first asking it to listen. The test fails currently; the failure mode is a hang, which triggers the 5 minute timeout set in the test: > unknown location(0): fatal error: in "stop_without_listening": > seastar::timed_out_error: timedout > seastar/src/testing/seastar_test.cc(43): last checkpoint > test/boost/generic_server_test.cc(34): Leaving test case > "stop_without_listening"; testing time: 300097447us Backport notes for 6.1: - Replace #include "utils/assert.hh" SCYLLA_ASSERT(false); with #include <cassert> assert(false); due to 6.1 lacking commit `aa1270a00c` ("treewide: change assert() to SCYLLA_ASSERT()", 2024-08-05). The header file "utils/assert.hh" wouldn't be difficult to backport, but separating it from the treewide changes in commit `aa1270a00c` might not be the best idea. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `dbc0ca6354`)	2024-08-30 16:17:44 +02:00
Laszlo Ersek	8f0f362a30	configure, cmake: sort the lists of boost unit tests Both lists were obviously meant to be sorted originally, but by today we've introduced many instances of disorder -- thus, inserting a new test in the proper place leaves the developer scratching their head. Sort both lists. Backport notes for 6.1: - Conflicts in "configure.py", unsurprisingly. For the backport, I sorted the boost unit test list manually, from scratch. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `931f2f8d73`)	2024-08-30 16:16:53 +02:00
Laszlo Ersek	a8131a99ed	generic_server: convert connection tracking to seastar::gate If we call server::stop() right after "server" construction, it hangs: With the server never listening (never accepting connections and never serving connections), nothing ever calls server::maybe_stop(). Consequently, co_await _all_connections_stopped.get_future(); at the end of server::stop() deadlocks. Such a server::stop() call does occur in controller::do_start_server() [transport/controller.cc], when - cserver->start() (sharded<cql_server>::start()) constructs a "server"-derived object, - start_listening_on_tcp_sockets() throws an exception before reaching listen_on_all_shards() (for example because it fails to set up client encryption -- certificate file is inaccessible etc.), - the "deferred_action" cserver->stop().get(); is invoked during cleanup. (The cserver->stop() call exposing the connection tracking problem dates back to commit `ae4d5a60ca` ("transport::controller: Shut down distributed object on startup exception", 2020-11-25), and it's been triggerable through the above code path since commit `6b178f9a4a` ("transport/controller: split configuring sockets into separate functions", 2024-02-05).) Tracking live connections and connection acceptances seems like a good fit for "seastar::gate", so rewrite the tracking with that. "seastar::gate" can be closed (and the returned future can be waited for) without anyone ever having entered the gate. NOTE: this change makes it quite clear that neither server::stop() nor server::shutdown() must be called multiple times. The permitted sequences are: - server::shutdown() + server::stop() - or just server::stop(). Fixes #10305 Backport notes for 6.1: - Conflict in "generic_server.hh", due to 6.1 not having commit `324b3c43c0` ("generic_server: use async function in `for_each_gently()`", 2024-08-08), which is part of the feature series "service levels: update connections parameters automatically" <https://github.com/scylladb/scylladb/pull/19085>. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> (cherry picked from commit `5a04743663`)	2024-08-30 16:03:51 +02:00
Aleksandra Martyniuk	93fbe3af12	test: add test to ensure repair won't fail with uninitialized bm (cherry picked from commit `f38bb6483a`)	2024-08-30 13:55:48 +00:00
Aleksandra Martyniuk	b164ea4a68	repair: throw if batchlog manager isn't initialized repair_service::repair_flush_hints_batchlog_handler may access batchlog manager while it is uninitialized. Batchlog manager cannot be initialized before repair as we have the dependencies chain: repair_service -> storage_service::join_cluster -> batchlog_manager. Throw if batchlog manager isn't initialized. That won't cause repair to fail. (cherry picked from commit `d8e4393418`)	2024-08-30 13:55:48 +00:00
Jenkins Promoter	2db808e364	Update ScyllaDB version to: 6.1.2	2024-08-29 15:13:24 +03:00
Botond Dénes	e6d2d29dd1	Merge '[Backport 6.1] repair: do_rebuild_replace_with_repair: use source_dc only when safe' from ScyllaDB It is unsafe to restrict the sync nodes for repair to the source data center if it has too low replication factor in network_topology_replication_strategy, or if other nodes in that DC are ignored. Also, this change restricts the usage of source_dc to `network_topology` and `everywhere_topology` strategies, as with simple replication strategy there is no guarantee that there would be any more replicas in that data center. Fixes #16826 Reproducer submitted as https://github.com/scylladb/scylla-dtest/pull/3865 It fails without this fix and passes with it. * Requires backport to live versions. Issue hit in the filed with 2022.2.14 (cherry picked from commit `8b1877f3ca`) (cherry picked from commit `0419b1d522`) (cherry picked from commit `b5d0ab092c`) (cherry picked from commit `9729dd21c3`) (cherry picked from commit `8665eef98c`) (cherry picked from commit `5f655e41e3`) Refs #16827 Closes scylladb/scylladb#20228 * github.com:scylladb/scylladb: raft_rebuild: propagate source_dc force option to rebuild_option repair: do_rebuild_replace_with_repair: use source_dc only when safe repair: replace_with_repair: pass the replace_node downstream repair: replace_with_repair: pass ignore_nodes as a set of host_id:s repair: replace_rebuild_with_repair: pass ks_erms from caller nodetool: rebuild: add force option Add and use utils::optional_param to pass source_dc	2024-08-29 07:35:05 +03:00
Lakshmi Narayanan Sreethar	01661e1eaa	test/pylib: fix keyspace_compaction method The `keyspace_compaction` method incorrectly appends the column family parameter to the URL using a regular string, `"?cf={table}"`, instead of an f-string, `f"?cf={table}"`. As a result, the column family name is sent as `{table}` to the server, causing the compaction request to fail. Fix this issue by passing the parameter to the POST request using a dictionary instead of appending it to the URL. Fixes #20264 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `dc5c45e803`) Closes scylladb/scylladb#20273	2024-08-28 20:08:58 +03:00
Botond Dénes	6232982772	Merge '[Backport 6.1] select from mutation_fragments() + tablets: handle reads for non-owned partitions' from ScyllaDB Attempting to read a partition via `SELECT * FROM MUTATION_FRAGMENTS()`, which the node doesn't own, from a table using tablets causes a crash. This is because when using tablets, the replica side simply doesn't handle requests for un-owned tokens and this triggers a crash. We should probably improve how this is handled (an exception is better than a crash), but this is outside the scope of this PR. This PR fixes this and also adds a reproducer test. Fixes: https://github.com/scylladb/scylladb/issues/18786 Fixes a regression introduced in 6.0, so needs backport to 6.0 and 6.1 (cherry picked from commit `de5329157c`) (cherry picked from commit `46563d719f`) (cherry picked from commit `4e2d7aa2a2`) Refs #20109 Closes scylladb/scylladb#20313 * github.com:scylladb/scylladb: test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works replica/mutation_dump: enfore pinning of effective replication map replica/mutation_dump: handle un-owned tokens (with tablets)	2024-08-28 06:23:45 +03:00
Botond Dénes	6418787ee0	Merge '[Backport 6.1] Make Summary support histogram with infinite bucket vlaues' from ScyllaDB This series fixes an issue where histogram Summaries return an infinite value. It updated the quantile calculation logic to address cases where values fall into the infinite bucket of a histogram. Now, instead of returning infinite (max int), the calculation will return the last bucket limit, ensuring finite outputs in all cases. The series adds a test for summaries with a specific test case for this scenario. Fixes #20255 Need backport to 6.0, 6.1 and 2023.1 and above (cherry picked from commit `011aa91a8c`) (cherry picked from commit `644e6f0121`) Refs #20257 Closes scylladb/scylladb#20303 * github.com:scylladb/scylladb: test/estimated_histogram_test Add summary tests utils/histogram.hh: Make summary support inifinite bucket.	2024-08-28 06:23:03 +03:00
Botond Dénes	06d6cf5608	Merge '[Backport 6.1] abstract_replication_strategy: make get_ranges async' from ScyllaDB To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 (cherry picked from commit `d385219a12`) (cherry picked from commit `333c0d7c88`) (cherry picked from commit `b2abbae24b`) (cherry picked from commit `824bdf99d2`) (cherry picked from commit `ea5a0cca10`) (cherry picked from commit `2bbbe2a8bc`) (cherry picked from commit `686a8f2939`) Refs #19758 Closes scylladb/scylladb#20297 * github.com:scylladb/scylladb: abstract_replication_strategy: make get_ranges async database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param compaction: task_manager_module: open code maybe_get_keyspace_local_ranges alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder alternator: ttl: can pass const gms::gossiper& to ranges_holder alternator: ttl: ranges_holder_primary: unconstify _token_ranges member alternator: ttl: refactor token_ranges_owned_by_this_shard	2024-08-28 06:22:33 +03:00
Botond Dénes	1f8d8fd3db	Merge '[Backport 6.1] replica: fix copy constructor of tablet_sstable_set' from ScyllaDB Commit `9f93dd9fa3` changed `tablet_sstable_set::_sstable_sets` to be a `absl::flat_hash_map` and in addition, `std::set<size_t> _sstable_set_ids` was added. `_sstable_set_ids` is set up in the `tablet_sstable_set(schema_ptr s, const storage_group_manager& sgm, const locator::tablet_map& tmap)` constructor, but it is not copied in `tablet_sstable_set(const tablet_sstable_set& o)`. This affects the `tablet_sstable_set::tablet_sstable_set` method as it depends on the copy constructor. Since sstable set can be cloned when a new sstable set is added, the issue will cause ids not being copied into the new sstable set. It's healed only after compaction, since the sstable set is rebuilt from scratch there. This PR fixes this issue by removing the existing copy constructor of `tablet_sstable_set` to enable the implicit default copy constructor. Fixes #19519 (cherry picked from commit `44583eed9e`) (cherry picked from commit `ec47b50859`) Refs #20115 Closes scylladb/scylladb#20201 * github.com:scylladb/scylladb: boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor replica: fix copy constructor of tablet_sstable_set	2024-08-28 06:20:12 +03:00
Pavel Emelyanov	bc03d13c76	test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works Currently it doesn't, one of the node crashes with std::out_of_range exception and meaningless calltrace [Botond]: this test checks the case of reading a partition via MUTATION_FRAGMENTS from a node which doesn't own said partition. refs: #18786 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `4e2d7aa2a2`)	2024-08-27 23:43:14 +00:00
Botond Dénes	4b4dbc1112	replica/mutation_dump: enfore pinning of effective replication map By making it a required argument, making sure the topology version is pinned for the duration of the query. This is needed because mutation dump queries bypass the storage proxy, where this pinning usually takes place. So it has to be enforced here. (cherry picked from commit `46563d719f`)	2024-08-27 23:43:14 +00:00
Botond Dénes	739be17801	replica/mutation_dump: handle un-owned tokens (with tablets) When using tablets, the replica-side doesn't handle un-owned tokens. table::shard_for_reads() will just return 0 for un-owned tokens, and a later attempt at calling table::storage_group_for_token() with said un-owned token will cause a crash (std::terminate due to std::out_of_range thrown in noexcept context). The replicas rely on the coordinator to not send stray requests, but for select from mutation_fragments(table) queries, there is no coordinator side who could do the correct dispatching. So do this in mutation_dump(), just creating empty readers for un-owned tokens. (cherry picked from commit `de5329157c`)	2024-08-27 23:43:14 +00:00
Tomasz Grabiec	7fc15ce200	Merge '[Backport 6.1] schema_tables: calculate_schema_digest: prevent stalls due to large m…' from ScyllaDB …utations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 (cherry picked from commit `95a5fba0ea`) (cherry picked from commit `52234214e5`) Refs #18174 Closes scylladb/scylladb#20246 * github.com:scylladb/scylladb: schema_tables: calculate_schema_digest: filter the key earlier schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector	2024-08-27 21:42:35 +02:00
Benny Halevy	164d58b0d5	raft_rebuild: propagate source_dc force option to rebuild_option Currently, the `force` property of the `source_dc` rebuild option is lost and `raft_topology_cmd_handler` has no way to know if it was given or not. This in turn can cause rebuild to fail, even when `--force` is set by the user, where it would succeed with gossip topology changes, based on the source_dc --force semantics. \Fixes scylladb/scylladb#20242 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> \Closes scylladb/scylladb#20249 (cherry picked from commit `18c45f7502`) Closes scylladb/scylladb#20311	2024-08-27 22:20:48 +03:00
Aleksandra Martyniuk	0839df3dbf	replica: add/remove table atomically Currently, database::tables_metadata::add_table needs to hold a write lock before adding a table. So, if we update other classes keeping track of tables before calling add_table, and the method yields, table's metadata will be inconsistent. Set all table-related info in tables_metadata::add_table_helper (called by add_table) so that the operation is atomic. Analogically for remove_table. Fixes: #19833. (cherry picked from commit `483d89ed6d`) Closes scylladb/scylladb#20244	2024-08-27 20:46:48 +03:00
Amnon Heiman	64befbca61	test/estimated_histogram_test Add summary tests This patch adds tests for summary calculation. It adds two tests, the first is a basic calculation for P50, P95, P99 by adding 100 elements into 20 buckets. The second test look that if elements are found in the infinite bucket, the result would be the lower limit (33s) and not infinite. Relates to #20255 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `644e6f0121`)	2024-08-27 12:12:39 +00:00
Amnon Heiman	8ee09f4353	utils/histogram.hh: Make summary support inifinite bucket. This patch handles an edge cases related to The infinite bucket limit. Summaries are the P50, P95, and P99 quantiles. The quantiles are calculated from a histogram; we find the bucket and return its upper limit. In classic histograms, there is a notion of the infinite bucket; anything that does not fall into the last bucket is considered to be infinite; with quantile, it does not make sense. So instead of reporting infinite we'll report the bucket lower limit. Fixes #20255 Signed-off-by: Amnon Heiman <amnon@scylladb.com> (cherry picked from commit `011aa91a8c`)	2024-08-27 12:12:39 +00:00
Botond Dénes	e84d8b1205	Merge '[Backport 6.1] cql: process LIMIT for GROUP BY select queries' from ScyllaDB This change fixes #17237, fixes #5361 and fixes #5362 by passing the limit value down the call chain in cql3. A test is also added. fixes: #17237 fixes: #5361 fixes: #5362 The regression happened in 5.4 as we changed the way GROUP BY is processed in `432cb02` - to force aggregation when it is used. The LIMIT value was not passed to aggregations and thus we failed to adhere to it. W want to backport this fix to 5.4 and 6.0 to have continuous correct results for the test case from #17237 This patch consists of 4 commits: - fa4225ea0fac2057b7a9976f57dc06bcbd900cd4 - cql3: respect the user-defined page size in aggregate queries - a precondition for this patch to be implementable - 8fbe69e74dca16ed8832d9a90489ca47ba271d0b - cql3/select_statement: simplify the get_limit function - the `do_get_limit()` function did a lot of legwork that should not be associated with it. This change makes it trivial and makes its callers do additional checks (for unset guards, or for an aggregate query) - 162828194a2b88c22fbee335894ff045dcc943c9 - cql3: process LIMIT for GROUP BY queries - pass the limit value down the chain and make use of it. This is the actual fix to #17237 - b3dc6de6d6cda8f5c09b01463bb52f827a6a00b4 - test/cql-pytest: Add test for GROUP BY queries with LIMIT - tests (cherry picked from commit `08f3219cb8`) (cherry picked from commit `3838ad64b3`) (cherry picked from commit `e7ae7f3662`) (cherry picked from commit `9db272c949`) Refs: #18842 Closes scylladb/scylladb#20154 * github.com:scylladb/scylladb: test/cql-pytest: Add test for GROUP BY queries with LIMIT cql3: process LIMIT for GROUP BY queries cql3/select_statement: simplify the get_limit function cql3: respect the user-defined page size in aggregate queries	2024-08-27 14:52:18 +03:00
Benny Halevy	6692c1702d	abstract_replication_strategy: make get_ranges async To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `686a8f2939`)	2024-08-26 21:50:39 +00:00
Benny Halevy	415bdf3160	database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param Prepare for making the function async. Then, it will need to hold on to the erm while getting the token_ranges asynchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `2bbbe2a8bc`)	2024-08-26 21:50:39 +00:00
Benny Halevy	6b2d0f5934	compaction: task_manager_module: open code maybe_get_keyspace_local_ranges It is used only here and can be simplified by checking if the keyspace replication strategy is per table by the caller. Prepare for making get_keyspace_local_ranges async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ea5a0cca10`)	2024-08-26 21:50:39 +00:00
Benny Halevy	0f990a8dc5	alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder Add static `make` methods to ranges_holder_{primary,secondary} and use them to make the ranges objects and pass them to `token_ranges_owned_by_this_shard`, rather than letting token_ranges_owned_by_this_shard invoke the right constructor of the ranges_holder class. Prepare for making `make` async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `824bdf99d2`)	2024-08-26 21:50:39 +00:00
Benny Halevy	5f8b199253	alternator: ttl: can pass const gms::gossiper& to ranges_holder There's no need to pass a mutable reference to the gossiper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b2abbae24b`)	2024-08-26 21:50:38 +00:00
Benny Halevy	2288f98d83	alternator: ttl: ranges_holder_primary: unconstify _token_ranges member To allow the class to be nothrow_move_constructable. Prepare for returning it as a future value. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `333c0d7c88`)	2024-08-26 21:50:38 +00:00
Benny Halevy	3ed214a728	alternator: ttl: refactor token_ranges_owned_by_this_shard Rather than holding a variant member (and defining both ranges_holder_{primary,secondary} in both specilizations of the class, just make the internal ranges_holder class first-class citizens and parameterize the `token_ranges_owned_by_this_shard` template by this class type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `d385219a12`)	2024-08-26 21:50:38 +00:00
Michał Jadwiszczak	b7e6f22999	cql3/statements/create_service_level: forbid creating SL starting with `$` Tenant names starting with `$` are reserved for internal ones. Forbid creating new service level which name starts with `$` and log a warning for existing service levels with `$` prefix. (cherry picked from commit `d729d1b272`) Closes scylladb/scylladb#20156	2024-08-26 13:03:16 +03:00
Benny Halevy	31f3ff37f4	schema_tables: calculate_schema_digest: filter the key earlier Currently, each frozen mutation we get from system_keyspace::query_mutations is unfrozen in whole to a mutation and only then we check its key with the provided `accept_keyspace` function. This is wasteful, since they key can be processed directly form the frozen mutation, before taking the toll of unfreezing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `52234214e5`)	2024-08-22 09:06:26 +00:00
Benny Halevy	828595786a	schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `95a5fba0ea`)	2024-08-22 09:06:25 +00:00
Benny Halevy	fdbb0cdef3	repair: do_rebuild_replace_with_repair: use source_dc only when safe It is unsafe to restrict the sync nodes for repair to the source data center if we cannot guarantee a quorum in the data center with network-topology replication strategy. This change restricts the usage of source_dc in the following cases: 1. For SimpleStrategy - source_dc is ignored since there is no guarantee that it contains remaining replicas for all tokens. 2. For EverywhereStrategy - use source_dc if there are remaining live nodes in the datacenter. 3. For NetworkTopologyStrategy: a. It is considered unsafe to use source_dc if number of nodes lost in that DC (replaced/rebuilt node + additional ignored nodes) is greater than 1, or it has 1 lost node and rf <= 1 in the DC. b. If the source_dc arg is forced, as with the new `nodetool rebuild --force <source_dc>` option, we use it anyway, even if it's considered to be unsafe. A warning is printed in this case. c. If the source_dc arg is user-provided, (using nodetool rebuild), an error exception is thrown, advising to use an alternative dc, if available, omit source_dc to sync with all nodes, or use the --force option to use the given source_dc anyhow. d. Otherwise, we look for an alternative source datacenter, that has not lost any node. If such datacenter is found we use it as source_dc for the keyspace, and log a warning. e. If no alternative dc is found (and source_dc is implicit), then: log a warning and fall back to using replicas from all nodes in the cluster. Fixes #16826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `5f655e41e3`)	2024-08-21 16:09:25 +03:00
Benny Halevy	912c46e07f	repair: replace_with_repair: pass the replace_node downstream To be used by the next path to count how many nodes are lost in each datacenter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `8665eef98c`)	2024-08-21 15:49:39 +03:00
Benny Halevy	e80c587da3	repair: replace_with_repair: pass ignore_nodes as a set of host_id:s The callers already pass ignore_nodes as host_id:s and we translate them into inet_address only for repair so delay the translation as much as posible, Refs scylladb/scylladb#6403 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `9729dd21c3`)	2024-08-21 15:41:42 +03:00
Benny Halevy	485a508cb3	repair: replace_rebuild_with_repair: pass ks_erms from caller The keyspaces replication maps must be in sync with the token_metadata_ptr passed already to the functions, so instead of getting it in the callee, let the caller get the ks_erms along with retrieving the tmptr. Note that it's already done on the rebuild path for streaming based rebuild. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b5d0ab092c`)	2024-08-21 14:42:09 +03:00
Anna Stuchlik	1683b07d2e	doc: extract the info about tablets defaut to a separate file This commit extracts the information about the default for tables in keyspace creation to a separate file in the _common folder. The file is then included using the scylladb_include_flag directive. The purpose of this commit is to make it possible to include a different file in the scylla-enterprise repo - with a different default. Refs https://github.com/scylladb/scylla-enterprise/issues/4585 (cherry picked from commit `107708434c`) Closes scylladb/scylladb#20220	2024-08-21 11:07:19 +03:00
David Garcia	853d2ec76f	docs: improve include flag directive The include flag directive now treats missing content as info logs instead of warnings. This prevents build failures when the enterprise-specific content isn't yet available. If the enterprise content is undefined, the directive automatically loads the open-source content. This ensures the end user has access to some content. address comments (cherry picked from commit `30887d096f`) Closes scylladb/scylladb#20226	2024-08-21 10:20:21 +03:00
Botond Dénes	0b1dbb3a64	Update tools/java submodule * tools/java 33938ec1...27999135 (1): > cassandra-stress: Make default repl. strategy NetworkTopologyStrategy Fixes: scylladb/scylla-tools-java#400 Closes scylladb/scylladb#20199	2024-08-21 10:02:59 +03:00
Benny Halevy	e13d5ee834	nodetool: rebuild: add force option To be used to force usage of source_dc, even when it is unsafe for rebuild. Update docs and add test/nodetool/test_rebuild.py Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `0419b1d522`)	2024-08-21 09:37:14 +03:00
Benny Halevy	505cad64ad	Add and use utils::optional_param to pass source_dc Clearly indicate if a source_dc is provided, and if so, was it explicitly given by the user, or was implicitly selected by scylla. This will become useful in the next patches that will use that to either reject the operation if it's unsafe to use the source_dc and the dc was explicitly given by the user, or whether to fallback to using all nodes otherwise. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `8b1877f3ca`)	2024-08-21 09:35:13 +03:00
Raphael S. Carvalho	d65961d8cf	compaction: Allow "offline" sstable to be split In order to fix the race between split and repair, we must introduce the ability to split an "offline" sstable, one that wasn't added to any of the table's sstable set yet. It's not safe to split a sstable after adding it to the set, because a failure to split can result in unsplit data left in the set, causing split to fail down the road, since the coordinator thinks this replica has only split data in the set. Refs #19378. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `239344ab55`)	2024-08-20 10:38:36 +00:00
Anna Stuchlik	4b88ec4722	doc: fix a link on the RBAC page This commit fixes an external link on the Role Based Access Control page. Fixes https://github.com/scylladb/scylladb/issues/20166 (cherry picked from commit `c56c3ce469`) Closes scylladb/scylladb#20202	2024-08-19 15:29:54 +03:00
Lakshmi Narayanan Sreethar	13aa97a00f	boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `ec47b50859`)	2024-08-19 12:11:50 +00:00
Lakshmi Narayanan Sreethar	c336ee63a3	replica: fix copy constructor of tablet_sstable_set Remove the existing copy constructor to enable the use of the implicit copy constructor. This fixes the issue of `_sstable_set_ids` not being copied in the current copy constructor. Fixes #19519 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `44583eed9e`)	2024-08-19 12:11:50 +00:00
Dawid Medrek	8d90b81766	db/hints: Make commitlog use commitlog IO scheduling group Before these changes, we didn't specify which I/O scheduling group commitlog instances in hinted handoff should use. In this commit, we set it explicitly to the commitlog scheduling group. The rationale for this choice is the fact we don't want to cause a bottleneck on the write path -- if hints are written too slowly, new incoming mutations (NOT hints) might be rejected due to a too high number of hints currently being written to disk; see `storage_proxy::create_write_response_handler_helper()` for more context. (cherry picked from commit `6a7fb18b52`) Closes scylladb/scylladb#20093	2024-08-14 22:14:43 +03:00
Raphael S. Carvalho	bc0097688f	replica: Fix race between split compaction and migration After removal of rwlock (`53a6ec05ed`), the race was introduced because the order that compaction groups of a tablet are closed, is no longer deterministic. Some background first: Split compaction runs in main (unsplit) group, and adds sstable to left and right groups on completion. The race works as follow: 1) split compaction starts on main group of tablet X 2) tablet X reaches cleanup stage, so its compaction groups are closed in parallel 3) left or right group are closed before main (more likely when only main has flush work to do) 4) split compaction completes, and adds sstable to left and right 5) if e.g left is closed, adjusting backlog tracker will trigger an exception, and since that happens in row cache update's execute(), node crashes. The problem manifested as follow: [shard 0: gms] raft_topology - Initiating tablet cleanup of 5739b9b0-49d4-11ef-828f-770894013415:15 on 102a904a-0b15-4661-ba3f-f9085a5ad03c:0 ... [shard 0:strm] compaction - [Split keyspace1.standard1 009e2f80-49e5-11ef-85e3-7161200fb137] Splitting [/var/lib/scylla/data/keyspace1/...] ... [shard 0:strm] cache - Fatal error during cache update: std::out_of_range (Compaction state for table [0x600007772740] not found), at: ... -------- seastar::continuation<seastar::internal::promise_base_with_type<void>, row_cache::do_update(... -------- seastar::internal::do_with_state<std::tuple<row_cache::external_updater, std::function<seastar::future<void> ()> >, seastar::future<void> > -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::(anonymous namespace)::thread_wake_task -------- seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::async<sstables::compaction::run(... seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::future<sstables::compaction_resu... From the log above, it can be seen cache update failure happens under streaming sched group and during compaction completion, which was good evidence to the cause. Problem was reproduced locally with the help of tablet shuffling. Fixes: #19873. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `5af1f41ecd`) Closes scylladb/scylladb#20107	2024-08-14 22:13:53 +03:00
Aleksandra Martyniuk	69c1a0e2ca	repair: use find_column_family in insert_repair_meta repair_service::insert_repair_meta gets the reference to a table and passes it to continuations. If the table is dropped in the meantime, the reference becomes invalid. Use find_column_family at each table occurrence in insert_repair_meta instead. Fixes: #20057 (cherry picked from commit `719999b34c`) Refs #19953 Closes scylladb/scylladb#20076	2024-08-14 20:54:12 +03:00
Avi Kivity	c382e19e5e	Merge '[Backport 6.1] Prevent ALTERing non-existing KS with tablets' from ScyllaDB ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 (I suggest to disable displaying whitespace differences when reviewing this PR). Fixes: #19576 Requires 6.0 backport (cherry picked from commit `5b089d8e10`) (cherry picked from commit `0ea2128140`) (cherry picked from commit `ddb5204929`) Refs #19666 Closes scylladb/scylladb#20143 * github.com:scylladb/scylladb: tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist cql: refactor rf_change indentation Prevent ALTERing non-existing KS with tablets	2024-08-14 20:16:55 +03:00
Michał Chojnowski	b786e6a39a	cql_test_env: ensure shutdown() before stop() for system_keyspace If system_keyspace::stop() is called before system_keyspace::shutdown(), it will never finish, because the uncleared shared pointers will keep it alive indefinitely. Currently this can happen if an exception is thrown before the construction of the shutdown() defer. This patch moves the shutdown() call to immediately before stop(). I see no reason why it should be elsewhere. Fixes scylladb/scylla-enterprise#4380 (cherry picked from commit `eeaf4c3443`) Closes scylladb/scylladb#20145	2024-08-14 20:16:29 +03:00
Paweł Zakrzewski	3286c14d76	test/cql-pytest: Add test for GROUP BY queries with LIMIT Remove xfail from all tests for #5361, as the issue is fixed. Remove xfail from test_group_by_clustering_prefix_with_limit It references #5362, but is fixed by #17237. Refs #17237 (cherry picked from commit `9db272c949`)	2024-08-14 16:56:20 +00:00
Paweł Zakrzewski	1773dd5632	cql3: process LIMIT for GROUP BY queries Currently LIMIT not passed to the query executor at all and it was just an accident that it worked for the case referenced in #17237. This change passes the limit value down the chain. (cherry picked from commit `e7ae7f3662`)	2024-08-14 16:56:20 +00:00
Paweł Zakrzewski	c1292c69cf	cql3/select_statement: simplify the get_limit function The get_limit() function performed tasks outside of its scope - for example checked if the statement was an aggregate. This change moves the onus of the check to the caller. (cherry picked from commit `3838ad64b3`)	2024-08-14 16:56:20 +00:00
Paweł Zakrzewski	f27edaa19c	cql3: respect the user-defined page size in aggregate queries The comment in the code already states that we should use the user-defined page size if it's provided. To avoid OOM conditions we'll use the internally defined limit as the upper bound or if no page size is provided. This change lays ground work for fixing #5362 and is necessary to pass the test introduced in #19392 once it is implemented. (cherry picked from commit `08f3219cb8`)	2024-08-14 16:56:19 +00:00
Piotr Smaron	706761d8ec	tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist Using the error injection framework, we inject a sleep into the processing path of ALTER tablets KS, so that the topology coordinator of the leader node sleeps after the rf_change event has been scheduled, but before it is started to be executed. During that time the second node executes a DROP KS statement, which is propagated to the leader node. Once leader node wakes up and resumes processing of ALTER tablets KS, the KS won't exist and the node cannot crash, which was the case before. (cherry picked from commit `ddb5204929`)	2024-08-14 10:37:25 +00:00
Piotr Smaron	41e4c39087	cql: refactor rf_change indentation (cherry picked from commit `0ea2128140`)	2024-08-14 10:37:24 +00:00
Piotr Smaron	d5bdef9ee5	Prevent ALTERing non-existing KS with tablets ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 Fixes: #19576 (cherry picked from commit `5b089d8e10`)	2024-08-14 10:37:24 +00:00
Jenkins Promoter	a4dcf3956e	Update ScyllaDB version to: 6.1.1	2024-08-14 12:28:43 +03:00
Anna Stuchlik	858fa914b1	doc: update Raft info in 6.1 This commit updates the Raft information regarding the Raft verification procedure. In 6.1, the procedure is no longer related to the upgrade. Fixes https://github.com/scylladb/scylladb/issues/19932 (cherry picked from commit `705e53d223`) Closes scylladb/scylladb#20083	2024-08-11 11:37:05 +03:00
Kamil Braun	ec923171a6	storage_service: raft topology: warn when `raft_topology_cmd_handler` fails due to abort Currently we print an ERROR on all exceptions in `raft_topology_cmd_handler`. This log level is too high, in some cases exceptions are expected -- like during shutdown. And it causes dtest failures. Turn exceptions from aborts into WARN level. Also improve logging by printing the command that failed. Fixes scylladb/scylladb#19754 (cherry picked from commit `7506709573`) Closes scylladb/scylladb#20071	2024-08-08 18:13:53 +02:00
Tomasz Grabiec	0144549cd6	tablets: Do not allocate tablets on nodes being decommissioned If tablet-based table is created concurrently with node being decommissioned after tablets are already drained, the new table may be permanently left with replicas on the node which is no longer in the topology. That creates an immidiate availability risk because we are running with one replica down. This also violates invariants about replica placement and this state cannot be fixed by topology operations. One effect is that this will lead to load balancer failure which will inhibit progress of any topology operations: load_balancer - Replica 154b0380-1dd2-11b2-9fdd-7156aa720e1a:0 of tablet 7e03dd40-537b-11ef-9fdd-7156aa720e1a:1 not found in topology, at: ... Fixes #20032 (cherry picked from commit `f5c74a5df2`) Closes scylladb/scylladb#20066	2024-08-08 11:56:13 +03:00
Kamil Braun	0f246bfbc9	raft topology: improve logging Add more logging for raft-based topology operations in INFO and DEBUG levels. Improve the existing logging, adding more details. Fix a FIXME in test_coordinator_queue_management (by readding a log message that was removed in the past -- probably by accident -- and properly awaiting for it to appear in test). Enable group0_state_machine logging at TRACE level in tests. These logs are relatively rare (group 0 commands are used for metadata operations) and relatively small, mostly consist of printing `system.group0_history` mutation in the applied command, for example: ``` TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - apply() is called with 1 commands TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd: prev_state_id: optional(dd9d47c6-50ee-11ef-d77f-500b8e1edde3), new_state_id: dd9ea5c6-50ee-11ef-ae64-dfbcd08d72c3, creator_addr: 127.219.233.1, creator_id: 02679305-b9d1-41ef-866d-d69be156c981 TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd.history_append: {canonical_mutation: table_id 027e42f5-683a-3ed7-b404-a0100762063c schema_version c9c345e1-428f-36e0-b7d5-9af5f985021e partition_key pk{0007686973746f7279} partition_tombstone {tombstone: none}, row tombstone {range_tombstone: start={position: clustered, ckp{0010b4ba65c64b6e11ef8080808080808080}, 1}, end={position: clustered, ckp{}, 1}, {tombstone: timestamp=1722617232237511, deletion_time=1722617232}}{row {position: clustered, ckp{0010dd9ea5c650ee11efae64dfbcd08d72c3}, 0} tombstone {row_tombstone: none} marker {row_marker: 1722617232237511 0 0}, column description atomic_cell{ create system_distributed keyspace; create system_distributed_everywhere keyspace; create and update system_distributed(_everywhere) tables,ts=1722617232237511,expiry=-1,ttl=0}}} ``` note that the mutation contains a human-readable description of the command -- like "create system_distributed keyspace" above. These logs might help debugging various issues (e.g. when `apply` hangs waiting for read_apply mutex, or takes too long to apply a command). Ref: scylladb/scylladb#19105 Ref: scylladb/scylladb#19945 (cherry picked from commit `e8d5974961`) Closes scylladb/scylladb#20048	2024-08-07 13:39:30 +02:00
Anna Stuchlik	1a1583a5b6	doc: add post-installation configuration to the Web Installer page This commit extracts the information about the configuration the user should do right after installation (especially running scylla_setup) to a separate file. The file is included in the relevant pages, i.e., installing with packages and installing with Web Installer. In addition, the examples on the Web Installer page are updated with supported versions of ScyllaDB. Fixes https://github.com/scylladb/scylladb/issues/19908 (cherry picked from commit `849856b964`) Closes scylladb/scylladb#20050	2024-08-07 10:14:13 +03:00
Botond Dénes	f78b88b59b	Merge '[Backport 6.1] db/view: drop view updates to replaced node marked as left' from ScyllaDB When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. In addition to the fix, this PR also includes a regression test heavily based on the test that @kbr-scylla prepared during his investigation of the issue. Fixes: scylladb/scylladb#19439 This issue can cause multiple nodes to crash at once and the fix is quite small, so I think this justifies backporting it to all affected versions. 6.0 and 6.1 are affected. No need to backport to 5.4 as this issue only happens with tablets, and tablets are experimental there. (cherry picked from commit `6af7882c59`) (cherry picked from commit `5ec8c06561`) Refs #19765 Closes scylladb/scylladb#19895 * github.com:scylladb/scylladb: test: regression test for MV crash with tablets during decommission db/view: drop view updates to replaced node marked as left	2024-08-07 09:18:26 +03:00
Tzach Livyatan	73d46ec548	Improve tombstone_compaction_interval description (cherry picked from commit `861a1cedea`) Closes scylladb/scylladb#20025	2024-08-07 09:06:56 +03:00
Tzach Livyatan	dcee7839d4	Update tracing.rst - fix table node_slow_log_time name (cherry picked from commit `858fd4d183`) Closes scylladb/scylladb#20023	2024-08-07 09:05:50 +03:00
Anna Stuchlik	75477f5661	doc: add OS support for version 6.1 This commit adds OS support for version 6.1 and removes OS support for 5.4 (according to our support policy for versions). (cherry picked from commit `eca2dfd8c3`) Closes scylladb/scylladb#20019	2024-08-07 09:04:13 +03:00
Nadav Har'El	78d7c953b0	test: increase timeouts for /localnodes test In commit `bac7c33313` we introduced a new test for the Alternator "/localnodes" request, checking that a node that is still joining does not get returned. The tests used what I thought were "very high" timeouts - we had a timeout of 10 seconds for starting a single node, and injected a 20 second sleep to leave us 10 seconds after the first sleep. But the test failed in one extremely slow run (a debug build on aarch64), where starting just a single node took more than 15 seconds! So in this patch I increase the timeouts significantly: We increase the wait for the node to 60 seconds, and the sleeping injection to 120 seconds. These should definitely be enough for anyone (famous last words...). The test doesn't actually wait for these timeouts, so the ridiculously high timeouts shouldn't affect the normal runtime of this test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `ca8b91f641`) Closes scylladb/scylladb#19940	2024-08-07 08:55:23 +03:00
Nadav Har'El	753fc87efa	alternator: exclude CDC log table from ListTables The Alternator command ListTables is supposed to list actual tables created with CreateTable, and should list things like materialized views (created for GSI or LSI) or CDC log tables. We already properly excluded materialized views from the list - and had the tests to prove it - but forgot both the exclusion and the testing for CDC log tables - so creating a table xyz with streams enable would cause ListTables to also list "xyz_scylla_cdc_log". This patch fixes both oversights: It adds the code to exclude CDC logs from the output of ListTables, add adds a test which reproduces the bug before this fix, and verifies the fix works. Fixes #19911. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `d293a5787f`) Closes scylladb/scylladb#19938	2024-08-07 08:54:08 +03:00
Benny Halevy	c75dbc1f9c	sstable_directory: delete_atomically: allow sstables from multiple prefixes Currently, delete_atomically can be called with a list of sstables from mixed prefixes in two cases: 1. truncate: where we delete all the sstables in the table directory 2. tablet cleanup: similar to truncate but restricted to sstables in a single tablet replica In both cases, it is possible that sstables in staging (or quarantine) are mixed with sstables in the base directory. Until a more comprehensive fix is in place, (see https://github.com/scylladb/scylladb/pull/19555) this change just lifts the ban on atomic deletion of sstables from different prefixes, and acknowledging that the implementation is not atomic across prefixes. This is better than crashing for now, and can be backported more easily to branches that support tablets so tablet migration can be done safely in the presence of repair of tables with views. Refs scylladb/scylladb#18862 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `26abad23d9`) Closes scylladb/scylladb#19919	2024-08-06 16:27:57 +03:00
Lakshmi Narayanan Sreethar	96e5ebe28c	boost/bloom_filter_test: wait for total memory reclaimed update The testcase `test_bloom_filter_reclaim_during_reload` checks the SSTable manager's `_total_memory_reclaimed` against an expected value to verify that a Bloom filter was reloaded. However, it does not wait for the manager to update the variable, causing the check to fail if the update has not occurred yet. Fix it by making the testcase wait until the variable is updated to the expected value. Fixes #19879 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> (cherry picked from commit `27b305b9d1`) Closes scylladb/scylladb#19897	2024-08-06 16:26:36 +03:00
Takuya ASADA	c45e92142e	scylla_raid_setup: install update-initramfs when it's not available scylla_raid_setup may fail on Ubuntu minimal image since it calls update-initramfs without installing. (cherry picked from commit `02b20089cb`) Closes scylladb/scylladb#19869	2024-08-06 16:24:27 +03:00
Aleksandra Martyniuk	d69f0e529a	test: tasks: adjust tests to new wait_task behavior After `c1b2b8cb2c` /task_manager/wait_task/ does not unregister tasks anymore. Delete the check if the task was unregistered from test_task_manager_wait. Check task status in drain_module_tasks to ensure that the task is removed from task manager. Fixes: #19351. (cherry picked from commit `dfe3af40ed`) Closes scylladb/scylladb#19839	2024-08-06 16:23:02 +03:00
Łukasz Paszkowski	86ff3c2aa3	api/system: add highest_supported_sstable_format path Current upgrade dtest rely on a ccm node function to get_highest_supported_sstable_version() that looks for r'Feature (.*)_SSTABLE_FORMAT is enabled' in the log files. Starting from scylla-6.0 ME_SSTABLE_FORMAT is enabled by default and there is no cluster feature for it. Thus get_highest_supported_sstable_version() returns an empty list resulting in the upgrade tests failures. This change introduces a seperate API path that returns the highest supported sstable format (one of la, mc, md, me) by a scylla node. Fixes scylladb/scylladb#19772 Backports to 6.0 and 6.1 required. The current upgrade test in dtest checks scylla upgrades up to version 5.4 only. This patch is a prerequisite to backport the upgrade tests fix in dtest. (cherry picked from commit `781eb7517c`) Closes scylladb/scylladb#19814	2024-08-06 16:21:48 +03:00
Avi Kivity	efac73109e	Merge '[Backport 6.1] doc: add the 6.0-to-6.1 upgrade guide' from ScyllaDB This PR adds the 6.0-to-6.1 upgrade guide (including metrics) and removes the 5.4-to-6.0 upgrade guide. Compared 5.4-to-6.0, the the 6.0-to-6.1 guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure. - Removed the references to the Enable Consistent Topology Updates page (which was in version 6.0 and is removed with this PR) across the docs. See the individual commits for more details. Fixes https://github.com/scylladb/scylladb/issues/19853 Fixes https://github.com/scylladb/scylladb/issues/19933 This PR must be backported to branch-6.1 as it is critical in version 6.1. (cherry picked from commit `9972e50134`) (cherry picked from commit `32fa5aa938`) Refs #19983 Closes scylladb/scylladb#20038 * github.com:scylladb/scylladb: doc: remove the 5.4-to-6.0 upgrade guide doc: add the 6.0-to-6.1 upgrade guide	2024-08-06 13:28:24 +03:00
Anna Stuchlik	8c975712d3	doc: remove the 5.4-to-6.0 upgrade guide This commit removes the 5.4-to-6.0 upgrade guide and all references to it. It mainly removes references to the Enable Consistent Topology Updates page, which was added as enabling the feature was optional. In rare cases, when a reference to that page is necessary, the internal link is replaced with an external link to version 6.0. Especially the Handling Cluster Membership Change Failures page was modified for troubleshooting purposes rather than removed. (cherry picked from commit `32fa5aa938`)	2024-08-06 10:20:09 +00:00
Anna Stuchlik	1fdfe11bb0	doc: add the 6.0-to-6.1 upgrade guide This commit adds the 6.0-to-6.1 upgrade guide. Compared to the previous upgrade guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure. (cherry picked from commit `9972e50134`)	2024-08-06 10:20:09 +00:00
Botond Dénes	58c06819d7	Update ./tools/python3 submodule * ./tools/python3 18fa79ee...ea49f0ca (1): > install.sh: fix incorrect permission on strict umask Fixes: https://github.com/scylladb/scylladb/issues/19775 Closes scylladb/scylladb#20022	2024-08-06 10:02:07 +03:00
Michael Litvak	5b604509ce	db: fix waiting for counter update operations on table stop When a table is dropped it should wait for all pending operations in the table before the table is destroyed, because the operations may use the table's resources. With counter update operations, currently this is not the case. The table may be destroyed while there is a counter update operation in progress, causing an assert to be triggered due to a resource being destroyed while it's in use. The reason the operation is not waited for is a mistake in the lifetime management of the object representing the write in progress. The commit fixes it so the object lives for the duration of the entire counter update operation, by moving it to the `do_with` list. Fixes scylladb/scylla-enterprise#4475 Closes scylladb/scylladb#20018	2024-08-05 12:54:19 +02:00
Jenkins Promoter	abbf0b24a6	Update ScyllaDB version to: 6.1.0	2024-08-04 14:31:47 +03:00
Kamil Braun	347857e5e5	Merge '[Backport 6.1] raft: fix the shutdown phase being stuck' from ScyllaDB Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 (cherry picked from commit `2dbe9ef2f2`) (cherry picked from commit `5dfc50d354`) Refs #19860 Closes scylladb/scylladb#19970 * github.com:scylladb/scylladb: raft: fix the shutdown phase being stuck raft: use the abort source reference in raft group0 client interface	2024-08-02 11:24:34 +02:00
Emil Maskovsky	cd2ca5ef57	raft: fix the shutdown phase being stuck Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 (cherry picked from commit `5dfc50d354`)	2024-07-31 20:52:23 +00:00
Emil Maskovsky	5a4065ecd5	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent. (cherry picked from commit `2dbe9ef2f2`)	2024-07-31 20:52:23 +00:00
Kamil Braun	ed4f2ecca4	docs: extend "forbidden operations" section for Raft-topology upgrade The Raft-topology upgrade procedure must not be run concurrently with version upgrade. (cherry picked from commit `bb0c3cdc65`) Closes scylladb/scylladb#19836	2024-07-29 16:52:40 +02:00
Jenkins Promoter	8f80a84e93	Update ScyllaDB version to: 6.1.0-rc2	2024-07-29 15:50:26 +03:00
Piotr Dulikowski	95abb6d4a7	test: regression test for MV crash with tablets during decommission Regression test for scylladb/scylladb#19439. Co-authored-by: Kamil Braun <kbraun@scylladb.com> (cherry picked from commit `5ec8c06561`)	2024-07-26 14:02:51 +00:00
Piotr Dulikowski	30b0cb4f5d	db/view: drop view updates to replaced node marked as left When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. Fixes: scylladb/scylladb#19439 (cherry picked from commit `6af7882c59`)	2024-07-26 14:02:50 +00:00
Nadav Har'El	97ae704f99	alternator: do not allow authentication with a non-"login" role Alternator allows authentication into the existing CQL roles, but roles which have the flag "login=false" should be refused in authentication, and this patch adds the missing check. The patch also adds a regression test for this feature in the test/alternator test framework, in a new test file test/alternator/cql_rbac.py. This test file will later include more tests of how the CQL RBAC commands (CREATE ROLE, GRANT, REVOKE) affect authentication and authorization in Alternator. In particular, these tests need to use not just the DynamoDB API but also CQL, so this new test file includes the "cql" fixture that allows us to run CQL commands, to create roles, to retrieve their secret keys, and so on. Fixes #19735 (cherry picked from commit `14cd7b5095`) Closes scylladb/scylladb#19863	2024-07-25 12:45:27 +03:00
Nadav Har'El	738e4c3681	alternator: fix "/localnodes" to not return nodes still joining Alternator's "/localnodes" HTTP request is supposed to return the list of nodes in the local DC to which the user can send requests. The existing implementation incorrectly used gossiper::is_alive() to check for which nodes to return - but "alive" nodes include nodes which are still joining the cluster and not really usable. These nodes can remain in the JOINING state for a long time while they are copying data, and an attempt to send requests to them will fail. The fix for this bug is trivial: change the call to is_alive() to a call to is_normal(). But the hard part of this test is the testing: 1. An existing multi-node test for "/localnodes" assummed that right after a new node was created, it appears on "/localnodes". But after this patch, it may take a bit more time for the bootstrapping to complete and the new node to appear in /localnodes - so I had to add a retry loop. 2. I added a test that reproduces the bug fixed here, and verifies its fix. The test is in the multi-node topology framework. It adds an injection which delays the bootstrap, which leaves a new node in JOINING state for a long time. The test then verifies that the new node is alive (as checked by the REST API), but is not returned by "/localnodes". 3. The new injection for delaying the bootstrap is unfortunately not very pretty - I had to do it in three places because we have several code paths of how bootstrap works without repair, with repair, without Raft and with Raft - and I wanted to delay all of them. Fixes #19694. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `0d1aa399f9`) Closes scylladb/scylladb#19855	2024-07-24 11:04:54 +03:00
Lakshmi Narayanan Sreethar	ee74fe4e0e	[Backport 6.1] sstables: do not reload components of unlinked sstables The SSTable is removed from the reclaimed memory tracking logic only when its object is deleted. However, there is a risk that the Bloom filter reloader may attempt to reload the SSTable after it has been unlinked but before the SSTable object is destroyed. Prevent this by removing the SSTable from the reclaimed list maintained by the manager as soon as it is unlinked. The original logic that updated the memory tracking in `sstables_manager::deactivate()` is left in place as (a) the variables have to be updated only when the SSTable object is actually deleted, as the memory used by the filter is not freed as long as the SSTable is alive, and (b) the `_reclaimed.erase(sst)` is still useful during shutdown, for example, when the SSTable is not unlinked but just destroyed. Fixes https://github.com/scylladb/scylladb/issues/19722 Closes scylladb/scylladb#19717 github.com:scylladb/scylladb: boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded sstables: do not reload components of unlinked sstables sstables/sstables_manager: introduce on_unlink method (cherry picked from commit `591876b44e`) Backported from #19717 to 6.1 Closes scylladb/scylladb#19828	2024-07-24 09:03:52 +03:00
Jenkins Promoter	b2ea946837	Update ScyllaDB version to: 6.1.0-rc1	2024-07-23 10:33:48 +03:00
Avi Kivity	92e725c467	Merge '[Backport 6.1] Fix lwt semaphore guard accounting' from ScyllaDB Currently the guard does not account correctly for ongoing operation if semaphore acquisition fails. It may signal a semaphore when it is not held. Should be backported to all supported versions. (cherry picked from commit `87beebeed0`) (cherry picked from commit `4178589826`) Refs #19699 Closes scylladb/scylladb#19819 * github.com:scylladb/scylladb: test: add test to check that coordinator lwt semaphore continues functioning after locking failures paxos: do not signal semaphore if it was not acquired	2024-07-22 17:41:30 +03:00
Kamil Braun	e57d48253f	Merge '[Backport 6.1] test: raft: fix the flaky `test_raft_recovery_stuck`' from ScyllaDB Use the rolling restart to avoid spurious driver reconnects. This can be eventually reverted once the scylladb/python-driver#295 is fixed. Fixes scylladb/scylladb#19154 (cherry picked from commit `ef3393bd36`) (cherry picked from commit `a89facbc74`) Refs #19771 Closes scylladb/scylladb#19820 * github.com:scylladb/scylladb: test: raft: fix the flaky `test_raft_recovery_stuck` test: raft: code cleanup in `test_raft_recovery_stuck`	2024-07-22 14:12:26 +02:00
Emil Maskovsky	47df9f9b05	test: raft: fix the flaky `test_raft_recovery_stuck` Use the rolling restart to avoid spurious driver reconnects. This can be eventually reverted once the scylladb/python-driver#295 is fixed. Fixes scylladb/scylladb#19154 (cherry picked from commit `a89facbc74`)	2024-07-22 09:17:05 +00:00
Emil Maskovsky	193dc87bd0	test: raft: code cleanup in `test_raft_recovery_stuck` Cleaning up the imports. (cherry picked from commit `ef3393bd36`)	2024-07-22 09:17:04 +00:00
Gleb Natapov	11d1950957	test: add test to check that coordinator lwt semaphore continues functioning after locking failures (cherry picked from commit `4178589826`)	2024-07-22 09:01:34 +00:00
Gleb Natapov	6317325ed5	paxos: do not signal semaphore if it was not acquired The guard signals a semaphore during destruction if it is marked as locked, but currently it may be marked as locked even if locking failed. Fix this by using semaphore_units instead of managing the locked flag manually. Fixes: https://github.com/scylladb/scylladb/issues/19698 (cherry picked from commit `87beebeed0`)	2024-07-22 09:01:34 +00:00
Anna Mikhlin	14222ad205	Update ScyllaDB version to: 6.1.0-rc0	2024-07-18 16:05:23 +03:00
Avi Kivity	c93e2662ae	build: regenerate toolchain for optimized clang Generate a profile-guided-optimization build of clang and install it. See `bd34f2fe46`. The optimized clang package can be found in https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-x86_64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-aarch64.tar.gz Closes scylladb/scylladb#19685	2024-07-18 12:57:45 +03:00
Botond Dénes	8cc99973eb	Merge 'Apply sstable io error handler to exceptions generated when opening file' from Calle Wilund Fixes #19753 SSTable file open provides an `io_error_handler` instance which is applied to a file-wrapper to process any IO errors happing during read/write via the handler in `storage_service`, which in turn will effectively disable the node. However, this is not applied to the actual open operation itself, i.e. any exception generated by the file open call itself will instead just escape to caller. This PR adds filtering via the `error_handler` to sstable open + makes `storage_service` "isolate" mechanism non-module-static (thus making it testable) and adds tests to check we exhibit the same behaviour in both cases. The main motivation for this issue it discussions that secondary level IO issues (i.e. caused by extensions) should trigger the same behaviour as, for example, running out of disk space. Closes scylladb/scylladb#19766 * github.com:scylladb/scylladb: memtable_test: Add test for isolate behaviour on exceptions during flush cql_test_env: Expose storage service storage_service: Make isolate guard non-static and add test accessor sstable: apply error_handler on open exceptions	2024-07-18 08:14:40 +03:00
Avi Kivity	d5af86bd8a	test: cql-pytest: config_value_context: remove strange ast.literal_eval call cql-pytest's config_value_context is used to run a code sequence with different ScyllaDB configuration applied for a while. When it reads the original value (in order to restore it later), it applies ast.literal_eval() to it. This is strange, since the config variable isn't a Python literal. It was added in `8c464b2ddb` ("guardrails: restrict replication strategy (RS)"). Presumably, as a workaround for #19604 - it sufficiently massaged the input we read via SELECT to be acceptable later via UPDATE. Now that #19604 is fixed, we can remove the call to ast.literal_eval, but have to fix up the parameters to config_value_context to something that will be accepted without further massaging. This is a step towards fixing #15559, where we want to run some tests with a boolean configuration variable changed, and literal_eval is transforming the string representation of integers to integers and confusing the driver. Closes scylladb/scylladb#19696	2024-07-18 08:11:26 +03:00
Dawid Medrek	414ea68cac	exceptions/exceptions.hh: Wrap `#include <concepts>` within an `#ifdef` `GitHub Actions / Analyze #includes in source files` keeps reporting that the include shouldn't be present in the file. The reason is that we use FMT with version >10, so the fragment of the code that uses the include is not compiled. We move the include to a place where it's used, which should fix the warnings. Closes scylladb/scylladb#19776	2024-07-17 22:09:41 +03:00
Yaron Kaikov	ddcc6ec1e4	dist/docker/debian/build_docker.sh: Build container based on Ubuntu24.04 Now that we added support for Ubuntu24.04 and also migrating our images to be based on that (https://github.com/scylladb/scylla-machine-image/pull/530), we should also modify our docker image Fixes: https://github.com/scylladb/scylladb/issues/19738 Closes scylladb/scylladb#19764	2024-07-17 18:45:48 +03:00
Calle Wilund	91b1be6736	memtable_test: Add test for isolate behaviour on exceptions during flush Tests that certain exceptions thrown during flush to sstable does not crash the node, but does trigger io_error_handler and causes node isolation	2024-07-17 09:36:28 +00:00
Calle Wilund	f996dfc4fa	cql_test_env: Expose storage service So tests can play with it.	2024-07-17 09:36:28 +00:00
Calle Wilund	de728958d1	storage_service: Make isolate guard non-static and add test accessor Makes storage service isolate repeatable in same process and more testable. Note, since the test var now is shard-local we need to check twice: once on error, once on reaching shard zero for actual shutdown.	2024-07-17 09:36:28 +00:00
Calle Wilund	7918ec2e39	sstable: apply error_handler on open exceptions	2024-07-17 09:36:27 +00:00
Emil Maskovsky	21c67a5a64	test: raft: fix the flaky `test_change_ip` The python driver might currently trigger spurios reconnects that cause the `NoHostAvailable` to be thrown, which is not expected. This patch adds a retry mechanism to the test to make skip this failure if it occurs, as a work-around. The proper fix is expected to be done in the scylladb/python-driver#295, once fixed there this work-around can be reverted. Fixes: scylladb/scylla#18547 Closes scylladb/scylladb#19759	2024-07-16 15:46:16 +02:00
Botond Dénes	1be6cfb16e	Update tools/java submodule * tools/java 01ba3c19...33938ec1 (1): > cassandra-stress: delay before retry	2024-07-16 16:29:51 +03:00
Avi Kivity	dde209390f	Merge 'sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski There are two schemas associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. This series fixes the known mixups between the two — when setting up compression, and when setting up the bloom filters. Fixes #16065 The bug is present in all supported versions, so the patch has to be backported to all of them. Closes scylladb/scylladb#19695 * github.com:scylladb/scylladb: sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's sstables: for i_filter downcasts, use dynamic_cast instead of static_cast	2024-07-16 12:17:41 +03:00
Raphael S. Carvalho	c061ec8d1c	test: Fix max_ongoing_compaction_test test ``` DEBUG 2024-07-03 00:59:58,291 [shard 0:main] compaction_manager - Compaction task 0x51800002a480 for table tests.3 compaction_group=0 [0x503000062050]: switch_state: none -> pending: pending=2 active=0 done=0 errors=0 DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - Checking droppable sstables in tests.3, candidates=0 DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - time_window_compaction_strategy::newest_bucket: now 1720314000000000 buckets = { key=1720314000000000, size=2 key=1720310400000000, size=2 1720314000000000: GMT: Sunday, July 7, 2024 1:00:00 AM 1720310400000000: GMT: Sunday, July 7, 2024 12:00:00 AM ``` the test failed to complete when ran across different clock hours, as it expected all sstables produced to belong to same window of 1h size. let's fix it by reusing timestamps, so it's always consistent. Fixes #13280. Fixes #18564. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19749	2024-07-16 07:29:10 +03:00
Emil Maskovsky	144794a952	raft: Fix crash in leader_host API handler The leader_host API handler was eventually using the `req` unique_ptr after it has been already destroyed (passed down to the future lambda by reference). This was causing an occassional crash in some tests. Reworked the leader_host handler to use the req only outside of the future lambda. Also updated the code to handle the possibility that the non-default leader group (other than Group 0) might reside on a different shard than the shard 0 - using the same concept of calling on all shards via `invoke_on_all()` as done for the other requests. Fixes scylladb/scylladb#19714 Closes scylladb/scylladb#19715	2024-07-15 11:06:56 +02:00
Avi Kivity	c11f2c9bcd	Merge 'scylla-housekeeping: fix exception on parsing version string v2' from Takuya ASADA This reverts `65fbf72ed0` and introduce new version of the patch which fixes SCT breakage after the commit merged. ---- Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to pass acceptable version string to parse_version() like '6.1.0.dev0', which is allowed on Python version scheme. reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes https://github.com/scylladb/scylladb/issues/19564 Closes https://github.com/scylladb/scylladb/pull/19572 Closes scylladb/scylladb#19670 * github.com:scylladb/scylladb: scylla-housekeeping: fix exception on parsing version string Revert "scylla-housekeeping: fix exception on parsing version string"	2024-07-14 16:24:41 +03:00
Botond Dénes	53a6ec05ed	Merge 'replica: remove rwlock for protecting iteration over storage group map' from Raphael "Raph" Carvalho rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Fixes #18821. ``` WRITE ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets --write - BEFORE 65559.52 tps ( 59.6 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52841 insns/op, 30946 cycles/op, 0 errors) 67408.05 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53018 insns/op, 30874 cycles/op, 0 errors) 67714.72 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53026 insns/op, 30881 cycles/op, 0 errors) 67825.57 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53015 insns/op, 30821 cycles/op, 0 errors) 67810.74 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53009 insns/op, 30828 cycles/op, 0 errors) throughput: mean=67263.72 standard-deviation=967.40 median=67714.72 median-absolute-deviation=547.02 maximum=67825.57 minimum=65559.52 instructions_per_op: mean=52981.61 standard-deviation=79.09 median=53014.96 median-absolute-deviation=36.54 maximum=53025.79 minimum=52840.56 cpu_cycles_per_op: mean=30869.90 standard-deviation=50.23 median=30874.06 median-absolute-deviation=42.11 maximum=30945.94 minimum=30820.89 - AFTER 65448.76 tps ( 59.5 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52788 insns/op, 31013 cycles/op, 0 errors) 67290.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30950 cycles/op, 0 errors) 67646.81 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30909 cycles/op, 0 errors) 67565.90 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53058 insns/op, 30951 cycles/op, 0 errors) 67537.32 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 52983 insns/op, 30963 cycles/op, 0 errors) throughput: mean=67097.93 standard-deviation=931.44 median=67537.32 median-absolute-deviation=467.97 maximum=67646.81 minimum=65448.76 instructions_per_op: mean=52975.85 standard-deviation=108.07 median=53024.55 median-absolute-deviation=49.45 maximum=53057.99 minimum=52788.49 cpu_cycles_per_op: mean=30957.17 standard-deviation=37.43 median=30951.31 median-absolute-deviation=7.51 maximum=31013.01 minimum=30908.62 READ ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets - BEFORE 79423.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41840 insns/op, 26820 cycles/op, 0 errors) 81076.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41837 insns/op, 26583 cycles/op, 0 errors) 80927.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41829 insns/op, 26629 cycles/op, 0 errors) 80539.44 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41841 insns/op, 26735 cycles/op, 0 errors) 80793.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41864 insns/op, 26662 cycles/op, 0 errors) throughput: mean=80551.99 standard-deviation=661.12 median=80793.10 median-absolute-deviation=375.37 maximum=81076.70 minimum=79423.36 instructions_per_op: mean=41842.20 standard-deviation=13.26 median=41840.14 median-absolute-deviation=5.68 maximum=41864.50 minimum=41829.29 cpu_cycles_per_op: mean=26685.88 standard-deviation=93.31 median=26662.18 median-absolute-deviation=56.47 maximum=26820.08 minimum=26582.68 - AFTER 79464.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41799 insns/op, 26761 cycles/op, 0 errors) 80954.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41803 insns/op, 26605 cycles/op, 0 errors) 81160.90 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41811 insns/op, 26555 cycles/op, 0 errors) 81263.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41814 insns/op, 26527 cycles/op, 0 errors) 81162.97 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41806 insns/op, 26549 cycles/op, 0 errors) throughput: mean=80801.25 standard-deviation=755.54 median=81160.90 median-absolute-deviation=361.72 maximum=81263.10 minimum=79464.70 instructions_per_op: mean=41806.47 standard-deviation=5.85 median=41806.05 median-absolute-deviation=4.05 maximum=41813.86 minimum=41799.36 cpu_cycles_per_op: mean=26599.22 standard-deviation=94.84 median=26554.54 median-absolute-deviation=50.51 maximum=26761.06 minimum=26527.05 ``` Closes scylladb/scylladb#19469 * github.com:scylladb/scylladb: replica: remove rwlock for protecting iteration over storage group map replica: get rid of fragile compaction group intrusive list	2024-07-12 15:45:36 +03:00
Piotr Dulikowski	3cdf549da2	Merge 'remove utils::in' from Avi Kivity utils::in uses std::aligned_storage, which is deprecated. Rather than fixing it, replace its only user with simpler code and remove it. No backport needed as this isn't fixing a bug. Closes scylladb/scylladb#19683 * github.com:scylladb/scylladb: utils: remove utils/in.hh gossiper: remove initializer-list overload of add_local_application_state()	2024-07-12 12:06:09 +02:00
Takuya ASADA	373a7825b5	scylla-housekeeping: fix exception on parsing version string Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to pass acceptable version string to parse_version() like '6.1.0.dev0', which is allowed on Python version scheme. Also, release canditate version like '6.0.0~rc3' has same issue, it should be replaced to '6.0.0rc3' to compare in parse_version(). reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes #19564 Closes scylladb/scylladb#19572	2024-07-12 03:23:34 +09:00
Takuya ASADA	db04f8b16e	Revert "scylla-housekeeping: fix exception on parsing version string" This reverts commit `65fbf72ed0`, since it breaks scylla-housekeeping and SCT because the patch modified version string. We shoudn't modify version string directly, need to pass modified string just for parse_version() instead.	2024-07-12 03:23:34 +09:00
Emil Maskovsky	b9abad0515	test: raft: fix the topology failure recovery test flakiness Setting the error condition for all nodes in the cluster to avoid having to check which one is the coordinator. This should make the test more stable and avoid the flakiness observed when the coordinator node is the one that got the error condition injected. Randomizing the retrieved running servers to reproduce the issue more frequently and to avoid making any assumptions about the order of the servers. Note that only the "raft_topology_barrier_fail" needs to run on a non-coordinator node, the other error "stream_ranges_fail" can be injected on any node (including the coordinator). Fixes: scylladb/scylladb#18614 Closes scylladb/scylladb#19663	2024-07-11 16:23:26 +02:00
Piotr Dulikowski	188b4ac0fc	Merge 'service_level_controller: update configuration on raft change' from Michał Jadwiszczak This patch is a follow-up to scylladb/scylladb#16585. Once we have service levels on raft, we can get rid of update loop, which updates the configuration in a configured interval (default is 10s). Instead, this PR introduces methods to `group0_state_machine` which look through table ids in mutations in `write_mutation` and update submodules based on that ids. Fixes: scylladb/scylladb#18060 Closes scylladb/scylladb#18758 * github.com:scylladb/scylladb: test: remove `sleep()`s which were required to reload service levels configuration test/cql_test_env: remove unit test service levels data accessors service/storage_service: reload SL cache on topology_state_load() service/qos/service_level_controller: move semaphore breaking to stop service/qos/service_level_controller: maybe start and stop legacy update loop service/qos/service_level_controller: make update loop legacy raft/group0_state_machine: update submodules based on table_id service/storage_service: add a proxy method to reload sl cache	2024-07-11 16:18:48 +02:00
Kefu Chai	2a1c9ed7cb	github: use needs.read-toolchain.outputs.image for iwyu's container in `9a71543fd2`, we introduced a regression, which failed to use the proper value for the container image in which the iwyu workflow is run. in this change, we pass the correct value, as we do in clang-tidy.yaml workflow. Refs `9a71543fd2` Fixes scylladb/scylladb#19704 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19697	2024-07-11 17:17:37 +03:00
Michał Chojnowski	1a8ee69a43	sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the compressor objects based on its own schema, but using them based based on the sstable's schema the sstable's schema. This patch forces the writer to use the sstable's schema for both.	2024-07-11 12:53:54 +02:00
Michał Chojnowski	d10b38ba5b	sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the filter based on its own schema, while the layer outside the writer was interpreting it as if it was created with the sstable's schema. This patch forces the writer to pick the filter's parameters based on the sstable's schema instead.	2024-07-11 12:53:54 +02:00
Michał Chojnowski	a1834efd82	sstables: for i_filter downcasts, use dynamic_cast instead of static_cast As of this patch, those static_casts are actually invalid in some cases (they cast to the wrong type) because of an oversight. A later patch will fix that. But to even write a reliable reproducer for the problem, we must force the invalid casts to manifest as a crash (instead of weird results). This patch both allows writing a reproducer for the bug and serves as a bit of defensive programming for the future.	2024-07-11 12:53:54 +02:00
Tomas Nozicka	26466a3043	Allow configuring default loglevel with args for container images Closes scylladb/scylladb#19671	2024-07-11 12:37:53 +03:00
Piotr Dulikowski	19c5e1807c	Merge 'schema: fix describe of indexes on collections' from Michał Jadwiszczak If the index was created on collection (both frozen or not), its description wasn't a correct create statement. This patch fixes the bug and includes functions like `full()`, `keys()`, `values()`, ... used to create index on collections. Fixes scylladb/scylladb#19278 Closes scylladb/scylladb#19381 * github.com:scylladb/scylladb: cql-pytest/test_describe: add a test for describe indexes schema/schema: fix column names in index description	2024-07-11 09:11:01 +02:00
Kefu Chai	9a71543fd2	github: always use the tools/toolchain/image for lint workflows instead of hardwiring the toolchain image in github workflows, read it from `tools/toolchain/image`. a dedicated reusable workflow is added to read from this file, and expose its content with an output parameter. also, switch iwyu.yaml workflow to this image, more maintainable this way. please note, before this change, we are also using the latest stable build of clang, and since fedora 40 is also using the clang 18, so the behavior is not change. but with this change, we don't have the flexibility of using other clang versions provided https://apt.llvm.org in future. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19655	2024-07-10 23:45:35 +03:00
Avi Kivity	65a7fc9902	Merge 'transport, service: move definition of destructors into .cc' from Kefu Chai this changeset includes two changes: - service: move storage_service::~storage_service() into .cc - transport: move the cql_server::~cql_server() into .cc they intends to address the compile failures when building scylladb with clang-19. clang-19 is more picky when generating the defaulted destructors with incomplete types. but its behavior makes sense regarding to standard compliance. so let's update accordingly. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19668 * github.com:scylladb/scylladb: transport: move the cql_server::~cql_server() into .cc service: move storage_service::~storage_service() into .cc	2024-07-10 23:43:16 +03:00
Kefu Chai	06ba523818	sstable: extract file_writer out `sstables::write()` has multiple overloads, which are defined in `sstables/writer.hh`. two of these overloads are template functions, which have a template parameter named `W`, which has a type constraint requiring it to fulfill the `Writer` concept. but in `types.hh`, when the compiler tries to instantiate the template function with signature of `write(sstable_version_types v, W& out, const T& t)` with `file_writer` as the template parameter of `w`, `file_writer` is only forward-declared using `class file_writer` in the same header file, so this type is still an incomplete type at that moment. that's why the compiler is not able to determine if `file_writer` fulfills the constraint or not. actually, the declaration of `file_writer` is located in `sstables/writer.hh`, which in turn includes `types.hh`. so they form a cyclic dependency. in this change, in order to break this cycle, we extract file_writer out into a separate header file, so that both `sstables/writer.hh` and `sstables/types.hh` can include it. this address the build failure. Fixes scylladb/scylladb#19667 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19669	2024-07-10 23:32:47 +03:00
Michał Chojnowski	fdd8b03d4b	scylla-gdb.py: add $coro_frame() Adds a convenience function for inspecting the coroutine frame of a given seastar task. Short example of extracting a coroutine argument: ``` (gdb) p $coro_frame(seastar::local_engine->_current_task) $1 = { __resume_fn = 0x2485f80 <sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&)>, ... PointerType_7 = 0x601008e67880, ... __coro_index = 0 '\000' ... (gdb) p $downcast_vptr($->PointerType_7) $2 = (schema ) 0x601008e67880 ``` Closes scylladb/scylladb#19479	2024-07-10 21:46:27 +03:00
Avi Kivity	45e27c0da2	config, enum_option: allow round-trip string conversion The default configuration for replication_strategy_warn_list is ["SimpleStrategy"], but one cannot set this via CQL: cqlsh> select * from system.config where name = 'replication_strategy_warn_list'; name \| source \| type \| value --------------------------------+---------+---------------------------+-------------------- replication_strategy_warn_list \| default \| replication strategy list \| ["SimpleStrategy"] (1 rows) cqlsh> update system.config set value = '[NetworkTopologyStrategy]' where name = 'replication_strategy_warn_list'; cqlsh> select * from system.config where name = 'replication_strategy_warn_list'; name \| source \| type \| value --------------------------------+--------+---------------------------+----------------------------- replication_strategy_warn_list \| cql \| replication strategy list \| ["NetworkTopologyStrategy"] (1 rows) cqlsh> update system.config set value = '["NetworkTopologyStrategy"]' where name = 'replication_strategy_warn_list'; WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed for system.config - received 0 responses and 1 failures from 1 CL=ONE." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1} Fix by allowing quotes in enum_set parsing. Bug present since `8c464b2ddb` ("guardrails: restrict replication strategy (RS)", 6.0). Fixes #19604. Closes scylladb/scylladb#19605	2024-07-10 20:39:01 +03:00
Yaron Kaikov	e33126fc3e	.github/script/label_promoted_commit.py: add label only if ref is PR we got a failure during check-commit action: ``` Run python .github/scripts/label_promoted_commits.py --commit_before_merge `30e82a81e8` --commit_after_merge `f31d5e3204` --repository scylladb/scylladb --ref refs/heads/master Commit sha is: `d5a149fc01` Commit sha is: `415457be2b` Commit sha is: `d3b1ccd03a` Commit sha is: `1fca341514` Commit sha is: `f784be6a7e` Commit sha is: `80986c17c3` Commit sha is: `492d0a5c86` Commit sha is: `7b3f55a65f` Commit sha is: `78d6471ce4` Commit sha is: `7a69d9070f` Commit sha is: `a9e985fcc9` master branch, pr number is: 19213 Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 87, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 81, in main pr = repo.get_pull(pr_number) File "/usr/lib/python3/dist-packages/github/Repository.py", line 2746, in get_pull headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/pulls/pulls#get-a-pull-request", "status": "404"} Error: Process completed with exit code 1. ``` The reason for this failure is since in one of the promoted commits (`a9e985fcc9`) had a reference of `Closes` to an issue. Fixes: https://github.com/scylladb/scylladb/issues/19677 Closes scylladb/scylladb#19678	2024-07-10 15:27:12 +03:00
Botond Dénes	9bdcba7a46	Merge 'conf: scylla.yaml: update documentation for tablets' from Benny Halevy Tablets are no longer in experimental_features since `83d491a`, so remove them from the experimental_features section documentation. Also, expand the documentation for the `enable_tablets` option. Fixes #19456 Needs backport to 6.0 Closes scylladb/scylladb#19516 * github.com:scylladb/scylladb: conf: scylla.yaml: enable_tablets: expand documentation conf: scylla.yaml: remove tablets from experimental_features doc comment	2024-07-10 14:32:40 +03:00
Avi Kivity	8b7a2661c1	utils: remove utils/in.hh It uses deprecated std::aligned_storage and had only one user (now removed) rather than maintain it, remove.	2024-07-10 14:11:27 +03:00
Avi Kivity	d50ba03965	gossiper: remove initializer-list overload of add_local_application_state() The initializer_list overload uses a too-clever technique to avoid copies. While copies here are unlikely to pose any real problem (we're allocating map nodes anyway), it's simple enough to provide a copy-less replacement that doesn't require questionable tricks. We replace the initializer_list<..., in<>> overload with a variadic template that constructs a temporary map.	2024-07-10 14:11:27 +03:00
Michał Jadwiszczak	375499b727	test: remove `sleep()`s which were required to reload service levels configuration Previously, some service levels tests requires to sleep in order to ensure in-memory configuration of service levels was updated. Now, when we are updating the configuration as the raft log is applied, doing read barrier (for instance to execute `DROP TABLE IF EXISTS non_existing_table`) is enough and the sleeps are not needed.	2024-07-10 10:42:21 +02:00
Michał Jadwiszczak	23bebb8037	test/cql_test_env: remove unit test service levels data accessors Unit test data accessors were created to avoid starting update loop in unit test and to update controller's configuration directly. With raft data accessor and configuration updates on applying raft log, we can get rid of unit test data accessors and use the raft one. This also make unit test env a bit like real Scylla environment.	2024-07-10 10:42:21 +02:00
Michał Jadwiszczak	de857d9ce3	service/storage_service: reload SL cache on topology_state_load() Since SL cache is no longer updated in a loop, it needs to be initialized on startup and because we are updating the cache while applying raft commands, we can initialize it on topology_state_load().	2024-07-10 10:42:20 +02:00
Jadw1	cf29242962	service/qos/service_level_controller: move semaphore breaking to stop Before this, the notification semaphore was broken() in do_abort(), which was triggered by early abort source. However we are going to reload sl cache on topology state reload and it can happen after the early abort source is triggered, so it may throw broken_semaphore exception. We can move semaphore breaking to stop() method. Legacy update loop is still stopped in do_abort(), so it doesn't change the order of service level controller shutdown.	2024-07-10 10:33:24 +02:00
Michał Jadwiszczak	85119b90df	service/qos/service_level_controller: maybe start and stop legacy update loop In previous commit, we marked the update loop as legacy. For compatibility reasons, we need to start legacy update loop when the cluster is in recovery mode or it hasn't been upgraded to raft topology. Then, in the update loop we check if all conditions are met and stop the loop. This commit also moves start of update loop later (after topology state is loaded) in main.cc. There is no risk in doing it later.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	b0f76db9f2	service/qos/service_level_controller: make update loop legacy Rename method which started update loop to better reflect what it does. Previously the method was named `update_from_distributed_data`, however it doesn't update anything but only start the update loop, which we are making legacy.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	5ddf5e3d7d	raft/group0_state_machine: update submodules based on table_id We want to update service levels cache when any new mutations are applied to service levels table. To not create new raft command type, this commit changes design of `write_mutations` to updated in-memory structures based on mutations' table_id.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	b61047a3f8	service/storage_service: add a proxy method to reload sl cache In this series of patches, we want to reload service levels cache when any changes to SL table are applied. Firstly we need to have a way to trigger reload of the cache from `group0_state_machines`. To not introduce another dependency, we can use `storage_service` (which has access to SL controller) and add a proxy method to it.	2024-07-10 10:23:04 +02:00
Nadav Har'El	c6cffe36dd	Merge 'cql: forbid having counter columns in tablets tables' from Piotr Smaron Counter updates break under tablet migration (#18180), and for this reason counters need to be disabled until the problem is fixed. It's enough to forbid creating a table with counters, as altering a table without counters already cannot result in the table having counters: 1) Adding a counter column to a table without counters: ``` cqlsh> ALTER TABLE temp.cf ADD (col_name counter); ConfigurationException: Cannot add a counter column (col_name) in a non counter column family ``` 2) Altering a column to be of the counter type: ``` cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter; ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible. ``` Fixes: #19449 Fixes: https://github.com/scylladb/scylladb/issues/18876 Need to backport to 6.0, as this is broken there. Closes scylladb/scylladb#19518 * github.com:scylladb/scylladb: doc: add notes to feature pages which don't support tablets cql: adjust warning about tablets cql: forbid having counter columns in tablets tables	2024-07-10 10:18:30 +03:00
Michał Jadwiszczak	b65a4c66f0	cql-pytest/test_describe: add a test for describe indexes	2024-07-10 07:14:46 +02:00
Kefu Chai	7e4e685964	transport: move the cql_server::~cql_server() into .cc because transport/server.cc has the complete definition of event_notifier, the compiler can default-generate the destructor of `cql_server` with the necessary information. otherwise, clang-19 would fail to build, like: ``` FAILED: CMakeFiles/scylla.dir/Dev/main.cc.o /home/kefu/.local/bin/clang++ -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_PROGRAM_OPTIONS_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT CMakeFiles/scylla.dir/Dev/main.cc.o -MF CMakeFiles/scylla.dir/Dev/main.cc.o.d -o CMakeFiles/scylla.dir/Dev/main.cc.o -c /home/kefu/dev/scylladb/main.cc In file included from /home/kefu/dev/scylladb/main.cc:11: In file included from /usr/include/yaml-cpp/yaml.h:10: In file included from /usr/include/yaml-cpp/parser.h:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql_transport::cql_server::event_notifier' 91 \| static_assert(sizeof(_Tp)>0, \| ^~~~~~~~~~~ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql_transport::cql_server::event_notifier>::operator()' requested here 398 \| get_deleter()(std::move(__ptr)); \| ^ /home/kefu/dev/scylladb/transport/server.hh:135:7: note: in instantiation of member function 'std::unique_ptr<cql_transport::cql_server::event_notifier>::~unique_ptr' requested here 135 \| class cql_server : public seastar::peering_sharded_service<cql_server>, public generic_server::server { \| ^ /home/kefu/dev/scylladb/transport/server.hh:135:7: note: in implicit destructor for 'cql_transport::cql_server' first required here /home/kefu/dev/scylladb/transport/server.hh:149:11: note: forward declaration of 'cql_transport::cql_server::event_notifier' 149 \| class event_notifier; \| ^ 1 error generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-10 12:52:51 +08:00
Kefu Chai	79ffde063a	service: move storage_service::~storage_service() into .cc as repair/repair.cc has the complete definition of node_ops_meta_data, the compiler can default-generate the destructor of `storage_service` with the necessary information. otherwise, clang-19 would fail to build, like: ``` FAILED: repair/CMakeFiles/repair.dir/Dev/repair.cc.o /home/kefu/.local/bin/clang++ -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/Dev/repair.cc.o -MF repair/CMakeFiles/repair.dir/Dev/repair.cc.o.d -o repair/CMakeFiles/repair.dir/Dev/repair.cc.o -c /home/kefu/dev/scylladb/repair/repair.cc In file included from /home/kefu/dev/scylladb/repair/repair.cc:9: In file included from /home/kefu/dev/scylladb/repair/repair.hh:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:291:11: error: field has incomplete type 'service::node_ops_meta_data' 291 \| _T2 second; ///< The second member \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/ext/aligned_buffer.h:93:28: note: in instantiation of template class 'std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>' requested here 93 \| : std::aligned_storage<sizeof(_Tp), __alignof__(_Tp)> \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:334:43: note: in instantiation of template class '__gnu_cxx::__aligned_buffer<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here 334 \| __gnu_cxx::__aligned_buffer<_Value> _M_storage; \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:373:7: note: in instantiation of template class 'std::__detail::_Hash_node_value_base<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here 373 \| : _Hash_node_value_base<_Value> \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:1662:21: note: in instantiation of template class 'std::__detail::_Hash_node_value<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, false>' requested here 1662 \| ._M_bucket_index(declval<const __node_value_type&>(), \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:109:11: note: in instantiation of member function 'std::_Hashtable<utils::tagged_uuid<node_ops_id_tag>, std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, std::allocator<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<node_ops_id_tag>>, std::hash<utils::tagged_uuid<node_ops_id_tag>>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>>::~_Hashtable' requested here 109 \| class unordered_map \| ^ /home/kefu/dev/scylladb/service/storage_service.hh:109:7: note: forward declaration of 'service::node_ops_meta_data' 109 \| class node_ops_meta_data; \| ^ In file included from /home/kefu/dev/scylladb/repair/repair.cc:9: In file included from /home/kefu/dev/scylladb/repair/repair.hh:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:60: ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-10 12:52:51 +08:00
Michał Jadwiszczak	253feb6811	schema/schema: fix column names in index description Previously description of index didn't include functions for indexes on collections like full(), keys(), values(), etc...	2024-07-09 22:37:05 +02:00
Raphael S. Carvalho	c539b7c861	replica: remove rwlock for protecting iteration over storage group map rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Check documentation added to compaction_group.hh to understand how concurrent iterations and updates to the map work without the rwlock. Yielding variants that iterate over groups are no longer returning group id since id stability can no longer be guaranteed without serializing split finalization and iteration. Fixes #18821. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:59:24 -03:00
Raphael S. Carvalho	ad5c5bca5f	replica: get rid of fragile compaction group intrusive list It was added to make integration of storage groups easier, but it's complicated since it's another source of truth and we could have problems if it becomes inconsistent with the group map. Fixes #18506. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:53:35 -03:00
Piotr Smaron	531659f8dc	doc: add notes to feature pages which don't support tablets There's already a page which lists which features are not working with tablets: architecture/tablets.html#limitations-and-unsupported-features, but it's also helpful for users to be warned about this when visiting a specific feature doc page.	2024-07-09 18:18:05 +02:00
Avi Kivity	f31d5e3204	Merge 'repair/streaming: enable toggling tombstone gc with a config item' from Botond Dénes We currently disable tombstone GC for compaction done on the read path of streaming and repair, because those expired tombstones can still prevent data resurrection. With time-based tombstone GC, missing a repair for long enough can cause data resurrection because a tombstone is potentially GC'd before it could be spread to every node by repair. So repair disseminating these expired tombstones helps clusters which missed repair for long enough. It is not a guarantee because compaction could have done the GC itself, but it is better than nothing. This last resort is getting less important with repair-based tombstone GC. Furthermore, we have seen this cause huge repair amplification in a cluster, where expired tombstones triggered repair replicating otherwise identical rows. This series makes tombstone GC on the streaming/repair compaction path configurable with a config item. This new config item defaults to `false` (current behaviour), setting it to `true`, will enable tombstone GC. Fixes: https://github.com/scylladb/scylladb/issues/19015 Not a regression, no backport needed Closes scylladb/scylladb#19016 * github.com:scylladb/scylladb: test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming() db/config: introduce enable_tombstone_gc_for_streaming_and_repair	2024-07-09 19:04:11 +03:00
Piotr Smaron	5bfabff9a0	cql: adjust warning about tablets Made it shorter, simpler and mentioned also that counters aren't supported with tablets. Fixes: #18876	2024-07-09 18:01:37 +02:00
Piotr Smaron	c70f321c6f	cql: forbid having counter columns in tablets tables Counter updates break under tablet migration (#18180), and for this reason they need to be disabled until the problem is fixed. It's enough to forbid creating a table with counters, as altering a table without counters already cannot result in the table having counters: 1) Adding a counter column to a table without counters: ``` cqlsh> ALTER TABLE temp.cf ADD (col_name counter); ConfigurationException: Cannot add a counter column (col_name) in a non counter column family ``` 2) Altering a column to be of the counter type: ``` cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter; ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible. ``` Fixes: #19449	2024-07-09 18:01:31 +02:00
Patryk Wrobel	a89e3d10af	code-cleanup: add missing header guards The following command had been executed to get the list of headers that did not contain '#pragma once': 'grep -rnw . -e "#pragma once" --include *.hh -L' This change adds missing include guard to headers that did not contain any guard. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#19626	2024-07-09 18:31:35 +03:00
Takuya ASADA	cae999c094	toolchain: change optimized clang install method to standard one Previously optimized clang installation was not used standard build script, it overwrites preinstalled Fedora's clang binaries instead. However this breaks on clang-18.1.8, since libLTO versioning convention. To avoid such problem, let's switch to standard installation method and swith install prefix to /usr/local. Fixes #19203 Closes scylladb/scylladb#19505	2024-07-09 14:22:42 +03:00
Tomasz Grabiec	252110bc54	Merge 'mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion' from Michał Chojnowski apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes https://github.com/scylladb/scylladb/issues/19552 Closes scylladb/scylladb#19617 * github.com:scylladb/scylladb: mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion logalloc: add hold_reserve logalloc: generalize refill_emergency_reserve()	2024-07-09 13:09:01 +02:00
Anna Stuchlik	948459b1ac	doc: replace a link on the CDC+Kafka page This commit replaces a link to the installation section with a link to the getting started section. Closes scylladb/scylladb#19658	2024-07-09 12:35:43 +03:00
Michael Litvak	ed33e59714	storage_proxy: remove response handler if no targets When writing a mutation, it might happen that there are no live targets to send the mutation to, yet the request can be satisfied. For example, when writing with CL=ANY to a dead node, the request is completed by storing a local hint. Currently, in that case, a write response handler is created for the request and it remains active until it timeouts because it is not removed anywhere, even though the write is completed successfuly after storing the hint. The response handler should be removed usually when receiving responses from all targets, but in this case there are no targets to trigger the removal. In this commit we check if we don't have live targets to send the mutation to. If so, we remove the response handler immediately. Fixes scylladb/scylladb#19529 Closes scylladb/scylladb#19586	2024-07-09 12:11:05 +03:00
Kamil Braun	98c18d8904	Merge 'Add API for read barrier' from Emil Maskovsky Introduce REST API for triggering a read barrier. This is to make sure the database schema is up to date on the node where the read barrier is triggered. One of the use cases is the database backup via the Scylla Manager, which requires that the schema backed up is matching the data or newer (data can be migrated, but an older schema would cause issues). Fixes scylladb/scylladb#19213 Closes scylladb/scylladb#19597 * github.com:scylladb/scylladb: raft: add the read barrier REST API raft: use `raft_timeout` in trigger_snapshot raft: use bad_param_exception for consistency test: raft: verify schema updated after read barrier	2024-07-09 10:58:21 +02:00
Kefu Chai	6af989782c	test: sstable_directory_test: use THREADSAFE_BOOST_REQUIRE_EQUAL when appropriate for better debugging experience. before this change, we have ``` fatal error: in "sstable_directory_test_generation_sanity": critical check sst->generation() == sst1->generation() has failed ``` after this change, we have ``` fatal error: in "sstable_directory_test_generation_sanity": critical check sst->generation() == sst1->generation() has failed [3ghm_0ntw_29vj625yegw7jodysc != 3ghm_0ntw_29vj625yegw7jodysd] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19639	2024-07-09 10:54:23 +03:00
Kefu Chai	30e82a81e8	test: do not define boost_test_print_type() for types with operator<< before this change, we provide `boost_test_print_type()` for all types which can be formatted using {fmt}. these types includes those who fulfill the concept of range, and their element can be formatted using {fmt}. if the compilation unit happens to include `fmt/ranges.h`. the ranges are formatted with `boost_test_print_type()` as well. this is what we expect. in other words, we use {fmt} to format types which do not natively support {fmt}, but they fulfill the range concept. but `boost::unit_test::basic_cstring` is one of them - it can be formatted using operator<<, but it does not provide fmt::format specialization - it fulfills the concept of range - and its element type is `char const`, which can be formatted using {fmt} that's why it's formatted like: ``` test/boost/sstable_directory_test.cc(317): fatal error: in "sstable_directory_test_generation_sanity": critical check ['s', 's', 't', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')', ' ', '=', '=', ' ', 's', 's', 't', '1', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')'] has failed` ``` where the string is formatted as a sequence-alike container. this is far from readable. so, in this change, we do not define `boost_test_print_type()` for the types which natively support `operator<<` anymore. so they can be printed with `operator<<` when boost::test prints them. Fixes scylladb/scylladb#19637 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19638	2024-07-09 10:34:37 +03:00
Botond Dénes	9544c364be	scylla-gdb.py: introduce scylla large-objects The equivalent of small-objects, but for large objects (spans). Allows listing object of a large-class, and therefore investigating a run-away class, by attempting to identify the owners of the objects in it. Written to investigate #16493 Closes scylladb/scylladb#16711	2024-07-09 10:21:09 +03:00
Emil Maskovsky	a9e985fcc9	raft: add the read barrier REST API This will allow to trigger the read barrier directly via the API, instead of doing work-arounds (like dropping a non-existent table). The intended use-case is in the Scylla Manager, to make sure that the database schema is up to date after the data has been backed up and before attempting to backup the database schema. The database schema in particular is being backed up just on a single node, which might not yet have the schema at least as new as the data (data can be migrated to a newer schema, but not a vice-versa). The read barrier issued on the node should ensure that the node should have the schema at least as new as the data or newer. Closes #19213	2024-07-08 18:16:27 +02:00
Emil Maskovsky	7a69d9070f	raft: use `raft_timeout` in trigger_snapshot Migrate the "trigger_snapshot" to use the standardized `raft_timeout` approach.	2024-07-08 18:13:31 +02:00
Michał Chojnowski	78d6471ce4	mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes scylladb/scylladb#19552	2024-07-08 16:08:27 +02:00
Michał Chojnowski	7b3f55a65f	logalloc: add hold_reserve mutation_partition_v2::apply_monotonically() needs to perform some allocations in a destructor, to ensure that the invariants of the data structure are restored before returning. But it is usually called with reclaiming disabled, so the allocations might fail even in a perfectly healthy node with plenty of reclaimable memory. This patch adds a mechanism which allows to reserve some LSA memory (by asking the allocator to keep it unused) and make it available for allocation right when we need to guarantee allocation success.	2024-07-08 16:08:27 +02:00
Wojciech Przytuła	691e245152	storage_proxy: fix uninitialized LWT contention counter When debugging the issue of high LWT contention metric, we (the drivers team) discovered that at least 3 drivers (Go, Java, Rust) cause high numbers in that metrics in LWT workloads - we doubted that all those drivers route LWT queries badly. We tried to understand that metric and its semantics. It took 3 people over 10 hours to figure out what it is supposed to count. People from core team suspected that it was the drivers sending requests to different shards, causing contention. Then we ran the workload against a single node single shard cluster... and observed contention. Finally, we looked into the Scylla code and saw it. Uninitialized stack value. The core member was shocked. But we, the drivers people, felt we always knew it. It's yet another time that we are blamed for a server-side issue. We rebuilt scylla with the variable initialized to 0 and the metric kept being 0. To prevent such errors in the future, let's consider some lints that warn against uninitialized variables. This is such an obvious feature of e.g. Rust, and yet this has shown to be cause a painful bug in 2024. Closes scylladb/scylladb#19625	2024-07-08 16:55:46 +03:00
Emil Maskovsky	492d0a5c86	raft: use bad_param_exception for consistency Replace the `std::runtime_error` by the `bad_param_exception` that is used in other places.	2024-07-08 14:31:11 +02:00
Takuya ASADA	cbf33aba5c	scylla_coredump_setup: install systemd-coredump before has_zstd() On Ubuntu/Debian, we have to install systemd-coredump before running has_ztd(), since it detect ZSTD support by running coredumpctl. Move pkg_install('systemd-coredump') to the head of the script. Fixes #19643 Closes scylladb/scylladb#19648	2024-07-08 15:04:34 +03:00
Kefu Chai	229250ef3e	.github: use scylla-toolchain for newer fmt in `cccec07581`, we started using a featured introduced by {fmt} v10. but we are still using the {fmt} cooked using seastar, and it is 9.1.0, so this breaks the build when running the clang-tidy workflow. in this change, instead of building on ubuntu jammy, we use the scylladb/scylla-toolchain image based on fedora 40, which provides {fmt} v10.2.1. since we are have clang 18 in fedora 40, this change does not sacrifice anything. after this change, clang-tidy workflow should be back to normal. Fixes scylladb/scylladb#19621 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19628	2024-07-08 11:14:02 +02:00
Emil Maskovsky	80986c17c3	test: raft: verify schema updated after read barrier Regression test for #19213.	2024-07-08 10:50:32 +02:00
Piotr Dulikowski	3c535641fd	Merge 'service/storage_proxy: Add metrics keeping track of incoming hints' from Dawid Mędrek Although Scylla already exposes metrics keeping track of various information related to hinted handoff, all of them correspond to either storing or sending hints. However, when debugging, it's also crucial to be aware of how many hints are coming to a given node and what their size is. Unfortunately, the existing metrics are not enough to obtain that information. This PR introduces the following new metrics: * `sent_bytes_total` – the total size of the hints that have been sent from a given shard, * `received_hints_total` – the total number of hints that a given shard has received, * `received_hints_bytes_total` – the total size of the hints a given shard has received. It also renames `hints_manager_sent` to `hints_manager_sent_total` to avoid conflicts of prefixes between that metric and `sent_bytes_total` in tests. Fixes scylladb/scylladb#10987 Closes scylladb/scylladb#18976 * github.com:scylladb/scylladb: db/hints: Add a metric for the size of sent hints service/storage_proxy: Add metrics for received hints	2024-07-08 10:29:53 +02:00
Botond Dénes	56c194e52c	Merge 'compaction: not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19581 * github.com:scylladb/scylladb: .github: add compaction to iwyu's CLEANER_DIR compaction: not include unused headers	2024-07-08 10:03:51 +03:00
Israel Fruchter	32e6725b8e	Update tools/cqlsh submodule * tools/cqlsh 73bdbeb0...86a280a1 (1): > remove cassandra from the shiv package Ref: scylladb/scylla-cqlsh#96 Closes scylladb/scylladb#19558	2024-07-08 10:00:59 +03:00
Michael Litvak	407274e828	view: drain view builder before database The view builder is doing write operations to the database. In order for the view builder to shutdown gracefully without errors, we need to ensure the database can handle writes while it is drained. The commit changes the drain order, so that view builder is drained before the database shuts down. Fixes scylladb/scylladb#18929 Closes scylladb/scylladb#19609	2024-07-05 22:17:40 +03:00
Botond Dénes	103bd8334a	service/paxos/paxos_state: restore resilience against dropped tables Recently, the code in paxos_state::prepare(), paxos_state::accept() and paxos_state::learn() was coroutinized by `58912c2cc1`, `887a5a8f62` and `2b7acdb32c` respectively. This introduced a regression: the latency histogram updater code, was moved from a finally() to a defer(). Unlike the former, the latter runs in a noexcept context so the possible replica::no_such_column_family raised from the latency update code now crashes the node, instead of failing just the paxos operation as before. Fix by only updating the latency histogram if the table still exists. Fixes: scylladb/scylladb#19620 Closes scylladb/scylladb#19623	2024-07-05 14:58:11 +02:00
Anna Stuchlik	8759dfae96	doc: add Run in Docker page to the documentation The page was missing from the docs. I created the page based on the information in the download center (which will be closed down soon) and other ScyllaDB resources. Closes scylladb/scylladb#19577	2024-07-04 20:20:03 +03:00
Dawid Medrek	0e1cb0dc73	db/hints: Add logging when ignoring hint directories In `2446cce`, we stopped trying to attempt to create endpoint managers for invalid hint directories even when their names represented IP addresses or host IDs. In this commit, we add logging informing the user about it. Refs scylladb/scylladb#19173 Closes scylladb/scylladb#19618	2024-07-04 20:14:52 +03:00
Botond Dénes	155acbb306	reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop Now that the CPU concurency limit is configurable, new reads might be ready to execute right after the current one was executed. So move the poll for admitting new reads into the inner loop, to prevent the situation where the inner loop yields and a concurrent do_wait_admission() finds that there are waiters (queued because at the time they arrived to the semaphore, the _ready_list was not empty) but it is is possible to admit a new read. When this happens the semaphore will dump diagnostics to help debug the apparent contradiction, which can generate a lot of log spam. Moving the poll into the inner loop prevents the false-positive contradiction detection from firing. Refs: scylladb/scylladb#19017 Closes scylladb/scylladb#19600	2024-07-04 17:47:52 +03:00
Avi Kivity	0626e0487d	Merge 'Add copy on write to functions schema code' from Marcin Maliszkiewicz This is the first patch from series which would allow us to unify raft command code. Property we want to achieve is that all modifications performed by a single raft command can be made visible atomically. This helps to exclude accidental dependencies across subsystem updates and make easier to reason about state. Here we alter functions schema code so that changes are first applied to a copy of declared functions and then made visible atomically. Later work will apply similar strategy to the whole schema. Relates scylladb/scylladb#19153 Closes scylladb/scylladb#19598 * github.com:scylladb/scylladb: cql3: functions: make modification functions accessible only via batch class db: replica: batch functions schema modifications cql3: functions: introduce class for batching functions modifications cql3: functions: make functions class non-static cql3: functions: remove reduntant class access specifiers cql3: functions: remove unused java snippet	2024-07-04 17:40:23 +03:00
Anna Stuchlik	822a58f964	doc: remove support for Debian 10 This PR removes support for Debian 10, which reached end of life on June 30, 2024. Refs https://github.com/scylladb/scylla-enterprise/issues/4377 Closes scylladb/scylladb#19616	2024-07-04 17:24:57 +03:00
Marcin Maliszkiewicz	3f1c2fecc2	cql3: functions: make modification functions accessible only via batch class This is to assure that all the code is using batching	2024-07-04 13:10:26 +02:00
Marcin Maliszkiewicz	32fe101f9d	db: replica: batch functions schema modifications Before each function change was immediately visible as during event notification logic yielded. Now we first gather the modifications and then commit them. Further work will broaden the scope of atomicity to the whole schema and even across other subsystems.	2024-07-04 13:10:26 +02:00
Michał Chojnowski	f784be6a7e	logalloc: generalize refill_emergency_reserve() In the next patch, we will want to do the thing as refill_emergency_reserve() does, just with a quantity different than _emergency_reserve_max. So we split off the shareable part to a new function, and use it to implement refill_emergency_reserve().	2024-07-04 12:19:01 +02:00
Pavel Emelyanov	9a654730a7	tablet_allocator: Put more info into failed-to-drain exception When balancer fails to find a node to balance drained tablets into, it throws an exception with tablet id and node id, but it's also good to know more details about the balancing state that lead to failure refs: #19504 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19588	2024-07-04 12:18:50 +02:00
Marcin Maliszkiewicz	4d937c5a17	cql3: functions: introduce class for batching functions modifications It will hold a temporary shallow copy of declared functions. Then each modification adds/removes/replaces stored function object. At the end change is commited by moving temporary copy to the main functions class instance.	2024-07-04 12:14:36 +02:00
Nadav Har'El	96dff367f8	Merge 'storage_proxy: update view update backlog on correct shard when writing' from Wojciech Mitros This series is another approach of https://github.com/scylladb/scylladb/pull/18646 and https://github.com/scylladb/scylladb/pull/19181. In this series we only change where the view backlog gets updated - we do not assure that the view update backlog returned in a response is necessarily the backlog that increased due to the corresponding write, the returned backlog may be outdated up to 10ms. Because this series does not include this change, it's considerably less complex and it doesn't modify the common write patch, so no particular performance considerations were needed in that context. The issue being fixed is still the same, the full description can be seen below. When a replica applies a write on a table which has a materialized view it generates view updates. These updates take memory which is tracked by `database::_view_update_concurrency_sem`, separate on each shard. The fraction of units taken from the semaphore to the semaphore limit is the shard's view update backlog. Based on these backlogs, we want to estimate how busy a node is with its view updates work. We do that by taking the max backlog across all shards. To avoid excessive cross-shard operations, the node's (max) backlog isn't calculated each time we need it, but up to 1 time per 10ms (the `_interval`) with an optimization where the backlog of the calculating shard is immediately up-to-date (we don't need cross-shard operations for it): ``` update_backlog node_update_backlog::fetch() { auto now = clock::now(); if (now >= _last_update.load(std::memory_order_relaxed) + _interval) { _last_update.store(now, std::memory_order_relaxed); auto new_max = boost::accumulate( _backlogs, update_backlog::no_backlog(), [] (const update_backlog& lhs, const per_shard_backlog& rhs) { return std::max(lhs, rhs.load()); }); _max.store(new_max, std::memory_order_relaxed); return new_max; } return std::max(fetch_shard(this_shard_id()), _max.load(std::memory_order_relaxed)); } ``` For the same reason, even when we do calculate the new node's backlog, we don't read from the `_view_update_concurrency_sem`. Instead, for each shard we also store a update_backlog atomic which we use for calculation: ``` struct per_shard_backlog { // Multiply by 2 to defeat the prefetcher alignas(seastar::cache_line_size * 2) std::atomic<update_backlog> backlog = update_backlog::no_backlog(); need_publishing need_publishing = need_publishing::no; update_backlog load() const { return backlog.load(std::memory_order_relaxed); } }; std::vector<per_shard_backlog> _backlogs; ``` Due to this distinction, the update_backlog atomic need to be updated separately, when the `_view_update_concurrency_sem` changes. This is done by calling `storage_proxy::update_view_update_backlog`, which reads the `_view_update_concurrency_sem` of the shard (in `database::get_view_update_backlog`) and then calls node`_update_backlog::add` where the read backlog is stored in the atomic: ``` void storage_proxy::update_view_update_backlog() { _max_view_update_backlog.add(get_db().local().get_view_update_backlog()); } void node_update_backlog::add(update_backlog backlog) { _backlogs[this_shard_id()].backlog.store(backlog, std::memory_order_relaxed); _backlogs[this_shard_id()].need_publishing = need_publishing::yes; } ``` For this implementation of calculating the node's view update backlog to work, we need the atomics to be updated correctly when the semaphores of corresponding shards change. The main event where the view update backlog changes is an incoming write request. That's why when handling the request and preparing a response we update the backlog calling `storage_proxy::get_view_update_backlog` (also because we want to read the backlog and send it in the response): backlog update after local view updates (`storage_proxy::send_to_live_endpoints` in `mutate_begin`) ``` auto lmutate = [handler_ptr, response_id, this, my_address, timeout] () mutable { return handler_ptr->apply_locally(timeout, handler_ptr->get_trace_state()) .then([response_id, this, my_address, h = std::move(handler_ptr), p = shared_from_this()] { // make mutation alive until it is processed locally, otherwise it // may disappear if write timeouts before this future is ready got_response(response_id, my_address, get_view_update_backlog()); }); }; backlog update after remote view updates (storage_proxy::remote::handle_write) auto f = co_await coroutine::as_future(send_mutation_done(netw::messaging_service::msg_addr{reply_to, shard}, trace_state_ptr, shard, response_id, p->get_view_update_backlog())); ``` Now assume that on a certain node we have a write request received on shard A, which updates a row on shard B (A!=B). As a result, shard B will generate view updates and consume units from its `_view_update_concurrency_sem`, but will not update its atomic in `_backlogs` yet. Because both shards in the example are on the same node, shard A will perform a local write calling `lmutate` shown above. In the `lmutate` call, the `apply_locally` will initiate the actual write on shard B and the `storage_proxy::update_view_update_backlog` will be called back on shard A. In no place will the backlog atomic on shard B get updated even though it increased in size due to the view updates generated there. Currently, what we calculate there doesn't really matter - it's only used for the MV flow control delays, so currently, in this scenario, we may only overload a replica causing failed replica writes which will be later retried as hints. However, when we add MV admission control, the calculated backlog will be the difference between an accepted and a rejected request. Fixes: https://github.com/scylladb/scylladb/issues/18542 Without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none Closes scylladb/scylladb#19341 * github.com:scylladb/scylladb: test: add test for view backlog not being updated on correct shard test: move auxiliary methods for waiting until a view is built to util mv: update view update backlog when it increases on correct shard	2024-07-04 11:40:09 +03:00
Marcin Maliszkiewicz	16b770ff1a	cql3: functions: make functions class non-static This is done to ease code reuse in the following commit. It'd also help should we ever want properly mount functions class to schema object instead of static storage.	2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz	47033dce7a	cql3: functions: remove reduntant class access specifiers	2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz	e86191b19f	cql3: functions: remove unused java snippet It doesn't seem to serve any purpose now.	2024-07-04 10:24:57 +02:00
Kefu Chai	cccec07581	db: use format_as() in favor of fmt::streamed() since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also, we've switched to the build image based on fedora 40, which ships fmt-devel v10.2.1, there is no need to use fmt::streamed() when the corresponding format_as() as available. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19594	2024-07-04 11:10:43 +03:00
Kefu Chai	35e7a0b36f	test/cql-pytest: use offset-aware API to avoid deprecate warning to avoid warning like ``` DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC). ``` and to be future-proof, let's use the offset-aware timestamp. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19536	2024-07-04 10:48:00 +03:00
Kefu Chai	03e1fce7aa	zstd: include external header with brackets zstd.h is a header provided by libzstd, so let's include it with brackets, more consistent this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19538	2024-07-04 10:42:29 +03:00
Takuya ASADA	09e22690dc	scylla_coredump_setup: enable compress by default when zstd support detected We disabled coredump compression by default because it was too slow, but recent versions of systemd-coredump supports faster zstd based compression, so let's enable compression by default when zstd support detected. Related scylladb/scylla-machine-image#462 Closes scylladb/scylladb#18854	2024-07-04 10:38:22 +03:00
Botond Dénes	e3e5f8209d	Merge 'alternator: fix "/localnodes" to use broadcast_rpc_address' from Nadav Har'El This short series fixes Alternator's "/localnodes" request to allow a node's external IP address - configured with `broadcast_rpc_address` - to be listed instead of its usual, internal, IP address. The first patch fixes a bug in gossiper::get_rpc_address(), which the second patch needs to implement the feature. The second patch also contains regression tests. Fixes #18711. Closes scylladb/scylladb#18828 * github.com:scylladb/scylladb: alternator: fix "/localnodes" to use broadcast_rpc_address gossiper: fix get_rpc_address() for this node	2024-07-04 10:37:28 +03:00
Takuya ASADA	65fbf72ed0	scylla-housekeeping: fix exception on parsing version string Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to replace version string from '6.1.0~dev' to '6.1.0.dev0', which is allowed on Python version scheme. reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes #19564 Closes scylladb/scylladb#19572	2024-07-04 10:27:51 +03:00
Avi Kivity	69450780a7	docs: explain tuning for a node that is overcommitted at the hypervisor level Closes scylladb/scylladb#19589	2024-07-04 10:23:25 +03:00
Pavel Emelyanov	8809b99736	s3/client: Unmark put-object lambdas from mutable They don't need to modify the captured objects. In fact, they must not do it in the first place, because the request can be called more than once and the buffers must not change between those invocations. For the memory_sink_buffers there must be const method to get the vector of temporary_buffers themselves. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19599	2024-07-04 10:07:48 +03:00
Lakshmi Narayanan Sreethar	c80df8504c	sstables::maybe_rebuild_filter_from_index: log sstable origin Log the sstable origin when its bloom filter is being rebuilt. The origin has to be passed to the method by the caller as it is not available in the sstable object when the filter is rebuilt. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#19601	2024-07-04 10:01:23 +03:00
Wojciech Mitros	1fdc65279d	test: add test for view backlog not being updated on correct shard This patch adds a test for reproducing issue https://github.com/scylladb/scylladb/issues/18542 The test performs writes on a table with a materialized view and checks that the view backlog increases. To get the current view update backlog, a new metric "view_update_backlog" is added to the `storage_proxy` metrics. The metric differs from the metric from `database` metric with the same name by taking the backlog from the max_view_update_backlog which keeps view update backlogs from all shards which may be a bit outdated, instead of taking the backlog by checking the view_update_semaphore which the backlog is based on directly.	2024-07-03 23:18:52 +02:00
Wojciech Mitros	c4f5659c11	test: move auxiliary methods for waiting until a view is built to util In many materialized view tests we need to wait until a view is built before actually working on it, future tests will also need it. In existing tests we use the same, duplicated method for achieving that. In this patch the method is deduplicated and moved to pylib/util.py and existing tests are modified to use it instead.	2024-07-03 23:18:52 +02:00
Wojciech Mitros	fd9c7d4d59	mv: update view update backlog when it increases on correct shard When performing a write, we should update the view update backlog on the shard where the mutation is actually applied. Instead, currently we only update it on the shard that initially received the write request (which didn't change at all) and as a result, the backlog on the correct shard and the aggregated max view update backlog are not updated at all. This patch enables updating the backlog on the correct shard. The update is now performed just after the view generation and propagation finishes, so that all backlog increases are noted and the backlog is ready to be used in the write response. Additionally, after this patch, we no longer (falsely) assume that the backlog is modified on the same shard as where we later read it to attach to a response. However, we still compare the aggregated backlog from all shards and the backlog from the shard retrieving the max, as with a shard-aware driver, it's likely the exact shard whose backlog changed.	2024-07-03 23:18:52 +02:00
Avi Kivity	3fc4e23a36	forward_service: rename to mapreduce_service forward_service is nondescriptive and misnamed, as it does more than forward requests. It's a classic map/reduce algorithm (and in fact one of its parameters is "reducer"), so name it accordingly. The name "forward" leaked into the wire protocol for the messaging service RPC isolation cookie, so it's kept there. It's also maintained in the name of the logger (for "nodetool setlogginglevel") for compatibility with tests. Closes scylladb/scylladb#19444	2024-07-03 19:29:47 +03:00
Avi Kivity	f798217293	Merge 'build: cmake: include the whole archive of zstd.a' from Kefu Chai before this change, when linking scylla-main, the linker discards the unreferenced symbols defined by zstd.cc. but we use constructor of static variable `registerator` to register the zstd compressor, this variable is not used from the linker's point of view. but we do rely on the side effect of its constructor. that's why the rules generated by CMake fails to build tests and scylla executables with zstd support. that's why we have following test failure: ``` boost.sstable_3_x_test.test_uncompressed_collections_read ... [Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor' == [File] - seastar/src/testing/seastar_test.cc == [Line] - 43 ``` in this change, we single out zstd.cc and build it as an archive, so that scylla-main can include as a whole. an alternative is to link scylla-main as a whole archive, but that might increase the disk foot print when building lots of tests -- some of them do not use all symbols exposed by scylla-main, and can potentially have smaller size if linker can discard the unused symbols. Refs https://github.com/scylladb/scylladb/issues/2717 --- cmake related change, hence no need to backport. Closes scylladb/scylladb#19539 * github.com:scylladb/scylladb: build: cmake: include the whole archive of zstd.a build: cmake: find libzstd before using it	2024-07-03 17:38:22 +03:00
Botond Dénes	fca0a58674	Merge 'Close output_stream in get_compaction_history() API handler' from Pavel Emelyanov If an httpd body writer is called with output_stream<>, it mist close the stream on its own regardless of any exceptions it may generate while working, otherwise stream destructor may step on non-closed assertion. Stepped on with different handler, see #19541 Coroutinize the handler as the first step while at it (though the fix would have been notably shorter if done with .finally() lambda) Closes scylladb/scylladb#19543 * github.com:scylladb/scylladb: api: Close response stream of get_compaction_history() api: Flush output stream in get_compaction_history() call api: Coroutinize get_compaction_history inner function	2024-07-03 17:00:26 +03:00
Kefu Chai	fd5c04acbb	.github: use the latest dbuild image scylla does not build using scylla-toolchain:fedora-38-20240521, like: ``` FAILED: repair/CMakeFiles/repair.dir/repair.cc.o /usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -isystem /__w/scylladb/scylladb/abseil -O2 -std=gnu++2b -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/repair.cc.o -MF repair/CMakeFiles/repair.dir/repair.cc.o.d -o repair/CMakeFiles/repair.dir/repair.cc.o -c /__w/scylladb/scylladb/repair/repair.cc In file included from /__w/scylladb/scylladb/repair/repair.cc:10: In file included from /__w/scylladb/scylladb/repair/row_level.hh:14: In file included from /__w/scylladb/scylladb/repair/task_manager_module.hh:14: In file included from /__w/scylladb/scylladb/tasks/task_manager.hh:20: In file included from /__w/scylladb/scylladb/seastar/include/seastar/coroutine/parallel_for_each.hh:24: /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:6161:14: error: requires clause differs in template redeclaration requires forward_range<_Vp> ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:5860:14: note: previous template declaration is here requires input_range<_Vp> ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19547	2024-07-03 16:57:22 +03:00
Kefu Chai	a88496318b	alternator: use std::to_underlying() when appropriate now that we can use C++23 features, there is no need to hardcode the underlying type anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19546	2024-07-02 18:51:29 +03:00
Kefu Chai	57def6f1e2	docs: install in `non-package` node when running `make setup`, we could have following failure: ``` Installing the current project: scylla (4.3.0) The current project could not be installed: No file/folder found for package scylla If you do not want to install the current project use --no-root ``` because docs is not a proper python project named "scylla", and do not have a directory structure expected by poetry. what we expect from poetry, is to manage the dependencies for building the document. so, in this change, we install in the `non-package` mode when running `poetry install`, this skips the root package, which does not exist. as an alternative, we could put an empty `scylla.py` under `docs` directory, but that'd be overkill. or we could pass `--no-root` to `poetry install`, but would be ideal if we can keep the settings in a single place. see also https://python-poetry.org/docs/basic-usage/#operating-modes, and https://python-poetry.org/docs/cli/#options-2, for more details on the settings and command line options of poetry. please note this setting was added to poetry 1.8, so the required poetry version is updated. we might need to upgrade poetry in existing installation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19498	2024-07-02 18:03:20 +03:00
Michael Litvak	08b29460fc	mv: skip building view updates on a pending replica Currently, a pending replica that applies a write on a table that has materialized views, will build all the view updates as a normal replica, only to realize at a late point, in db::view::get_view_natural_endpoint(), that it doesn't have a paired view replica to send the updates to. It will then either drop the view updates, or send them to a pending view replica, if such exists. This work is unnecessary since it may be dropped, and even if there is a pending view replica to send the updates to, the updates that are built by the pending replica may be wrong since it may have incomplete information. This commit fixes the inefficiency by skipping the view update building step when applying an update on a pending replica. The metric total_view_updates_on_wrong_node is added to count the cases that a view update is determined to be unnecessary. The test reproduces the scenario of writing to a table and applying the update on a pending replica, and verifies that the pending replica doesn't try to build view updates. Fixes scylladb/scylladb#19152 Closes scylladb/scylladb#19488	2024-07-02 13:10:18 +02:00
Nadav Har'El	d61513c41c	Merge 'reader_concurrency_semaphore: make CPU concurrency configurable' from Botond Dénes The reader concurrency semaphore restricts the concurrency of reads that require CPU (intention: they read from the cache) to 1, meaning that if there is even a single active read which declares that it needs just CPU to proceed, no new read is admitted. This is meant to keep the concurrency of reads in the cache at 1. The idea is that concurrency in the cache is not useful: it just leads to the reactor rotating between these reads, all of the finishing later then they could if they were the only active read in the cache. This was observed to backfire in the case where there reads from a single table are mostly very fast, but on some keys are very slow (hint: collection full of tombstones). In this case the slow read keeps up the fast reads in the queue, increasing the 99th percentile latencies significantly. This series proposes to fix this, by making the CPU concurrency configurable. We don't like tunables like this and this is not a proper fix, but a workaround. The proper fix would be to allow to cut any page early, but we cannot cut a page in the middle of a row. We could maybe have a way of detecting slow reads and excluding them from the CPU concurrency. This would be a heuristic and it would be hard to get right. So in this series a robust and simple configurable is offered, which can be used on those few clusters which do suffer from the too strict concurrency limit. We have seen it in very few cases so far, so this doesn't seem to be wide-spread. Fixes: https://github.com/scylladb/scylladb/issues/19017 This fixes a regression introduced in 5.0, so we have to backport to all currently supported releases Closes scylladb/scylladb#19018 * github.com:scylladb/scylladb: test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc Please enter the commit message for your changes. Lines starting test/boost/reader_concurrency_semaphore_test: hoist require_can_admit reader_concurrency_semaphore: wire in the configurable cpu concurrency reader_concurrency_semaphore: add cpu_concurrency constructor parameter db/config: introduce reader_concurrency_semahore_cpu_concurrency	2024-07-02 13:39:00 +03:00
Tzach Livyatan	6ea475ec76	Docs: Fix a typo in sstable-corruption.rst Closes scylladb/scylladb#19515	2024-07-02 11:58:27 +02:00
Kamil Braun	bcfdeda080	Merge 'co-routinize paxos_state functions' from Gleb Co-routinize paxos_state functions to make them more readable. * 'gleb/coroutineze-paxos-state' of github.com:scylladb/scylla-dev: paxos: simplify paxos_state::prepare code to not work with raw futures paxos: co-routinize paxos_state::learn function paxos: remove no longer used with_locked_key functions paxos: co-routinize paxos_state::accept function paxos: co-routinize paxos_state::prepare function paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access	2024-07-02 11:54:13 +02:00
Tzach Livyatan	4938927fc2	Docs: fix typo in config-commands.rst This is a leftover from https://github.com/scylladb/scylladb/pull/19578, which mistakenly update the "scylla" script name to "ScyllaDB" Closes scylladb/scylladb#19583	2024-07-02 10:54:47 +02:00
Kamil Braun	edeb266fc2	Merge 'docs, config: render logging related options' from Kefu Chai this changeset adds a filter to customize the rendering of default values, and enables the `scylladb_cc_properties` extension to display the logging message related options. it prepares for the further improvements in https://opensource.docs.scylladb.com/master/reference/configuration-parameters.html. this changeset also prepare for the improvements requested by #19463 --- it's an improvement in the document, hence no need to backport. Closes scylladb/scylladb#19483 * github.com:scylladb/scylladb: config: add descriptions for default_log_level and friends config: define log_to_syslog in a different line docs: parse log_legacy_value as declarations of config option	2024-07-02 10:44:50 +02:00
Kefu Chai	aedd145d6b	.github: add compaction to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-02 14:06:42 +08:00
Kefu Chai	e87b64b7bb	compaction: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-02 14:06:42 +08:00
Tzach Livyatan	91401f7da5	docs: Update Scylla to ScyllaDB in all RST docs files v3 Closes scylladb/scylladb#19578	2024-07-01 18:04:21 +02:00
Andrei Chekun	b6aabca9a7	Add documentation how to use allure reporting Add documentation how to install and basic usage example of the allure reporting tool. Fix typo test/README.md Related: scylladb/qa-tasks#1665 Depends on: scylladb/scylladb#18169 Closes scylladb/scylladb#18710	2024-07-01 16:21:50 +02:00
Gleb Natapov	9ebdb23002	raft: add more raft metrics to make debug easier	2024-07-01 10:55:22 +02:00
Kamil Braun	94bc9d4f5b	Merge 'Do not expire local addres in raft address map since the local node cannot disappear' from Gleb Natapov A node may wait in the topology coordinator queue for awhile before been joined. Since the local address is added as expiring entry to the raft address map it may expire meanwhile and the bootstrap will fail. The series makes the entry non expiring. Fixes scylladb/scylladb#19523 Needs to be backported to 6.0 since the bug may cause bootstrap to fail. Closes scylladb/scylladb#19557 * github.com:scylladb/scylladb: test: add test that checks that local address cannot expire between join request placemen and its processing storage_service: make node's entry non expiring in raft address map	2024-07-01 09:12:48 +02:00
Kefu Chai	90be71d959	build: cmake: include the whole archive of zstd.a before this change, when linking scylla-main, the linker discards the unreferenced symbols defined by zstd.cc. but we use constructor of static variable `registerator` to register the zstd compressor, this variable is not used from the linker's point of view. but we do rely on the side effect of its constructor. that's why the rules generated by CMake fails to build tests and scylla executables with zstd support. that's why we have following test failure: ``` boost.sstable_3_x_test.test_uncompressed_collections_read ... [Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor' == [File] - seastar/src/testing/seastar_test.cc == [Line] - 43 ``` in this change, we single out zstd.cc and build it as an archive, so that scylla-main can include as a whole. an alternative is to link scylla-main as a whole archive, but that might increase the disk foot print when building lots of tests -- some of them do not use all symbols exposed by scylla-main, and can potentially have smaller size if linker can discard the unused symbols. Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 11:51:19 +08:00
Kefu Chai	1e0af0fb7e	build: cmake: find libzstd before using it we use libzstd in zstd.cc. so let's find this library before using it. this helps user to identify problem when preparing the building environment, instead of being greeted by a compile-time failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 11:51:19 +08:00
Kefu Chai	b71b638b2e	config: add descriptions for default_log_level and friends so that their description can be displayed in `reference/configuration-parameters/` web page. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	b486f4ef01	config: define log_to_syslog in a different line before this change, docs/_ext/scylladb_cc_properties.py parses the options line by line, because `log_to_stdout` and `log_to_syslog` are defined in a single line, this script is not able to parse them, hence fails to display them on the `reference/configuration-parameters/` web page. after this change, these two member variables are defined on different lines. both of them can be displayed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	34cab80103	docs: parse log_legacy_value as declarations of config option before this change, we only consider "named_value<type>" as the declaration of option, and the "Type" field of the corresponding option is displayed if its declaration is found. otherwise, "Type" field is not rendered. but some logging related options are declared using `log_legacy_value`, so they are missing. after this change, they are displayed as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	405f624776	cql3: define dtor of modification_statement in .cc file before this change, we rely on the compiler to use the definition of `cql3::attributes` to generate the defaulted destructor in .cc file. but with clang-19, it insists that we should have a complete definition available for defining the defaulted destructor, otherwise it fails the build: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -MF CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o.d -o CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -c /home/kefu/dev/scylladb/table_helper.cc In file included from /home/kefu/dev/scylladb/table_helper.cc:10: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/coroutine.hh:25: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/future.hh:30: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql3::attributes' 91 \| static_assert(sizeof(_Tp)>0, \| ^~~~~~~~~~~ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql3::attributes>::operator()' requested here 398 \| get_deleter()(std::move(__ptr)); \| ^ /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in instantiation of member function 'std::unique_ptr<cql3::attributes>::~unique_ptr' requested here 40 \| class modification_statement : public cql_statement_opt_metadata { \| ^ /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in implicit destructor for 'cql3::statements::modification_statement' first required here /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:28:7: note: forward declaration of 'cql3::attributes' 28 \| class attributes; \| ^ ``` so, in this change, we define the destructor in .cc file, where the complete definition of `cql3::attributes` is available. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19545	2024-06-30 19:35:05 +03:00
Avi Kivity	0ce00ebfbd	Merge 'Close output stream in task manager's API get_tasks handler' from Pavel Emelyanov If client stops reading response early, the server-side stream throws but must be closed anyway. Seen in another endpoint and fixed by #19541 Closes scylladb/scylladb#19542 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close response stream on error api: Flush response output stream before closing	2024-06-30 19:34:00 +03:00
Avi Kivity	3a85d88b68	Merge 'Close output_stream in get_snapshot_details() API handler' from Pavel Emelyanov All streams used by httpd handlers are to be closed by the handler itself, caller doesn't take care of that. fixes: #19494 Closes scylladb/scylladb#19541 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close output_stream on error api: Flush response output stream before closing	2024-06-30 19:33:16 +03:00
Avi Kivity	2fbc532e4d	Update tools/python3 submodule * tools/python3 3e833f1...18fa79e (1): > reloc: use `--add-rpath` and not `--set-rpath`	2024-06-30 19:31:23 +03:00
Kefu Chai	77d2d5821d	build: cmake: do not mark cqlsh noarch in `3c7af287`, cqlsh's reloc package was marked as "noarch", and its filename was updated accordingly in `configure.py`, so let's update the CMake building system accordingly. this change should address the build failure of ``` 08:48:14 [3325/4124] Generating ../Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 FAILED: Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 cd /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist && /usr/bin/cmake -E copy /jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 Error copying file "/jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz" to "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz". ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19544	2024-06-30 19:26:54 +03:00
Nadav Har'El	44e036c53c	alternator: fix "/localnodes" to use broadcast_rpc_address Alternator's non-standard "/localnodes" HTTP request returns a list of live nodes on this DC, to consider for load balancing. The returned node addresses should be external IP addresses usable by the clients. Scylla has a configuration parameter - broadcast_rpc_address - which defines for a node an external IP address. If such a configuration exists, we need to use those external IP addresses, not the internal ones. Finding these broadcast_rpc_address of all nodes is easy, because the gossiper already gossips them. This patch also tests the new feature: 1. The existing single-node test is extended to verify that without broadcast_rpc_address we get the usual IP address. 2. A new two-node test is added to check that when broadcast_rpc_address is configured, we get that address and not the usual internal IP addresses. Fixes #18711. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-06-30 18:38:15 +03:00
Nadav Har'El	2a2e8167c8	gossiper: fix get_rpc_address() for this node Commit `dd46a92e23` introduced a function gossiper::get_rpc_address() as a shortcut for get_application_state_ptr(endpoint, RPC_ADDRESS) - i.e., it fetches the endpoint's configured broadcast_rpc_address (despite its confusing name, this is the endpoint's external IP address that clients can use to make CQL connections). But strangely, the implementation get_rpc_address() made an exception for asking about the current host - where instead of getting this node's broadcast_rpc_address, it returns its internal address, which is not what this function was supposed to do - it's not useful for it to do one thing for this node, and a different thing for other nodes, and when I wrote code that uses this function (see the next patch), this resulted in wrong results for the current node. The fix is simple - drop the wrong if(), and get the broadcast_rpc_address stored by the gossiper unconditionally - the gossiper knows it for this node just like for other nodes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-06-30 18:38:15 +03:00
Gleb Natapov	3f136cf2eb	test: add test that checks that local address cannot expire between join request placemen and its processing	2024-06-30 15:52:23 +03:00
Gleb Natapov	5d8f08c0d7	storage_service: make node's entry non expiring in raft address map Local address map entry should never expire in the address map.	2024-06-30 15:08:50 +03:00
Kefu Chai	947e28146d	dbuild: pass --tty when running in interactive mode podman does not allocate a tty by default, so without `-t` or `--tty`, one cannot use a functional terminal when interacting with the container. that what one can expect when running `dbuild -i --`, and we are greeted with : ``` bash: cannot set terminal process group (-1): Inappropriate ioctl for device bash: no job control in this shell ``` after this change, one can enjoy the good-old terminal as usual after being dropped to the container provided by `dbuild -i --`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19550	2024-06-30 12:06:55 +03:00
Pavel Emelyanov	d034cde01f	Merge 'build: update C++ standard to C++23' from Avi Kivity Switch the C++ standard from C++20 to C++23. This is straightforward, but there are a few fallouts (mostly due to std::unique_ptr that became constexpr) that need to be fixed first. Internal enhancement - no backport required Closes scylladb/scylladb#19528 * github.com:scylladb/scylladb: build: switch to C++23 config: avoid binding an lvalue reference to an rvalue reference readers: define query::partition_slice before using it in default argument test: define table_for_tests earlier compaction: define compaction_group::table_state earlier compaction: compaction_group: define destructor out-of-line compaction_manager: define compaction_manager::strategy_control earlier	2024-06-28 18:02:33 +03:00
Avi Kivity	cf66f233aa	build: remove aarch64 workarounds In `90a6c3bd7a` ("build: reduce release mode inline tuning on aarch64") we reduced inlining on aarch64, due to miscompiles. In `224a2877b9` ("build: disable -Og in debug mode to avoid coroutine asan breakage") we disabled optimization in debug mode, due to miscompiles. With clang 18.1, it appears the miscompiles are gone, and we can remove the two workarounds. Closes scylladb/scylladb#19531	2024-06-28 17:53:51 +03:00
Pavel Emelyanov	b4f9387a9d	api: Close response stream of get_compaction_history() The function must close the stream even if it throws along the way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:56:53 +03:00
Pavel Emelyanov	6d4ba98796	api: Flush output stream in get_compaction_history() call It's currently implicitly flushed on its close, but in that case close can throw while flusing. Next patch wants close not to throw and that's possible if flushing the stream in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:55:58 +03:00
Pavel Emelyanov	acb351f4ee	api: Coroutinize get_compaction_history inner function The handler returns a function which is then invoked with output_stream argument to render the json into. This function is converted into coroutine. It has yet another inner lambda that's passed into compaction_manager::get_compaction_history() as consumer lambda. It's coroutinized too. The indentation looks weird as preparation for future patching. Hopefullly it's still possible to understand what's going on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:53:46 +03:00
Pavel Emelyanov	1be8b2fd25	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:07:21 +03:00
Pavel Emelyanov	986a04cb11	api: Close response stream on error The handler's lambda is called with && stream object and must close the stream on its own regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:06:41 +03:00
Pavel Emelyanov	4897d8f145	api: Flush response output stream before closing The .close() method flushes the stream, but it may throw doing it. Next patch will want .close() not to throw, for that stream must be flushed explicitly before closing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:05:20 +03:00
Pavel Emelyanov	1839030e3b	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:41:12 +03:00
Pavel Emelyanov	a0c1552cea	api: Close output_stream on error If the get_snapshot_details() lambda throws, the output stream remains non-closed which is bad. Close it regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:40:42 +03:00
Pavel Emelyanov	d1fd886608	api: Flush response output stream before closing Otherwise close() may throw and this is what next patch will want not to happen. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:40:00 +03:00
Piotr Dulikowski	f00c4eaf72	Merge '[test.py] add --extra-scylla-cmdline-options argument for test.py' from Artsiom Mishuta this PR has 2 commits - [test: pass Scylla extra CMD args from test.py args](`6b367a04b5`) - [test: adjust scylla_cluster.merge_cmdline_options behavior](`c60b36090a`) the main goal is to solve [test.py: provide an easy-to-remember, univeral way to run scylla with trace level logging](https://github.com/scylladb/scylladb/issues/14960) issue but also can be used to easily apply additional arguments for all UnitTests and PythonTests on the fly from the test.py CMD Closes scylladb/scylladb#19509 * github.com:scylladb/scylladb: test: adjust scylla_cluster.merge_cmdline_options behavior test: pass scylla extra CMD args from test.py args	2024-06-28 11:11:29 +02:00
Kamil Braun	6ec8143e56	Merge 'Remove dead code from migration_manager and schema_tables' from Benny Halevy This short series removed some ancient legacy code from migration_manager and schema_tables, before I make further changes in this area. We have more such code under the cql3 hierarchy but it can be dealt with as a follow up. No backport required Closes scylladb/scylladb#19530 * github.com:scylladb/scylladb: schema_tables: remove dead code migration_manager: remove dead code	2024-06-28 10:59:21 +02:00
Piotr Smaron	88eda47f13	cql: forbid switching from tablets to vnodes in ALTER KS This check is already in place, but isn't fully working, i.e. switching from a vnode KS to a tablets KS is not allowed, but this check doesn't work in the other direction. To fix the latter, `ks_prop_defs::get_initial_tablets()` has been changed to handle 3 states: (1) init_tablets is set, (2) it was skipped, (3) tablets are disabled. These couldn't fit into std::optional, so a new local struct to hold these states has been introduced. Callers of this function have been adjusted to set init_tablets to an appropriate value according to the circumstances, i.e. if tablets are globally enabled, but have been skipped in the CQL, init_tablets is automatically set to 0, but if someone executes ALTER KS and doesn't provide tablets options, they're inherited from the old KS. I tried various approaches and this one resulted in the least lines of code changed. I also provided testcases to explain how the code behaves. Fixes: #18795 Closes scylladb/scylladb#19368	2024-06-28 11:41:41 +03:00
Gleb Natapov	5c72af7a93	paxos: simplify paxos_state::prepare code to not work with raw futures	2024-06-28 07:30:45 +03:00
Gleb Natapov	2b7acdb32c	paxos: co-routinize paxos_state::learn function	2024-06-28 07:30:45 +03:00
Gleb Natapov	6bf307ffe8	paxos: remove no longer used with_locked_key functions	2024-06-28 07:30:45 +03:00
Gleb Natapov	887a5a8f62	paxos: co-routinize paxos_state::accept function	2024-06-28 07:30:45 +03:00
Benny Halevy	b7f00ba4bf	schema_tables: remove dead code Well, even after 10 years, the c++ compilers still do not compile Java... And having that legacy code laying around not only it doesn't help anyone understand what's going on, but on the contrary, it's confusing and distracting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 20:34:02 +03:00
Benny Halevy	5f6c411656	migration_manager: remove dead code Well, even after 10 years, the c++ compilers still do not compile Java... And having that legacy code laying around not only it doesn't help anyone understand what's going on, but on the contrary, it's confusing and distracting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 20:30:33 +03:00
Avi Kivity	4d85db9f39	build: switch to C++23 Set the C++ dialect to C++23, allowing us to use the new features.	2024-06-27 19:36:13 +03:00
Avi Kivity	d14eec8160	config: avoid binding an lvalue reference to an rvalue reference config_file::add_deprecated_options() returns an lvalue reference to a parameter which itself is an rvalue reference. In C++20 this is bad practice (but not a bug in this case) as rvalue references are not expected to live past the call. In C++23, it fails to compile. Fix by accepting an lvalue reference for the parameter, and adjust the caller.	2024-06-27 19:36:13 +03:00
Avi Kivity	ed816afac4	readers: define query::partition_slice before using it in default argument C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is the default argument of make_reversing_reader; define it earlier (by including its header) to build with C++23.	2024-06-27 19:36:13 +03:00
Piotr Dulikowski	f9abe52d3b	Merge 'test: auth: add random tag to resources in test_auth_v2_migration' from Marcin Maliszkiewicz Those tests are sometimes failing on CI and we have two hypothesis: 1. Something wrong with consistency of statements 2. Interruption from another test run (e.g. same queries performed concurrently or data remained after previous run) To exclude or confirm 2. we add random marker to avoid potential collision, in such case it will be clearly visible that wrong data comes from a different run. Related scylladb/scylladb#18931 Related scylladb/scylladb#18319 backport: no, just a test fix Closes scylladb/scylladb#19484 * github.com:scylladb/scylladb: test: auth: add random tag to resources in test_auth_v2_migration test: extend unique_name with random sufix	2024-06-27 17:35:14 +02:00
Gleb Natapov	58912c2cc1	paxos: co-routinize paxos_state::prepare function	2024-06-27 18:10:49 +03:00
Gleb Natapov	4f546b8b79	paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access	2024-06-27 18:09:30 +03:00
Avi Kivity	e5807555bd	test: define table_for_tests earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is table_for_tests; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Avi Kivity	d5ba0b4041	compaction: define compaction_group::table_state earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is compaction_group::table_state; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Avi Kivity	9ecf4ada49	compaction: compaction_group: define destructor out-of-line Define compaction_group::~compaction_group() out-of-line to prevent problems instantiating compaction_group::_table_state, which is an std::unique_ptr. In C++23, std::unique_ptr is constexpr, which means its methods (in this case the destructor) require seeing the definition of the class at the point of instantiation.	2024-06-27 17:54:12 +03:00
Avi Kivity	050e7bbd64	compaction_manager: define compaction_manager::strategy_control earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is compaction_manager::strategy_control; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Andrei Chekun	561e88f00e	[test.py] Throw meaningful error when something wrong wit Scylla binary Fixes: https://github.com/scylladb/scylladb/issues/19489 There is already a check that Scylla binary is executable, but it's done on later stage. So in logs for specific test file there will be a message about something wrong with binary, but in console there will be now signs of that. Moreover, there will be an error that completely misleads what actually happened and why test run failed. With this check test will fail earlier providing the correct reason why it's failed Closes scylladb/scylladb#19491	2024-06-27 17:38:32 +03:00
Avi Kivity	581d619572	storage_proxy: trace speculative retries A speculative retry can appear out of the blue[1] and confuse people, as it looks like the consistency level was elevated. Fix by adding such a tracepoint. Sample output: ``` activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2024-06-27 14:25:58.947000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2024-06-27 14:25:58.947918 \| 127.0.0.1 \| 2 \| 127.0.0.1 Processing a statement for authenticated user: anonymous [shard 0] \| 2024-06-27 14:25:58.948025 \| 127.0.0.1 \| 108 \| 127.0.0.1 Creating read executor for token -4069959284402364209 with all: [127.0.0.1, 127.0.0.2] targets: [127.0.0.2] repair decision: NONE [shard 0] \| 2024-06-27 14:25:58.948125 \| 127.0.0.1 \| 209 \| 127.0.0.1 Added extra target 127.0.0.1 for speculative read [shard 0] \| 2024-06-27 14:25:58.948128 \| 127.0.0.1 \| 212 \| 127.0.0.1 Creating speculating_read_executor [shard 0] \| 2024-06-27 14:25:58.948129 \| 127.0.0.1 \| 213 \| 127.0.0.1 read_data: sending a message to /127.0.0.2 [shard 0] \| 2024-06-27 14:25:58.948138 \| 127.0.0.1 \| 222 \| 127.0.0.1 Launching speculative retry for data [shard 0] \| 2024-06-27 14:25:58.948234 \| 127.0.0.1 \| 318 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2024-06-27 14:25:58.948235 \| 127.0.0.1 \| 319 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] \| 2024-06-27 14:25:58.948246 \| 127.0.0.1 \| 330 \| 127.0.0.1 [reader concurrency semaphore user] admitted immediately [shard 0] \| 2024-06-27 14:25:58.948250 \| 127.0.0.1 \| 334 \| 127.0.0.1 [reader concurrency semaphore user] executing read [shard 0] \| 2024-06-27 14:25:58.948258 \| 127.0.0.1 \| 342 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] \| 2024-06-27 14:25:58.948281 \| 127.0.0.1 \| 365 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2024-06-27 14:25:58.948311 \| 127.0.0.1 \| 395 \| 127.0.0.1 Querying is done [shard 0] \| 2024-06-27 14:25:58.948320 \| 127.0.0.1 \| 404 \| 127.0.0.1 read_data: message received from /127.0.0.1 [shard 0] \| 2024-06-27 14:25:58.948351 \| 127.0.0.2 \| 12 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2024-06-27 14:25:58.948354 \| 127.0.0.1 \| 438 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] \| 2024-06-27 14:25:58.948370 \| 127.0.0.2 \| 31 \| 127.0.0.1 [reader concurrency semaphore user] admitted immediately [shard 0] \| 2024-06-27 14:25:58.948374 \| 127.0.0.2 \| 35 \| 127.0.0.1 [reader concurrency semaphore user] executing read [shard 0] \| 2024-06-27 14:25:58.948388 \| 127.0.0.2 \| 49 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] \| 2024-06-27 14:25:58.948405 \| 127.0.0.2 \| 66 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2024-06-27 14:25:58.948424 \| 127.0.0.2 \| 85 \| 127.0.0.1 Querying is done [shard 0] \| 2024-06-27 14:25:58.948430 \| 127.0.0.2 \| 91 \| 127.0.0.1 read_data handling is done, sending a response to /127.0.0.1 [shard 0] \| 2024-06-27 14:25:58.948436 \| 127.0.0.2 \| 97 \| 127.0.0.1 read_data: got response from /127.0.0.2 [shard 0] \| 2024-06-27 14:25:58.949140 \| 127.0.0.1 \| 1224 \| 127.0.0.1 Request complete \| 2024-06-27 14:25:58.947449 \| 127.0.0.1 \| 449 \| 127.0.0.1 ``` Ref #18988 [1] not completely out of the blue, `ff29f430` indicates that a speculative read can happen. Closes scylladb/scylladb#19520	2024-06-27 17:37:36 +03:00
Botond Dénes	b4f3809ad2	test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc Please enter the commit message for your changes. Lines starting	2024-06-27 09:57:11 -04:00
Botond Dénes	9cbdd8ef92	test/boost/reader_concurrency_semaphore_test: hoist require_can_admit This is currently a lambda in a test, hoist it into the global scope and make it into a function, so other tests can use it too (in the next patch).	2024-06-27 09:57:11 -04:00
Botond Dénes	07c0a8a6f8	reader_concurrency_semaphore: wire in the configurable cpu concurrency Before this patch, the semaphore was hard-wired to stop admission, if there is even a single permit, which is in the need_cpu state. Therefore, keeping the CPU concurrency at 1. This patch makes use of the new cpu_concurrency parameter, which was wired in in the last patches, allowing for a configurable amount of concurrent need_cpu permits. This is to address workloads where some small subset of reads are expected to be slow, and can hold up faster reads behind them in the semaphore queue.	2024-06-27 09:57:11 -04:00
Botond Dénes	59faa6d4ff	reader_concurrency_semaphore: add cpu_concurrency constructor parameter In the case of the user semaphore, this receives the new reader_concurrency_semaphore_cpu_limit config item. Not used yet.	2024-06-27 09:57:11 -04:00
Benny Halevy	7f05f95ec4	conf: scylla.yaml: enable_tablets: expand documentation The exiting documentation comment for `enable_tablets` is very terse and lacks details about the effect of enabling or disabling tablets. This change adds more details about the impact of `enable_tablets` on newly created keyspaces, and hot to disable tablets when keyspaces are created. Also, a note was added to warn about the irreversibility of the tablets enablement per keyspace. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 14:41:43 +03:00
Avi Kivity	0d23b8165e	build: update frozen toolchain to Fedora 40 with clang 18.1.6 This refreshes our dependencies to a supported distribution. Closes scylladb/scylladb#19205	2024-06-27 14:27:21 +03:00
Yaron Kaikov	efa94b06c2	.github/scripts/label_promoted_commits.py: fix adding labels when PR is closed `prs = response.json().get("items", [])` will return empty when there are no merged PRs, and this will just skip the all-label replacement process. This is a regression following the work done in #19442 Adding another part to handle closed PRs (which is the majority of the cases we have in Scylla core) Fixes: https://github.com/scylladb/scylladb/issues/19441 Closes scylladb/scylladb#19497	2024-06-27 14:00:44 +03:00
Pavel Emelyanov	6c1e5c248f	main,proxy: Drain proxy in its stop_remote Currently proxy initialization is pretty disperse, in particular it's stopped in several steps -- first drain_on_shutdown() then stop_remote(). In between there's nothing that needs proxy in any particular sate, so those two steps can be merged into one. refs: scylladb/scylladb#2737 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19344	2024-06-27 12:26:51 +02:00
Pavel Emelyanov	1a219c674c	s3/client: Always retry http requests Real S3 server is known to actively close connections, thus breaking S3 storage backend at random places. The recent http client update is more robust against that, but the needed feature is OFF by default. refs: scylladb/seastar#1883 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19461	2024-06-27 13:14:24 +03:00
Artsiom Mishuta	919d44e0c7	test: adjust scylla_cluster.merge_cmdline_options behavior adjust merge_cmdline_options behaviour to append --logger-log-level option instead of merge this behaviour can be changed(if needed) to previour version(all merge): merge_cmdline_options(list1, list2, appending_options=[]) or, to append different cmd options: merge_cmdline_options(list1, list2, appending_options=[option1,option2])	2024-06-27 10:03:31 +02:00
Artsiom Mishuta	440785bc41	test: pass scylla extra CMD args from test.py args this commit introduces a test.py option --extra-scylla-cmdline-options to pass extra scylla cmdline options for all tests. Options should be space-separated: '--logger-log-level raft=trace --default-log-level error'	2024-06-27 10:02:55 +02:00
Artsiom Mishuta	677173bf8b	test: generate core dumps on crashes in nodetool tests The nodetool tests does not set the asan/ubsan options to abort on error and create core dumps Fix by setting the environment variables in nodetool tests. Closes scylladb/scylladb#19503	2024-06-27 10:44:33 +03:00
Marcin Maliszkiewicz	b708c5701f	test: auth: add random tag to resources in test_auth_v2_migration Those tests are sometimes failing on CI and we have two hypothesis: 1. Something wrong with consistency of statements 2. Interruption from another test run (e.g. same queries performed concurrently or data remained after previous run) To exclude or confirm 2. we add random marker to avoid potential collision, in such case it will be clearly visible that wrong data comes from a different run. Related scylladb/scylladb#18931 Related scylladb/scylladb#18319	2024-06-27 09:28:27 +02:00
Marcin Maliszkiewicz	d08a80b34f	test: extend unique_name with random sufix This reduces collision risk in an unlikely and incorrect setup where tests would be run concurrently by multiple processes.	2024-06-27 09:28:02 +02:00
Anna Stuchlik	e2994a19d5	doc: update Scylla Doctor installation This commit updates the instuctions on how to download and run Scylla Doctor, following the changes in how Scylla Doctor is released. Closes scylladb/scylladb#19510	2024-06-27 10:22:08 +03:00
Botond Dénes	2fe50cda22	Merge 'chunked_vector enhancements' from Benny Halevy This short series enhances utils::chunked_vector so it could be used more easily to convert dht::partition_range_vector to chunked_vector, for example. - utils: chunked_vector: document invalidation of iterators on move - utils: chunked_vector: add ctor from std::initializer_list - utils: chunked_vector: add ctor from a single value No backport required Closes scylladb/scylladb#19462 * github.com:scylladb/scylladb: chunked_vector_test: add tests for value-initialization constructor utils: chunked_vector: add ctor from std::initializer_list utils: chunked_vector: document invalidation of iterators on move	2024-06-27 10:20:47 +03:00
Benny Halevy	92f8d219b3	conf: scylla.yaml: remove tablets from experimental_features doc comment tablets are no longer in experimental_features since `83d491af02`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 08:55:30 +03:00
Anna Stuchlik	072542a5cc	doc: add a page with ScyllaDB limits This commit adds a page listing the ScyllDB limits we know today. The page can and should be extended when other limits are confirmed. Closes scylladb/scylladb#19399	2024-06-27 08:28:51 +03:00
Kefu Chai	52f1168a3d	repair: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19508	2024-06-26 21:57:03 +03:00
Israel Fruchter	3c7af28725	cqlsh: update cqlsh submodule this change updates the cqlsh submodule: * tools/cqlsh/ ba83aea3...73bdbeb0 (4): > install.sh: replace tab with spaces > define the the debug packge is empty > tests: switch from using cqlsh bash to the test the python file > package python driver as wheels it also includes follow change to package cqlsh as a regular rpm instead of as a "noarch" rpm: so far cqlsh bundles the python-driver in, but only as source. meaning the package wasn't architecture, and also didn't have the libev eventloop compiled in. Since from python 3.12 and up, that would mean we would fallback into asyncio eventloop (which still exprimental) or into error (once we'll sync with the driver upstream) so to avoid those, we are change the packaging of cqlsh to be architecture specific, and get cqlsh compiled, and bundle all of it's requirements as per architecture installed bundle of wheels. using `shiv`, i.e. one file virtualenv that we'll be packing into our artifacts Ref: https://github.com/scylladb/scylla-cqlsh/issues/90 Ref: https://github.com/scylladb/scylla-cqlsh/pull/91 Ref: https://github.com/linkedin/shiv Closes scylladb/scylladb#19385 * tools/cqlsh ba83aea...242876c (1): > Merge 'package python driver as wheels' from Israel Fruchter Update tools/cqlsh/ submodule in which, the change of `define the the debug packge is empty` should address the build failure like ``` Processing files: scylla-cqlsh-debugsource-6.1.0~dev-0.20240624.c7748f60c0bc.aarch64 error: Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list RPM build errors: Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list ``` Closes scylladb/scylladb#19473	2024-06-26 12:07:21 +03:00
Botond Dénes	1fca341514	test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair	2024-06-26 04:05:17 -04:00
Botond Dénes	d3b1ccd03a	replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag Now enable_tombstone_gc_for_streaming_and_repair is wired in all the way to maybe_compact_for_streaming(), so we can implement the toggling of tombstone GC based on it.	2024-06-26 04:05:17 -04:00
Botond Dénes	415457be2b	replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming() Just wiring, the new flag will be used in the next patch.	2024-06-26 04:05:17 -04:00
Botond Dénes	d5a149fc01	db/config: introduce enable_tombstone_gc_for_streaming_and_repair To control whether the compacting reader (if enabled) for streaming and repair can garbage-collect tombstones. Default is false (previous behaviour). Not wired yet.	2024-06-26 04:05:17 -04:00
Pavel Emelyanov	263668bc85	transport: Use sharded<>::invoke_on_others() When preparing statement, the server code first does it on non-local shards, then on local one. The former call is done the hard way, while there's a short sugar sharded<> class method doing it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19485	2024-06-25 22:17:59 +03:00
Kamil Braun	13fc2bd854	Merge `notify other nodes on boot` from Gleb The series adds a step during node's boot process, just before completing the initialization, in which the node sends a notification to all other normal nodes in the cluster that it is UP now. Other nodes wait for this node to be UP and in normal state before replying. This ensures that, in a healthy cluster, when a node start serving queries the entire cluster knows its up-to-date state. The notification is a best effort though. If some nodes are down or do not reply in time the boot process continues. It is somewhat similar to shutdown notification in this regard. * 'gleb/notify-up-v2' of github.com:scylladb/scylla-dev: gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization Wait for booting node to be marked UP before complete booting. gossiper: move gossip verbs to the idl	2024-06-25 17:58:17 +02:00
Aleksandra Martyniuk	2394e3ee7a	repair: drop timeout from table_sync_and_check Delete 10s timeout from read barrier in table_sync_and_check, so that the function always considers all previous group0 changes. Fixes: #18490. Closes scylladb/scylladb#18752	2024-06-25 17:44:31 +02:00
Avi Kivity	c80dc57156	Merge 'batchlog replay: bypass tombstones generated by past replays' from Botond Dénes The `system.batchlog` table has a partition for each batch that failed to complete. After finally applying the batch, the partition is deleted. Although the table has gc_grace_second = 0, tombstones can still accumulate in memory, because we don't purge partition tombstones from either the memtable or the cache. This can lead to the cache and memtable of this table to accumulate many thousands of even millions of tombstones, making batchlog replay very slow. We didn't notice this before, because we would only replay all failed batches on unbootstrap, which is rare and a heavy and slow operation on its own right already. With repair-based tombstone-gc however, we do a full batchlog replay at the beginning of each repair, and now this extra delay is noticeable. Fix this by making sure batchlog replays don't have to scan through all the tombstones generated by previous replays: * flush the `system.batchlog` memtable at the end of each batchlog replay, so it is cleared of tombstones * bypass the cache Fixes: https://github.com/scylladb/scylladb/issues/19376 Although this is not a regression -- replay was like this since forever -- now that repair calls into batchlog replay, every release which uses repair-based tombstone-gc should get this fix Closes scylladb/scylladb#19377 * github.com:scylladb/scylladb: db/batchlog_manager: bypass cache when scanning batchlog table db/batchlog_manager: replace open-coded paging with internal one db/batchlog_manager: implement cleanup after all batchlog replay cql3/query_processor: for_each_cql_result(): move func to the coro frame	2024-06-25 16:11:01 +03:00
Avi Kivity	371e37924f	Merge 'Rebuild bloom filters that have bad partition estimates' from Lakshmi Narayanan Sreethar The bloom filters are built with partition estimates because the actual partition count might not be available in all cases. If the estimate is inaccurate, the bloom filters might end up being too large or too small compared to their optimal sizes. This PR rebuilds bloom filters with inaccurate partition estimates using the actual partition count before the filter is written to disk. A bloom filter is considered to have an inaccurate estimate if its false positive rate based on the current bitmap size is either less than 75% or more than 125% of the configured false positive rate. Fixes #19049 A manual test was run to check the impact of rebuild on compaction. Table definition used : CREATE TABLE scylla_bench.simple_table (id int PRIMARY KEY); Setup : 3 billion random rows with id in the range [0, 1e8) were inserted as batches of 5 rows into scylla_bench.simple_table via 80 threads. Compaction statistics : scylla_bench.simple_table : (a) Total number of compactions : `1501` (b) Total time spent in compaction : `9h58m47.269s` (c) Number of compactions which rebuilt bloom filters : `16` (d) Total time taken by these 16 compactions which rebuilt bloom filters : `2h55m11.89s` (e) Total time spent by these 16 compactions to rebuild bloom filters : `8m6.221s` which is - `4.63%` of the total time taken by the compactions which rebuilt filters (d) - `1.35%` of the total compaction time (b). (f) Total bytes saved by rebuilding filters : `388 MB` system.compaction_history : (a) Total number of compactions : `77` (b) Total time spent in compaction : `21.24s` (c) Number of compactions which rebuilt bloom filters : `74` (d) Time taken by these 74 compactions which rebuilt bloom filters : `20.48s` (e) Time spent by these 74 compactions to rebuild bloom filters : `377ms` which is - `1.84%` of the total time taken by the compactions which rebuilt filters (d) - `1.77%` of the total compaction time (b). (f) Total bytes saved by rebuilding filters : `20 kB` The following tables also had compactions and the bloom filter was rebuilt in all those compactions. However, the time taken for every rebuild was observed as 0ms from the logs as it completed within a microsecond : system.raft : (a) Total number of compactions : `2` (b) Total time spent in compaction : `106ms` (c) Total bytes saved by rebuilding filters : `960 B` system_schema.tables : (a) Total number of compactions : `1` (b) Total time spent in compaction : `25ms` (c) Total bytes saved by rebuilding filter : `312 B` system.topology : (a) Total number of compactions : `1` (b) Total time spent in compaction : `25ms` (c) Total bytes saved by rebuilding filter : `320 B` Closes scylladb/scylladb#19190 * github.com:scylladb/scylladb: bloom_filter_test: add testcase to verify filter rebuilds test/boost: move bloom filter tests from sstable_datafile_test into a new file sstables/mx/writer: rebuild bloom filters with bad partition estimates sstables/mx/writer: add variable to track number of partitions consumed sstable: introduce sstable::maybe_rebuild_filter_from_index() sstable: add method to return filter format for the given sstable version utils/i_filter: introduce get_filter_size()	2024-06-25 15:35:09 +03:00
Nadav Har'El	35ace0af5c	Merge 'Move some /storage_proxy API endpoints to config.cc' from Pavel Emelyanov API endpoints that need a particular service to get data from are registered next to this service (#2737). In /storage_proxy function there live some endpoints that work with config, so this PR moves them to the existing config.cc with config-related endpoints. The path these endpoints are registered with remains intact, so some tweak in proxy API registration is also here. Closes scylladb/scylladb#19417 * github.com:scylladb/scylladb: api: Use provided db::config, not the one from ctx api: Move some config endpoints from proxy to config api: Split storage_proxy api registration api: Unset config endpoints	2024-06-25 13:55:58 +03:00
Michał Chojnowski	c7dc3b9b58	scylla-gdb.py: add line information to coroutine names in `scylla fiber` For convenience. Note that this line info only points to the function as a whole, not to the current suspend point. I think there's no facility for converting the `__coro_index` to the current suspend point automatically. Before: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` After: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) at sstables/sstables.cc:352) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) at sstables/sstables.cc:570) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const at sstables/sstables.cc:992) ``` Closes scylladb/scylladb#19478	2024-06-25 13:55:10 +03:00
Kefu Chai	def432617d	docs: print out invalid branch name to help user to understand what the extension is expecting. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19477	2024-06-25 13:17:25 +03:00
Botond Dénes	31c0fa07d8	db/batchlog_manager: bypass cache when scanning batchlog table Scans should not pollute the cache with cold data, in general. In the case of the batchlog table, there is another reason to bypass the cache: this table can have a lot of partition tombstones, which currently are not purged from the cache. So in certain cases, using the cache can make batch replay very slow, because it has to scan past tombstones of already replayed batches.	2024-06-25 06:15:47 -04:00
Botond Dénes	29f610d861	db/batchlog_manager: replace open-coded paging with internal one query_processor has built-in paging support, no need to open-code paging in batchlog manager code.	2024-06-25 06:15:47 -04:00
Botond Dénes	2dd057c96d	db/batchlog_manager: implement cleanup after all batchlog replay We have a commented code snippet from Origin with cleanup and a FIXME to implement it. Origin flushes the memtables and kicks a compaction. We only implement the flush here -- the flush will trigger a compaction check and we leave it up to the compaction manager to decide when a compaction is worthwhile. This method used to be called only from unbootstrap, so a cleanup was not really needed. Now it is also called at the end of repair, if the table is using repair-based tombstone-gc. If the memtable is filled with tombstones, this can add a lot of time to the runtime of each repair. So flush the memtable at the end, so the tombstones can be purged (they aren't purged from memtables yet).	2024-06-25 06:15:47 -04:00
Botond Dénes	4e96e320b4	cql3/query_processor: for_each_cql_result(): move func to the coro frame Said method has a func parameter (called just f), which it receives as rvalue ref and just uses as a reference. This means that if caller doesn't keep the func alive, for_each_cql_result() will run into use-after-free after the first suspention point. This is unexpected for callers, who don't expect to have to keep something alive, which they passed in with std::move(). Adjust the signature to take a value instead, value parameters are moved to the coro frame and survive suspention points. Adjust internal callers (query_internal()) the same way. There are no known vulnerable external callers.	2024-06-25 06:15:25 -04:00
Benny Halevy	3f23016cc0	perf-simple-query: add mean and standard deviation stats Currently, perf-simple-query summarizes the statistics only for the throughput, printing the median, median absolute deviation, minimum, and maximum. But the throughput put is typically highly variable and its median is noisy. This patch calculates also the mean and standard deviation and does that also for instructions_per_op and cpu_cycles_per_op to present a fuller picture of the performance metrics. Output example: ``` random-seed=3383668492 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 95613.97 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42456 insns/op, 22117 cycles/op, 0 errors) 97538.45 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42454 insns/op, 22094 cycles/op, 0 errors) 95883.37 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42438 insns/op, 22268 cycles/op, 0 errors) 96791.45 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42433 insns/op, 22256 cycles/op, 0 errors) 97894.71 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42420 insns/op, 22010 cycles/op, 0 errors) throughput: mean=96744.39 standard-deviation=996.89 median=96791.45 median-absolute-deviation=861.02 maximum=97894.71 minimum=95613.97 instructions_per_op: mean=42440.08 standard-deviation=14.99 median=42437.59 median-absolute-deviation=13.58 maximum=42456.15 minimum=42420.10 cpu_cycles_per_op: mean=22148.98 standard-deviation=110.43 median=22117.04 median-absolute-deviation=106.89 maximum=22267.70 minimum=22010.42 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19450	2024-06-25 12:25:59 +03:00
Yaron Kaikov	394cba3e4b	.github/workflow: close and replace label when backport promoted Today after Mergify opened a Backport PR, it will stay open until someone manually close the backport PR , also we can't track using labels which backport was done or not since there is no indication for that except digging into the PR and looking for a comment or a commit ref The following changes were made in this PR: * trigger add-label-when-promoted.yaml also when the push was made to `branch-x.y` * Replace label `backport/x.y` with `backport/x.y-done` in the original PR (this will automatically update the original Issue as well) * Add a comment on the backport PR and close it Fixes: https://github.com/scylladb/scylladb/issues/19441 Closes scylladb/scylladb#19442	2024-06-25 12:11:28 +03:00
Benny Halevy	8daf755f8a	statement_restrictions: partition_ranges_from_singles: no need to default-initialize result Currently, the returned `ranges` vector is first initialized to `product_size` and then the returned partition ranges are copied into it. Instead, we can simply reserve the vector capacity, without initializing it, and then emplace all partition ranges onto it using std::back_inserter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19457	2024-06-25 12:11:28 +03:00
Laszlo Ersek	656a9468bb	HACKING.md: fix typo in "--overprovisioned" option name Grepped the tree for "--overprovisioned" (coming from <https://university.scylladb.com/courses/scylla-essentials-overview/lessons/high-availability/topic/consistency-level-demo-part-1/>), and noticed that this instance was not matched by grep (while another one just below was). Fixes: `4f838a82e2` Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19458	2024-06-25 12:11:28 +03:00
Kefu Chai	adca415245	bytes: drop unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. the callers in alternator/streams.cc is updated to use `fmt::print()` to format the `bytes` instances. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19448	2024-06-25 12:11:28 +03:00
Kefu Chai	94e36d4af4	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. this change addresses the leftover of 850ee7e170a. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19467	2024-06-25 12:11:28 +03:00
Benny Halevy	378578b481	chunked_vector_test: add tests for value-initialization constructor Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 12:08:11 +03:00
Benny Halevy	5bd2ee7507	utils: chunked_vector: add ctor from std::initializer_list Prepare for using utils::chunked_vector for dht::partition_range_vector Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 12:08:06 +03:00
Benny Halevy	7780af2e84	utils: chunked_vector: document invalidation of iterators on move chunked_vector differs from std::vector where the latter's move constructor is required to preserve and iterators to the moved-from vector. In contrast, chunked_vector::iterator keeps a pointer to the chunked_vector::_chunks data, which is a utils::small_vector, and when moved, it might invalidate the iterator since the moved-to _chunks might copy the contents of the internal capacity rather than moving the allocated capacity. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 11:44:50 +03:00
Botond Dénes	c7317be09a	db/config: introduce reader_concurrency_semahore_cpu_concurrency To allow increasing the semaphore's CPU concurrency, which is currently hard-limited to 1. Not wired yet.	2024-06-25 04:00:11 -04:00
Piotr Dulikowski	85219e9294	configure.py: fix the 'configure' rule generated during regeneration The Ninja makefile (build.ninja) generated by the ./configure.py script is smart enough to notice when the configure.py script is modified and re-runs the script in order to regenerate itself. However, this operation is currently not idempotent and quickly breaks because information about the Ninja makefile's name is not passed properly. This is the rule used for makefile's regeneration: ``` rule configure command = {python} configure.py --out={buildfile}.new $configure_args && mv {buildfile}.new {buildfile} generator = 1 description = CONFIGURE $configure_args ``` The `buildfile` variable holds the value of the `--out` option which is set to `build.ninja` if not provided explicitly. Note that regenerating the makefile passes a name with the `.new` suffix added to the end; we want to first write the file in full and then overwrite the old file via a rename. However, notice that the script was called with `--out=build.ninja.new`; the `configure` rule in the regenerated file will have `configure.py --out=build.ninja.new.new` and then `mv build.ninja.new.new build.ninja.new`. So, second regeneration will just leave a build.ninja.new file which is not useful. Fix this by introducing an additional parameter `--out-final-name`. This parameter is only supposed to be used in the regeneration rule and its purpose is to preserve information about the original file name. After this change I no longer see `build.ninja.new` being created after a sequence of `touch configure.py && ninja` calls. Closes scylladb/scylladb#19428	2024-06-24 21:20:32 +03:00
Laszlo Ersek	a4c6ae688a	install-dependencies.sh: set file mode creation mask to 0022 The docs [1] clearly say "install-dependencies.sh" should be run as "root"; however, the script silently assumes that the umask inherited from the calling environment is 0022. That's not necessarily the case, and there's an argument to be made for "root" setting umask 0077 by default. The script behaves unexpectedly under such circumstances; files and directories it creates under /opt and /usr/local are then not accessible to unprivileged users, leading to compilation failures later on. Set the creation mask explicitly to 0022. [1] https://github.com/scylladb/scylladb/blob/master/HACKING.md#dependencies Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19464	2024-06-24 19:46:15 +03:00
Marcin Maliszkiewicz	a4e26585e5	git: add build.ninja.new to .gitignore Since some time executing our ninja build targets generates also build.ninja.new file. Adding it to .gitignore for convenience as we won't commit this file. Closes scylladb/scylladb#19367	2024-06-24 16:48:50 +03:00
Kefu Chai	e61061d19f	test.py: improve help message on tests selection Since `3afbd21f`, we are able to selectively choose a single test in a boost test executable which represents a test suite, and to choose a single test in a pytest script with the syntax of "test_suite::test_case". it's very handy for manual testing. so let's document in the command line help message as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19454	2024-06-24 14:27:02 +03:00
Kefu Chai	e9d8c25e86	alternator: define static variable before this change, when linking an executable referencing `marker`, we could have following error: ``` 13:58:02 ld.lld: error: undefined symbol: alternator::event_id::marker 13:58:02 >>> referenced by streams.cc 13:58:02 >>> build/dev/alternator/streams.o:(from_string_helper<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>, alternator::event_id>::Set(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&, alternator::event_id, rjson::internal::throwing_allocator&)) 13:58:02 clang-16: error: linker command failed with exit code 1 (use -v to see invocation) ``` it turns out `event_id::marker` is only declared, but never defined. please note, the non-inline static member variable in its class definition is not considered as a definition, see [class.static.data](https://eel.is/c++draft/class.static.data#3) > The declaration of a non-inline static data member in its class > definition is not a definition and may be of an incomplete type > other than cv void. so, let's declare it as a `constexpr` instead. it implies `inline`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19452	2024-06-24 13:15:00 +03:00
Kefu Chai	af2b0b030b	test/pylib: use raw string to avoid using escape sequence before this change, when running test like: ```console ./test.py --mode release topology_experimental_raft/test_tablets /home/kefu/dev/scylladb/test/pylib/scylla_cluster.py:333: SyntaxWarning: invalid escape sequence '$' deleted_sstable_re = f"^./{keyspace}/{table}-[0-9a-f]{{32}}/. \(deleted$$" ``` we could have the warning above. because `\(` is not a valid escape sequence, but the Python interpreter accepts it as two separated characters of `\(` after complaining. but it's still annoying. so, let's use a raw string here, as we want to match "(deleted)". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19451	2024-06-24 11:11:44 +03:00
Lakshmi Narayanan Sreethar	a09556a49f	bloom_filter_test: add testcase to verify filter rebuilds Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:11:37 +05:30
Lakshmi Narayanan Sreethar	4aa5698f0d	test/boost: move bloom filter tests from sstable_datafile_test into a new file Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	21e463b108	sstables/mx/writer: rebuild bloom filters with bad partition estimates The bloom filters are built with partition estimates, as the actual partition count might not be available in all the cases. If the estimate was bad, the bloom filters might end up too large or too small than their optimal sizes. Rebuild such bloom filters with actual partition count before the filter is written to disk and the sstable is sealed. Fixes #19049 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	afc90657d6	sstables/mx/writer: add variable to track number of partitions consumed Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	fccb1a11e5	sstable: introduce sstable::maybe_rebuild_filter_from_index() Add method sstable::maybe_rebuild_filter_from_index() that rebuilds bloom filters which had bad partition estimates when they were built. The method checks the false positive rate based on the current bitset size against the configured false positive rate to decide whether a filter needs to be rebuilt. If the current false positive rate is within 75% to 125% of the configured false positive rate, the bloom filter will not be rebuilt. Otherwise, the filter will be rebuilt from the index entries. This method should only be called before an SSTable is sealed as the bloom filter is updated in-place. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	a7d77f6304	sstable: add method to return filter format for the given sstable version Extract out the filter format computing logic from sstable::read_filter into a separate function. This is done so that the subsequent patches can make use of this function. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:01 +05:30
Botond Dénes	6dd6f0198e	utils/i_filter: introduce get_filter_size() Currently, the only way to get the size of a filter, for certain parameters is to actually create one. This requires a seastar thread context and potentially also allocates huge amount of memory. Provdide a method which just calculates the size, without any of the above mentioned baggage. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:01 +05:30
Kefu Chai	a230ecc4eb	utils/murmur_hash: replace rotl64() with std::rotl() since we are now able to use C++20, there is no need to use the homebrew rotl64(). so in this change, we replace rotl64() with std::rotl(), and remove the former from the source tree. the underlying implementations of these two solutions are equivalent, so no performance changes are expected. all caller sites have been audited: all of them pass `uint64` as the first parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19447	2024-06-24 08:24:43 +03:00
Marcin Maliszkiewicz	794440eb85	test: skip checking default role in test_auth_v2_migration Default role creation in auth-v1 is asynchronous and all nodes race to create it so we'd need to delay the test and wait. Checking this particular role doesn't bring much value to the test as we check other roles to demonstrate correctness. Fixes scylladb/scylladb#19039 Closes scylladb/scylladb#19424	2024-06-23 19:50:55 +03:00
Avi Kivity	0d52f0684a	Merge 'Sanitize gossiper API endpoints management' from Pavel Emelyanov Gossiper has two blocs of endpoints, both are registered in legacy/random place in main. This PR moves them next to gossiper start and adds unregistration for both. refs: #2737 Closes scylladb/scylladb#19425 * github.com:scylladb/scylladb: api: Remove dedicated failure_detector registration method api: Move failure_detector endpoints set/unset to gossiper api: Unset failure detector endpoints method api: (Un)Register gossiper API in correct place api: Unset gossiper endpoints on stop asi: Coroutinize set_server_gossip()	2024-06-23 19:35:11 +03:00
Kefu Chai	850ee7e170	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19429	2024-06-23 19:25:23 +03:00
Kefu Chai	72fdee1efb	README.md: add badges for cron jobs these jobs are scheduled to verify the builds of scylla, like if it builds with the latest Seastar, if scylla can generated reproducible builds, and if it builds with the nightly build of clang. the failure of these workflow are not very visible without clicking into the corresponding workflow in https://github.com/scylladb/scylladb/actions. in this change, we add their badges in the testing section of README.md, so one can identify the test failures of them if any, Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19430	2024-06-23 19:24:40 +03:00
Kefu Chai	a7e38ada8e	test: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19432	2024-06-23 18:02:52 +03:00
zhouxiang	694014591a	test/alternator/test_projection_expression.py: remove useless comparisons pytest.raises expects a block of code that will raise an exception, not a comparison of results. Closes scylladb/scylladb#19436	2024-06-23 13:53:14 +03:00
Pavel Emelyanov	d8009ed843	api/cache_service: Don't use database to perform map+reduce on The sharded<database> is used as a map_reduce0() method provider, there's no real need in database itself. Simple smp::map_reduce() would work just as good. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19364	2024-06-21 19:47:25 +03:00
Kefu Chai	f781c3babe	.github: add reproducible-build workflow to verify that scylla builds are reproducible. the new workflow builds scylla twice with master HEAD, and compares the md5sums of the built scylla executables. it fails if the md5sum:s do not match. this workflow is triggered at 5AM every Friday. its status can be found at https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml after it's built for the first time. Refs #19225 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19409	2024-06-21 19:39:37 +03:00
Nadav Har'El	81a02f06dd	test/cql-pytest: add more tests for SELECT's LIMIT SELECT's "LIMIT" feature is tested in combination with other features in different test/cql-pytest/*.py source files - for examples the combination of LIMIT and GROUP BY is tested in test_group_by.py. This patch adds a new test file, test_limit.py, for testing aspects basic usage of LIMIT that weren't already tested in other files. The new file also has a comment saying where we have other tests for LIMIT combined with other features. All the new tests pass (on both Scylla and Cassandra). But they can be useful as regression tests to test patches which modify the behavior of LIMIT - e.g., pull reques #18842. This patch also adds another test in test_group_by.py. This adds to one of the tests for the combination of LIMIT and GROUP BY (in this case, GROUP BY of clustering prefix, no aggregation) also a check for paging, that was previously missing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19392	2024-06-21 19:35:15 +03:00
Pavel Emelyanov	755be887a6	api: Remove dedicated failure_detector registration method It's now empty and can be dropped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:54 +03:00
Pavel Emelyanov	2bfa1b3832	api: Move failure_detector endpoints set/unset to gossiper These two api functions both need gossiper service and only it, and thus should have set/unset calls next to each other. It's worth putting them into a single place Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:54 +03:00
Pavel Emelyanov	88a6094121	api: Unset failure detector endpoints method There's one more set of endpoints that need gossiper -- the failure_detector ones. They are registered, but not unregistered, so here's the method to do it. It's not called by any code yet, because next patch would need to rework the caller anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	f84694166e	api: (Un)Register gossiper API in correct place Each service's endpoints are to be registered just after the service itself, so should gossiper's Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	19f3a9805a	api: Unset gossiper endpoints on stop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	c7547b9c7e	asi: Coroutinize set_server_gossip() One of the next patches will add more async calls here, so not to create then-chains, convert it into a coroutine Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Kefu Chai	eef64a6bb8	build: cmake: do not add "absl::headers" to include dirs `absl::headers` is a library, not the path to its headers. before this change, the command lines of genereated build rule look like: ``` -I/home/kefu/dev/scylladb/repair/absl::headers ``` this does not hurt, as other libraries might add the intended include dir to the compiler command line, but this is just wrong. so let's remove it. please note, `repair` target already links against `absl::headers`. so we don't need to add `absl::headers` to its linkage again. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19384	2024-06-21 19:22:17 +03:00
Kefu Chai	7b10cc8079	treewide: include seastar headers with brackets this change was created in the same spirit of `ebff5f5d`. despite that we include Seastar as a submodule, Seastar is not a part of scylla project. so we'd better include its headers using brackets. `ebff5f5d` addressed this cosmetic issue a while back. but probably clangd's header-insertion helped some of contributor to insert the missing headers with `"`. so this style of `include` returned to the tree with these new changes. unfortunately, clangd does not allow us to configure the style of `include` at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19406	2024-06-21 19:20:27 +03:00
Kefu Chai	987fd59f21	test: correct some misspellings fix a typo in source code. this typo was identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19412	2024-06-21 19:16:11 +03:00
Kefu Chai	52693fc21c	Update seastar submodule * seastar 9ce62705...908ccd93 (42): > include/seastar: do not include unused headers > timer-set: Add missing sanity headers > tutorial.md: fix typos > Update tutorial.md to reflect update preemption methods > tutorial.md: remove trailing whitespace > json: Add a test for jsonable objects > json: Make formatter::write(vector/map/umap) copy their arguments > json: Make formatter call write for jsonable > test: futures: verify stream yields the consumed value > build: add pyyaml to install-dependencies.sh > stall-analyser: remove unused variable > stall-analyser: use itertools.dropwhile when appropriate > scripts: sort packages alphanumerically > docker: bind the file instead of copying during the build stage > docker: lint dockerfile > dns: use undeprecated c-ares APIs > stall-analyser: use argparse.FileType when appropriate > http/client: Retry request over fresh connection in case old one failed > http/client: Fix indentation after previous patch > http/client: Pass request and handle by reference > http/client: Introduce make_new_connection() > http/client: Fix parser result checking > http/client: Document max_connections > test/http: Generalize http connection factory > loopback_socket: Shutdown socket on EOF close > loopback_socket: Rename buffer's shutdown() to abort() > test: Add test for sharded<>::invoke_on_...() compilation > net/tls: Added additional error codes > io-tester.md: update available parameters for job description > io_tester: expose extent_allocation_size_hint via job param > file: Unfriend reactor class > memory.cc: fix cross-shard shrinking realloc > sharded: Mark invoke_on_others() helper lambda mutable > scheduling: Unfriend reactor from scheduling_group_key > reactor: Make allocate_scheduling_group_specific_data() accept key_id argument > reactor: Add local key_id variable to allocate_scheduling_group_specific_data() > timer: Unfriend reactor > reactor: Generalize timer removal > timer: Add type alias for timer_set > reactor: Move reactor::complete_timers() to timer_set > tests: test protobuf support in prometheus_test.py > tests: enable prometheus_test.py to test metrics without aggregation Closes scylladb/scylladb#19405	2024-06-21 18:52:58 +03:00
Dawid Medrek	2446cce272	db/hints: Initialize endpoint managers only for valid hint directories Before these changes, it could happen that Scylla initialized endpoint managers for hint directories representing * host IDs before migrating hinted handoff to using host IDs, * IP addresses after the migration. One scenario looked like this: 1. Start Scylla and upgrade the cluster to using host IDs. 2. Create, by hand, a hint directory representing an IP address. 3. Trigger changing the host filter in hinted handoff; it could be achieved by, for example, restricting the set of data centers Scylla is allowed to save hints for. When changing the host filter, we browse the hint directories and create endpoint managers if we can send hints towards the node corresponding to a given hint directory. We only accepted hint directories representing IP addresses and host IDs. However, we didn't check whether the local node has already been upgraded to host-ID-based hinted handoff or not. As a result, endpoint managers were created for both IP addresses and host IDs, no matter whether we were before or after the migration. These changes make sure that any time we browse the hint directories, we take that into account. Fixes scylladb/scylladb#19172 Closes scylladb/scylladb#19173	2024-06-21 15:59:49 +02:00
Avi Kivity	3cfb0503a9	Update tools/cqlsh submodule for v6.0.21-scylla * tools/cqlsh 0d58e5c...ba83aea (1): > requirements: update scylla-driver	2024-06-21 16:04:21 +03:00
Piotr Dulikowski	cf2b4bf721	Merge 'cdc: do not include unused headers' from Kefu Chai also add `auth` and `cdc` to iwyu's `CLEANER_DIR` setting. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19410 * github.com:scylladb/scylladb: .github: add auth and cdc to iwyu's CLEANER_DIR cdc: do not include unused headers	2024-06-21 13:44:40 +02:00
Pavel Emelyanov	0330640b4d	api: Use provided db::config, not the one from ctx The set_server_config() already has the db::config reference for endpoints to work with, there's no need to obtain one via ctx and database. This change kills two birds with one stone -- less users of database as config provider, less places that need http context -> database dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:30:54 +03:00
Pavel Emelyanov	afb48d8ab9	api: Move some config endpoints from proxy to config Those getting (and setting, but these are not implemented) various timeouts work on config, whilst register themselves in storage_proxy function. Since the "service" they need to work with is config, move the endpoints to config endpoints code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:29:38 +03:00
Pavel Emelyanov	0aad406a2f	api: Split storage_proxy api registration The set_server_storage_proxy() does two things -- registers storage_proxy "function" and sets proxy routes, that depend on it. Next patches will move some /storage_proxy/... endpoints registration to earlier stage, so the function should be ready in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:28:29 +03:00
Pavel Emelyanov	473cb62a9a	api: Unset config endpoints The set_server_config() needs the stop-time peer, here it is. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:28:06 +03:00
Kefu Chai	c429a8d8ae	sstables: use "me" sstable format by default in `7952200c`, we changed the `selected_format` from `mc` to `me`, but to be backward compatible the cluster starts with "md", so when the nodes in cluster agree on the "ME_SSTABLE_FORMAT" feature, the format selector believes that the node is already using "ME", which is specified by `_selected_format`. even it is actually still using "md", which is specified by `sstable_manager::_format`, as changed by `54d49c04`. as explained above, it was specified to "md" in hope to be backward compatible when upgrading from an existign installation which might be still using "md". but after a second thought, since we are able to read sstables persisted with older formats, this concern is not valid. in other words, `7952200c` introduced a regression which changed the "default" sstable format from `me` to `md`. to address this, we just change `sstable_manager::_format` to "me", so that all sstables are created using "me" format. a test is added accordingly. Fixes #18995 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19293	2024-06-21 12:56:01 +03:00
Yaron Kaikov	57428d373b	[actions] fix sync label from PR to linked issue in `b8c705bc54` i modified the even name to `pull_request_target`, This caused skipping sync process when PR label was added/removed Fixing it Closes scylladb/scylladb#19408	2024-06-21 11:39:44 +03:00
Kamil Braun	627d566811	Merge 'join_token_ring, gossip topology: recalculate sync nodes in wait_alive' from Patryk Jędrzejczak The node booting in gossip topology waits until all NORMAL nodes are UP. If we removed a different node just before, the booting node could still see it as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop. Although the issue fixed by this PR caused only test flakiness, it could also manifest in real clusters. It's best to backport this PR to 5.4 and 6.0. Fixes scylladb/scylladb#17526 Closes scylladb/scylladb#19387 * github.com:scylladb/scylladb: join_token_ring, gossip topology: update obsolete comment join_token_ring, gossip topology: fix indendation after previous patch join_token_ring, gossip topology: recalculate sync nodes in wait_alive	2024-06-21 10:22:32 +02:00
Piotr Dulikowski	c3536015e4	Merge 'cql3/statement/select_statement: do not parallelize single-partition aggregations' from Michał Jadwiszczak This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request. Fixes scylladb/scylladb#19349 Closes scylladb/scylladb#19350 * github.com:scylladb/scylladb: test/boost/cql_query_test: add test for single-partition aggregation cql3/select_statement: do not parallelize single-partition aggregations	2024-06-21 08:50:00 +02:00
Kefu Chai	694fe58d6e	.github: add auth and cdc to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-21 14:29:48 +08:00
Kefu Chai	1a4740ddc0	cdc: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-21 14:29:48 +08:00
Avi Kivity	fdc1449392	treewide: rename flat_mutation_reader_v2 to mutation_reader flat_mutation_reader_v2 was introduced in a pair of commits in 2021: `e3309322c3` "Clone flat_mutation_reader related classes into v2 variants" `08b5773c12` "Adapt flat_mutation_reader_v2 to the new version of the API" as a replacement for flat_mutation_reader, using range_tombstone_change instead of range_tombstone to represent represent range tombstones. See those commits for more information. The transition was incremental; the last use of the original flat_mutation_reader was removed in 2022 in commit `026f8cc1e7` "db: Use mutation_partition_v2 in mvcc" In turn, flat_mutation_reader was introduced in 2017 in commit `748205ca75` "Introduce flat_mutation_reader" To transition from a mutation_reader that nested rows within a partition in a separate stream, to a flat reader that streamed partitions and rows in the same stream. Here, we reclaim the original name and rename the awkward flat_mutation_reader_v2 to mutation_reader. Note that mutation_fragment_v2 remains since we still use the original for compatibilty, sometimes. Some notes about the transition: - files were also renamed. In one case (flat_mutation_reader_test.cc), the rename target already existed, so we rename to mutation_reader_another_test.cc. - a namespace 'mutation_reader' with two definitions existed (in mutation_reader_fwd.hh). Its contents was folded into the mutation_reader class. As a result, a few #includes had to be adjusted. Closes scylladb/scylladb#19356	2024-06-21 07:12:06 +03:00
Avi Kivity	185338c8cf	Merge 'Reduce TWCS off-strategy space overhead' from Raphael "Raph" Carvalho Normally, the space overhead for TWCS is 1/N, where is number of windows. But during off-strategy, the overhead is 100% because input sstables cannot be released earlier. Reshaping a TWCS table that takes ~50% of available space can result in system running out of space. That's fixed by restricting every TWCS off-strategy job to 10% of free space in disk. Tables that aren't big will not be penalized with increased write amplification, as all input (disjoint) sstables can still be compacted in a single round. Fixes #16514. Closes scylladb/scylladb#18137 * github.com:scylladb/scylladb: compaction: Reduce twcs off-strategy space overhead to 10% of free space compaction: wire storage free space into reshape procedure sstables: Allow to get free space from underlying storage replica: don't expose compaction_group to reshape task	2024-06-20 18:51:25 +03:00
Kefu Chai	42b9784650	build: cmake: mark wasm "ALL" so that "wasm" target is built. "wasm" generates the text format of wasm code. and these wasm applications are used by the test_wasm tests. the rules generated by `configure.py` adds these .wat files as a dependency of `{mode}-build`, which is in turn a dependency of `{mode}`. in this change, let's mirror this behavior by making `wasm` ALL, so it is built by the default target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19391	2024-06-20 18:45:31 +03:00
Kefu Chai	caf1149f11	cql-pytest/test_sstable: do not import unused modules Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19389	2024-06-20 17:14:28 +03:00
Avi Kivity	02cf17f4dc	Merge 'Sanitize load_meter API handlers management' from Pavel Emelyanov The service in question is pretty small one, but it has its API endpoint that lives in /storage_service group. Currently when a service starts and has any endpoints that depend on it, the endpoint registration should follow it (#2737). Here's the PR that does it for load meter. Another goal of this change is that http context now has one less dependency onboard. Closes scylladb/scylladb#19390 * github.com:scylladb/scylladb: api: Remove ctx->load_meter dependency api: Use local load_meter reference in handlers api: Fix indentation after previous patch api: Coroutinize load_meter::get_load_map handler api: Move load meter handlers api: Add set/unset methods for load_meter	2024-06-20 17:07:19 +03:00
Gleb Natapov	7bc05c3880	gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization When a node bootstraps it may happen that some nodes still see it as bootstrapping while the node itself already is in normal state and ready to serve queries. We want to delay the bootstrap completion until all nodes see the new node as normal. Piggy back on UP notification to do so and what of the node that sent the notification to be seen as normal. Fixes #18678	2024-06-20 16:37:56 +03:00
Anna Stuchlik	027cf3f47d	doc: remove the link to Scylladb Google group The group is no longer active and should be removed from resources. Closes scylladb/scylladb#19379	2024-06-20 15:31:03 +02:00
Yaron Kaikov	f2705b3887	[action] add github context info for better debugging It seems that we skip the sync label process between PR and linked Issues Adding those debug prints will allow us to understand why Closes scylladb/scylladb#19393	2024-06-20 16:17:04 +03:00
Gleb Natapov	28c0a27467	Wait for booting node to be marked UP before complete booting. Currently a node does not wait to be marked UP by other nodes before complete booting which creates a usability issue: during a rolling restart it is not enough to wait for local CQL port to be opened before restarting next node, but it is also needed to check that all other nodes already see this node as alive otherwise if next node is restarted some nodes may see two node as dead instead of one. This patch improves the situation by making sure that boot process does not complete before all other nodes do not see the booting one as alive. This is still a best effort thing: if some nodes are unreachable or gossiper propagation takes too much time the boot process continues anyway. Fixes scylladb/scylladb#19206	2024-06-20 14:55:40 +03:00
Pavel Emelyanov	de80094815	Merge 'treewide: remove unused operator<<' from Kefu Chai since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. there are more occurrences of unused operator<< in the tree, but let's do the cleanup piecemeal. --- this is a cleanup, so no need to backport Closes scylladb/scylladb#19346 * github.com:scylladb/scylladb: types: remove unused operator<< node_ops: remove unused operator<< lang: remove unused operator<< gms: remove unused operator<< dht: remove unused operator<< test: do not use operator<< for std::optional	2024-06-20 13:18:59 +03:00
Pavel Emelyanov	873d76c02b	api: Remove ctx->load_meter dependency Now the API uses captured reference and the explicit dependency is not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:38:28 +03:00
Pavel Emelyanov	d85e70ef98	api: Use local load_meter reference in handlers Now it uses ctx.lm dependency, but the idiomatic way for API is to use the argument one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:48 +03:00
Pavel Emelyanov	bc5e360066	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:39 +03:00
Pavel Emelyanov	e54f651beb	api: Coroutinize load_meter::get_load_map handler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:18 +03:00
Pavel Emelyanov	40c178bee2	api: Move load meter handlers Now they are in storage service set/unset helper, but there's the dedicated set/unset pair for meter's enpoints. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:36:38 +03:00
Pavel Emelyanov	724d62aa87	api: Add set/unset methods for load_meter The meter is pretty small sevice and its API is also tiny. Still, it's a standalone top-level service, and its API should come next to it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:35:58 +03:00
Botond Dénes	b09196ac49	Merge 'tasks: fix tasks abort' from Aleksandra Martyniuk Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: #19304. Requires backport to all versions containing task manager Closes scylladb/scylladb#19305 * github.com:scylladb/scylladb: test: add test for abort while a task is being unregistered tasks: fix tasks abort	2024-06-20 12:09:30 +03:00
Kefu Chai	1a724f22f9	mutation: silence false alarm from clang-tidy before this change, because it seems that we move away from `p2` in each iteration, so the succeeding iterations are moving from an empty `p2`, clang-tidy warns at seeing this. but we only move from `p2._static_row` in the first iteration when the dest `mutation_partition` instance's static row is empty. and in the succeeding iterations, the dest `mutation_partition` instance's static row is not empty anymore if it is set. so, this is a false alarm. in this change, we silence this warning. another option is to extract the single-shot mutation out of the loop, and pass the `std::move(p2)` only for the single-shot mutation, but that'd be a much more intrusive change. we can revisit this later. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19331	2024-06-20 12:05:20 +03:00
Kefu Chai	9f0b60c7a0	rust: disable incremental build for release build so that the release build is reproducible. a reproduciable helps developers to perform postmortem debugging. Fixes #19225 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19374	2024-06-20 12:01:14 +03:00
Patryk Jędrzejczak	bcc0a352b7	join_token_ring, gossip topology: update obsolete comment The code mentioned in the comment has already been added. We change the comment to prevent confusion.	2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak	7735bd539b	join_token_ring, gossip topology: fix indendation after previous patch	2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak	017134fd38	join_token_ring, gossip topology: recalculate sync nodes in wait_alive Before this patch, if we booted a node just after removing a different node, the booting node may still see the removed node as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop.	2024-06-20 10:59:49 +02:00
Anna Stuchlik	680405b465	doc: separate Entrprise- from OSS-only content This commit adds files that contain Open Source-specific information and includes these files with the .. scylladb_include_flag:: directive. The files include a) a link and b) Table of Contents. The purpose of this update is to enable adding Open Source/Enterprise-specific information in the Reference section. Closes scylladb/scylladb#19362	2024-06-20 11:58:32 +03:00
Piotr Dulikowski	75441ee120	Merge 'mv: fix value of the gossiped view update backlog' from Wojciech Mitros Currently, when calculating the view update backlog for gossip, we start with `db::view::update_backlog()` and compare it to backlogs from all shards. However, this backlog can't be compared to other backlogs - it has size 0 and we compare the fraction current/size when comparing backlogs, causing us to compare with `NaN`. This patch fixes it by starting the comparisons with an empty backlog. The patch introducing this issue (`f70f774e40`) wasn't backported, so this one doesn't need to be either Closes scylladb/scylladb#19247 * github.com:scylladb/scylladb: mv: make the view update backlog unmofidiable mv: fix value of the gossiped view update backlog	2024-06-20 06:27:11 +02:00
Piotr Dulikowski	78a40dbe2c	Merge 'cql: remove global_req_id from schema_altering_statement' from Marcin Maliszkiewicz Such field is no longer needed as the information comes directly from group0_batch. Fixes scylladb/scylladb#19365 Backport: no, we don't backport code cleanups Closes scylladb/scylladb#19366 * github.com:scylladb/scylladb: cql: remove global_req_id from schema_altering_statement cql: switch alter keyspace prepare_schema_mutations to use group0_batch	2024-06-20 06:21:48 +02:00
Dawid Medrek	c56de90a26	test/boost/hint_test.cc: Add missing parse() callback Before these changes, compilation was failing with the following error: In file included from test/boost/hint_test.cc:12: /usr/include/fmt/ranges.h:298:7: error: no member named 'parse' in 'fmt::formatter<db::hints::sync_point::host_id_or_addr>' 298 \| f.parse(ctx); \| ~ ^ We add the missing callback. Closes scylladb/scylladb#19375	2024-06-19 23:19:33 +02:00
Wojciech Mitros	cde14a5788	mv: make the view update backlog unmofidiable Currently, a view update backlog may reach an invalid state, when its max is 0 and its relative_size() is NaN as a result. This can be achieved either by constructing the backlog with a 0 max or by modifying the max of an existing backlog. In particular, this happens when creating the backlog using the default constructor. In this patch the the default constructor is deleted and a check is added to make sure that the max is different than 0 is added to its constructor - if the check fails, we construct an empty backlog instead, to handle the possibility of getting an invalid backlog sent from a node with a version that's missing this check. Additionally, we make the backlogs members private, exposing them only through const getters.	2024-06-19 19:44:57 +02:00
Pavel Emelyanov	5fe4290f66	gitattributes: Mark swagger .js files as binary The goal is the same as in `29768a2d02` (gitattributes: Mark *.svg as binary) -- prevent grep from searching patterns in those files. Despite those files are, in fact, javascript code, the way they are formatted is not suitable for human reading, so it's unlikely that anyone would be interested in grep-ing patters in it. At the same time, those files consist of of very long lines, so if a grep finds a pattern in one of those, the output is spoiled. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19357	2024-06-19 15:07:56 +03:00
Botond Dénes	9d1fa828be	Merge 'utils/large_bitset: replace reserve_partial with utils::reserve_gently' from Lakshmi Narayanan Sreethar Replace the reserve_partial loop in large_bitset constructor with a new function - reserve_gently() that can reserve memory without stalling by repeatedly calling reserve_partial() method of the passed container. Closes scylladb/scylladb#19361 * github.com:scylladb/scylladb: utils/large_bitset: replace reserve_partial with utils::reserve_gently utils/stall_free: introduce reserve_gently	2024-06-19 14:31:59 +03:00
Michał Jadwiszczak	8eb5ca8202	test/boost/cql_query_test: add test for single-partition aggregation	2024-06-19 09:24:17 +02:00
Piotr Dulikowski	7567b87e72	Merge 'auth: reuse roles select query during cache population' from Marcin Maliszkiewicz With big number of shards in the cluster (e.g. 500+) due to cache periodic refresh we experience high load on role_permissions table (e.g. 1k op/s). The load on roles table is amplified because to populate single entry in the cache we do several selects on roles table. Some of this can't be avoided because roles are arranged in a tree-like structure where permissions can be inherited. This patch tries to reuse queries which are simply duplicated. It should reduce the load on roles table by up to 50%. Fixes scylladb/scylladb#19299 Closes scylladb/scylladb#19300 * github.com:scylladb/scylladb: auth: reuse roles select query during cache population auth: coroutinize service::get_uncached_permissions auth: coroutinize service::has_superuser	2024-06-19 07:53:47 +02:00
Marcin Maliszkiewicz	56707e2965	cql: remove global_req_id from schema_altering_statement Such field is no longer needed as the information comes directly from group0_batch. Fixes scylladb/scylladb#19365	2024-06-18 20:26:09 +02:00
Lakshmi Narayanan Sreethar	9ad800cfb9	utils/large_bitset: replace reserve_partial with utils::reserve_gently Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-18 23:36:30 +05:30
Lakshmi Narayanan Sreethar	31414f54c6	utils/stall_free: introduce reserve_gently Add reserve_gently() that can reserve memory without stalling by repeatedly calling reserve_partial() method of the passed container. Update the comments of existing reserve_partial() methods to mention this newly introduced reserve_gently() wrapper. Also, add test to verify the functionality. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-18 23:36:30 +05:30
Marcin Maliszkiewicz	685aecde61	cql: switch alter keyspace prepare_schema_mutations to use group0_batch This is needed to simplify the code in the following commit.	2024-06-18 19:54:55 +02:00
Michał Jadwiszczak	e9ace7c203	cql3/select_statement: do not parallelize single-partition aggregations Currently reads with WHERE clause which limits them to be single-partition reads, are unnecessarily parallelized. This commit checks this condition and the query doesn't use forward_service in single-partition reads.	2024-06-18 19:21:32 +02:00
Pavel Emelyanov	f7d5d4877c	Merge '[test.py] Fix several issues in log gathering' from Andrei Chekun Related: https://github.com/scylladb/scylladb/issues/17851 Fix the issue that test logs were not deleted Fix the issue that the URL to the failed test directory was incorrectly shown even when artifacts_dir_url option was not provided Fix the issue that there were no node logs when it failed to join the cluster Closes scylladb/scylladb#19115 * github.com:scylladb/scylladb: [test.py] Fix logs had multiplication of lines [test.py] Fix log not deleted [test.py] Fix log for failed node was nod added to failed directory [test.py] Fix URl for failed logs directory in CI	2024-06-18 15:37:29 +03:00
Aleksandra Martyniuk	50cb797d95	test: add test for abort while a task is being unregistered	2024-06-18 13:41:51 +02:00
Aleksandra Martyniuk	3463f495b1	tasks: fix tasks abort Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: #19304.	2024-06-18 13:39:29 +02:00
Botond Dénes	2123b22526	Merge 'doc: add 6.x.y to 6.x.z and remove 5.x.y to 5.x.z upgrade guide' from Anna Stuchlik This PR removes the 5.x.y to 5.x.z upgrade guide and adds the 6.x.y to 6.x.z upgrade guide. The previous maintenance upgrade guides, such as from 5.x.y to 5.x.z, consisted of several documents - separate for each platform. The new 6.x.y to 6.x.z upgrade guide is one document - there are tabs to include platform-specific information (we've already done it for other upgrade guides as one generic document is more convenient to use and maintain). I did not modify the procedures. At some point, they have been reviewed for previous upgrade guides. Fixes https://github.com/scylladb/scylladb/issues/19322 - This PR must be backported to branch-6.0, as it adds 6.x specific content. Closes scylladb/scylladb#19340 * github.com:scylladb/scylladb: doc: remove the 5.x.y to 5.x.z upgrade guide doc: add the 6.x.y to 6.x.z upgrade guide-6	2024-06-18 14:24:38 +03:00
Wojciech Mitros	1de5566cfa	mv: fix value of the gossiped view update backlog Currently, when calculating the view update backlog for gossip, we start with `db::view::update_backlog()` and compare it to backlogs from all shards. However, this backlog can't be compared to other backlogs - it has size 0 and we compare the fraction current/size when comparing backlogs, causing us to compare with `NaN`. This patch fixes it by starting the comparisons with an empty backlog.	2024-06-18 13:15:18 +02:00
Kefu Chai	87247c6542	.github: add workflow to build with latest seastar so we can be awared that if scylla builds with seastar master HEAD, and to be prepared if a build failure is found. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19135	2024-06-18 13:34:43 +03:00
Andrei Chekun	6a4b441bf2	[test.py] Fix logs had multiplication of lines Since the test name was not unique across the run and when we were using a --repeat option, there were several handlers for the same file. With this change test name and accordingly, the log name will be different for the same test but different repeat case. Remove mode from the test name since it's already in mode directory.	2024-06-18 11:14:07 +02:00
Andrei Chekun	b01a5f9bd9	[test.py] Fix log not deleted One of the created log files was not deleted at all, because there was no delete command. Unlink moved on later stage explicitly after removing the handler that writing to this file to avoid the possibility that something will be added after removing the file.	2024-06-18 11:14:01 +02:00
Kefu Chai	0a74d45425	build: cmake: add commitlog_cleanup_test in `94cdfcaa94`, we added commitlog_cleanup_test to `configure.py`, but didn't add it to the CMake building system. in this change, let's add it to the CMake building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19314	2024-06-18 12:12:28 +03:00
Kefu Chai	68ef7dda79	config: correct the comment on printable_to_json() seastar::format() does not use operator<< under the hood, it uses {fmt}, so update the comment accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19315	2024-06-18 12:08:59 +03:00
Nadav Har'El	2ec1e0f0d5	test/cql-pytest: tests verifying UUID sort order In issue #15561 some doubts were raised regarding the way ScyllaDB sorts UUID values. This patch adds a heavily-commented cql-pytest test that helps understand - and verify that understanding - of the way Scylla sorts UUIDs, and shows there is some reason in the madness (in particular, Version 1 UUIDs (time uuids) are sorted like timeuuids, and not as byte arrays. The new tests check the different cases (see the comments in the test), and as usual for cql-pytest tests - they passes also on Cassandra, which allows us to confirm that the sort order we used is identical to the one used by Cassandra and not something that Scylla mis-implemented. Having this test in our suite will also ensure that the UUID ordering never changes accidentally in the future. If it ever changes, it can break access to existing tables that use UUID clustering keys, so it shouldn't change. Fixes #15561 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19343	2024-06-18 12:05:30 +03:00
Pavel Emelyanov	147552c34a	Merge 'configurable maintenance (streaming) semaphore count resource limit' from Botond Dénes Making the count resources on the maintenance (streaming) semaphore live update via config. This will allow us to improve repair speed on mixed-shard clusters, where we suspect that reader trashing -- due to the combination of high number of readers on each shard and very conservative reader count limit (10) -- is the main cause of the slowness. Making this count limit confgurable allows us to start experimenting with this fix, without committing to a count limit increase (or removal), addressing the pain in the field. Refs: #18269 No OSS backport needed. Closes scylladb/scylladb#19248 * github.com:scylladb/scylladb: replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit db/config: introduce maintenance_reader_concurrency_semaphore_count_limit reader_concurrency_semaphore: make count parameter live-update	2024-06-18 12:02:24 +03:00
Gleb Natapov	fb764720d3	topology coordinator: add more trace level logging for debugging Add more logging that provide more visibility into what happens during topology loading. Message-ID: <ZnE5OAmUbExVZMWA@scylladb.com>	2024-06-18 10:34:03 +02:00
Botond Dénes	1acc57e19d	Merge 'schema: Make "describe" use extensions to string' from Calle Wilund Fixes #19334 Current impl uses hardcoded printing of a few extensions. Instead, use extension options to string and print all. Note: required to make enterprise CI happy again. Closes scylladb/scylladb#19337 * github.com:scylladb/scylladb: schema: Make "describe" use extensions to string schema_extensions: Add an option to string method	2024-06-18 11:28:11 +03:00
Botond Dénes	495f7160da	Update tools/jmx submodule * tools/jmx 53696b13...3328a229 (1): > scylla-apiclient: add missing license for SBOM report	2024-06-18 11:11:57 +03:00
Kefu Chai	fd0de02b81	types: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	2c1a3e7191	node_ops: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	84f0fd6823	lang: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	ec5f0fccce	gms: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	51d686ea9f	dht: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 11:26:20 +08:00
Kefu Chai	ef0f4eaef2	test: do not use operator<< for std::optional we don't provide it anymore, and if any of existing type provides constructor accepting an `optional<>`, and hence can be formatted using operator<< after converting it, neither shall we rely on this behavior, as it is fragile. so, in this change, we switch to `fmt::print()` to use {fmt} to print `optional<>`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 10:41:48 +08:00
Andrei Chekun	3c921d5712	Add allure pytest adaptor to the toolchain Add allure-pytest pip dependency to be able to use it for generating the allure report later. Main benefits of the allure report: 1. Group test failures 2. Possibility to attach log files to she test itself 3. Timeline of test run 4. Test description on the report 5. Search by test name or tag [avi: regenerate toolchain] Closes scylladb/scylladb#19335	2024-06-17 23:17:01 +03:00
Nadav Har'El	4faceeaa33	Merge 'treewide: drop thrift support' from Kefu Chai thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> - [x] not a fix, no need to backport Closes scylladb/scylladb#18453 * github.com:scylladb/scylladb: config: expand on rpc_keepalive's description api: s/rpc/thrift/ db/system_keyspace: drop thrift_version from system.local table transport: do not return client_type from cql_server::connection::make_client_key() treewide: drop thrift support	2024-06-17 22:36:49 +03:00
Andrei Chekun	8845978ec5	[test.py] Unbreak cql-pytest and alternator Provide possibility to run pytest without explicitly providing mode parameter Closes scylladb/scylladb#19342	2024-06-17 21:41:09 +03:00
Piotr Dulikowski	85128c5b10	Merge 'cql3: always return created event in create keyspace statement' from Marcin Maliszkiewicz cql3: always return created event in create ks/table/type/view statement In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS and later USE KEYSPACE it can happen that schema in driver's session is out of sync because it synces when it receives special message from CREATE KEYSPACE response. Similar situation occurs with other schema change statements. In this patch we fix only create keyspace/table/type/view statements by always sending created event. Behavior of any other schema altering statements remains unchanged. Fixes https://github.com/scylladb/scylladb/issues/16909 backport: no, it's not a regression Closes scylladb/scylladb#18819 * github.com:scylladb/scylladb: cql3: always return created event in create ks/table/type/view statement cql3: auth: move auto-grant closer to resource creation code cql3: extract create ks/table/type/view event code	2024-06-17 19:58:38 +02:00
Anna Stuchlik	ea35982764	doc: remove the 5.x.y to 5.x.z upgrade guide This commit removes the upgrade guide from 5.x.y to 5.x.z. It is reduntant in version 6.x.	2024-06-17 17:28:39 +02:00
Anna Stuchlik	ead201496d	doc: add the 6.x.y to 6.x.z upgrade guide-6 This commit adds the upgrade guide from 6.x.y to 6.x.z.	2024-06-17 17:23:00 +02:00
Marcin Maliszkiewicz	95673907ca	auth: reuse roles select query during cache population With big number of shards in the cluster (e.g. 500+) due to cache periodic refresh we experience high load on role_permissions table (e.g. 1k op/s). The load on roles table is amplified because to populate single entry in the cache we do several selects on roles table. Some of this can't be avoided because roles are arranged in a tree-like structure where permissions can be inherited. This patch tries to reuse queries which are simply duplicated. It should reduce the load on roles table by up to 50%. Fixes scylladb/scylladb#19299	2024-06-17 16:46:33 +02:00
Marcin Maliszkiewicz	547eb6d59b	auth: coroutinize service::get_uncached_permissions	2024-06-17 16:46:28 +02:00
Marcin Maliszkiewicz	00a24507cb	auth: coroutinize service::has_superuser	2024-06-17 16:46:22 +02:00
Kefu Chai	a5a5ca0785	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19312	2024-06-17 17:33:55 +03:00
Yaniv Michael Kaul	9b0eb82175	dist/common/scripts/scylla_coredump_setup: fix typo Does not able -> Unable Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#19328	2024-06-17 17:33:46 +03:00
Kefu Chai	b64126fe1c	db: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19313	2024-06-17 17:33:31 +03:00
Calle Wilund	73abc56d79	schema: Make "describe" use extensions to string Fixes #19334 Current impl uses hardcoded printing of a few extensions. Instead, use extension options to string and print all.	2024-06-17 13:30:24 +00:00
Calle Wilund	d27620e146	schema_extensions: Add an option to string method Allow an extension to describe itself as the CQL property string that created it (and is serialized to schema tables) Only paxos extension requires override.	2024-06-17 13:30:10 +00:00
Gleb Natapov	09556bff0e	gossiper: move gossip verbs to the idl	2024-06-17 12:47:17 +03:00
Kefu Chai	7e9550e9f9	test/py/minio_server.py: do not reference non-existent old_env in `51c53d8db6`, we check `self.old_env[env]` for None, but there are chances `self.old_env` does not contain a value with `env`. in that case, we'd have following failure: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 307, in <module> asyncio.run(main()) File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 304, in main await server.stop() File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 274, in stop self._unset_environ() File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 211, in _unset_environ if self.old_env[env] is not None: ~~~~~~~~~~~~^^^^^ KeyError: 'S3_CONFFILE_FOR_TEST' ``` this happens if we run `pylib/minio_server.py` as a standalone application. in this change, instead of getting the value with index, we use `dict.get()`, so that it does not throw when the dict does not have the given key. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19291	2024-06-17 12:42:43 +03:00
Andrei Chekun	293cf355df	[test.py] Fix log for failed node was nod added to failed directory If something happens during nod adding to the cluster, it will not be registered as a part of the cluster. This leads to situations during log gathering that logs for a such node will be missing.	2024-06-17 11:16:55 +02:00
Andrei Chekun	7bbb8d9260	[test.py] Fix URl for failed logs directory in CI Incorrect passing of the artifacts_dir_url parameter from test.py to pytest leads to the situation when it will pass None as a string and pytest will generate incorrect URL.	2024-06-17 11:16:48 +02:00
Aleksandra Martyniuk	fb3153d253	api: task_manager: delete module from full_task_status Delete module field from full_task_status as it is unused. Closes scylladb/scylladb#18853	2024-06-17 09:03:19 +03:00
Nadav Har'El	9fc70a28ca	test: unflake test test_alternator_ttl_scheduling_group This test in topology_experimental_raft/test_alternator.py wants to check that during Alternator TTL's expiration scans, ALL of the CPU was used in the "streaming" scheduling group and not in the "statement" scheduling group. But to allow for some fluke requests (e.g., from the driver), the test actually allows work in the statement group to be up to 1% of the work. Unfortunately, in one test run - a very slow debug+aarch64 run - we saw the work on the statement group reach 1.4%, failing the test. I don't know exactly where this work comes from, perhaps the driver, but before this bug was fixed we saw more than 58% of the work in the wrong scheduling group, so neither 1% or 1.4% is a sign that the bug came back. In fact, let's just change the threshold in the test to 10%, which is also much lower than the pre-fix value of 58%, so is still a valid regression test. Fixes #19307 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19323	2024-06-17 08:39:38 +03:00
Yaron Kaikov	996be2e235	dbuild: update toolchain to get latest scylla-api-client a new Scylla-api-client was released to get a proper license information in our SBOM report, Refs: https://github.com/scylladb/scylla-jmx/issues/237 Closes scylladb/scylladb#19324	2024-06-17 08:37:49 +03:00
Dawid Medrek	670830091c	db/hints: Use dedicated functions to lock a shared mutex Seastar has functions implementing locking a `seastar::shared_mutex`. We should use those now instead of reimplementing them in Scylla. Closes scylladb/scylladb#19253	2024-06-14 20:31:37 +02:00
Kamil Braun	bbb424a757	Merge '[test.py] Add uniqueness to the test name' from Andrei Chekun In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id. Fixes: https://github.com/scylladb/scylladb/issues/17851 Fixes: https://github.com/scylladb/scylladb/issues/15973 Closes scylladb/scylladb#19235 * github.com:scylladb/scylladb: [test.py] Add uniqueness to the test name [test.py] Refactor alternator, nodetool, rest_api	2024-06-14 17:59:07 +02:00
Botond Dénes	5b87fa4cea	Merge 'doc: document `keyspace` and `table` for `nodetool ring`' from Kefu Chai these two arguments are critical when tablets are enabled. Fixes https://github.com/scylladb/scylladb/issues/19296 --- 6.0 is the first release with tablets support. and `nodetool ring` is an important tool to understand the data distribution. so we need to backport this document change to 6.0 Closes scylladb/scylladb#19297 * github.com:scylladb/scylladb: doc: document `keyspace` and `table` for `nodetool ring` doc: replace tab with space	2024-06-14 16:04:23 +03:00
Kefu Chai	ea3b8c5e4f	doc: document `keyspace` and `table` for `nodetool ring` these two arguments are critical when tablets are enabled. Fixes #19296 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 21:01:14 +08:00
Botond Dénes	c563acdbe9	Merge 'build: cmake: use path to be compatible with CI' from Kefu Chai this change is created in the same spirit of `1186ddef16`, which updated the rule for generating the stripped dist pkg, but it failed to update the one for generating the unstripped dist pkg. what's why we have build failure when the workflow is looking for the unstripped tar.gz: ``` 08:02:47 ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz 08:02:47 ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory` ``` so, in this change, we fix the path. Refs #2717 --- * cmake related change, hence no need to backport. Closes scylladb/scylladb#19290 * github.com:scylladb/scylladb: build: cmake: use per-mode path for building unstripped_dist_pkg build: cmake: use path to be compatible with CI	2024-06-14 15:35:26 +03:00
Kefu Chai	d498ca3afa	test: randomized_nemesis_test: use BOOST_REQUIRE_* when appropriate for better debuggability. Refs #17030 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19282	2024-06-14 15:33:07 +03:00
Kefu Chai	d887fd2402	build: use default modes when no modes are selected when `--use-cmake` option is passed to `configure.py`, - before this change, all modes are selected if no `--mode` options are passed to `configure.py`. - after this change, only the modes whose `build_by_default` is `True` are selected, if no `--mode` options are specfied. the new behavior matches the existing behavior. otherwise, `ninja -C build mode_list` would list the mode which is not built by default. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19292	2024-06-14 15:31:58 +03:00
Botond Dénes	b2ebc172d0	Merge 'Fix usage of utils/chunked_vector::reserve_partial' from Lakshmi Narayanan Sreethar utils/chunked_vector::reserve_partial: fix usage in callers The method reserve_partial(), when used as documented, quits before the intended capacity can be reserved fully. This can lead to overallocation of memory in the last chunk when data is inserted to the chunked vector. The method itself doesn't have any bug but the way it is being used by the callers needs to be updated to get the desired behaviour. Instead of calling it repeatedly with the value returned from the previous call until it returns zero, it should be repeatedly called with the intended size until the vector's capacity reaches that size. This PR updates the method comment and all the callers to use the right way. Fixes #19254 Closes scylladb/scylladb#19279 * github.com:scylladb/scylladb: utils/large_bitset: remove unused includes identified by clangd utils/large_bitset: use thread::maybe_yield() test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial utils/lsa/chunked_managed_vector: fix reserve_partial() utils/chunked_vector: return void from reserve_partial and make_room test/boost/chunked_vector_test: fix testcase tests_reserve_partial utils/chunked_vector::reserve_partial: fix usage in callers	2024-06-14 15:31:00 +03:00
Kefu Chai	5c41073e00	tools/scylla-sstable: format error message with compile-time check before this change, we use runtime format string to format error messages. but it does not have the compile time format check. if we pass arguments which are not formattable, {fmt} throws at runtime, instead of error out at compile-time. this could be very annoying, because we format error messages at the error handling path. but if user ends up seeing an exception for {fmt} instead of a nice error message, it would be far from helpful. in this change, we - use compile-time format string - fix two caller sites, where we pass `std::exception_ptr` to {fmt}, but `std::exception_ptr` is not formattable by {fmt} at the time of writing. we do have operator<< based formatter for it though. so we delegate to `fmt::streamed` to format it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19294	2024-06-14 15:30:19 +03:00
Kefu Chai	aef1718833	doc: replace tab with space more consistent this way, also easier to format in a regular editor without additional setup. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 18:46:09 +08:00
Kamil Braun	982fa31250	Merge 'test: servers_add: fix the expected_error parameter' from Patryk Jędrzejczak This PR fixes two problems with the `expected_error` parameter in `server_add` and `servers_add`. 1. It didn't work in `server_add` if the cluster was empty because of an incorrect attempt to connect the driver. 2. It didn't work in `servers_add` completely because the `seeds` parameter was handled incorrectly. This PR only adds improvements in the testing framework, no need to backport it. Closes scylladb/scylladb#19255 * github.com:scylladb/scylladb: test: manager_client, scylla_cluster: fix type annotations in add_servers test: manager_client: don't connect driver after failed server_{add, start} test: scylla_cluster: pass seeds to add_servers	2024-06-14 11:33:21 +02:00
Wojciech Mitros	d31437b589	mv: replicate the gossiped backlog to all shards On each shard of each node we store the view update backlogs of other nodes to, depending on their size, delay responses to incoming writes, lowering the load on these nodes and helping them get their backlog to normal if it were too high. These backlogs are propagated between nodes in two ways: the first one is adding them to replica write responses. The seconds one is gossiping any changes to the node's backlog every 1s. The gossip becomes useful when writes stop to some node for some time and we stop getting the backlog using the first method, but we still want to be able to select a proper delay for new writes coming to this node. It will also be needed for the mv admission control. Currently, the backlog is gossiped from shard 0, as expected. However, we also receive the backlog only on shard 0 and only update this shard's backlogs for the other node. Instead, we'd want to have the backlogs updated on all shards, allowing us to use proper delays also when requests are received on shards different than 0. This patch changes the backlog update code, so that the backlogs on all shards are updated instead. This will only be performed up to once per second for each other node, and is done with a lower priority, so it won't severly impact other work. Fixes: scylladb/scylladb#19232 Closes scylladb/scylladb#19268	2024-06-14 11:24:20 +02:00
Andrei Chekun	8d1d206aff	[test.py] Add uniqueness to the test name In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id. Fixes: https://github.com/scylladb/scylladb/issues/17851 Fixes: https://github.com/scylladb/scylladb/issues/15973	2024-06-14 11:23:04 +02:00
Wojciech Mitros	9bae1814ab	test: add test for failed view building write For various reasons, a view building write may fail. When that happens, the view building should not finish until these writes are successfully retried and they should not interfere with any writes that are performed to the base table while the view is building. The test introduced in this patch confirms that this is the case. Refs scylladb/scylladb#19261 Closes scylladb/scylladb#19263	2024-06-14 10:38:21 +02:00
Lakshmi Narayanan Sreethar	c49f6391ab	utils/large_bitset: remove unused includes identified by clangd Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	83190fa075	utils/large_bitset: use thread::maybe_yield() Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	310c5da4bb	test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial Update the maximum size tested by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (12.8 KB). So, use twice the max_chunk_capacity as the test size distribution upper limit to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	d4f8b91bd6	utils/lsa/chunked_managed_vector: fix reserve_partial() Fix the method comment and return types of chunked_managed_vector's reserve_partial() similar to chunked_vector's reserve_partial() as it has the same issues mentioned in #19254. Also update the usage in the chunked_managed_vector_test. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	0a22759c2a	utils/chunked_vector: return void from reserve_partial and make_room Since reserve_partial does not depend on the number of remaining capacity to be reserved, there is no need to return anything from it and the make_room method. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:43:07 +05:30
Lakshmi Narayanan Sreethar	29f036a777	test/boost/chunked_vector_test: fix testcase tests_reserve_partial Fix the usage of reserve_partial in the testcase. Also update the maximum chunk size used by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (128 KB). So, use smaller chunk size, 512 bytes, to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:43:07 +05:30
Kefu Chai	df094061e3	test: randomized_nemesis_test: define static variable before this change, when linking randomized_nemesis_test with ld.lld: ``` [4/4] Linking CXX executable test/raft/RelWithDebInfo/randomized_nemesis_test FAILED: test/raft/RelWithDebInfo/randomized_nemesis_test : && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 --ld-path=ld.lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections test/raft/CMakeFiles/test-raft-helper.dir/RelWithDebInfo/helpers.cc.o test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o -o test/raft/RelWithDebInfo/randomized_nemesis_test -L/home/kefu/dev/scylladb/idl/absl::headers -Wl,-rpath,/home/kefu/dev/scylladb/idl/absl::headers test/lib/RelWithDebInfo/libtest-lib.a seastar/RelWithDebInfo/libseastar.a /usr/lib64/libxxhash.so seastar/RelWithDebInfo/libseastar_testing.a test/lib/RelWithDebInfo/libtest-lib.a -Xlinker --push-state -Xlinker --whole-archive auth/RelWithDebInfo/libscylla_auth.a -Xlinker --pop-state /usr/lib64/libcrypt.so cdc/RelWithDebInfo/libcdc.a compaction/RelWithDebInfo/libcompaction.a mutation_writer/RelWithDebInfo/libmutation_writer.a -Xlinker --push-state -Xlinker --whole-archive dht/RelWithDebInfo/libscylla_dht.a -Xlinker --pop-state types/RelWithDebInfo/libtypes.a index/RelWithDebInfo/libindex.a -Xlinker --push-state -Xlinker --whole-archive locator/RelWithDebInfo/libscylla_locator.a -Xlinker --pop-state message/RelWithDebInfo/libmessage.a gms/RelWithDebInfo/libgms.a sstables/RelWithDebInfo/libsstables.a readers/RelWithDebInfo/libreaders.a schema/RelWithDebInfo/libschema.a -Xlinker --push-state -Xlinker --whole-archive tracing/RelWithDebInfo/libscylla_tracing.a -Xlinker --pop-state RelWithDebInfo/libscylla-main.a abseil/absl/strings/RelWithDebInfo/libabsl_cord.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_info.a abseil/absl/strings/RelWithDebInfo/libabsl_cord_internal.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_functions.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_handle.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_cord_state.a abseil/absl/crc/RelWithDebInfo/libabsl_crc32c.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_internal.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_cpu_detect.a abseil/absl/strings/RelWithDebInfo/libabsl_str_format_internal.a /usr/lib64/libz.so service/RelWithDebInfo/libservice.a node_ops/RelWithDebInfo/libnode_ops.a service/RelWithDebInfo/libservice.a node_ops/RelWithDebInfo/libnode_ops.a -lsystemd raft/RelWithDebInfo/libraft.a repair/RelWithDebInfo/librepair.a streaming/RelWithDebInfo/libstreaming.a replica/RelWithDebInfo/libreplica.a db/RelWithDebInfo/libdb.a mutation/RelWithDebInfo/libmutation.a data_dictionary/RelWithDebInfo/libdata_dictionary.a cql3/RelWithDebInfo/libcql3.a transport/RelWithDebInfo/libtransport.a cql3/RelWithDebInfo/libcql3.a transport/RelWithDebInfo/libtransport.a lang/RelWithDebInfo/liblang.a /usr/lib64/liblua-5.4.so -lm /usr/lib64/libsnappy.so.1.1.10 abseil/absl/container/RelWithDebInfo/libabsl_raw_hash_set.a abseil/absl/hash/RelWithDebInfo/libabsl_hash.a abseil/absl/hash/RelWithDebInfo/libabsl_city.a abseil/absl/types/RelWithDebInfo/libabsl_bad_variant_access.a abseil/absl/hash/RelWithDebInfo/libabsl_low_level_hash.a abseil/absl/types/RelWithDebInfo/libabsl_bad_optional_access.a abseil/absl/container/RelWithDebInfo/libabsl_hashtablez_sampler.a abseil/absl/profiling/RelWithDebInfo/libabsl_exponential_biased.a abseil/absl/synchronization/RelWithDebInfo/libabsl_synchronization.a abseil/absl/debugging/RelWithDebInfo/libabsl_stacktrace.a abseil/absl/synchronization/RelWithDebInfo/libabsl_graphcycles_internal.a abseil/absl/synchronization/RelWithDebInfo/libabsl_kernel_timeout_internal.a abseil/absl/debugging/RelWithDebInfo/libabsl_symbolize.a abseil/absl/debugging/RelWithDebInfo/libabsl_debugging_internal.a abseil/absl/base/RelWithDebInfo/libabsl_malloc_internal.a abseil/absl/debugging/RelWithDebInfo/libabsl_demangle_internal.a abseil/absl/time/RelWithDebInfo/libabsl_time.a abseil/absl/strings/RelWithDebInfo/libabsl_strings.a abseil/absl/strings/RelWithDebInfo/libabsl_strings_internal.a abseil/absl/strings/RelWithDebInfo/libabsl_string_view.a abseil/absl/base/RelWithDebInfo/libabsl_throw_delegate.a abseil/absl/numeric/RelWithDebInfo/libabsl_int128.a abseil/absl/base/RelWithDebInfo/libabsl_base.a abseil/absl/base/RelWithDebInfo/libabsl_raw_logging_internal.a abseil/absl/base/RelWithDebInfo/libabsl_log_severity.a abseil/absl/base/RelWithDebInfo/libabsl_spinlock_wait.a -lrt abseil/absl/time/RelWithDebInfo/libabsl_civil_time.a abseil/absl/time/RelWithDebInfo/libabsl_time_zone.a rust/RelWithDebInfo/libwasmtime_bindings.a rust/librust_combined.a /usr/lib64/libdeflate.so utils/RelWithDebInfo/libutils.a /usr/lib64/libxxhash.so /usr/lib64/libcryptopp.so /usr/lib64/libboost_regex.so.1.83.0 /usr/lib64/libicui18n.so /usr/lib64/libicuuc.so /usr/lib64/libboost_unit_test_framework.so.1.83.0 seastar/RelWithDebInfo/libseastar_testing.a seastar/RelWithDebInfo/libseastar.a /usr/lib64/libboost_program_options.so /usr/lib64/libboost_thread.so /usr/lib64/libboost_chrono.so /usr/lib64/libboost_atomic.so /usr/lib64/libcares.so /usr/lib64/libfmt.so.10.2.1 /usr/lib64/liblz4.so -ldl /usr/lib64/libgnutls.so -latomic /usr/lib64/libsctp.so /usr/lib64/libprotobuf.so /usr/lib64/libyaml-cpp.so /usr/lib64/libhwloc.so //usr/lib64/liburing.so /usr/lib64/libnuma.so /usr/lib64/libboost_unit_test_framework.so && : ld.lld: error: undefined symbol: append_seq::magic >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38) >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38) >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(append_seq::append(int) const) >>> referenced 5 more times clang++: error: linker command failed with exit code 1 (use -v to see invocation) ``` it turns out `append_seq::magic` is only declared, but never defined. please note, the non-inline static member variable in its class definition is not considered as a definition, see [class.static.data](https://eel.is/c++draft/class.static.data#3) > The declaration of a non-inline static data member in its class > definition is not a definition and may be of an incomplete type > other than cv void. so, let's declare it as a `constexpr` instead. it implies `inline`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19283	2024-06-14 10:00:21 +03:00
Kefu Chai	4c1006a5bb	dist: s/SafeConfigParser/ConfigParser/ `SafeConfigParser` was renamed to `ConfigParser` in Python 3.2, and Python warns us: > scylla-housekeeping:183: DeprecationWarning: The SafeConfigParser > class has been renamed to ConfigParser in Python 3.2. This alias will > be removed in Python 3.12. Use ConfigParser directly instead. see https://docs.python.org/3.2/library/configparser.html#configparser.ConfigParser and https://docs.python.org/3.1/library/configparser.html#configparser.SafeConfigParser Fixes #13046 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19285	2024-06-14 09:59:22 +03:00
Kefu Chai	3a5898880e	alternator: drop unused friend declaration in `57c408ab`, we dropped operator<< for `parsed::path`, but we forgot to drop the friend declaration for it along with the operator. so in this change, let's drop the friend declaration. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19287	2024-06-14 09:58:09 +03:00
Kefu Chai	83c6ae10c4	sstables/compress: put type constraints into template type param more compact this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19284	2024-06-14 09:50:55 +03:00
Kefu Chai	6556cd684e	cql3: remove unused operator<< as these operators are not used anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19288	2024-06-14 09:45:35 +03:00
Botond Dénes	d50688efee	Merge 'api: do not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. also, add api to iwyu github workflow's CLEANER_DIR, to prevent future violations. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19269 * github.com:scylladb/scylladb: .github: add api to iwyu's CLEANER_DIR api: do not include unused headers	2024-06-14 09:34:13 +03:00
Kefu Chai	28a4298005	build: cmake: use per-mode path for building unstripped_dist_pkg `before this change, we use "scylla" as the dependecy of unstripped_dist_pkg, but that's implies the scylla built with the default mode. if the build rules is generated using the multi-config generator, the default mode does not necessarily identical to the current `$<CONFIG>`, so let's be more explicit. otherwise, we could run into built failure like ``` FAILED: dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz cd /jenkins/workspace/scylla-master/scylla-ci/scylla && scripts/create-relocatable-package.py --build-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo --node-exporter-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/node_exporter --debian-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/debian /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz ldd: /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla: No such file or directory Traceback (most recent call last): File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 109, in <module> libs.update(ldd(exe)) ^^^^^^^^ File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 37, in ldd for ldd_line in subprocess.check_output( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/subprocess.py", line 466, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ldd', '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla']' returned non-zero exit status 1. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 13:27:26 +08:00
Kefu Chai	b94420a9dd	build: cmake: use path to be compatible with CI this change is created in the same spirit of `1186ddef16`, which updated the rule for generating the stripped dist pkg, but it failed to update the one for generating the unstripped dist pkg. what's why we have build failure when the workflow is looking for the unstripped tar.gz: ``` 08:02:47 ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz 08:02:47 ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory` ``` so, in this change, we fix the path. Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 13:27:26 +08:00
Botond Dénes	ea40567bbc	Merge 'Some cleanups for replica table' from Raphael "Raph" Carvalho backport not needed, these are just cleanups. Closes scylladb/scylladb#19260 * github.com:scylladb/scylladb: replica: simplify perform_cleanup_compaction() replica: return storage_group by reference on storage_group_for*() replica: devirtualize storage_group_of()	2024-06-14 08:14:58 +03:00
Botond Dénes	bf429695b6	Merge 'test_tablets: add test_tablet_storage_freeing' from Michał Chojnowski Before work on tablets was completed, it was noticed that — due to some missing pieces of implementation — Scylla doesn't properly close sstables for migrated-away tablets. Because of this, disk space wasn't being reclaimed properly. Since the missing pieces of implementation were added, the problem should be gone now. This patch adds a test which was used to reproduce the problem earlier. It's expected to pass now, validating that the issue was fixed. Should be backported to branch-6.0, because the tested problem was also affecting that branch. Fixes #16946 Closes scylladb/scylladb#18906 * github.com:scylladb/scylladb: test_tablets: add test_tablet_storage_freeing test: pylib: add get_sstables_disk_usage()	2024-06-14 08:08:54 +03:00
Raphael S. Carvalho	f143f5b90d	replica: remove linear search when picking memtable_list for range scan with tablets with tablets, we're expected to have a worst of ~100 tablets in a given table and shard, so let's avoid linear search when looking for the memtable_list in a range scan. we're bounded by ~100 elements, so shouldn't be a big problem, but it's an inefficiency we can easily get rid of. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19286	2024-06-14 08:00:17 +03:00
Benny Halevy	fb3db7d81f	perf-simple-query: add cpu_cycles / op metric Example output: ``` bhalevy@[] scylla$ build/release/scylla perf-simple-query --default-log-level=error -c 1 --duration 10 random-seed=4058714023 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 86912.75 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42346 insns/op, 22811 cycles/op, 0 errors) 91348.29 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42306 insns/op, 22362 cycles/op, 0 errors) 87965.84 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42338 insns/op, 22966 cycles/op, 0 errors) 90793.67 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42351 insns/op, 22783 cycles/op, 0 errors) 90104.27 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42358 insns/op, 22875 cycles/op, 0 errors) 90397.13 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42355 insns/op, 22735 cycles/op, 0 errors) 89142.39 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42363 insns/op, 22996 cycles/op, 0 errors) 90410.40 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42363 insns/op, 22725 cycles/op, 0 errors) 88173.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42366 insns/op, 23160 cycles/op, 0 errors) 88416.51 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42379 insns/op, 23102 cycles/op, 0 errors) median 90104.26849997675 median absolute deviation: 1244.02 maximum: 91348.29 minimum: 86912.75 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18818	2024-06-14 07:42:09 +03:00
Lakshmi Narayanan Sreethar	64768b58e5	utils/chunked_vector::reserve_partial: fix usage in callers The method reserve_partial(), when used as documented, quits before the intended capacity can be reserved fully. This can lead to overallocation of memory in the last chunk when data is inserted to the chunked vector. The method itself doesn't have any bug but the way it is being used by the callers needs to be updated to get the desired behaviour. Instead of calling it repeatedly with the value returned from the previous call until it returns zero, it should be repeatedly called with the intended size until the vector's capacity reaches that size. This commit updates the method comment and all the callers to use the right way. Fixes #19254 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-13 21:42:11 +05:30
Raphael S. Carvalho	ace4e5111e	compaction: Reduce twcs off-strategy space overhead to 10% of free space TWCS off-strategy suffers with 100% space overhead, so a big TWCS table can cause scylla to run out of disk space during node ops. To not penalize TWCS tables, that take a small percentage of disk, with increased write ampl, TWCS off-strategy will be restricted to 10% of free disk space. Then small tables can still compact all disjoint sstables in a single round. Fixes #16514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 13:06:51 -03:00
Raphael S. Carvalho	0ce8ee03f1	compaction: wire storage free space into reshape procedure After this, TWCS reshape procedure can be changed to limit job to 10% of available space. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:53:27 -03:00
Raphael S. Carvalho	51c7ee889e	sstables: Allow to get free space from underlying storage That will be used in turn to restrict reshape to 10% of available space in underlying storage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:43:14 -03:00
Raphael S. Carvalho	b8bd4c51c2	replica: don't expose compaction_group to reshape task compaction_group sits in replica layer and compaction layer is supposed to talk to it through compaction::table_state only. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:43:14 -03:00
Andrei Chekun	93b9b85c12	[test.py] Refactor alternator, nodetool, rest_api Make alternator, nodetool and rest_api test directories as python packages. Move scylla-gdb to scylla_gdb and make it python package.	2024-06-13 13:56:10 +02:00
Avi Kivity	f1819419cc	Merge 'scylla-sstable: add method to load the schema from the sstable itself' from Botond Dénes As it turns out, each sstable carries its own schema in its serialization header (Statistics component). This schema is incomplete -- the names of the key columns are not stored, just their type. Static and regular columns do have names and types stored however. This bare-bones schema is enough to parse and display the content of the sstable. Another thing missing is schema options (the stuff after the `WITH` keyword, except the clustering order). The only options stored are the compression options (in the CompressionInfo component), this is actually needed to read the Data component. This series adds a new method to `tools/schema_loader.cc` to extract the schema stored in the sstable itself. This new schema load method is used as the last fall-back for obtaining the schema, in case scylla-sstable is trying to autodetect the schema of the sstable. Although, right now this bare-bones schema is enough for everything scylla-sstable does, it is more future proof to stick to the "full" schema if possible, so this new method is the last resort for now. Fixes: https://github.com/scylladb/scylladb/issues/17869 Fixes: https://github.com/scylladb/scylladb/issues/18809 New functionality, no backport needed. Closes scylladb/scylladb#19169 * github.com:scylladb/scylladb: tools/scylla-sstable: log loaded schema with trace level tools/scylla-sstable: load schema from the sstable as fallback tools/schema_loader: introduce load_schema_from_sstable() test/lib/random_schema: remove assert on min number of regular columns sstables: introduce load_metadata()	2024-06-13 12:21:09 +03:00
Benny Halevy	34dfa4d3a3	storage_service: join_token_ring: reject replace on different dc or rack Do not allow replacing a node on one dc/rack with a node on a different dc/rack as this violates the assumption of replace node operation that all token ranges previously owned by the dead node would be rebuilt on the new node. Fixes scylladb/scylladb#16858 Refs scylladb/scylla-enterprise#3518 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16862	2024-06-13 11:19:47 +02:00
Botond Dénes	6868add228	replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit Making the count resources on the maintenance (streaming) semaphore live update via config. This will allow us to improve repair speed on mixed-shard clusters, where we suspect that reader trashing -- due to the combination of high number of readers on each shard and very conservative reader count limit (10) -- is the main cause of the slowness. Making this count limit confgurable allows us to start experimenting with this fix, without committing to a count limit increase (or removal), addressing the pain in the field.	2024-06-13 01:59:21 -04:00
Botond Dénes	665fdd6ce4	db/config: introduce maintenance_reader_concurrency_semaphore_count_limit To control the amount of count resources of the maintenance (streaming) semaphore. Not wired yet.	2024-06-13 01:59:21 -04:00
Botond Dénes	ba0cc29d82	reader_concurrency_semaphore: make count parameter live-update So that the amount of count resources can be changed at run-time, triggered by a e.g. a config change. Previous constant-count based constructor is left intact, to avoid patching all clients, as only a small subset will want the new functionality.	2024-06-13 01:59:21 -04:00
Nadav Har'El	44ea1993ba	test/cql-pytest: tests CREATE/DROP INDEX during paged query This patch includes extensive testing for what happens to an ongoing paged query when a secondary index is suddenly added or dropped. Issue #18992 was opened suggesting that this would be broken, and indeed the tests included here show that it is indeed broken. The four tests included in this patch are heavily commented to explain what they are testing and why, but here is a short summary of what is being tested by each of them: 1. A paged query filtering on v=17 continues correctly even if an index is created on v. 2. A paged query filtering on v1 and v2 where v2 is indexed, continues correctly even if an index is created on v1 (remember that Scylla prefers to use the first index mentioned in the query). 3. A paged query using an index on v continues correctly even if that index is deleted. 4. However, if the query doesn't say "ALLOW FILTERING", it cannot be continued after the index is deleted. All these tests pass on Cassandra, but all of them except the fourth fail on Scylla, reproducing issue #18992. Somewhat to my suprise, the failure of the query in all the failed tests is silent (i.e., trying to fetch the next page just fetches nothing and says the iteration is done). I was expecting more dramatic failures ("marshaling error" messages, crashes, etc.) but didn't get them. Refs #18992 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19000	2024-06-13 08:39:22 +03:00
Botond Dénes	145a67f77c	tools/scylla-sstable: log loaded schema with trace level The schema of the sstable can be interesting, so log it with trace level. Unfortunately, this is not the nice CQL statement we are used to (that requires a database object), but the not-nearly-so-nice CFMetadata printout. Still, it is better then nothing.	2024-06-13 01:32:17 -04:00
Botond Dénes	43c44f0af5	tools/scylla-sstable: load schema from the sstable as fallback When auto-detecting the schema of the sstable, if all other methods failed, load the schema from the sstable's serialization header. This schema is incomplete. It is just enough to parse and display the content of the sstable. Although parsing and displaying the content of the sstable is all scylla-sstable does, it is more future-compatible to us the full schema when possible. So the always-available but minimal schema that each sstable has on itself, is used just as a fallback. The test which tested the case when all schema load attempts fail, doesn't work now, because loading the serialization header always succeeds. So convert this test into two positive tests, testing the serialization header schema fallback instead.	2024-06-13 01:32:17 -04:00
Botond Dénes	8f2ba03465	tools/schema_loader: introduce load_schema_from_sstable() Allows loading the schema from an sstable's serialization header. This schema is incomplete, but it is enough to parse and display the content of the sstable.	2024-06-13 01:32:17 -04:00
Botond Dénes	0d7335dd27	test/lib/random_schema: remove assert on min number of regular columns It is legal for a schema to have 0 regular columns, so remove the assert on the schema specification's regular column count.	2024-06-13 01:32:17 -04:00
Piotr Dulikowski	0b5a0c969a	Merge 'hinted handoff: migrate sync point to host ID' from Michael Litvak Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module. Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs. The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of new type the translation is avoided. Fixes #18653 Closes scylladb/scylladb#19134 * github.com:scylladb/scylladb: db/hints: migrate sync point to host ID db/hints: rename sync point structures with _v1 suffix to _v1_v2	2024-06-13 06:16:00 +02:00
Kefu Chai	9d8d9168e6	.github: add api to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-13 09:32:51 +08:00
Kefu Chai	c03141b4b2	api: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-13 09:32:51 +08:00
Anna Stuchlik	603c662049	doc: remove an entry about seeds from FAQ This commit removes a useless entry from the FAQ page. It contains a false recommendation to configure multiple seeds. Closes scylladb/scylladb#19259	2024-06-12 19:11:52 +02:00
Dawid Medrek	dc41086c57	db/hints: Add a metric for the size of sent hints In this commit, we add a new metric `sent_total_size` keeping track of how many bytes of hints a node has sent. The metric is supposed to complement its counterpart in storage proxy that counts how many bytes of hints a node has received. That information should prove useful in analyzing statistics of a cluster -- load on given nodes and where it comes from. We also change the name of the matric `sent` to `sent_total` to avoid the conflict of prefixes between the two metrics.	2024-06-12 18:20:08 +02:00
Raphael S. Carvalho	f3a1f5df83	replica: simplify perform_cleanup_compaction() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 12:44:21 -03:00
Raphael S. Carvalho	6214dda506	replica: return storage_group by reference on storage_group_for*() those functions cannot return nullptr, will throw when group is not found, so better return ref instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 11:53:06 -03:00
Patryk Jędrzejczak	a7ab9a015a	test: manager_client, scylla_cluster: fix type annotations in add_servers	2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak	1eb25d22c6	test: manager_client: don't connect driver after failed server_{add, start} If adding or starting a server fails expectedly, there is no reason to update or connect the driver. Moreover, before this patch, we couldn't use `server_add` and `servers_add` with `expected_error` if the cluster was empty. After expected bootstrap failures, we tried to connect the driver, which rightfully failed on `assert len(hosts) > 0` in `cluster_con`.	2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak	8f486de8d3	test: scylla_cluster: pass seeds to add_servers This parameter was incorrectly missing. For this reason, `expected_error` was passed from `add_servers` to `add_server` as `seeds`, which caused strange crashes.	2024-06-12 16:51:19 +02:00
Botond Dénes	435c01d1e6	sstables: introduce load_metadata() Loads just the metadata components. No validation. Split off from load(), to allow scylla-sstable to partially load an sstable.	2024-06-12 10:46:38 -04:00
Botond Dénes	aa27f8f365	Merge 'Improve handling of outdated --experimental-features' from Pavel Emelyanov Some time ago it turned out that if unrecognized feature name is met in scylla.yaml, the whole experimental features list is ignored, but scylla continues to boot. There's UNUSED feature which is the proper way to deprecate a feature, and this PR improves its handling in several ways. 1. The recently removed "tablets" feature is partially brought back, but marked as UNUSED 2. Any UNUSED features met while parsing are printed into logs 3. The enum_option<> helper is enlightened along the way refs: #18968 Closes scylladb/scylladb#19230 * github.com:scylladb/scylladb: config: Mark tablets feature as unused main: Warn unused features enum_option: Carry optional key on board enum_option: Remove on-board _map member	2024-06-12 17:33:14 +03:00
Botond Dénes	d2a4cd9cae	Merge 'Register API endpoints next to corresponding services' from Pavel Emelyanov The API endpoints are registered for particular services (with rare exceptions), and once the corresponding service is ready, its endpoints section can be registered too. Same but reversed is for shutdown, and it's automatic with deferred actions. refs: #2737 Closes scylladb/scylladb#19208 * github.com:scylladb/scylladb: main: Register task manager API next to task manager itself main: Register messaging API next to messaging service main: Register repair API next to repair service	2024-06-12 17:31:30 +03:00
Kefu Chai	2eca8b54de	auth/role_or_anonymous: drop operator<< for role_or_anonymous its declaration was removed in `84a9d2fa`, which failed to remove the implementation from .cc file. in this change, let's remove operator<< for role_or_anonymous completely. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19243	2024-06-12 17:30:20 +03:00
Raphael S. Carvalho	9c1d3bcc02	replica: devirtualize storage_group_of() can be made private to tablet_storage_group_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 11:29:49 -03:00
Kamil Braun	a441d06d6c	raft: fsm: add details to on_internal_error_noexcept message If we receive a message in the same term but from a different leader than we expect, we print: ``` Got append request/install snapshot/read_quorum from an unexpected leader ``` For some reason the message did not include the details (who the leader was and who the sender was) which requires almost zero effort and might be useful for debugging. So let's include them. Ref: scylladb/scylla-enterprise#4276 Closes scylladb/scylladb#19238	2024-06-12 17:29:42 +03:00
Pavel Emelyanov	4400f9082e	lang: Return context as future, not via reference argument Commit `882b2f4e9f` (cql3, schema_tables: Generalize function creation) erroneously says that optional<context> is not suitable for future<> type, but in fact it is. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19204	2024-06-12 16:54:46 +03:00
Kefu Chai	8c99d9e721	.github: use libstdc++-13 since gcc-13 is packaged by ppa:ubuntu-toolchain-r, and GCC-13 was released 1 year ago, let's use it instead. less warnings, as the standard library from GCC-13 is more standard compliant. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19162	2024-06-12 16:52:05 +03:00
Botond Dénes	e91f82fd5c	Merge '.github: add workflow to build with clang nightly' from Kefu Chai to be prepared for changes from clang, and enjoy the new warnings/errors from this compiler. * it is an improvement in our CI, no need to backport. Closes scylladb/scylladb#19164 * github.com:scylladb/scylladb: .github: add workflow to build with clang nightly .github: rename clang-tidy-matcher.json to clang-matcher.json	2024-06-12 16:50:21 +03:00
Pavel Emelyanov	24c818453d	main: Start view builder earlier Commit `47dbf23773` (Rework view services and system-distributed-keyspace dependencies) made streaming and repair services depend on view builder, but missed the fact that the builder itself starts much later. Move view builder earlier, that's safe, no activity is started upon that, real building is kicked much later when invoke_on_all(start) happens. Other than than, start system distributed keyspace earlier, which also looks safe, as it's also started "for real" later, by storage service when it joins the ring. fixes: #19133 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19250	2024-06-12 16:46:55 +03:00
Anna Stuchlik	3f9cc0ec3f	doc: reorganize ToC of the Reference section This commit adds a proper ToC to the Reference section to improve how it renders. Closes scylladb/scylladb#18901	2024-06-12 16:16:04 +03:00
Kefu Chai	da59710fb9	doc: remove unused documents upgrade/_common are document fragments included by other documents. but quite a few the documents previously including these fragments were removed. but we didn't remove these fragments along with them. in this change, we drop them. Fixes #19245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19251	2024-06-12 16:14:57 +03:00
Botond Dénes	cd05de6cfb	Merge 'test: memtable_test: increase unspooled_dirty_soft_limit ' from Kefu Chai before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes https://github.com/scylladb/scylladb/issues/19034 --- the issue applies to both 5.4 and 6.0, and this issue hurts the CI stability, hence we should backport it. Closes scylladb/scylladb#19252 * github.com:scylladb/scylladb: test: memtable_test: increase unspooled_dirty_soft_limit test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE	2024-06-12 16:14:05 +03:00
Dawid Medrek	23bea50de0	service/storage_proxy: Add metrics for received hints In this commit, we add two new metrics to storage proxy: * `received_hints_total`, * `received_hints_bytes_total`. Before these changes, we had to rely solely on other metrics indicating how many hints nodes have written, rejected, sent, etc. Because hints are subject to many more or less controllable factors, e.g. a target node still being a replica for a mutation, it was very difficult to approximate how many hints a given node might have received or what part of its load they were. The newly introduced metrics are supposed to help reason about those.	2024-06-12 14:44:47 +02:00
Kefu Chai	223fba3243	test: memtable_test: increase unspooled_dirty_soft_limit before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Kefu Chai	2df4e9cfc2	test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE before this change, we verify the behavior of design under test using `BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test fails, the test just aborts. this is not very helpful for postmortem debugging. after this change, we use `BOOST_REQUIRE` macro for verifying the behavior, so that Boost.Test prints out the condition if it does not hold when we test it. Refs #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Pavel Emelyanov	c752bda0a2	Merge '.github: change severity to error in clang-include-cleaner ' from Kefu Chai in this changeset, we tighten the clang-include-cleaner workflow, and address the warnings in two more subdirectories in the source tree. * it's a cleanup, no need to backport Closes scylladb/scylladb#19155 * github.com:scylladb/scylladb: .github: add alternator to iwyu's CLEANER_DIR alternator: do not include unused headers .github: change severity to error in clang-include-cleaner exceptions: do not include unused headers	2024-06-12 10:16:17 +03:00
Kefu Chai	0c9ea654f5	service/paxos: drop operator<< for proposal since we stopped using the generic container formatters which in turn use operator<< for formatting the elemements. we can drop more operator<< operators. so, in this change, we drop operator<< for proposal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19156	2024-06-12 10:14:47 +03:00
Dawid Medrek	431ec55f6c	service/storage_proxy: Move a comment to its relevant place In `b92fb35`, we put a comment in the wrong place. These changes move it to the right one. Closes scylladb/scylladb#19215	2024-06-12 10:10:02 +03:00
Avi Kivity	dffd0901b3	dist: scylla_util: sysconfig_parser: replace deprecated ConfigParser.readfp ConfigParser.readfp was deprecated in Python 3.2 and removed in Python 3.12. Under Fedora 40, the container fails to launch because it cannot parse its configuration. Fix by using the newer read_file(). Closes scylladb/scylladb#19236	2024-06-12 10:07:10 +03:00
Benny Halevy	2ed81cbf84	locator/topology: update_node: format also shard_count in debug log message The format string is missing `shard_count={}` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19242	2024-06-12 10:04:23 +03:00
Kefu Chai	4175e02d9d	clustering_bounds_comparator: drop operator<< for bound_kind turns out operator<< for bound_kind is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19159	2024-06-11 18:01:06 +02:00
Avi Kivity	6608f49718	Merge 'make enable_compacting_data_for_streaming_and_repair truly live-update' from Botond Dénes This config item is propagated to the table object via table::config. Although the field in `table::config`, used to propagate the value, was `utils::updateable_value<T>`, it was assigned a constant and so the live-update chain was broken. This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series. Fixes: https://github.com/scylladb/scylladb/issues/18674 - [x] This patch has to be backported because it fixes broken functionality Closes scylladb/scylladb#18705 * github.com:scylladb/scylladb: test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update test/pylib: rest_client: add get_injection() api/error_injection: add getter for error_injection utils/error_injection: add set_parameter() replica/database: fix live-update enable_compacting_data_for_streaming_and_repair	2024-06-11 15:53:19 +03:00
Kefu Chai	d05db52d11	build: remove coverage compiling options from the cxx_flags in `44e85c7d`, we remove coverage compiling options from the cflags when building abseil. but in `535f2b21`, these options were brought back as parts of cxx_flags. so we need to remove them again from cxx_flags. Fixes #19219 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19220	2024-06-11 14:58:27 +03:00
Pavel Emelyanov	b2520b8185	config: Mark tablets feature as unused This features used to be there for a while, but then it was removed by `83d491af02`. This patch partially takes it back, but maps to UNUSED, so that if met in config, it's warned, but other features are parsed as well. refs: #18968 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:58:19 +03:00
Pavel Emelyanov	b85a02a3fe	main: Warn unused features When seeing an UNUSED feature -- print it into log. This is where the enum_option::key is in use. The thing is that experimental features map different unused feature names into the single UNUSED feature enum value, so once the feature is parsed its configured name only persists in the option's key member (saved by previous patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:56:51 +03:00
Pavel Emelyanov	0c0a7d9b9a	enum_option: Carry optional key on board It facilitates option formatting, but the main purpose is to be able to find out the exact keys, not values, later (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:55:14 +03:00
Pavel Emelyanov	f56cdb1cac	enum_option: Remove on-board _map member The map in question is immutable and can obtained from the Mapper type at any time, there's no need in keeping its copy on each enum_option Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:54:39 +03:00
Michael Litvak	afc9a1a8a6	db/hints: migrate sync point to host ID Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module. Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs. The encoding of sync points now always uses the new v3 format with host IDs. The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of the new format the translation from IP to host ID is avoided.	2024-06-11 11:07:00 +02:00
Michael Litvak	b824e73418	db/hints: rename sync point structures with _v1 suffix to _v1_v2 rename sync point types and variables to have v1/v2 suffix according to their use.	2024-06-11 11:05:59 +02:00
Avi Kivity	03e776ce3e	Update tools/java submodule * tools/java 88809606c8...01ba3c196f (3): > Revert "build: don't add nonexistent directory 'lib' to relocatable packages" > build: run antlr in a separate process > build: don't add nonexistent directory 'lib' to relocatable packages	2024-06-11 11:58:56 +03:00
Botond Dénes	8ef4fbdb87	test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update Avoid this the live-update feature of this config item breaking silently.	2024-06-11 04:17:48 -04:00
Botond Dénes	0c61b1822c	test/pylib: rest_client: add get_injection() The /v2/error_injection/{injection} endpoint now has a GET method too, expose this.	2024-06-11 04:17:48 -04:00
Botond Dénes	feea609e37	api/error_injection: add getter for error_injection Allow external code to obtain information about an error injection point, including whether it is enabled, and importantly, what its parameters are. Together with the `set_parameter()` added in the previous patch, this allows tests to read out the values of internal parameters, via a set_parameter() injection point.	2024-06-11 04:17:48 -04:00
Botond Dénes	4590026b38	utils/error_injection: add set_parameter() Allow injection points to write values into the parameter map, which external code can then examine. This allows exfiltrating the values if internal variables, to be examined by tests, without exposing these variables via an "official" path.	2024-06-11 04:17:48 -04:00
Pavel Emelyanov	1b9cedb3f3	test: Reduce failure detector timeout for failed tablets migration test Most of the time this test spends waiting for a node to die. Helps 3x times Was real 9m21,950s user 1m11,439s sys 1m26,022s Now real 3m37,780s user 0m58,439s sys 1m13,698s refs: #17764 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19222	2024-06-11 09:55:06 +02:00
Calle Wilund	dfd996e7c1	describe_statement: Filter out "extension internal" keyspaces in DESC SCHEMA Fixes /scylladb/scylla-enterprise#4168 Unless listing all (including system) keyspaces, filter out "extension internal" keyspaces. These are to be considered "system" for the purposes of exposing to end user. Closes scylladb/scylladb#19214	2024-06-11 10:01:42 +03:00
Botond Dénes	dbccb61636	replica/database: fix live-update enable_compacting_data_for_streaming_and_repair This config item is propagated to the table object via table::config. Although the field in table::config, used to propagate the value, was utils::updateable_value<T>, it was assigned a constant and so the live-update chain was broken. This patch fixes this.	2024-06-11 01:15:20 -04:00
Raphael S. Carvalho	7b41630299	replica: Refresh mutation source when allocating tablet replicas Consider the following: 1) table A has N tablets and views 2) migration starts for a tablet of A from node 1 to 2. 3) migration is at write_both_read_old stage 4) coordinator will push writes to both nodes (pending and leaving) 5) A has view, so writes to it will also result in reads (table::push_view_replica_updates()) 6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in) 7) so read on step 5 is not being able to find sstable set for tablet migrating in Causes the following error: "tablets - SSTable set wasn't found for tablet 21 of table mview.users" which means loss of write on pending replica. The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot. It's not a problem to refresh the cache snapshot as long as the logical state of the data hasn't changed, which is true when allocating new tablet replicas. That's also done in the context of compactions for example. Fixes #19052. Fixes #19033. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19099	2024-06-11 06:59:04 +03:00
Calle Wilund	51c53d8db6	main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars Fixes scylladb/scylla-pkg#3845 Don't overwrite (or rather change) AWS credentials variables if already set in enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI. v2: * Allow environment variables in reading obj storage config - allows CI to use real credentials in env without risking putting them info less seure files * Don't write credentials info from miniserver into config, instead use said environment vars to propagate creds. v3: * Fix python launch scripts to not clear environment, thus retaining above aws envs. Closes scylladb/scylladb#19086	2024-06-11 06:59:04 +03:00
Nadav Har'El	73dfa4143a	cql-pytest: translate Cassandra's tests for SELECT DISTINCT This is a translation of Cassandra's CQL unit test source file DistinctQueryPagingTest.java into our cql-pytest framework. The 5 tests did not reproduce any previously-unknown bug, but did provide additional reproducers for one already-known issue: Refs #10354: SELECT DISTINCT should allow filter on static columns, not just partition keys Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18971	2024-06-11 06:59:04 +03:00
Michał Chojnowski	823da140dd	test_tablets: add test_tablet_storage_freeing Tests that tablet storage is freed after it is migrated away. Fixes #16946	2024-06-10 14:25:37 +02:00
Michał Chojnowski	7741491b47	test: pylib: add get_sstables_disk_usage() Adds an util for measuring the disk usage of the given table on the given node. Will be used in a follow-up patch for testing that sstables are freed properly.	2024-06-10 14:25:37 +02:00
Pavel Emelyanov	b10ddcfd18	main: Register task manager API next to task manager itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:49:11 +03:00
Pavel Emelyanov	02c36ebd2e	main: Register messaging API next to messaging service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:49:02 +03:00
Pavel Emelyanov	f7e4724770	main: Register repair API next to repair service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:48:51 +03:00
Anna Stuchlik	55ed18db07	doc: mark tablets as GA in the CREATE KEYSPACE section This commit removes the information that tablets are an experimental feature from the CREATE KEYSPACE section. In addition, it removes the notes and cautions that are redundant when a feature is GA, especially the information and warnings about the future plans. Fixes https://github.com/scylladb/scylladb/issues/18670 Closes scylladb/scylladb#19063	2024-06-10 12:36:36 +03:00
Kefu Chai	069be01451	lang: remove redundant std::move() C++ standard enforces copy elision in this case. and copy elision is more performant than constructing the return value with a move constructor, so no need to use `std:move()` here. and GCC-14 rightfully points this out: ``` /home/kefu/dev/scylladb/lang/lua.cc: In member function ‘data_value {anonymous}::from_lua_visitor::operator()(const utf8_type_impl&)’: /var/ssd/scylladb/lang/lua.cc:797:25: error: redundant move in return statement [-Werror=redundant-move] 797 \| return std::move(s); \| ~~~~~~~~~^~~ /home/kefu/dev/scylladb/lang/lua.cc:797:25: note: remove ‘std::move’ call ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19187	2024-06-10 07:41:25 +03:00
Botond Dénes	7b2aad56c4	test/boost/sstable_datafile_test: remove unused semaphores The tests use the ones from test_env, the explicitely created ones are unused. Closes scylladb/scylladb#19167	2024-06-09 20:43:59 +03:00
Kefu Chai	535f2b2134	build: populate cxxflags to abseil before this change, when building abseil, we don't pass cxxflags to compiler, and abseil libraries are build with the default optimization level. in the case of clang, its default optimization level is `-O0`, it compiles the fastest, but the performance of the emitted code is not optimized for runtime performance. but we expect good performance for the release build. a typical command line for building abseil looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -MF absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o.d -o absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/base/internal/scoped_set_env.cc ``` so, in this change, we populate cxxflags to abseil, so that the per-mode `-O` option can be populated when building abseil. after this change, the command line building abseil in release mode looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffunction-sections -fdata-sections -O3 -mllvm -inline-threshold=2500 -fno-slp-vectorize -DSCYLLA_BUILD_MODE=release -g -gz -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -MF absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o.d -o absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/flags/internal/commandlineflag.cc ``` Refs `0b0e661a85` Fixes #19161 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19160	2024-06-09 20:01:50 +03:00
Tomasz Grabiec	c8f71f4825	test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout The check query may be executed on a node which doesn't yet see that the downed server is down, as it is not shut down gracefully. The query coordinator can choose the down node as a CL=1 replica for read and time out. To fix, wait for all nodes to notice the node is down before executing the checking query. Fixes #17938 Closes scylladb/scylladb#19137	2024-06-09 19:39:57 +03:00
Kefu Chai	b5dce7e3d0	docs: correct the link pointing to Scylla U before this change it points to https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/ which then redirects the browser to https://university.scylladb.com/courses/scylla-operations/, but it should have point to https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/ in this change, the hyperlink is corrected. Fixes #19163 Refs `6e97b83b60` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19182	2024-06-09 19:37:21 +03:00
Avi Kivity	7b301f0cb9	Merge 'Encapsulate wasm and lua management in lang::manager service' from Pavel Emelyanov After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code. This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737) Closes scylladb/scylladb#19166 * github.com:scylladb/scylladb: code: Enlighten wasm headers usage lang: Unfriend wasm context from manager lang, cql3, schema_tables: Don't mess with db::config lang: Don't use db::config to create lua context lang: Don't use db::config to create wasm context lang: Drop manager::precompile() method cql3, schema_tables: Generalize function creation wasm: Replace startup_context with wasm_config lang: Add manager::start() method lang: Move manager to lang namespace lang: Move wasm::manager to its .cc/.hh files	2024-06-09 19:32:26 +03:00
Kefu Chai	9318d21a22	sstables: change const_iterator::value_type to uint64_t in general, the value_type of a `const_iterator` is `T` instead of `const T`, what has the const specifier is `reference`. because, when dereferencing an iterator, the value type does not matter any more, as it always a copy. and GCC-14 points this out: ``` /home/kefu/dev/scylladb/sstables/compress.hh:224:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] 224 \| value_type operator() const { \| ^~~~~~~~~~ /home/kefu/dev/scylladb/sstables/compress.hh:228:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] 228 \| value_type operator[](ssize_t i) const { \| ^~~~~~~~~~ ``` so, in this change, let's change the value_type to `uint64_t`. please note, it's not typical to return `value_type` from `operator` or `operator[]` of an iterator. but due to the design of segmented_offsets, we cannot return a reference, so let's keep it this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19186	2024-06-09 19:21:16 +03:00
Avi Kivity	b2a500a9a1	Merge 'alternator: keep TTL work in the maintenance scheduling group' from Botond Dénes Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group. Fixes: #18719 - [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this. Closes scylladb/scylladb#18729 * github.com:scylladb/scylladb: alternator, scheduler: test reproducing RPC scheduling group bug main: add maintenance tenant to messaging_service's scheduling config	2024-06-09 19:20:18 +03:00
Kefu Chai	58edee8d93	mutation/mutation_rebuilder: remove redundant std::move() GCC-14 rightfully points out: ``` /var/ssd/scylladb/mutation/mutation_rebuilder.hh: In member function ‘const mutation& mutation_rebuilder::consume_new_partition(const dht::decorated_key&)’: /var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: error: redundant move in initialization [-Werror=redundant-move] 24 \| _m = mutation(_s, std::move(dk)); \| ~~~~~~~~~^~~~ /var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: note: remove ‘std::move’ call ``` as `dk` is passed with a const reference, `std::move()` does not help the callee to consume from it. so drop the `std::move()` here. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19188	2024-06-09 19:19:37 +03:00
Nadav Har'El	13cf6c543d	test/alternator: fix flaky test test_item_latency The Alternator test test_metrics.py::test_item_latency confirms that for several operation types (PutItem, GetItem, DeleteItem, UpdateItem) we did not forget to measure their latencies. The test checked that a latency was updated by checking that two metrics increases: scylla_alternator_op_latency_count scylla_alternator_op_latency_sum However, it turns out that the "sum" is only an approximate sum of all latencies, and when the total sum grows large it sometimes does not increase when a short latency is added to the statistics. When this happens, this test fails on the assertion that the "sum" increases after an operation. We saw this happening sometimes in CI runs. The simple fix is to stop checking _sum at all, and only verify that the _count increases - this is really an integer counter that unconditionally increases when a latency is added to the histogram. Don't worry that the strength of this test is reduced - this test was never meant to check the accuracy or correctness of the histograms - we should have different (and better) tests for that, unrelated to Alternator. The purpose of this test is only to verify that for some specific operation like PutItem, Alternator didn't forget to measure its latency and update the histogram. We want to avoid a bug like we had in counters in the past (#9406). Fixes #18847. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19080	2024-06-09 19:19:09 +03:00
Botond Dénes	37fd568139	sstables/compress.hh: remove unused forward declaration struct compress if forward declared right before its definition. At some point in the past there was probably some code there using it, but now its gone so remove it. Closes scylladb/scylladb#19168	2024-06-09 17:52:05 +03:00
Guilherme Nogueira	cf157e4423	Remove comma that breaks CQL DML on tablets.rst The current sample reads: ```cql CREATE KEYSPACE my_keyspace WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3, } AND tablets = { 'enabled': false }; ``` The additional comma after `'replication_factor': 3` breaks the query execution. Closes scylladb/scylladb#19177	2024-06-09 14:58:13 +03:00
Botond Dénes	6e3b997e04	docs: nodetool status: document keyspace and table arguments Also fix the example nodetool status invocation. Fixes: #17840 Closes scylladb/scylladb#18037	2024-06-09 00:37:12 +02:00
Kefu Chai	f4706be8a8	test: test_topology_ops: adapt to tablets in `e7d4e080`, we reenabled the background writes in this test, but when running with tablets enabled, background writes are still disabled because of #17025, which was fixed last week. so we can enable background writes with tablets. in this change, * background writes are enabled with tablets. * increase the number of nodes by 1 so that we have enough nodes to fulfill the needs of tablets, which enforces that the number of replicas should always satisfy RF. * pass rf to `start_writes()` explicitly, so we have less magic numbers in the test, and make the data dependencies more obvious. Fixes #17589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18707	2024-06-08 17:46:37 +02:00
Dawid Medrek	a5528a2093	db/hints: Log when ignoring invalid hint directories In `58784cd`, `aa4b06a` and other commits migrating hinted handoff from IPs to host IDs (scylladb/scylladb#15567), we started ignoring hint directories of invalid names, i.e. those that represent neither an IP address, nor a host ID. They remain on disk and are taken into account while computing e.g. the total size of hints, but they're not used in any way. These changes add logs informing the user when Scylla encounters such a directory. Closes scylladb/scylladb#17566	2024-06-07 19:19:15 +02:00
Michał Chojnowski	fee48f67ef	storage_proxy: avoid infinite growth of _throttled_writes storage_proxy has a throttling mechanism which attempts to limit the number of background writes by forcefully raising CL to ALL (it's not implemented exactly like that, but that's the effect) when the amount of background and queued writes is above some fixed threshold. If this is applied to a write, it becomes "throttled", and its ID is appended to into _throttled_writes. Whenever the amount of background and queued writes falls below the threshold, writes are "unthrottled" — some IDs are popped from _throttled_writes and the writes represented by these IDs — if their handlers still exist — have their CL lowered back. The problem here is that IDs are only ever removed from _throttled_writes if the number of queued and background writes falls below the threshold. But this doesn't have to happen in any finite time, if there's constant write pressure. And in fact, in one load test, it hasn't happened in 3 hours, eventually causing the buffer to grow into gigabytes and trigger OOM. This patch is intended to be a good-enough-in-practice fix for the problem. Fixes scylladb/scylladb#17476 Fixes scylladb/scylladb#1834 Closes scylladb/scylladb#19136	2024-06-07 15:56:23 +02:00
Gleb Natapov	34cf5c81f6	group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group Currently they both run in streaming group and it may become busy during repair/mv building and affect group0 functionality. Move it to the gossiper group where it should have more time to run. Fixes scylladb/scylladb#18863 Closes scylladb/scylladb#19138	2024-06-07 15:31:44 +02:00
Pavel Emelyanov	bebd121936	code: Enlighten wasm headers usage Now when function context creation is encapsulated in lang::manager, some .cc files can stop using wasm-specific headers and just go with the lang/manager.hh one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	ceebbc5948	lang: Unfriend wasm context from manager The friendship was needed to get engine and instance cache from manager, but there's a shorter way to create cotnext with the info it needs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	b0ffc03599	lang, cql3, schema_tables: Don't mess with db::config Not function context creation is encapsulated in lang::manager so it's possible to patch-out few more places that use database as config provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	b854bf4b83	lang: Don't use db::config to create lua context Similarly to previous patch, lua context needs db::config for creation. It's better to get the configurables via lang::manager::config. One thing to note -- lua config carries updateable_values on board, but respective db::config options and _not_ LiveUpdate-able, so the lua config could just use simple data types. This patch keeps updateable values intact for brevity. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	783ccc0a74	lang: Don't use db::config to create wasm context The managerr needs to get two "fuel" configurables from db::config in order to create context. Instead of carrying db config from callers, keep the options on existing lang::manager::config and use them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	f277bd89f5	lang: Drop manager::precompile() method It's not helping much any longer. Manager can call wasm:: stuff directly with less code involved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	882b2f4e9f	cql3, schema_tables: Generalize function creation When a function is created with the CREATE FUNCTION statement, the statement handler does all the necessary preparations on its own. The very same code exists in schema_tables, when the function is loaded on boot. This patch generalizes both and keeps function language-specific context creation inside lang/ code. The creation function returns context via argument reference. It would have been nicer if it was returned via future<>, but it's not suitable for future<T> type :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	fe7ff7172d	wasm: Replace startup_context with wasm_config The lang::manager starts with the help of a context because it needs to have std::shared_ptr<> pointg to cross-shard shared wasm engine and runner thread. For that a context is created in advance, that then helps sharing the engine and runner across manager instances. This patch removes the "context" and replaces it with classical manager::config. With it, it's lang::manager who's now responsible for initializing itself. In order to have cross-shard engine and thread pointers, the start() method uses invoke_on_others() facility to share the pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	0dad72b736	lang: Add manager::start() method Just like any other sharded<> service, the lang::manager now starts and stops in a classical sequence of await sharded<manager>::start() defer([] { await sharded<manager>::stop() }) await sharded<manager>::invoke_on_all(&manager::start) For now the method is no-op, next patches will start using it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	f950469af5	lang: Move manager to lang namespace And, while at it, rename local variable to refer to it to as "manager" not "wasm". Query processor and database also have getters named "wasm()", these are not renamed yet to keep patch smaller (and those getters are going to be reworked further anyway). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	1dec79e97d	lang: Move wasm::manager to its .cc/.hh files It's going to become a facade in front of both -- wasm and lua, so keep it in files with language independent names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Marcin Maliszkiewicz	c13fea371c	cql3: always return created event in create ks/table/type/view statement In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS and later USE KEYSPACE it can happen that schema in driver's session is out of sync because it synces when it receives special message from CREATE KEYSPACE response. Similar situation occurs with other schema change statements. In this patch we fix only create keyspace/table/type/view statements by always sending created event. Behavior of any other schema altering statements remains unchanged.	2024-06-07 10:36:40 +02:00
Marcin Maliszkiewicz	f6108a72d3	cql3: auth: move auto-grant closer to resource creation code This should reduce the risk of re-introducing issue similar to the one fixed in `ab6988c52f` When grant code is closer to actual creation code (announcing mutations) there is lower chance of those two effects being triggered differently, if we ever call grant_permissions_to_creator and not announce mutations that's very likely a security vulnerability. Additionally comment was rewritten to be more accurate.	2024-06-07 10:26:32 +02:00
Piotr Dulikowski	e18aeb2486	Merge 'mv: gossip the same backlog if a different backlog was sent in a response' from Wojciech Mitros Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog - this can be observed in the following scenario: 1. Cluster starts, all nodes gossip their empty view update backlog to one another 2. On node N, `view_update_backlog_broker` (the backlog gossiper) performs an iteration of its backlog update loop, sees no change (backlog has been empty since the start), schedules the next iteration after 1s 3. Within the next 1s, coordinator (different than N) sends a write to N causing a remote view update (which we do not wait for). As a result, node N replies immediately with an increased view update backlog, which is then noted by the coordinator. 4. Still within the 1s, node N finishes the view update in the background, dropping its view update backlog to 0. 5. In the next and following iterations of `view_update_backlog_broker` on N, backlog is empty, as it was in step 2, so no change is seen and no update is sent due to the check ``` auto backlog = _sp.local().get_view_update_backlog(); if (backlog_published && backlog_published == backlog) { sleep_abortable(gms::gossiper::INTERVAL, _as).get(); continue; } ``` After this scenario happens, the coordinator keeps an information about an increased view update backlog on N even though it's actually already empty This patch fixes the issue this by notifying the gossip that a different backlog was sent in a response, causing it to send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461 Similarly to https://github.com/scylladb/scylladb/pull/18646, without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none Tests: manual. Currently this patch only affects the length of MV flow control delay, which is not reliable to base a test on. A proper test will be added when MV admission control is added, so we'll be able to base the test on rejected requests Closes scylladb/scylladb#18663 github.com:scylladb/scylladb: mv: gossip the same backlog if a different backlog was sent in a response node_update_backlog: divide adding and fetching backlogs	2024-06-07 10:20:21 +02:00
Marcin Maliszkiewicz	281c06ba2e	cql3: extract create ks/table/type/view event code So that the code in subsequent commit is cleaner. Create function/aggregate code was not changed as it would require bigger refactor.	2024-06-07 10:07:50 +02:00
Wojciech Mitros	4aa7ada771	exceptions: make view update timeouts inherit from timed_out_error Currently, when generating and propagating view updates, if we notice that we've already exceeded the time limit, we throw an exception inheriting from `request_timeout_exception`, to later catch and log it when finishing request handling. However, when catching, we only check timeouts by matching the `timed_out_error` exception, so the exception thrown in the view update code is not registered as a timeout exception, but an unknown one. This can cause tests which were based on the log output to start failing, as in the past we were noticing the timeout at the end of the request handling and using the `timed_out_error` to keep processing it and now, even though we do notice the timeout even earlier, due to it's type we log an error to the log, instead of treating it as a regular timeout. In this patch we make the error thrown on timeout during view updates inherit from `timed_out_error` instead of the `request_timeout_exception` (it is also moved from the "exceptions" directory, where we define exceptions returned to the user). Aside from helping with the issue described above, we also improve our metrics, as the `request_timeout_exception` is also not checked for in the `is_timeout_exception` method, and because we're using it to check whether we should update write timeout metrics, they will only start getting updated after this patch. Closes scylladb/scylladb#19102	2024-06-07 09:54:48 +02:00
Kefu Chai	01568a36a5	.github: add workflow to build with clang nightly to be prepared for changes from clang, and enjoy the new warnings/errors from this compiler. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 14:23:06 +08:00
Kefu Chai	bbeabe2989	.github: rename clang-tidy-matcher.json to clang-matcher.json as the matcher actually applies to all warnings from clang frontend, and hence can be reused when building the tree with clang, so let's rename it before using it in the clang build workflows. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 14:23:06 +08:00
Anna Stuchlik	582bafabb3	doc: set 6.0 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: 6.0 is the latest version. 6.0 is removed from the list of unstable versions. It must be merged when ScyllaDB 6.0 is released. No backport is required. Closes scylladb/scylladb#19003	2024-06-07 09:13:56 +03:00
Kefu Chai	571ab9f5f0	config: expand on rpc_keepalive's description before this change, we use "RPC or native". but before thrift support is removed "RPC" implies "thrift", now that we've dropped thrift support, "RPC" could be confusing here, so let's be more specific, and put all connection types in place of "RPC or native". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	c75442bc2a	api: s/rpc/thrift/ replace all occurrences of "rpc" in function names and debugging messages to "thrift", as "rpc" is way too general, and since we are removing "thrift" support, let's take this opportunity to use a more specific name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	36239ec592	db/system_keyspace: drop thrift_version from system.local table so we don't create new sstables with this unused column, but we can still open old sstables of this table which was created with the old schema. Refs #3811 Refs #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	f688fa16bc	transport: do not return client_type from cql_server::connection::make_client_key() since we've dropped the thift support, the `client_type` is always `cql`, there is no need to differentiate different clients anymore. so, we change `make_client_key()` so that it only return the IP address and port. Refs #3811 Refs #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:06 +08:00
Kefu Chai	0e04a033af	.github: add alternator to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:45:00 +08:00
Kefu Chai	a2f54ded80	alternator: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:45:00 +08:00
Kefu Chai	0ff66bf564	.github: change severity to error in clang-include-cleaner since we've addressed all warnings, we are ready to tighten the standards of this workflow, so that contributors are awared of the violation of include-what-you-use policy. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:28:52 +08:00
Kefu Chai	d33ab21ef8	exceptions: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:28:52 +08:00
Kefu Chai	ad649be1bf	treewide: drop thrift support thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. * `rpc_port` and `start_rpc` options are preserved, but they are marked as "Unused". so that the new release of scylladb can consume existing scylla.yaml configurations which might contain these settings. by making them deprecated, user will be able get warned, and update their configurations before we actually remove them in the next major release. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 06:44:59 +08:00
Avi Kivity	cd553848c1	Merge 'auth-v2: use a single transaction in auth related statements ' from Marcin Maliszkiewicz Due to gradual raft introduction into statements code in cases when single statement modified more than one table or mutation producing function was composed out of simpler ones we violated transactional logic and statement execution was not atomic as whole. This patch changes that, so now either all changes resulting from statement execution are applied or none. Affected statements types are: - schema modification - auth modifications - service levels modifications Fixes https://github.com/scylladb/scylladb/issues/17738 Closes scylladb/scylladb#17910 * github.com:scylladb/scylladb: raft: rename mutations_collector to group0_batch raft: rename announce to commit cql3: raft: attach description to each mutations collector group auth: unify mutations_generator type auth: drop redundant 'this' keyword auth: remove no longer used code from standard_role_manager::legacy_modify_membership cql3: auth: use mutation collector for service levels statements cql3: auth: use mutation collector for alter role cql3: auth: use mutation collector for grant role and revoke role cql3: auth: use mutation collector for drop role and auto-revoke auth: add refactored modify_membership func in standard_role_manager auth: implement empty revoke_all in allow_all_authorizer auth: drop request_execution_exception handling from default_authorizer::revoke_all Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" cql3: auth: use mutation collector for grant and revoke permissions cql3: extract changes_tablets function in alter_keyspace_statement cql3: auth: use mutation collector for create role statement auth: move create_role code into service auth: add a way to announce mutations having only client_state ref auth: add collect_mutations common helper auth: remove unused header in common.hh auth: add class for gathering mutations without immediate announce auth: cql3: use auth facade functions consistently on write path auth: remove unused is_enforcing function	2024-06-06 17:31:26 +03:00
Yaniv Michael Kaul	82875095e9	Raft: improve descriptions of metrics 1. Fixed a single typo (send -> sent) 2. Rephrase 'How many' to 'Number of' and use less passive tense. 3. Be more specific in the description of the different metrics insteda of the more generic descriptions. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#19067	2024-06-06 15:18:47 +03:00
Kefu Chai	bac7e1e942	doc: document "enable_tablets" option it sets the cluster feature of tablets, and is a prerequisite for using tablets. Refs #18670 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19090	2024-06-06 15:06:32 +03:00
Marcin Maliszkiewicz	63e6334a64	raft: rename mutations_collector to group0_batch	2024-06-06 13:26:34 +02:00
Kamil Braun	57e810c852	Merge 'Serialize repair with tablet migration' from Tomasz Grabiec We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requests start using the new topology. Also, if transitions are already running, repair waits for them to finish. A blocked tablet migration (e.g. due to down node) will block repair, whereas before it would fail. Once admin resolves the cause of blocked migration, repair will continue. Fixes #17658. Fixes #18561. Closes scylladb/scylladb#18641 * github.com:scylladb/scylladb: test: pylib: Do not block async reactor while removing directories repair: Exclude tablet migrations with tablet repair repair_service: Propagate topology_state_machine to repair_service main, storage_service: Move topology_state_machine outside storage_service storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() tablet_scheduler: Make disabling of balancing interrupt shuffle mode tablet_scheduler: Log whether balancing is considered as enabled	2024-06-06 11:27:03 +02:00
Kamil Braun	256517b570	Merge 'tablets: Filter-out left nodes in get_natural_endpoints()' from Tomasz Grabiec The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 Closes scylladb/scylladb#18955 * github.com:scylladb/scylladb: tablets: Filter-out left nodes in get_natural_endpoints() test: pylib: Extract start_writes() load generator utility	2024-06-06 11:23:27 +02:00
Wojciech Mitros	f70f774e40	mv: gossip the same backlog if a different backlog was sent in a response Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog. This patch changes this by notifying the gossip that a the backlog changed since the last gossip round so a different backlog could have been send through the response piggyback mechanism. With that information, gossip will send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461	2024-06-06 10:45:15 +02:00
Wojciech Mitros	272e80fe0a	node_update_backlog: divide adding and fetching backlogs Currently, we only update the backlogs in node_update_backlog at the same time when we're fetching them. This is done using storage_proxy's method get_view_update_backlog, which is confusing because it's a getter with side-effects. Additionally, we don't always want to update the backlog when we're reading it (as in gossip which is only on shard 0) and we don't always want to read it when we're updating it (when we're not handling any writes but the backlog drops due to background work finish). This patch divides the node_view_backlog::add_fetch as well the storage_proxy::get_view_update_backlog both into two methods; one for updating and one for reading the backlog. This patch only replaces the places where we're currently using the view backlog getter, more situations where we should get/update the backlog should be considered in a following patch.	2024-06-06 10:45:13 +02:00
Botond Dénes	8ff1742182	Merge 'Relax production_snitch_base's property file parsing' from Pavel Emelyanov It consists of reading method and parsing one and it uses class fields to carry data between those two. The former is additionally built with curly continuation chains, while it's naturally linear, so turn it into a coroutine while at it Closes scylladb/scylladb#18994 * github.com:scylladb/scylladb: snitch: Remove production_snitch_base::_prop_file_contents snitch: Remove production_snitch_base::_prop_file_size snitch: Coroutinize load_property_file()	2024-06-06 09:14:33 +03:00
Botond Dénes	cd10beb89d	Merge 'Don't use db::config by gossiper' from Pavel Emelyanov All sharded<service>'s a supposed to have their own config and not use global db::config one. The service config, in turn, is to be created by main/cql_test_env/whatever out of db::config and, maybe, other data. Gossiper is almost there, but it still uses db::config in few places. Closes scylladb/scylladb#19051 * github.com:scylladb/scylladb: gossiper: Stop using db::config gossiper: Move force_gossip_generation on gossip_config gossiper: Move failure_detector_timeout_ms on gossip_config main: Fix indentation after previous patch main: Make gossiper config a sharded parameter main: Add local variable for set of seeds main: Add local variable for group0 id main: Add local variable for cluster_name	2024-06-06 09:12:51 +03:00
Botond Dénes	44975abe18	Merge 'Sanitize start-stop of protocol servers' from Pavel Emelyanov Protocol servers are started last, and are registered in storage_service, which stops them. Also there are deferred actions scheduled to stop protocol servers on aborted start and a FIXME asking to make even this case rely on storage_service. Also, there's a (rather rare) aborted-start bug in alternator and redis. Yet, thrift can be left started in some weird circumstances. This patch fixes it all. As a side effect, the start-stop code becomes shorter and a bit better structured. refs: #2737 Closes scylladb/scylladb#19042 * github.com:scylladb/scylladb: main: Start alternator expiration service earlier main: Start redis transparently main: Start alternator transparently main: Start thrift transparently main: Start native transport transparently storage_service: Make register_protocol_server() start the server storage_service: Turn register_protocol_server() async method storage_service: Outline register_protocol_server() main: Schedule deferred drain_on_shutdown() prior to protocol servers main: Move some trailing startup earlier	2024-06-06 09:08:05 +03:00
Botond Dénes	db5c23491e	Merge '.github: annotate the report from clang-include-cleaner' from Kefu Chai this series * add annotation to the github pull request when extraneous `#include` processor macros are identified * add `exceptions` subdirectory to `CLEANER_DIRS` to demonstrate the annotation. we will fix the identified issue in a follow-up change. --- * This is a CI workflow improvement. No backporting is required. Closes scylladb/scylladb#19037 * github.com:scylladb/scylladb: .github: add exception to CLEANER_DIRS .github: annotate the report from clang-include-cleaner .github: build headers before running clang-include-cleaner	2024-06-06 09:02:26 +03:00
Pavel Emelyanov	acc438e98b	view-update-generator: Start in provided scheduling group Currently it gets the streaming/maintenance one from database, but it can as well just assume that it's already running in the correct one, and the main code fulfils this assumption. This removes one more place that uses database as sched groups provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19078	2024-06-06 08:58:05 +03:00
Tzach Livyatan	c30f81c389	Docs: fix start command in Update replace-dead-node.rst Fix #18920 Closes scylladb/scylladb#18922	2024-06-06 08:56:07 +03:00
Botond Dénes	7aa9bfa661	Merge 'util/result_try: pass template arg list explicitly' from Kefu Chai clang-19 introduced a change which enforces the change proposed by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96), which was accepted by C++20 in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), as [[temp.names]p5](https://eel.is/c++draft/temp.names#6). so, to be future-proof and to be standard compliant, let's pass the template arguments. otherwise we'd have build failure like ``` error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] ``` --- no need to backport. as this change only addresses a FTBFS with a recent build of clang-19. but our CI is not a clang built from llvm's main HEAD. Closes scylladb/scylladb#19100 * github.com:scylladb/scylladb: util/result_try: pass template arg list explicitly util/result_try: pass func as `const F&` instead of `F&&`	2024-06-06 08:54:42 +03:00
Nadav Har'El	b5fd854c77	cql-pytest: be more forgiving to ancient versions of Scylla We recently added to cql-pytest tests the ability to check if tablets are enabled or not (for some tablet-specific tests). When running tests against Cassandra or old pre-tablet versions of Scylla, this fact is detected and "False" is returned immediately. However, we still look at a system table which didn't exist on really ancient versions of Scylla, and tests couldn't run against such versions. The fix is trivial: if that system table is missing, just ignore the error and return False (i.e., no tablets). There were no tablets on such ancient versions of Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19098	2024-06-06 08:53:26 +03:00
Pavel Emelyanov	4606302ead	distributed_loader: Remove base_path from populator It's unused, populator uses it to print debugging messages, but it can as well use table->dir() for it, just as sstable_directory does. One message looks useless and is removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19113	2024-06-06 08:49:41 +03:00
Pavel Emelyanov	84f0bab27c	hints/manager: Simplify hints dir evaluation Currently the code wraps simple "if" with std::invoke over a lambda. Also, the local variable that gets the result, is declared as const one, which prevents it from being std::move()-d in the very next line. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19106	2024-06-06 08:31:30 +03:00
Pavel Emelyanov	ad0e6b79fc	replica: Remove all_datadir from keyspace config This vector of paths is only used to generate the same vector of paths for table config, but the latter already has all the needed info. It's the part of the plan to stop using paths/directories in keyspaces and tables, because with storage-options tables no longer keep their data in "files on disk", so this information goes to sstables storage manager (refs #12707) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19119	2024-06-06 08:30:34 +03:00
Kefu Chai	4a36918989	topology_coordinator: handle/wait futures when stopping topology_coordinator before this change, unlike other services in scylla, topology_coordinator is not properly stopped when it is aborted, because the scylla instance is no longer a leader or is being shut down. its `run()` method just stops the grand loop and bails out before topology_coordinator is destroyed. but we are tracking the migration state of tablets using a bunch of futures, which might not be handled yet, and some of them could carry failures. in that case, when the `future` instances with failure state get destroyed, seastar calls `report_failed_future`. and seastar considers this practice a source a bug -- as one just fails to handle an error. that's why we have following error: ``` WARN 2024-05-19 23:00:42,895 [shard 0:strm] seastar - Exceptional future ignored: seastar::rpc::unknown_verb_error (unknown verb), backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56c14e /home/bhalevy/.ccm/scylla-repository/local_tarball/libre loc/libseastar.so+0x56c770 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56ca58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x38c6ad 0x29cdd07 0x29b376b 0x29a5b65 0x108105a /home/bhalevy/.ccm/scylla-repository/local_tarbal l/libreloc/libseastar.so+0x3ff1df /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x400367 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff838 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36de58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d092 0x1017cba 0x1055080 0x1016ba7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x1015524 ``` and the backtrace looks like: ``` seastar::current_backtrace_tasklocal() at ??:? seastar::current_tasktrace() at ??:? seastar::current_backtrace() at ??:? seastar::report_failed_future(seastar::future_state_base::any&&) at ??:? service::topology_coordinator::tablet_migration_state::~tablet_migration_state() at topology_coordinator.cc:? service::topology_coordinator::~topology_coordinator() at topology_coordinator.cc:? service::run_topology_coordinator(seastar::sharded<db::system_distributed_keyspace>&, gms::gossiper&, netw::messaging_service&, locator::shared_token_metadata&, db::system_keyspace&, replica::database&, service::raft_group0&, service::topology_state_machine&, seastar::abort_source&, raft::server&, seastar::noncopyable_function<seastar::future<service::raft_topology_cmd_result> (utils::tagged_tagged_integer<raft::internal::non_final, raft::term_tag, unsigned long>, unsigned long, service::raft_topology_cmd const&)>, service::tablet_allocator&, std::chrono::duration<long, std::ratio<1l, 1000l> >, service::endpoint_lifecycle_notifier&) [clone .resume] at topology_coordinator.cc:? seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at main.cc:? seastar::reactor::run_some_tasks() at ??:? seastar::reactor::do_run() at ??:? seastar::reactor::run() at ??:? seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ??:? ``` and even worse, these futures are indirectly owned by `topology_coordinator`. so there are chances that they could be used even after `topology_coordinator` is destroyed. this is a use-after-free issue. because the `run_topology_coordinator` fiber exits when the scylla instance retires from the leader's role, this use-after-free could be fatal to a running instance due to undefined behavior of use after free. so, in this change, we handle the futures in `_tablets`, and note down the failures carried by them if any. Fixes #18745 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18991	2024-06-06 07:55:03 +03:00
Israel Fruchter	1fd600999b	Update tools/cqlsh submodule v6.0.20 * tools/cqlsh c8158555...0d58e5ce (6): > cqlsh.py: fix server side describe after login command > cqlsh: try server-side DESCRIBE, then client-side > Refactor tests to accept both client and server side describe > github actions: support testing with enterprise release > Add the tab-completion support of SERVICE_LEVEL statements > reloc/build_reloc.sh: don't use `--no-build-isolation` Closes scylladb/scylladb#18990	2024-06-06 07:32:05 +03:00
Tomasz Grabiec	2c3f7c996f	test: pylib: Fetch all pages by default in run_async Fetching only the first page is not the intuitive behavior expected by users. This causes flakiness in some tests which generate variable amount of keys depending on execution speed and verify later that all keys were written using a single SELECT statement. When the amount of keys becomes larger than page size, the test fails. Fixes #18774 Closes scylladb/scylladb#19004	2024-06-05 18:07:24 +03:00
Tomasz Grabiec	5ca54a6e88	test: pylib: Do not block async reactor while removing directories This fixes a problem where suite cleanup schedules lots of uninstall() tasks for servers started in the suite, which schedules lots of tasks, which synchronously call rmtree(). These take over a minute to finish, which blocks other tasks for tests which are still executing. In particular, this was observed to case ManagerClient.server_stop_gracefully() to time-out. It has a timeout of 60 seconds. The server was stopped quickly, but the RESTful API response was not processed in time and the call timed out when it got the async reactor.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	98323be296	repair: Exclude tablet migrations with tablet repair We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requets start using the new topology. Also, if transitions are already running, repair waits for them to finish. Fixes #17658. Fixes #18561.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	e97acf4e30	repair_service: Propagate topology_state_machine to repair_service	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	c45ce41330	main, storage_service: Move topology_state_machine outside storage_service It will be propagated to repair_service to avoid cyclic dependency: storage_service <-> repair_service	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	476c076a21	storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() Will be used later in a place which doesn't have access to storage_service but has to toplogy_state_machine. It's not necessary to start group0 operation around polling because the busy() state can be checked atomically and if it's false it means the topology is no longer busy.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	1513d6f0b0	tablet_scheduler: Make disabling of balancing interrupt shuffle mode Tests will rely on that, they will run in shuffle mode, and disable balancing around section which otherwise would be infinitely blocked by ongoing shuffling (like repair).	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	6c64cf33df	tablet_scheduler: Log whether balancing is considered as enabled	2024-06-05 16:11:22 +02:00
Benny Halevy	b2fa954d82	gms: endpoint_state: get_dc_rack: do not assign to uninitialized memory Assigning to a member of an uninitialized optional does not initialize the object before assigning to it. This resulted in the AddressSanitizer detecting attempt to double-free when the uninitialized string contained apprently a bogus pointer. The change emplaces the returned optional when needed without resorting to the copy-assignment operator. So it's not suceptible to assigning to uninitialized memory, and it's more efficient as well... Fixes scylladb/scylladb#19041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19043	2024-06-05 13:09:01 +03:00
Kamil Braun	18f5d6fd89	Merge 'Fail bootstrap if ip mapping is missing during double write stage' from Gleb Natapov If a node restart just before it stores bootstrapping node's IP it will not have ID to IP mapping for bootstrapping node which may cause failure on a write path. Detect this and fail bootstrapping if it happens. Closes scylladb/scylladb#18927 * github.com:scylladb/scylladb: raft topology: fix indentation after previous commit raft topology: do not add bootstrapping node without IP as pending test: add test of bootstrap where the coordinator crashes just before storing IP mapping schema_tables: remove unused code	2024-06-05 11:15:15 +02:00
Raphael S. Carvalho	3983f69b2d	topology_experimental_raft/test_tablets: restore usage of check_with_down `e7246751b6` incorrectly dropped its usage in test_tablet_missing_data_repair. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19092	2024-06-05 10:11:02 +02:00
Kefu Chai	b7994ee4f6	util/result_try: pass template arg list explicitly clang-19 introduced a change which enforces the change proposed by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96), which was accepted by C++20 in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), as [[temp.names]p5](https://eel.is/c++draft/temp.names#6). so, to be future-proof and to be standard compliant, let's pass the template arguments. otherwise we'd have build failure like ``` error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-05 13:19:45 +08:00
Kefu Chai	e2158a0c72	util/result_try: pass func as `const F&` instead of `F&&` as we the functor passed to `invoke()` is not a rvalue, if we specify the template parameter explicitly, clang errors out like: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -MF transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o.d -o transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -c /home/kefu/dev/scylladb/transport/server.cc In file included from /home/kefu/dev/scylladb/transport/server.cc:39: /home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke' 210 \| return Converter::template invoke<const Cb, const Ex&>(_cb, ex); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here 194 \| return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 327 \| first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)), \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 518 \| result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...); \| ^ /home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here 484 \| return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument 33 \| invoke(F&& f, Args&&... args) { \| ^ ~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke' 210 \| return Converter::template invoke<const Cb, const Ex&>(_cb, ex); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here 194 \| return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here 327 \| first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)), \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:326:79: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here 326 \| return try_catch_chain_impl<R, Converter, CatchHandlers...>::template invoke_in_try_catch<>( \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 518 \| result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...); \| ^ /home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here 484 \| return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument 33 \| invoke(F&& f, Args&&... args) { \| ^ ~~~~~ ``` so to prepare for the change to pass template parameter explicitly, let's pass `f` as a `const` reference, instead of as a rvalue refernece. also, this parameter type matches with our usage case -- we always pass a member variable `_cb` to `invoke`, and we don't expect that `invoke()` would move it away. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-05 13:19:40 +08:00
Kefu Chai	cfd6084edd	Update seastar submodule * seastar 914a4241...9ce62705 (18): > github: do not set --dpdk-machine haswell > io_tester: correct calculation of writes count > io-tester.md: update information about file size > reactor: align used hint for extent size to 128KB for XFS > Fix compilation failure on Ubuntu 22.04 > io_tester: align the used file size to 1MB > circular_buffer_fixed_capacity: arrow operator instead of . operator > posix-file-impl: Do not keep device-id on board > github: s/clang++-18/clang++/ > include: include used headers > include: include used headers > iotune: allow user to set buffer size for random IO > abort_source: add method to get exception pointer > github: cancel a job if it takes longer than 40 minutes > std-compat: remove #include:s which were added for pre C++17 > perf_tests: measure and report also cpu cycles > linux_perf_events: add user_cpu_cycles_retired > linux_perf_event: user_instructions_retired: exclude_idle Closes scylladb/scylladb#19019	2024-06-05 08:13:55 +03:00
Michał Chojnowski	c901139d07	scylla-gdb.py: print coroutine names in `scylla fiber` Enriches the output of `scylla fiber` with resolved names of coroutine resume functions. Before: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 ``` After: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` Closes scylladb/scylladb#19091	2024-06-04 22:32:17 +03:00
Pavel Emelyanov	dcc083110d	gossiper: Stop using db::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	00d8590d7e	gossiper: Move force_gossip_generation on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	e3abc5d2fd	gossiper: Move failure_detector_timeout_ms on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	53906aa431	main: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	fcab847f31	main: Make gossiper config a sharded parameter Next patches will put updateable_value's on it, but plain copy of them across shard doesn't work (see #7316) Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:26 +03:00
Pavel Emelyanov	77361e1661	main: Add local variable for set of seeds Next patch will do seeds assignment to gossiper config on each shard, so it's good to have it once, then copy around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:18:47 +03:00
Pavel Emelyanov	9c719a0a02	main: Add local variable for group0 id Next patch will do group0_id assignment to gossiper config on each shard, so it's good to have it once, then copy around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:17:58 +03:00
Pavel Emelyanov	b069544d16	main: Add local variable for cluster_name It's modified if its empty, next patch will make this code be called on each shard, so modification must happen only once Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:17:58 +03:00
Marcin Maliszkiewicz	ac0e164a6b	raft: rename announce to commit Old wording was derived from existing code which originated from schema code. Name commit better describes what we do here.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	370a5b547e	cql3: raft: attach description to each mutations collector group This description is readable from raft log table. Previously single description was provided for the whole announce call but since it can contain mutations from various subsystems now description was moved to add_mutation(s)/add_generator function calls.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	3289fbd71e	auth: unify mutations_generator type mutation_collector supports generators but it was added to /service/raft code so it couldn't depend on /auth/ but once it's added we can remove generator type from /auth/ as it can depend on /service/raft.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	64b635bb58	auth: drop redundant 'this' keyword	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	b639350933	auth: remove no longer used code from standard_role_manager::legacy_modify_membership Since we gruadually switched all auth-v2 code paths to use modify_membership it's now safe to delete unused code.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a88b7fc281	cql3: auth: use mutation collector for service levels statements This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	97a5da5965	cql3: auth: use mutation collector for alter role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a12c8ebfce	cql3: auth: use mutation collector for grant role and revoke role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	5ba7d1b116	cql3: auth: use mutation collector for drop role and auto-revoke The main theme of this commit is executing drop keyspace/table/aggregate/function statements in a single transaction together with auth auto-revoke logic. This is the logic which cleans related permissions after resource is deleted. It contains serveral parts which couldn't easily be split into separate commits mainly because mutation collector related paths can't be mixed together. It would require holding multiple guards which we don't support. Another reason is that with mutation collector the changes are announced in a single place, at the end of statement execution, if we'd announce something in the middle then it'd lead to raft concurrent modification infinite loop as it'd invalidate our guard taken at the begining of statement execution. So this commit contains: - moving auto-revoke code to statement execution from migration_listener * only for auth-v2 flow, to not break the old one * it's now executed during statement execution and not merging schemas, which means it produces mutations once as it should and not on each node separately * on_before callback family wasn't used because I consider it much less readable code. Long term we want to remove auth_migration_listener. - adding mutation collector to revoke_all * auto-revoke uses this function so it had to be changed, auth::revoke_all free function wrapper was added as cql3 layer should not use underlying_authorizer() directly. - adding mutation collector to drop_role * because it depends on revoke_all and we can't mix old and new flows * we need to switch all functions auth::drop_role call uses * gradual use of previously introduced modify_membership, otherwise we would need to switch even more code in this commit	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	9ca15a3ada	auth: add refactored modify_membership func in standard_role_manager The new function is simplified and handles only auth-v2 flow with mutation_collector (single transaction logic). It's not used in this commit and we'll switch code paths gradually in subsequent commits.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	f67761f5b6	auth: implement empty revoke_all in allow_all_authorizer There is no need to throw an exception because it was always ignored later with an empty catch block.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	75ccab9693	auth: drop request_execution_exception handling from default_authorizer::revoke_all The change applies only to auth-v2 code path. It seems nothing in the code except cdc and truncate throws this exception so it's probably dead code. I'll keep it for now in other places to not accidentally break things in auth-v1, in auth-v2 even if this exception is used it should likely fail the query because otherwise data consistency is silently violated.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	01fb43e35f	Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" This reverts commit `80ed442be2`. This logic was replaced in previous commit by dynamic cast. Hopefully even this cast will be eliminated in the future.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	0573fee2a9	cql3: auth: use mutation collector for grant and revoke permissions This is done to achieve single transaction semantics. The change includes auto-grant feature. In particular for schema related auto-grant we don't use normal mutation collector announce path but follow migration manager, this may be unified in the future.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	9ddfc2ce4b	cql3: extract changes_tablets function in alter_keyspace_statement It will be used outside this class in the following commit	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	2a6cfbfb33	cql3: auth: use mutation collector for create role statement This is done to achieve single transaction semantics. grant_permissions_to_creator is logically part of create role but its change will be included in following commits as it spans multiple usages. Additinally we disabled rollback during create role as it won't work and is not needed with single transaction logic.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	e4a83008b6	auth: move create_role code into service We need this later as we'll add condition based on legacy_mode(qp) and free function doesn't have access to qp. Moreover long term we should get rid of this weird free function pattern bloat.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	6f654675c6	auth: add a way to announce mutations having only client_state ref Statements code have only access to client_state from which it takes auth::service. It doesn't have abort_source nor group0_client so we need to add them to auth::service. Additionally since abort_source can't be const the whole announce_mutations method needs non const auth::service so we need to remove const from the getter function.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	47864b991a	auth: add collect_mutations common helper It will be used in subsequent commits.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	b2cbcb21e8	auth: remove unused header in common.hh	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	7e0a801f53	auth: add class for gathering mutations without immediate announce To achieve write atomicity across different tables we need to announce mutations in a single transaction. So instead of each function doing a separate announce we need to collect mutations and announce them once at the end.	2024-06-04 15:43:04 +02:00
Piotr Dulikowski	01ff8108c1	Merge 'db/hints: Use host ID to IP mappings to choose the ep manager to drain when node is leaving' from Dawid Mędrek In `d0f5873`, we introduced mappings IP–host ID between hint directories and the hint endpoint managers managing them. As a consequence, it may happen that one hint directory stores hints towards multiple nodes at the same time. If any of those nodes leaves the cluster, we should drain the hint directory. However, before these changes that doesn't happen – we only drain it when the node of the same host ID as the hint endpoint manager leaves the cluster. This PR fixes that draining issue in the pre-host-ID-based hinted handoff. Now no matter which of the nodes corresponding to a hint directory leaves the cluster, the directory will be drained. We also introduce error injections to be able to test that it indeed happens. Fixes scylladb/scylladb#18761 Closes scylladb/scylladb#18764 * github.com:scylladb/scylladb: db/hints: Introduce an error injection to test draining db/hints: Ensure that draining happens	2024-06-04 10:17:14 +02:00
Botond Dénes	d120f0d7d3	Merge 'tasks: introduce task manager's task folding' from Aleksandra Martyniuk Task manager's tasks stay in memory after they are finished. Moreover, even if a child task is unregistered from task manager, it is still alive since its parent keeps a foreign pointer to it. Also, when a task has finished successfully there is no point in keeping all of its descendants in memory. The patch introduces folding of task manager's tasks. Whenever a task which has a parent is finished it is unregistered from task manager and foreign_ptr to it (kept in its parent) is replaced with its status. Children's statuses of the task are dropped unless they or one of their descendants failed. So for each operation we keep a tree of tasks which contains: - a root task and its direct children (status if they are finished, a task otherwise); - running tasks and their direct children (same as above); - a statuses path from root to failed tasks. /task_manager/wait_task/ does not unregister tasks anymore. Refs: #16694. - [ ] Backport reason (please explain below if this patch should be backported or not) Requires backport to 6.0 as task number exploded with tablets. Closes scylladb/scylladb#18735 * github.com:scylladb/scylladb: docs: describe task folding test: rest_api: add test for task tree structure test: rest_api: modify new_test_module tasks: test: modify test_task methods api: task_manager: do not unregister task in /task_manager/wait_task/ tasks: unregister tasks with parents when they are finished tasks: fold finished tasks info their parents tasks: make task_manager::task::impl::finish_failed noexcept tasks: change _children type	2024-06-04 08:43:44 +03:00
Pavel Emelyanov	9e65434692	main: Start alternator expiration service earlier Prior to registering drain_on_shutdown and all the protorocl servers. To keep the natural sequence - start core - register drain-on-shutdown - start transport(s) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	d7c231ede9	main: Start redis transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	4204d7f4f9	main: Start alternator transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Also move the controller variable lower to keep it all next to each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	d3e1121793	main: Start thrift transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. It also fixes a rare bug. If thrifst is not asked to be started on boot, its deferred shutdown action isn't created, so it it's later started via the API, it won't be stopped on shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	830a87e862	main: Start native transport transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Marcin Maliszkiewicz	09b26208e9	auth: cql3: use auth facade functions consistently on write path Auth interface is quite mixed-up but general rule is that cql statements code calls auth::* free functions from auth/service.hh to execute auth logic. There are many exceptions where underlying_authorizer or underlying_role_manager or auth::service method is used instead. Service should not leak it's internal APIs to upper layers so functions like underlying_role_manager should not exists. In this commit we fix tiny fragment related to auth write path.	2024-06-03 14:27:13 +02:00
Marcin Maliszkiewicz	126c82a6f5	auth: remove unused is_enforcing function	2024-06-03 14:27:13 +02:00
Wojciech Mitros	2cafa573df	mv: update the backlogs when view updates finish Currently, the backlog used for MV flow control is only updated after we generate view updates as a result of a write request. However, when the resources are no longer used, we should also notice that to prevent excessive slowdowns caused by the MV flow control calulating the delays based of an outdated, large backlog. This patch makes it so the backlogs are updated every time a view update finishes, and not only when the updates start. Fixes #18783 Closes scylladb/scylladb#18804	2024-06-03 14:10:49 +03:00
Avi Kivity	f133ae945a	Merge 'repair: Introduce new primary replica selection algorithm for tablets' from Benny Halevy Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica or get_primary_replica_within_dc, respectively to see if this node is the primary replica for each tablet or not. Fixes https://github.com/scylladb/scylladb/issues/17752 No backport is required before 6.0 as tablets (and tablet repair) are introduced in 6.0 Closes scylladb/scylladb#18784 * github.com:scylladb/scylladb: repair: repair_tablets: use get_primary_replica repair: repair_tablets: no need to check ranges_specified per tablet locator: tablet_map: add get_primary_replica_within_dc locator: tablet_map: get_primary_replica: do not copy tablet info locator: tablet_map: get_primary_replica: return tablet_replica	2024-06-03 13:16:49 +03:00
Kefu Chai	0da0461668	build: cmake: do not scan for C++20 modules when creating the build rules using CMake 3.28 and up, it generates the rules to scan for C++20 modules for C++20 projects by default. but this slows down the compilation, and introduces unnecessary dependencies for each of the targets when building .cc files. also, it prevents the static analysis tools from running from a repo which only have its building system generated, but not yet built. as, these tools would need to process the source files just like a compiler does, and if any of the included header files is missing, they just fail. so, before we migrate to C++20 modules, let's disable this feature. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19038	2024-06-03 12:51:40 +03:00
Pavel Emelyanov	9292d326b7	storage_service: Make register_protocol_server() start the server After a protocol server is registered, it can be instantly started by the main code. It makes sense to generalize this sequence by teaching register_protocol_server() start it. For now it's a no-op change, as "start_instantly" is false by default, but next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	2aab9f6340	storage_service: Turn register_protocol_server() async method To make the next patch shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	eb033e3c5f	storage_service: Outline register_protocol_server() To make next patch shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	315ef4c484	main: Schedule deferred drain_on_shutdown() prior to protocol servers Nex patches will remove protocol servers' deferred stops and will rely on drain_on_shutdown -> stop_transport to do it, so the drain deferred action some come before protocol servers' registration. This also fixes a bug. Currently alternator and redis both rely on protocol servers to stop them on shutdown. However, when startup is aborted prior to drain_on_shutdown() registration, protocol servers are not stopped and alternator and redis can remain stopped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:11:04 +03:00
Pavel Emelyanov	2fa89d8696	main: Move some trailing startup earlier The set_abort_on_ebadf() call and some api endpoints registration come after protocol servers. The latter is going to be shuffled, so move the former earlier not to hang around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:01:24 +03:00
Kefu Chai	c6691d3217	.github: add exception to CLEANER_DIRS to cover more directories to prevent regressions of violating the "include what you use" policy in this directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 12:45:04 +08:00
Kefu Chai	21bdda550a	.github: annotate the report from clang-include-cleaner before this change, user has to click into the "Details" link for access the report from clang-include-cleaner. but this is neither convenient nor obvious. after this change, the report is annotated in the github web interface, this helps the reviewers and contributers to user this tool in a more efficient way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 12:45:04 +08:00
Kefu Chai	3d056a0cf2	.github: build headers before running clang-include-cleaner clang-include-cleaner actually interprets the preprocessor macros, and looks at the symbols. so we have to prepare the included headers before using it. so, but in ScyllaDB, we don't have a single target for building all the used headers, so we have to build them either in batch of separately. in this change, we build the included headers before running clang-include-cleaner. this allows us to run clang-include-cleaner on more source files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 11:30:31 +08:00
Nadav Har'El	95db1c60d6	test/alternator: fix a test failing on Amazon DynamoDB The test test_table.py::test_concurrent_create_and_delete_table failed on Amazon DynamoDB because of a silly typo - "false" instead of "False". A function detecting Scylla tried to return false when noticing this isn't Scylla - but had a typo, trying to return "false" instead of "False". This patch fixes this typo, and the test now works on DynamoDB: test/alternator/run --aws test_table.py::test_concurrent_create_and_delete_table Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17799	2024-06-02 22:25:56 +03:00
Avi Kivity	79d0711c7e	Merge 'tablets: load balancer: Use random selection of candidates when moving tablets' from Tomasz Grabiec In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and another one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824 Closes scylladb/scylladb#18885 * github.com:scylladb/scylladb: tablets: load balancer: Use random selection of candidates when moving tablets test: perf: Add test for tablet load balancer effectiveness load_sketch: Extract get_shard_minmax() load_sketch: Allow populating only for a given table	2024-06-02 22:03:37 +03:00
Benny Halevy	18df36d920	repair: repair_tablets: use get_primary_replica Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica* functions to get the primary replica for each tablet, possibly within a dc. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:28:39 +03:00
Benny Halevy	009767455d	repair: repair_tablets: no need to check ranges_specified per tablet The code already turns off `primary_replica_only` if `!ranges_specified.empty()`, so there's no need to check it again inside the per-tablet loop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	84761acc31	locator: tablet_map: add get_primary_replica_within_dc Will be needed by repair in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	2de79c39dc	locator: tablet_map: get_primary_replica: do not copy tablet info Currently, the function needlessly copies the tablet_info (all tablet replicas in particular) to a local variable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	c52f70f92c	locator: tablet_map: get_primary_replica: return tablet_replica This is required by repair when it will start using get_primary_replica in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Tomasz Grabiec	603abddca9	tablets: load balancer: Use random selection of candidates when moving tablets In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and the one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	7b1eea794b	test: perf: Add test for tablet load balancer effectiveness	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	c9bcb5e400	load_sketch: Extract get_shard_minmax()	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	3be6120e3b	load_sketch: Allow populating only for a given table	2024-06-02 14:23:00 +02:00
Avi Kivity	db4e4df762	alternator: yield while converting large responses to json text We have two paths for generating the json text representation, one for large items and one for small items, but the large item path is lacking: - it doesn't yield, so a response with many items will stall - it doesn't wait for network sends to be accepted by the network stack, so it will allocate a lot of memory Fix by moving the generation to a thread. This allows us to wait for the network stack, which incidentally also fixes stalls. The cost of the thread is amortized by the fact we're emitting a large response. Fixes #18806 Closes scylladb/scylladb#18807	2024-06-02 13:07:13 +03:00
Michał Jadwiszczak	5b4e688668	docs/procedures/backup-restore: use `DESC SCHEMA WITH INTERNALS` Update docs for backup procedure to use `DESC SCHEMA WITH INTERNALS` instead of plain `DESC SCHEMA`. Add a note to use cqlsh in a proper version (at least 6.0.19). Closes scylladb/scylladb#18953	2024-05-31 15:26:36 +02:00
Aleksandra Martyniuk	beef77a778	docs: describe task folding	2024-05-31 10:40:04 +02:00
Aleksandra Martyniuk	d7e80a6520	test: rest_api: add test for task tree structure Add test which checks whether the tasks are folded into their parent as expected.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	fc0796f684	test: rest_api: modify new_test_module Remove remaining test tasks when a test module is removed, so that a node could shutdown even if a test fails.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	30f97ea133	tasks: test: modify test_task methods Wait until the task is done in test_task::finish_failed and test_task::finish to ensure that it is folded into its parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	c1b2b8cb2c	api: task_manager: do not unregister task in /task_manager/wait_task/ If /task_manager/wait_task/ unregisters the task, then there is no way to examine children failures, since their statuses can be checked only through their parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	a82a2f0624	tasks: unregister tasks with parents when they are finished Unregister children that are finished from task manager. They can be examined through they parents.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	e6c50ad2d0	tasks: fold finished tasks info their parents Currently, when a child task is unregistered, it is still kept by its parent. This leads to excessive memory usage, especially when the tasks are configured to be kept in task manager after they are finished (task_ttl_in_seconds). Introduce task_essentials struct which keeps only data necesarry for task manager API. When a task which has a parent is finished, a foreign pointer to it in its parent is replaced with respective task_essentials. Once a parent task is finished it is also folded into its parent (if it has one). Children details of a folded task are lost, unless they (or some of their subtrees) failed. That is, when a task is finished, we keep: - a root task (until it is unregistered); - task_essentials of root's direct children; - a path (of task_essentials) from root to each failed task (so that the reason of a failure could be examined).	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	319e799089	tasks: make task_manager::task::impl::finish_failed noexcept	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	6add9edf8a	tasks: change _children type Keep task children in a map. It's a preparation for further changes.	2024-05-31 10:27:09 +02:00
Pavel Emelyanov	273dca6f27	query_processor: Coroutinize stop() This effectively removes "finally" block so if authorized_prepared_cache.stop() resolves with exception, the prepared_cache.stop() is skipped. But that's not a problem -- even if .stop() throws the shole scylla stop aborts so we don't really care if it was clean or not. Also, authorized_prepared_cache.stop() closes the gate and cancels the timer. None of those can resolve with exception. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19001	2024-05-31 10:22:08 +03:00
Benny Halevy	427acb393e	data_dictionary: keyspace_metadata: format: print also initial_tablets Currently, there is no indication of tablets in the logged KSMetaData. Print the tablets configuration of either the`initial` number of tablets, if enabled, or {'enabled':false} otherwise. For example: ``` migration_manager - Create new Keyspace: KSMetaData{name=tablets_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"initial":0}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004d446a8} migration_manager - Create new Keyspace: KSMetaData{name=vnodes_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"enabled":false}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004c33ea8} Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18998	2024-05-31 10:09:58 +03:00
Nadav Har'El	c786621b4c	test/cql-pytest: reproduce bug of secondary index used before built This patch adds a test reproducing for the known issue #7963, where after adding a secondary-index to a table, queries might immediately start to use this index - even before it is built - and produce wrong results. The issue is still open and unfixed, so the new test is marked "xfail". Interestingly, even though Cassandra claims to have found and fixed a similar bug in 2015 (CASSANDRA-8505), this test also fails on Cassandra - trying a query right after CREATE INDEX and before it was fully built may cause the query to fail. Refs #7963 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18993	2024-05-31 10:05:00 +03:00
Raphael S. Carvalho	b396b05e20	replica: Fix race of tablet snapshot with compaction tablet snapshot, used by migration, can race with compaction and can find files deleted. That won't cause data loss because the error is propagated back into the coordinator that decides to retry streaming stage. So the consequence is delayed migration, which might in turn reduce node operation throughput (e.g. when decommissioning a node). It should be rare though, so shouldn't have drastic consequences. Fixes #18977. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18979	2024-05-31 09:58:49 +03:00
Lakshmi Narayanan Sreethar	3d7d1fa72a	db/config.cc: increment components_memory_reclaim_threshold config default Incremented the components_memory_reclaim_threshold config's default value to 0.2 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters. Fixes #18607 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18964	2024-05-30 18:03:51 +03:00
Botond Dénes	0ead3570b4	Merge 'Run sstables loader in scheduling group' from Pavel Emelyanov Currently the loader is called via API, which inherits the maintenance scheduling group from API http server. The loader then can either do load_and_stream() or call (legacy) distributed_loader::upload_new_sstables(). The latter first switches into streaming scheduling group, but the former doesn't and continues running in the maintenance one. All this is not really a problem, because streaming sched group and maintenance sched group is one group under two different variable names. However, it's messy and worth delegating the sched group switch (even if it's a no-op) to the sstables-loader. As a nice side effect, this patch removes one place that uses database as proxy object to get configuration parameters. Closes scylladb/scylladb#18928 * github.com:scylladb/scylladb: sstables-loader: Run loading in its scheduling group sstables-loader: Add scheduling group to constructor	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	83d491af02	config: Remove experimental TABLETS feature ... and replace it with boolean enable_tablets option. All the places in the code are patched to check the latter option instead of the former feature. The option is OFF by default, but the default scylla.yaml file sets this to true, so that newly installed clusters turn tablets ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18898	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	dc588d1eef	replication_strategy: Remove unused factory_key::to_sstring() declaration Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18908	2024-05-30 18:03:51 +03:00
Anna Stuchlik	8f5c15b78f	doc: add support for Ubuntu 24.04 Closes scylladb/scylladb#18954	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	91f74989ba	snitch: Remove production_snitch_base::_prop_file_contents This fiend was used to carry string with property file contents into the parse_property_file(), but it can go with an argument just as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:55:14 +03:00
Pavel Emelyanov	1cdeabdc50	snitch: Remove production_snitch_base::_prop_file_size This field was used to carry property file size across then-lambdas, now the code is coroutinized and can live with on-stack variable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:54:30 +03:00
Pavel Emelyanov	b62aa276d1	snitch: Coroutinize load_property_file() Cleaner and easier to read this way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:54:15 +03:00
Kefu Chai	fb87ab1c75	compress, auth: include used headers before this change, we rely on `seastar/util/std-compat.hh` to include the used headers provided by stdandard library. this was necessary before we moved to a C++20 compliant standard library implementation. but since Seastar has dropped C++17 support. its `seastar/util/std-compat.hh` is not responsible for providing these headers anymore. so, in this change, we include the used header directly instead of relying on `seastar/util/std-compat.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18986	2024-05-30 09:16:23 +03:00
Kefu Chai	810da830ef	build: add sanitizer compiling options directly before this change, in order to avoid repeating/hardwiring the compiling options set by Seastar, we just inherit the compiling options of Seastar for building Abseil, as the former exposes the options to enable sanitizers. this works fine, despite that, strictly speaking, not all options are necessary for building abseil, as abseil is not a Seastar application -- it is just a C++ library. but when we introduce dependencies which are only generated at build time, and these dependencies are passed to the compiler at build time, this breaks the build of Abseil. because these dependencies are exposed by the Seastar's .pc file, and consumed by Abseil. when building Abseil, apparently, the building process driven by ninja is not started yet, so we are not able to build Abseil with these settings due to missing dependencies. so instead of inheriting the compiling options from Seastar, just set the sanitizer related compiling options directly, to avoid referencing these missing dependencies. the upside is that we pass a much smaller set of compiling options to compiler when building Abseil, the downside is that we hardwire these options related to sanitizer manually, they are also detected by Seastar's building system. but fortunately, these options are relatively stable across the building environements we support. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18987	2024-05-30 09:14:03 +03:00
Aleksandra Martyniuk	8a72324ff1	docs: add docs to task manager Closes scylladb/scylladb#18967	2024-05-30 09:05:02 +03:00
Raphael S. Carvalho	a56664b8e9	readers: combined: Avoid reallocation in prepare_forwardable_readers() reserve() is missing conditional addition of single and galloping readers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18980	2024-05-30 08:57:27 +03:00
Dawid Medrek	e855794327	db/hints: Introduce an error injection to test draining We want to verify that a hint directory is drained when any of the nodes correspodning to it leaves the cluster. The test scenario should happen before the whole cluster has been migrated to the host-ID-based hinted handoff, so when we still rely on the mappings between hint endpoint managers and the hint directories managed by them. To make such a test possible, in these changes we introduce an error injection rejecting incoming hints. We want to test a scenario when: 1. hints are saved towards a given node -- node N1, 2. N1 changes its IP to a different one, 3. some other node -- node N2 -- changes its IP to the original IP of N1, 4. hints are saved towards N2 and they are stored in the same directory as the hints saved towards N1 before, 5. we start draining N2. Because at some point N2 needs to be stopped, it may happen that some mutations towards a distributed system table generate a hint to N2 BEFORE it has finished changing its IP, effectively creating another hint directory where ALL of the hints towards the node will be stored from there on. That would disturb the test scenario. Hence, this error injection is necessary to ensure that all of the steps in the test proceed as expected.	2024-05-29 19:32:41 +02:00
Dawid Medrek	745a9c6ab8	db/hints: Ensure that draining happens Before hinted handoff is migrated to using host IDs to identify nodes in the cluster, we keep track of mappings between hint endpoint managers identified by host IDs and the hint directories managed by them and represented by IP addresses. As a consequence, it may happen that one hint directory corresponds to multiple nodes -- it's intended. See `64ba620` for more details. Before these changes, we only started the draining process of a hint directory if the node leaving the cluster corresponded to that hint directory AND was identified by the same host ID as the hint endpoint manager managing that directory. As a result, the draining did not always happen when it was supposed to. Draining should start no matter which of the nodes corresponding to a hint directory is leaving the cluster. This commit ensures that it happens.	2024-05-29 19:32:38 +02:00
Wojciech Mitros	0de3a5f3ff	test mv: remove injection delaying shutdown of a node In the test_mv_topology_change case, we use an injection to delay the view updates application, so that the ERMs have a chance to change in the process. This injection was also enabled on a new node in the test, which was later decommissioned. During the shutdown, writes were still being performed, causing view update generation and delays due to the injection which in turn delayed the node shutdown, causing the test to timeout. This patch removes the injection for the node being shut down. At the same time, the force_gossip_topology_changes=True option is also removed from its config, but for that option it's enough to enable on the first node in the cluster and all nodes use it. Fixes: https://github.com/scylladb/scylladb/issues/18941 Closes scylladb/scylladb#18958	2024-05-29 15:29:55 +02:00
Kefu Chai	a415bb07ab	sl_controller: fix a typo in comment s/necessairy/necessary/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18950	2024-05-29 16:23:31 +03:00
Nadav Har'El	4b04ed1360	test/alternator: be more forgiving on authorizer configuration The Alternator test suite usually runs on a specific configuration of Scylla set up by test.py or test/alternator/run. However, we do consider it an important design goal of this test suite that developers should be able to run these tests against any DynamoDB-API implementation, including any version Scylla manually run by the developer in any way he or she pleases. The recent commit `dc80b5dafe` changed the way we retrieve the configured autentication key, which is needed if Scylla is run with --alternator-enforce-authorization. However, the new code assumed that Scylla was also run with --authenticator PasswordAuthenticator --authorizer CassandraAuthorizer so that the default role of "cassandra" has a valid, non-null, password (namely, "cassandra"). If the developer ran Scylla manually without these options, the test initialization code broke, and all tests in the suite failed. This patch fixes this breakage. You can now run the Alternator test suite against Scylla run manually without any of the aforementioned options, and everything will work except some tests in test_authorization.py will fail as expected. This patch has no affect on the usual test.py or test/alternator/run runs, as they already run Scylla with all the aforementioned options and weren't exposed to the problem fixed here. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18957	2024-05-29 16:22:45 +03:00
Raphael S. Carvalho	578a6c1e07	replica: Only consume memtable of the tablet intersecting with range read storage_proxy is responsible for intersecting the range of the read with tablets, and calling replica with a single tablet range, therefore it makes sense to avoid touching memtables of tablets that don't intersect with a particular range. Note this is a performance issue, not correctness one, as memtable readers that don't intersect with current range won't produce any data, but cpu is wasted until that's realized (they're added to list of readers in mutation_reader_merger, more allocations, more data sources to peek into, etc). That's also important for streaming e.g. after decommission, that will consume one tablet at a time through a reader, so we don't want memtables of streamed tablets (that weren't cleaned up yet) to be consumed. Refs #18904. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18907	2024-05-29 15:58:33 +03:00
Tomasz Grabiec	0d596a425c	tablets: Filter-out left nodes in get_natural_endpoints() The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843	2024-05-29 14:49:49 +02:00
Anna Stuchlik	888d7601a2	doc: add the tablets information to the nodetool describering command This commit adds an explanation of how the `nodetool describering` command works if tablets are enabled. Closes scylladb/scylladb#18940	2024-05-29 15:31:46 +03:00
Pavel Emelyanov	e74a4b038f	Merge 'tablets: alter keyspace' from Piotr Smaron This change supports changing replication factor in tablets-enabled keyspaces. This covers both increasing and decreasing the number of tablets replicas through first building topology mutations (`alter_keyspace_statement.cc`) and then tablets/topology/schema mutations (`topology_coordinator.cc`). For the limitations of the current solution, please see the docs changes attached to this PR. Fixes: #16129 Closes scylladb/scylladb#16723 * github.com:scylladb/scylladb: test: Do not check tablets mutations on nodes that don't have them test: Fix the way tablets RF-change test parses mutation_fragments test/tablets: Unmark RF-changing test with xfail docs: document ALTER KEYSPACE with tablets Return response only when tablets are reallocated cql-pytest: Verify RF is changes by at most 1 when tablets on cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 Reject ALTER with 'replication_factor' tag Implement ALTER tablets KEYSPACE statement support Parameterize migration_manager::announce by type to allow executing different raft commands Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks Extend system.topology with 3 new columns to store data required to process alter ks global topo req Allow query_processor to check if global topo queue is empty Introduce new global topo `keyspace_rf_change` req New raft cmd for both schema & topo changes Add storage service to query processor tablets: tests for adding/removing replicas tablet_allocator: make load_balancer_stats_manager configurable by name	2024-05-29 14:17:51 +03:00
Gleb Natapov	f91db0c1e4	raft topology: fix indentation after previous commit	2024-05-29 12:11:28 +03:00
Gleb Natapov	6853b02c00	raft topology: do not add bootstrapping node without IP as pending If there is no mapping from host id to ip while a node is in bootstrap state there is no point adding it to pending endpoint since write handler will not be able to map it back to host id anyway. If the transition sate requires double writes though we still want to fail. In case the state is write_both_read_old we fail the barrier that will cause topology operation to rollback and in case of write_both_read_new we assert but this should not happen since the mapping is persisted by this point (or we failed in write_both_read_old state). Fixes: scylladb/scylladb#18676	2024-05-29 12:11:18 +03:00
Gleb Natapov	27445f5291	test: add test of bootstrap where the coordinator crashes just before storing IP mapping On the next boot there is no host ID to IP mapping which causes node to crash again with "No mapping for :: in the passed effective replication map" assertion.	2024-05-29 11:46:23 +03:00
Marcin Maliszkiewicz	1b1bc6f9bb	docs: document if not exists option for create index Closes scylladb/scylladb#18956	2024-05-29 11:35:01 +03:00
Gleb Natapov	1faef47952	schema_tables: remove unused code	2024-05-29 11:30:24 +03:00
Tomasz Grabiec	3e1ba4c859	test: pylib: Extract start_writes() load generator utility	2024-05-29 10:02:56 +02:00
Piotr Smaron	8a77a74d0e	cql: fix a crash lurking in `ks_prop_defs::get_initial_tablets` `tablets_options->erase(it);` invalidates `it`, but it's still referred to later in the code in the last `else`, and when that code is invoked, we get a `heap-use-after-free` crash. Fixes: #18926 Closes scylladb/scylladb#18936	2024-05-28 23:46:43 +03:00
Botond Dénes	aae3cfaff4	readers: compacting_reader: remove unused _ignore_partition_end This member is read-only since `ac44efea11` so remove it. Closes scylladb/scylladb#18726	2024-05-28 20:53:00 +03:00
Kefu Chai	719d53a565	service/storage_proxy: coroutinize handle_paxos_accept() for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18765	2024-05-28 20:51:10 +03:00
Nadav Har'El	00d10aa84a	alternator: clean up target string splitting This patch cleans up a bit the code in Alternator which splits up the operation's X-Amz-Target header (the second part of it is the name of the operation, e.g., CreateTable). The patch doesn't change any functionality or change performance in any meaningful way. I was just reviewing this code and was annoyed by the unnecessary variable and unnecessary creation of strings and vectors for such a simple operation - and wanted to clean it up. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18830	2024-05-28 20:42:47 +03:00
Botond Dénes	d37eca0593	test/boost/mutation_reader_test: compacting_reader_next_partition: fix partition order The test creates two partitions and passes them through the reader, but the partitions are out-of-order. This is benign but best to fix it anyway. Found after bumping validation level inside the compactor. Closes scylladb/scylladb#18848	2024-05-28 20:41:54 +03:00
Aleksandra Martyniuk	b7ae7e0b0e	test: fix test_tombstone_gc.py Tests in test_tombstone_gc.py are parametrized with string instead of bool values. Fix that. Use the value to create a keyspace with or without tablets. Fixes: #18888. Closes scylladb/scylladb#18893	2024-05-28 20:40:15 +03:00
Kefu Chai	f58f6dfe20	data_dictionary: include <variant> otherwise when compiling with the new seastar, which removed `#include <variant>` from `std-compat.hh`, the {mode}-headers target would fail to build, like: ``` ./data_dictionary/storage_options.hh:34:29: error: no template named 'variant' in namespace 'std' 10:45:15 using value_type = std::variant<local, s3>; 10:45:15 ~~~~~^ 10:45:15 ./data_dictionary/storage_options.hh:35:5: error: unknown type name 'value_type'; did you mean 'std::_Bit_const_iterator::value_type'? 10:45:15 value_type value = local{}; 10:45:15 ^~~~~~~~~~ 10:45:15 std::_Bit_const_iterator::value_type ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18921	2024-05-28 20:38:55 +03:00
Anna Stuchlik	cfa3cd4c94	doc: add the tablet limitation to the manual recovery procedure This commit adds the information that the manual recovery procedure is not supported if tablets are enabled. In addition, the content in the Manual Recovery Procedure is reorganized by adding the Prerequisites and Procedure subsections - in this way, we can limit the number of Note and Warning boxes that made the page hard to follow. Fixes https://github.com/scylladb/scylladb/issues/18895 Closes scylladb/scylladb#18935	2024-05-28 18:19:22 +02:00
Nadav Har'El	1fe8f22d89	alternator, scheduler: test reproducing RPC scheduling group bug This patch adds a test for issue #18719: Although the Alternator TTL work is supposedly done in the "streaming" scheduling group, it turned out we had a bug where work sent on behalf of that code to other nodes failed to inherit the correct scheduling group, and was done in the normal ("statement") group. Because this problem only happens when more than one node is involved, the test is in the multi-node test framework test/topology_experimental_raft. The test uses the Alternator API. We already had in that framework a test using the Alternator API (a test for alternator+tablets), so in this patch we move the common Alternator utility functions to a common file, test_alternator.py, where I also put the new test. The test is based on metrics: We write expiring data, wait for it to expire, and then check the metrics on how much CPU work was done in the wrong scheduling group ("statement"). Before #18719 was fixed, a lot of work was done there (more than half of the work done in the right group). After the issue was fixed in the previous patch, the work on the wrong scheduling group went down to zero. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-28 10:58:08 -04:00
Anna Stuchlik	2bfdb1b583	doc: document RF limitation This commit adds the information that the Replication Factor must be the same or higher than the number of nodes. Closes scylladb/scylladb#18760	2024-05-28 17:14:40 +03:00
Botond Dénes	5d3f7c13f9	main: add maintenance tenant to messaging_service's scheduling config Currently only the user tenant (statement scheduling group) and system (default scheduling group) tenants exist, as we used to have only user-initiated operations and sytem (internal) ones. Now there is need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).	2024-05-28 10:08:46 -04:00
Wojciech Mitros	519317dc58	mv: handle different ERMs for base and view table When calculating the base-view mapping while the topology is changing, we may encounter a situation where the base table noticed the change in its effective replication map while the view table hasn't, or vice-versa. This can happen because the ERM update may be performed during the preemption between taking the base ERM and view ERM, or, due to `f2ff701`, the update may have just been performed partially when we are taking the ERMs. Until now, we assumed that the ERMs are synchronized while calling finding the base-view endpoint mapping, so in particular, we were using the topology from the base's ERM to check the datacenters of all endpoints. Now that the ERMs are more likely to not be the same, we may try to get the datacenter of a view endpoint that doesn't exist in the base's topology, causing us to crash. This is fixed in this patch by using the view table's topology for endpoints coming from the view ERM. The mapping resulting from the call might now be a temporary mapping between endpoints in different topologies, but it still maps base and view replicas 1-to-1. Fixes: #17786 Fixes: #18709 Closes scylladb/scylladb#18816	2024-05-28 16:01:39 +02:00
Botond Dénes	aae263ef0a	Merge 'Harden the repair_service shutdown path' from Benny Halevy This series ignores errors in `load_history()` to prevent `abort_requested_exception` coming from `get_repair_module().check_in_shutdown()` from escaping during `repair_service::stop()`, causing ``` repair_service::~repair_service(): Assertion `_stopped' failed. ``` Fixes https://github.com/scylladb/scylladb/issues/18889 Backport to 6.0 required due to `523895145d` Closes scylladb/scylladb#18890 * github.com:scylladb/scylladb: repair: load_history: warn and ignore all errors repair_service: debug stop	2024-05-28 15:30:39 +03:00
Pavel Emelyanov	66f6001c77	test: Do not check tablets mutations on nodes that don't have them The check is performed by selecting from mutation_fragments(table), but it's known that this query crashes Scylla when there's no tablet replica on that node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Pavel Emelyanov	6e0e2674f0	test: Fix the way tablets RF-change test parses mutation_fragments When the test changes RF from 2 to 3, the extra node executes "rebuild" transition which means that it streams tablets replicas from two other peers. When doing it, the node receives two sets of sstables with mutations from the given tablet. The test part that checks if the extra node received the mutations notices two mutation fragments on the new replica and errorneously fails by seeing, that RF=3 is not equal to the number of mutations found, which is 4. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Pavel Emelyanov	2567e300d1	test/tablets: Unmark RF-changing test with xfail Now the scailing works and test must check it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Piotr Smaron	1b913dd880	docs: document ALTER KEYSPACE with tablets	2024-05-28 13:56:46 +02:00
Piotr Smaron	39181c4bf2	Return response only when tablets are reallocated Up until now we waited until mutations are in place and then returned directly to the caller of the ALTER statement, but that doesn't imply that tablets were deleted/created, so we must wait until the whole processing is done and return only then.	2024-05-28 13:56:46 +02:00
Dawid Medrek	ec5708bdee	cql-pytest: Verify RF is changes by at most 1 when tablets on This commit adds a test verifying that we can only change the RF of a keyspace for any DC by at most 1 when using tablets. Fixes #18029	2024-05-28 13:56:46 +02:00
Dawid Medrek	951915ed84	cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 We want to ensure that when the replication factor of a keyspace changes, it changes by at most 1 per DC if it uses tablets. The rationale for that is to make sure that the old and new quorums overlap by at least one node. After these changes, attempts to change the RF of a keyspace in any DC by more than 1 will fail.	2024-05-28 13:56:46 +02:00
Piotr Smaron	b875151405	Reject ALTER with 'replication_factor' tag This patch removes the support for the "wildcard" replication_factor option for ALTER KEYSPACE when the keyspace supports tablets. It will still be supported for CREATE KEYSPACE so that a user doesn't have to know all datacenter names when creating the keyspace, but ALTER KEYSPACE will require that and the user will have to specify the exact change in replication factors they wish to make by explicitly specifying the datacenter names. Expanding the replication_factor option in the ALTER case is unintuitive and it's a trap many users fell into. See #8881, #15391, #16115	2024-05-28 13:56:46 +02:00
Piotr Smaron	fbd75c5c06	Implement ALTER tablets KEYSPACE statement support This commit adds support for executing ALTER KS for keyspaces with tablets and utilizes all the previous commits. The ALTER KS is handled in alter_keyspace_statement, where a global topology request in generated with data attached to system.topology table. Then, once topology state machine is ready, it starts to handle this global topology event, which results in producing mutations required to change the schema of the keyspace, delete the system.topology's global req, produce tablets mutations and additional mutations for a table tracking the lifetime of the whole req. Tracking the lifetime is necessary to not return the control to the user too early, so the query processor only returns the response while the mutations are sent.	2024-05-28 13:56:42 +02:00
Piotr Smaron	7081215552	Parameterize migration_manager::announce by type to allow executing different raft commands Since ALTER KS requires creating topology_change raft command, some functions need to be extended to handle it. RAFT commands are recognized by types, so some functions are just going to be parameterized by type, i.e. made into templates. These templates are instantiated already, so that only 1 instances of each template exists across the whole code base, to avoid compiling it in each translation unit.	2024-05-28 13:55:11 +02:00
Piotr Smaron	80ed442be2	Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks	2024-05-28 13:55:11 +02:00
Piotr Smaron	59d3fd615f	Extend system.topology with 3 new columns to store data required to process alter ks global topo req Because ALTER KS will result in creating a global topo req, we'll have to pass the req data to topology coordinator's state machine, and the easiest way to do it is through sytem.topology table, which is going to be extended with 3 extra columns carrying all the data required to execute ALTER KS from within topology coordinator.	2024-05-28 13:55:11 +02:00
Piotr Smaron	6fd0a49b63	Allow query_processor to check if global topo queue is empty With current implementation only 1 global topo req can be executed at a time, so when ALTER KS is executed, we'll have to check if any other global topo req is ongoing and fail the req if that's the case.	2024-05-28 13:55:11 +02:00
Piotr Smaron	c174eee386	Introduce new global topo `keyspace_rf_change` req It will be used when processing ALTER KS statement, but also to create a separate processing path for a KS with tablets (as opposed to a vnode KS).	2024-05-28 13:54:48 +02:00
Kamil Braun	247eb9020b	Merge 'cdc, raft topology: fix and test cdc in the recovery mode' from Patryk Jędrzejczak This PR ensures that CDC keeps working correctly in the recovery mode after leaving the raft-based topology. We update `system.cdc_local` in `topology_state_load` to ensure a node restarting in the recovery mode sees the last CDC generation created by the topology coordinator. Additionally, we extend the topology recovery test to verify that the CDC keeps working correctly during the whole recovery process. In particular, we test that after restarting nodes in the recovery mode, they correctly use the active CDC generation created by the topology coordinator. Fixes scylladb/scylladb#17409 Fixes scylladb/scylladb#17819 Closes scylladb/scylladb#18820 * github.com:scylladb/scylladb: test: test_topology_recovery_basic: test CDC during recovery test: util: start_writes_to_cdc_table: add FIXME to increase CL test: util: start_writes_to_cdc_table: allow restarting with new cql storage_service: update system.cdc_local in topology_state_load	2024-05-28 11:53:28 +02:00
Patryk Jędrzejczak	c44d8eca15	test: test_topology_ops: run correctly without tablets This patch fixes two bugs in `test_topology_ops`: 1. The values of `tablets_enabled` were nonempty strings, so they always evaluated to `True` in the if statement responsible for enabling writing workers only if tablets are disabled. Hence, the writing workers were always disabled. 2. The `topology_experimental_raft suite` uses tablets by default, so we need a config with empty `experimental_features` to disable them. Ensuring this test works with and without tablets is considered a part of 6.0, so we should backport this patch. Closes scylladb/scylladb#18900	2024-05-28 10:08:41 +02:00
Pavel Emelyanov	ae622d711e	sstables-loader: Run loading in its scheduling group Now the loading code has two different paths, and only one of them switches sched group. It's cleaner and more natural to switch the sched group in the loader itself, so that all code paths run in it and don't care switching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:58 +03:00
Pavel Emelyanov	7fefd57b74	sstables-loader: Add scheduling group to constructor So that it knows in which group to run its code in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:22 +03:00
Nadav Har'El	b7fa5261c8	Merge 'Fix parsing of initial tablets by ALTER' from Pavel Emelyanov If the user wants to change the default initial tablets value, it uses ALTER KEYSPACE statement. However, specifying `WITH tablets = { initial: $value }` will take no effect, because statement analyzer only applies `tablets` parameters together with the `replication` ones, so the working statement should be `WITH replication = $old_parameters AND tablets = ...` which is not very convenient. This PR changes the analyzer so that altering `tablets` happens independently from `replication`. Test included. fixes: #18801 Closes scylladb/scylladb#18899 * github.com:scylladb/scylladb: cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS cql3: Fix parsing of ALTER KEYSPACE's tablets parameters cql3: Remove unused ks_prop_defs/prepare_options() argument	2024-05-27 23:10:39 +03:00
Kefu Chai	e42d83dc46	treewide: include used headers before this change, we rely on `seastar/util/std-compat.hh` to include the used headers provided by stdandard library. this was necessary before we moved to a C++20 compliant standard library implementation. but since Seastar has dropped C++17 support. its `seastar/util/std-compat.hh` is not responsible for providing these headers anymore. so, in this change, we include the used headers directly instead of relying on `seastar/util/std-compat.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18883	2024-05-27 17:34:38 +03:00
Anna Stuchlik	806dd5a68a	doc: describe Tablets in ScyllaDB This commit adds the main description of tablets and their benefits. The article can be used as a reference in other places across the docs where we mention tablets. Closes scylladb/scylladb#18619	2024-05-27 15:41:37 +02:00
Botond Dénes	2d79b0106c	Merge 'storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability. If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count. Fixes #18085. Closes scylladb/scylladb#18287 * github.com:scylladb/scylladb: test: Fix flakiness in topology_experimental_raft/test_tablets service: Use tablet read selector to determine which replica to account table stats storage_service: Fix race between tablet split and stats retrieval	2024-05-27 16:32:54 +03:00
Pavel Emelyanov	1003391ed6	cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS There's a test that checks how ALTER changes the initial tablets value, but it equips the statement with `replication` parameters because of limitations that parser used to impose. Now the `tablets` parameters can come on their own, so add a new test. The old one is kept from compatibility considerations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:27:45 +03:00
Pavel Emelyanov	a172ef1bdf	cql3: Fix parsing of ALTER KEYSPACE's tablets parameters When the `WITH` doesn't include the `replication` parameters, the `tablets` one is ignoded, even if it's present in the statement. That's not great, those two parameter sets are pretty much independent and should be parsed individually. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:25:38 +03:00
Pavel Emelyanov	8a612da155	cql3: Remove unused ks_prop_defs/prepare_options() argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:25:22 +03:00
Benny Halevy	c32c418cd5	repair: load_history: warn and ignore all errors Currently, the call to `get_repair_module().check_in_shutdown()` may throw `abort_requested_exception` that causes `repair_service::stop()` to fail, and trigger assertion failure in `~repair_service`. We alredy ignore failure from `update_repair_time`, so expand the logic to cover the whole function body. Fixes scylladb/scylladb#18889 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-27 15:57:54 +03:00
Patryk Jędrzejczak	7c1e6ba8b3	test: test_topology_ops: stop a write worker after the first error `test_topology_ops` is flaky, which has been uncovered by gating in scylladb/scylladb#18707. However, debugging it is harder than it should be because write workers can flood the logs. They may send a lot of failed writes before the test fails. Then, the log file can become huge, even up to 20 GB. Fix this issue by stopping a write worker after the first error. This test is important for 6.0, so we can backport this change. Closes scylladb/scylladb#18851	2024-05-27 13:49:30 +02:00
Piotr Dulikowski	fa142a9ce7	Merge 'qos/raft_service_level_distributed_data_accessor: print correct error message when trying to modify a service level in recovery mode' from Michał Jadwiszczak Raft service levels are read-only in recovery mode. This patch adds check and proper error message when a user tries to modify service levels in recovery mode. Fixes https://github.com/scylladb/scylladb/issues/18827 Closes scylladb/scylladb#18841 * github.com:scylladb/scylladb: test/auth_cluster/test_raft_service_levels: try to create sl in recovery service/qos/raft_sl_dda: reject changes to service levels in recovery mode service/qos/raft_sl_dda: extract raft_sl_dda steps to common function	2024-05-27 13:26:06 +02:00
Kefu Chai	cbc83f92d3	.github: add iwyu workflow iwyu is short for "include what you use". this workflow is added to identify missing "#include" and extraneous "#include" in C++ source files. This workflow is triggered when a pull request is created targetting the "master" branch. It uses the clang-include-cleaner tool provided by clang-tools package to analyze all the ".cc" and ".hh" source files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18122	2024-05-27 14:19:11 +03:00
Kefu Chai	e70b116333	api/api-doc/utils: fix a typo in description s/mintues/minutes/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18869	2024-05-27 14:15:23 +03:00
Kefu Chai	2d7545ade6	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18884	2024-05-27 14:13:51 +03:00
Piotr Smaron	06008970fb	New raft cmd for both schema & topo changes Allows executing combined topology & schema mutations under a single RAFT command	2024-05-27 12:48:44 +02:00
Piotr Smaron	cb40f13831	Add storage service to query processor Query processor needs to access storage service to check if global topology request is still ongoing and to be able to wait until it completes.	2024-05-27 12:48:44 +02:00
Paweł Zakrzewski	c888945354	tablets: tests for adding/removing replicas Note we're suppressing a UBSanitizer overflow error in UTs. That's because our linter complains about a possible overflow, which never happens, but tests are still failing because of it.	2024-05-27 12:48:44 +02:00
Paweł Zakrzewski	65deddd967	tablet_allocator: make load_balancer_stats_manager configurable by name This is needed, because the same name cannot be used for 2 separate entities, because we're getting double-metrics-registration error, thus the names have to be configurable, not hardcoded.	2024-05-27 12:48:44 +02:00
Benny Halevy	38845754c4	repair_service: debug stop Seen the following unexplained assertion failure with pytest -s -v --scylla-version=local_tarball --tablets repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc ``` INFO 2024-05-27 11:18:05,081 [shard 0:main] init - Shutting down repair service INFO 2024-05-27 11:18:05,081 [shard 0:main] task_manager - Stopping module repair INFO 2024-05-27 11:18:05,081 [shard 0:main] task_manager - Unregistered module repair INFO 2024-05-27 11:18:05,081 [shard 1:main] task_manager - Stopping module repair INFO 2024-05-27 11:18:05,081 [shard 1:main] task_manager - Unregistered module repair scylla: repair/row_level.cc:3230: repair_service::~repair_service(): Assertion `_stopped' failed. Aborting on shard 0. Backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3f040c /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x41c7a1 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dbaf /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x8e883 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dafd /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2687e /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2679a /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x36186 0x26f2428 0x10fb373 0x10fc8b8 0x10fc809 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456c6d /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456bcf 0x10fc65b 0x10fc5bc 0x10808d0 0x1080800 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff22f /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x4003b7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff888 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36dea8 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d0e2 0x101cefa 0x105a390 0x101bde7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x101a764 ``` Decoded: ``` ~repair_service at ./repair/row_level.cc:3230 ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491 ~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:569 (inlined by) seastar::shared_ptr<repair_service>::operator=(seastar::shared_ptr<repair_service>&&) at ././seastar/include/seastar/core/shared_ptr.hh:582 (inlined by) seastar::shared_ptr<repair_service>::operator=(decltype(nullptr)) at ././seastar/include/seastar/core/shared_ptr.hh:588 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:727 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2035 (inlined by) seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::smp_submit_to_options, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:367 seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:394 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:725 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(std::__invoke_other, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:114 (inlined by) std::_Function_handler<seastar::future<void> (unsigned int), seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290 ``` FWIW, gdb crashed when opening the coredump. This commit will help catch the issue earlier when repair_service::stop() fails (and it must never fail) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-27 13:02:10 +03:00
Kefu Chai	61b5bfae6d	docs: fix typos in dev documents these typos were identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18871	2024-05-27 12:28:34 +03:00
Botond Dénes	c137f84535	Merge 'Mark prepare_statement as immutable' from Pavel Emelyanov Users of prepared statement reference it with the help of "smart" pointers. None of the users are supposed to modify the object they point to, so mark the respective pointer type as `pointer<const prepared_statement>`. Also mark the fields of prepared statement itself with const's (some of them already are) Closes scylladb/scylladb#18872 * github.com:scylladb/scylladb: cql3: Mark prepared_statement's fields const cql3: Define prepared_statement weak pointer as const	2024-05-27 12:27:54 +03:00
Kefu Chai	f1f3f009e7	docs: fix typos in upgrade document s/Montioring/Monitoring/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18870	2024-05-27 12:26:59 +03:00
Patryk Jędrzejczak	2111cb01df	test: test_topology_recovery_basic: test CDC during recovery In topology on raft, management of CDC generations is moved to the topology coordinator. We extend the topology recovery test to verify that the CDC keeps working correctly during the whole recovery process. In particular, we test that after restarting nodes in the recovery mode, they correctly use the active CDC generation created by the topology coordinator. A node restarting in the recovery mode should learn about the active generation from `system.cdc_local` (or from gossip, but we don't want to rely on it). Then, it should load its data from `system.cdc_generations_v3`. Fixes scylladb/scylladb#17409	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	388db33dec	test: util: start_writes_to_cdc_table: add FIXME to increase CL	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	68b6e8e13e	test: util: start_writes_to_cdc_table: allow restarting with new cql This patch allows us to restart writing (to the same table with CDC enabled) with a new CQL session. It is useful when we want to continue writing after closing the first CQL session, which happens during the `reconnect_driver` call. We must stop writing before calling `reconnect_driver`. If a write started just before the first CQL session was closed, it would time out on the client. We rename `finish_and_verify` - `stop_and_verify` is a better name after introducing `restart`.	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	4351eee1f6	storage_service: update system.cdc_local in topology_state_load When the node with CDC enabled and with the topology on raft disabled bootstraps, it reads system.cdc_local for the last generation. Nodes with both enabled use group0 to get the last generation. In the following scenario with a cluster of one node: 1. the node is created with CDC and the topology on raft enabled 2. the user creates table T 3. the node is restarted in the recovery mode 4. the CDC log of T is extended with new entries 5. the node restarts in normal mode The generation created in the step 3 is seen in system_distributed.cdc_generation_timestamps but not in system.cdc_generations_v3, thus there are used streams that the CDC based on raft doesn't know about. Instead of creating a new generation, the node should use the generation already committed to group0. Save the last CDC generation in the system.cdc_local during loading the topology state so that it is visible for CDC not based on raft. Fixes scylladb/scylladb#17819	2024-05-27 10:39:04 +02:00
Kefu Chai	f70e888ed5	build: cmake: pass -fprofile-list to compiler to mirror the behavior of the build.ninja generated by configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18734	2024-05-27 11:22:55 +03:00
Botond Dénes	47dbf23773	Merge 'Rework view services and system-distributed-keyspace dependencies' from Pavel Emelyanov The system-distributed-keyspace and view-update-generator often go in pair, because streaming, repair and sstables-loader (via distributed-loader) need them booth to check if sstable is staging and register it if it's such. The check is performed by messing directly with system_distributed.view_build_status table, and the registration happens via view-update-generator. That's not nice, other services shouldn't know that view status is kept in system table. Also view-update-generator is a service to generae and push view updates, the fact that it keeps staging sstables list is the implementation detail. This PR replaces dependencies on the mentioned pair of services with the single dependency on view-builder (repair, sstables-loader and stream-manager are enlightened) and hides the view building-vs-staging details inside the view_builder. Along the way, some simplification of repair_writer_impl class is done. Closes scylladb/scylladb#18706 * github.com:scylladb/scylladb: stream_manager: Remove system_distributed_keyspace and view_update_generator repair: Remove system_distributed_keyspace and view_update_generator streaming: Remove system_distributed_keyspace and view_update_generator sstables_loader: Remove system_distributed_keyspace and view_update_generator distributed_loader: Remove system_distributed_keyspace and view_update_generator view: Make register_staging_sstable() a method of view_builder view: Make check_view_build_ongoing() helper a method of view_builder streaming: Proparage view_builder& down to make_streaming_consumer() repair: Keep view_builder& on repair_writer_impl distributed_loader: Propagate view_builder& via process_upload_dir() stream_manager: Add view builder dependency repair_service: Add view builder dependency sstables_loader: Add view_bulder dependency main: Start sstables loader later repair: Remove unwanted local references from repair_meta	2024-05-27 10:51:11 +03:00
Botond Dénes	e0f4d79f3b	Merge 'Do not export statement scheduling group from database' from Pavel Emelyanov Database used to be (and still is in many ways) an object used to get configuration from. Part of the configuration is the set of pre-configured scheduling groups. That's not nice, services should use each other for some real need, not as proxies to configuration. This patch patches the places that explicitly switch to statement group _not_ to use database to get the group itself. fixes: #17643 Closes scylladb/scylladb#18799 * github.com:scylladb/scylladb: database: Don't export statement scheduling group test: Use async attrs and cql-test-env scheduling groups test: Use get_scheduling_groups() to get scheduling groups api: Don't switch sched group to start/stop protocol servers main: Don't switch sched group to start protocol servers code: Switch to sched group in request_stop_server() code: Switch to server sched group in start() protocol_server: Keep scheduling group on board code: Add scheduling group to controllers redis: Coroutinize start() method	2024-05-27 10:48:33 +03:00
Kefu Chai	46d993a283	test: revert `4c1b6f04` in `4c1b6f04`, we added a concept for fmt::is_formattable<>. but it was not ncessary. the fmt::is_formattable<> trait was enough. the reason `4c1b6f04` was actually a leftover of a bigger change which tried to add trait for the cases where fmt::is_formattable<> was not able to cover. but that was based on the wrong impression that fmt::is_formattable<> should be able to work with container types without including, for instance `fmt/ranges.h`. but in `222dbf2c`, we include `fmt/ranges.h` in tests, where the range-alike formatter is used, that enables `fmt::is_formattable<>` to tell that container types are formattable. in short, `4c1b6f04` was created based on a misunderstanding, and it was a reduced type trait, which is proved to be not necessary. so, in this change, it is dropped. but the type constraints is preserved to make the build failure more explicit, if the fallback formatter does not match with the type to be formatted by Boost.test. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18879	2024-05-27 10:14:59 +03:00
Marcin Maliszkiewicz	2ab143fb40	db: auth: move auth tables to system keyspace Separate keyspace which also behaves as system brings little benefit while creating some compatibility problems like schema digest mismatch during rollback. So we decided to move auth tables into system keyspace. Fixes https://github.com/scylladb/scylladb/issues/18098 Closes scylladb/scylladb#18769	2024-05-26 22:30:42 +03:00
Avi Kivity	56d523b071	Merge 'build, test: disable operator<< for vector and unordered_map' from Kefu Chai this series disables operator<<:s for vector and unordered_map, and drop operator<< for mutation, because we don't have to keep it to work with these operator:s anymore. this change is a follow up of https://github.com/scylladb/seastar/issues/1544 this change is a cleanup. so no need to backport Closes scylladb/scylladb#18866 * github.com:scylladb/scylladb: mutation,db: drop operator<< for mutation and seed_provider_type& build: disable operator<< for vector and unordered_map db/heat_load_balance: include used header test: define a more generic boost_test_print_type test/boost: define fmt::formatter for service_level_controller_test.cc test/boost: include test/lib/test_utils.hh	2024-05-26 19:19:20 +03:00
Kefu Chai	4e9596a5a9	treewide: replace std::result_of_t with std::invoke_result_t in theory, std::result_of_t should have been removed in C++20. and std::invoke_result_t is available since C++17. thanks to libstdc++, the tree is compiling. but we should not rely on this. so, in this change, we replace all `std::result_of_t` with `std::invoke_result_t`. actually, clang + libstdc++ is already warning us like: ``` In file included from /home/runner/work/scylladb/scylladb/multishard_mutation_query.cc:9: In file included from /home/runner/work/scylladb/scylladb/schema/schema_registry.hh:11: In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/unordered_map:38: Warning: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/type_traits:2624:5: warning: 'result_of<void (noop_compacted_fragments_consumer::*(noop_compacted_fragments_consumer &))()>' is deprecated: use 'std::invoke_result' instead [-Wdeprecated-declarations] 2624 \| using result_of_t = typename result_of<_Tp>::type; \| ^ /home/runner/work/scylladb/scylladb/mutation/mutation_compactor.hh:518:43: note: in instantiation of template type alias 'result_of_t' requested here 518 \| if constexpr (std::is_same_v<std::result_of_t<decltype(&GCConsumer::consume_end_of_stream)(GCConsumer&)>, void>) { \| ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18835	2024-05-26 16:45:42 +03:00
Pavel Emelyanov	9108952a52	test/cql-pytest: Add test for token() filter againts mutation_fragments() When selecting from mutation_fragments(table) one may want to apply token() filtering againts partition key. This doesn't work currently, but used to crash. This patch adds a regression test for that refs: #18637 refs: #18768 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18759	2024-05-26 15:31:20 +03:00
Kefu Chai	125464f2d9	migration_manager: do not reference moved-away smart pointer this change is inspired by clang-tidy. it warns like: ``` [752/852] Building CXX object service/CMakeFiles/service.dir/migration_manager.cc.o Warning: /home/runner/work/scylladb/scylladb/service/migration_manager.cc:891:71: warning: 'view' used after it was moved [bugprone-use-after-move] 891 \| db.get_notifier().before_create_column_family(keyspace, view, mutations, ts); \| ^ /home/runner/work/scylladb/scylladb/service/migration_manager.cc:886:86: note: move occurred here 886 \| auto mutations = db::schema_tables::make_create_view_mutations(keyspace, std::move(view), ts); \| ^ ``` in which, `view` is an instance of view_ptr which is a type with the semantics of shared pointer, it's backed by a member variable of `seastar::lw_shared_ptr<const schema>`, whose move-ctor actually resets the original instance. so we are actually accessing the moved-away pointer in ```c++ db.get_notifier().before_create_column_family(keyspace, view, mutations, ts) ``` so, in this change, instead of moving away from `view`, we create a copy, and pass the copy to `db::schema_tables::make_create_view_mutations()`. this should be fine, as the behavior of `db::schema_tables::make_create_view_mutations()` does not rely on if the `view` passed to it is a moved away from it or not. the change which introduced this use-after-move was `88a5ddabce` Refs `88a5ddabce` Fixes #18837 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18838	2024-05-26 12:04:00 +03:00
Kefu Chai	dbfdc71d2d	treewide: fix typos in comment and error messages these typos were identified by codespell Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18868	2024-05-26 11:54:36 +03:00
Kefu Chai	35e1fcde1f	mutation,db: drop operator<< for mutation and seed_provider_type& since we've migrated away from the generic homebrew formatters for range-alike containers, there is no need to keep there operator<< around -- they were preserved in order to work with the container formatters which expect operator<< of the elements. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	9bd9f283f4	build: disable operator<< for vector and unordered_map seastar provides an option named `Seastar_DEPRECATED_OSTREAM_FORMATTERS` to enable the operator<< for `std::vector` and `std::unordered_map`, and this option is enabled by default. but we intent to avoid using them, so that we can use the fmt::formatter specializations when Boost.test prints variables. if we keep these two operator<< enabled, Boost.test would use them when printing variables to be compaired then the check fails, but if elements in the vector or unordered_map to be compaired does do not provide operator<<, compiling would fail. so, in this change, let's disable these operator<< implementations. this allows us to ditch the operator<< implementations which are preserved only for testing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	8e0a6ea021	db/heat_load_balance: include used header in this header, we use `hr_logger.trace("returned _pp={}", p)` to print a `vector<float>`, so we we need to include `fmt/ranges.h`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	4c1b6f0476	test: define a more generic boost_test_print_type fmt::is_formattable<T>::value is false, even if * T is a container of U, and * fmt::is_formattable<U>, and * U can be formatted using fmt::formatter so, we have to define a more generic boost_test_print_type() for the all types supported by {fmt}. it will help us to ditch the operator<< for vector and unordered_map in Seastar, and allow us to use the fmt::formatter specialization of the element types. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Kefu Chai	bfe918ac9e	test/boost: define fmt::formatter for service_level_controller_test.cc since we are moving away for operator<< based formatter, more and more types now only have {fmt} based formatters. the same will apply to the STL container types after ditching the generic homebrew formatter in to_string.hh, so to be prepared for the change, let's add the fmt::formatter for tests as well. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Kefu Chai	222dbf2ce4	test/boost: include test/lib/test_utils.hh this change was created in the same spirit of 505900f18f. because we are deprecating the operator<< for vector and unorderd_map in Seastar, some tests do not compile anymore if we disable these operators. so to be prepared for the change disabling them, let's include test/lib/test_utils.hh for accessing the printer dedicated for Boost.test. and also '#include <fmt/ranges.h>' when necessary, because, in order to format the ranges using {fmt}, we need to use fmt/ranges.h. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Pavel Emelyanov	cf564d7a54	cql3: Mark prepared_statement's fields const Not only users of prepared_statement point to immutable object, but the class itself doesn't assume modifications of its fields, so mark them const too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-25 16:41:30 +03:00
Pavel Emelyanov	828862bdff	cql3: Define prepared_statement weak pointer as const The pointer points to immutable prepared_statement, so tune up the type respectively. Tracing has its own alieas for it, fix one too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-25 16:40:35 +03:00
Michał Chojnowski	de798775fd	test: test_coordinator_queue_management: wait for logs properly The modified lines of code intend to await the first appearance of a log on one of the nodes. But due to misplaced parentheses, instead of creating a list of log-awaiting tasks with a list comprehension, they pass a generator expression to asyncio.create_task(). This is nonsense, and it fails immediately with a type error. But since they don't actually check the result of the await, the test just assumes that the search completed successfully. This was uncovered by an upgrade to Python 3.12, because its typing is stronger and asyncio.create_task() screams when it's passed a regular generator. This patch fixes the bad list comprehension, and also adds an error check on the completed awaitables (by calling `await` on them). Fixes #18740 Closes scylladb/scylladb#18754	2024-05-25 10:54:44 +03:00
Pavel Emelyanov	31edab277a	database: Don't export statement scheduling group Now all the code gets this group from elsewhere and the method can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	ddc511872e	test: Use async attrs and cql-test-env scheduling groups Continuation of the prevuous patch, but with its own flavor. There's a manual test that wants to run seastar thread in statement scheduling group and gets one from database. This patch makes it get the group from cql-test-env and, while at it, makes it switch to that group using thread attributes passed to async() method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	2e3a057db1	test: Use get_scheduling_groups() to get scheduling groups There's such a helper in cql-test-env that other tests use to get sched groups from. Few other tests (ab)use databse for that, this patch fixes those remnants. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	d86a8252d4	api: Don't switch sched group to start/stop protocol servers All the protocol servers implementations now maintain scheduling group on their own, so the API handler can stop caring Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	ee0239b2ef	main: Don't switch sched group to start protocol servers Now each of them does this switch on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	7c76a35e0b	code: Switch to sched group in request_stop_server() This method is used to stop protocol server in the runtime (via the API). Since it's not just "kick it and wait to wrap up", it's needed to perform this in the inherited sched group too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	fe349a73c8	code: Switch to server sched group in start() This patch makes all protocol servers implementations use the inherited sched group in their start methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:56:02 +03:00
Pavel Emelyanov	bf5894cc69	protocol_server: Keep scheduling group on board The groups is now mandatory for the real protocol server implementation to initialize. Previous patch make all of them get the sched group as constructor argument, so that's where to take it from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:54:29 +03:00
Pavel Emelyanov	fc3c3e1099	code: Add scheduling group to controllers There are four of them currently -- transport, thrift, alternator and redis. This patch makes main pass to all the statement scheduling group as constructor argument. Next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:53:16 +03:00
Pavel Emelyanov	82511f3c25	redis: Coroutinize start() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:52:48 +03:00
Michał Jadwiszczak	af0b6bcc56	test/auth_cluster/test_raft_service_levels: try to create sl in recovery	2024-05-23 17:49:59 +02:00
Pavel Emelyanov	8906126a2c	stream_manager: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:56 +03:00
Pavel Emelyanov	84ef6a8179	repair: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:56 +03:00
Pavel Emelyanov	ae2dcdc7c2	streaming: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:55 +03:00
Pavel Emelyanov	afa94d2837	sstables_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	b728857954	distributed_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	66a8035b64	view: Make register_staging_sstable() a method of view_builder Callers of it had just checked if an sstable still has some views building, so the should talk to view-builder to register the sstable that's now considered to be staging. Effectively. this is to hide the view-update-generator from other services and make them communicate with the builder only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	92ff0d3fc3	view: Make check_view_build_ongoing() helper a method of view_builder This helper checks if there's an ongoing build of a view, and it's in fact internal to view-builder, who keeps its status in one of its system tables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	57517d5987	streaming: Proparage view_builder& down to make_streaming_consumer() Continuation of the previous patch. Repair itself doesn't need it, but streaming consumer does. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:46 +03:00
Pavel Emelyanov	5e6893075d	repair: Keep view_builder& on repair_writer_impl Preparation patch, next patches will make use of this new member Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:29 +03:00
Pavel Emelyanov	0d946a5fdf	distributed_loader: Propagate view_builder& via process_upload_dir() Preparation to next patches, they'll make use of this new argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	d917b06857	stream_manager: Add view builder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f0f1097d0c	repair_service: Add view builder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f269a37541	sstables_loader: Add view_bulder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	ff63f8b1a5	main: Start sstables loader later This service is on its own, nothing depends on it. Neither it can work before system distributed keyspace is started, so move it lower. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f4341ea088	repair: Remove unwanted local references from repair_meta When constructed, the class copies local references to services just to push them into make_repair_writer() later in the same initializers list. There's no need in keeping those references. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Marcin Maliszkiewicz	9adf74ae6c	docs: remove note about performance degradation with default superuser This doesn't apply for auth-v2 as we improved data placement and removed cassandra quirk which was setting different CL for some default superuser involved operations. Fixes #18773 Closes scylladb/scylladb#18785	2024-05-23 13:16:11 +03:00
Kefu Chai	dfeef4e4e8	build: use f-string when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18808	2024-05-23 11:19:39 +03:00
Anna Stuchlik	2da25cca1a	doc: enable publishing docs for branch-6.0 This commit enables publishing documentation from branch-6.0. The docs will be published as UNSTABLE (the warning about version 6.0 being unstable will be displayed). Closes scylladb/scylladb#18832	2024-05-23 10:37:55 +03:00
Michał Jadwiszczak	ee08d7fdad	service/qos/raft_sl_dda: reject changes to service levels in recovery mode When a cluster goes into recovery mode and service levels were migrated to raft, service levels become temporarily read-only. This commit adds a proper error message in case a user tries to do any changes.	2024-05-23 08:18:03 +02:00
Michał Jadwiszczak	2b56158d13	service/qos/raft_sl_dda: extract raft_sl_dda steps to common function When setting/dropping a service level using raft data accessor, the same validation steps are executed (this_shard_id = 0 and guard is present). To not duplicate the calls in both functions, they can be extracted to a helper function.	2024-05-23 08:16:00 +02:00
Raphael S. Carvalho	e7246751b6	test: Fix flakiness in topology_experimental_raft/test_tablets One source of flakiness is in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode due to gossiper being aborted prematurely, and causing reconnection storm. Another is test_tablet_missing_data_repair which is flaky due an issue in python driver that session might not reconnect on rolling restart (tracked by https://github.com/scylladb/python-driver/issues/230) Refs #15356. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 17:02:29 -03:00
Raphael S. Carvalho	eb8ef38543	replica: Fix tablet's compaction_groups_for_token_range() with unowned range File-based tablet streaming calls every shard to return data of every group that intersects with a given range. After dynamic group allocation, that breaks as the tablet range will only be present in a single shard, so an exception is thrown causing migration to halt during streaming phase. Ideally, only one shard is invoked, but that's out of the scope of this fix and compaction_groups_for_token_range() should return empty result if none of the local groups intersect with the range. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18798	2024-05-22 20:15:33 +03:00
Anna Stuchlik	6626d72520	doc: replace Raft-disabled with Raft-enabled procedure This commit fixes the incorrect Raft-related information on the Handling Cluster Membership Change Failures page introduced with https://github.com/scylladb/scylladb/pull/17500. The page describes the procedure for when Raft is disabled. Since 6.0, Raft for consistent schema management is enabled and mandatory (cannot be disabled), this commit adds the procedure for Raft-enabled setups. Closes scylladb/scylladb#18803	2024-05-22 17:45:20 +02:00
David Garcia	de2b30fafd	docs: docs: autogenerate metrics Autogenerates metrics documentation using the scripts/get_description.py script introduced in #17479 docs: add beta Closes scylladb/scylladb#18767	2024-05-22 15:49:41 +03:00
Raphael S. Carvalho	551bf9dd58	service: Use tablet read selector to determine which replica to account table stats Since we introduced the ability to revert migrations, we can no longer rely on ordering of transition stages to determine whether to account pending or leaving replica. Let's use read selector instead, which correctly has info which replica type has correct stats info. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 09:25:29 -03:00
Raphael S. Carvalho	abcc68dbe7	storage_service: Fix race between tablet split and stats retrieval If tablet split is finalized while retrieving stats, the saved erm, used by all shards, will be invalidated. It can either cause incorrect behavior or crash if id is not available. It's worked by feeding local tablet map into the "coordinator" collecting stats from all shards. We will also no longer have a snapshot of erm shared between shards to help intra-node migration. This is simplified by serializing token metadata changes and the retrieval of the stats (latter should complete pretty fast, so it shouldn't block the former for any significant time). Fixes #18085. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 09:25:29 -03:00
Yaron Kaikov	9cc42c98f5	[Mergify] update configuration for 6.0 Updating mergify conf to support 6.0 release Closes scylladb/scylladb#18823	2024-05-22 14:28:43 +03:00
Yaron Kaikov	219daf3489	Update ScyllaDB version to: 6.1.0-dev	2024-05-22 14:08:56 +03:00
Botond Dénes	2f87bfd634	Update tools/java submodule * tools/java 4ee15fd9...88809606 (2): > Update Scylla Java driver to 3.11.5.3. > install-dependencies.sh: s/python/python3/ [botond: regenerate toolchain image] Closes scylladb/scylladb#18790	2024-05-22 11:39:02 +03:00
Asias He	1a03e3d5ae	repair: Add missing db/config.hh Since commit `952dfc6157` "repair: Introduce repair_partition_count_estimation_ratio config option", get_config() is used. We need to include db/config.hh for that. Spotted when backporting to 5.4 branch. Refs #18615 Closes scylladb/scylladb#18780	2024-05-22 11:00:16 +03:00
Nadav Har'El	dc80b5dafe	test/alternator: do not write to auth tables As part of the Alternator test suite, we check Alternator's support for authentication. Alternator maps Scylla's existing CQL roles to AWS's authentication: * AWS's access_key_id <- the name of the CQL role * AWS's secret_access_key <- the salted hash of the password of the CQL role Before this patch, the Alternator test suite created a new role with a preset salted hash (role "alternator", salted hash "secret_pass") and than used that in the tests. However, with the advent of Raft-based metadata it is wrong to write directly to the roles table, and starting with #17952 such writes will be outright forbidden. But we don't actually need to create a new CQL role! We already have a perfectly good CQL role called "cassandra", and our tests already use it. So what this patch does is to have the Alternator tests (conftest.py) read from the roles system-table the salted hash of the "cassandra" role, and then use that - instead of the hard-coded pair alternator/secret_pass - in the tests. A couple more tests assumed that the role name that was used was "alternator", but now it was changed to "cassandra" so those tests needed minor fixes as well. After this patch, the Alternator tests no longer write to the roles system table. Moreover, after this patch, test/alternator/run and test/alternator/suite.yaml (used when testing with test.py) no longer need to do extra ugly CQL setup before starting the Alternator tests. Fixes #18744 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18771	2024-05-22 11:00:15 +03:00
Avi Kivity	c37f2c2984	version: bump version to 6.0.0-dev The next release will be called 6.0, not 5.5, so bump the version to reflect that. Closes scylladb/scylladb#18789	2024-05-22 11:00:15 +03:00
Kefu Chai	0610eda1b5	Update seastar submodule * seastar 42f15a5f...914a4241 (33): > sstring: deprecate formatters for vector and unordered_map > github: use fedora:40 image for testing > github: add 2 testing combinations back to the matrix > github: extract test.yaml into a resusable workflow > build: use initial-exec TLS when building seastar as shared library > coroutine: preserve this->container before calling dtor > smp: allocate hugepages eagerly when kernel support is available > shared_mutex: Add tests for std::shared_lock and std::unique_lock > shared_mutex: Add RAII locks > README.md: replace C++17 with C++23 > treewide: do not check for SEASTAR_COROUTINES_ENABLED > build: support enabled options when building seastar-module > treewide: include required header files > build: move add_subdirectory(src) down > README.md: replace CircleCI badge with GitHub badge > weak_ptr: Make it possible to convert to "compatible" pointers > circleci: remove circleci CI tests > build: use DPDK_MACHINE=haswell when testing dpdk build on github-hosted runner > build: add --dpdk-machine option to configure.py > build: stop translating -march option to names recognized by DPDK > github: encode matrix.enables in cache key > doc/prometheus.md: add metrics? in URL exporter URI > tests/unit/metrics_tester: use deferred_stop() when appropriate > httpd: mark http_server_control::stop() noexcept > reactor: print scheduling group along with backtrace > reactor: update lowres_clock when max_task_backlog is exceeded > tests: add test for prometheus exporter > tests: move apps/metrics_tester to tests/unit > apps/metrics_tester: keep metrics with "private" labels > apps/metrics_tester: support "labels" in conf.yaml > apps/metrics_tester: stop server properly > apps/metrics_tester: always start exporter > apps/metrics_tester: fix typo in conf-example.yaml Closes scylladb/scylladb#18800	2024-05-22 11:00:15 +03:00
Pavel Emelyanov	26eda88401	test/tablets: Check that after RF change data is replicated properly There's a test that checks system.tablets contents to see that after changing ks replication factor via ALTER KEYSPACE the tablet map is updated properly. This patch extends this test that also validates that mutations themselves are replicated according to the desired replication factor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18644	2024-05-22 11:00:15 +03:00
Anna Stuchlik	92bc8053e2	doc: remove outdated MV error from Troubleshooting This commit removes the MV error message, which only affect older versions of ScyllaDB, from the Troubleshooting section. Fixes https://github.com/scylladb/scylladb/issues/17205 Closes scylladb/scylladb#17229	2024-05-21 19:02:31 +03:00
Avi Kivity	2bf2e24fcd	Merge 'Coroutinize some auth and service levels related functions' from Marcin Maliszkiewicz Coroutinization will help improve readability and allow easier changes planned for this code. This work was separated from https://github.com/scylladb/scylladb/pull/17910 to make it smoother to review and merge. Closes scylladb/scylladb#18788 * github.com:scylladb/scylladb: cql3: coroutinize create/alter/drop service levels auth: coroutinize alter_role and drop_role auth: coroutinize grant_permissions and revoke_permissions auth: coroutinize create_role cql3: statements: co-routinize auth related statements cql3: statements: release unused guard explicitly in auth related statements	2024-05-21 17:45:19 +03:00
Botond Dénes	5e41dd28c7	Merge 'Sanitize sl controller draining' from Pavel Emelyanov The sl-controller is stopped in three steps. The first (and instantly the second) is unsubscribing from lifecycle notification and draining. The third is stop itself. First two steps are "out of order" as compared to the desired start-stop sequence of any service, this patch fixes these steps. After this PR the drain_on_shutdown() (the call that drains the node upon stop) finally becomes clean and tidy and is no longer accompanied by ad-hoc fellow drains/stops/aborts/whatever. refs: #2737 Closes scylladb/scylladb#18731 * github.com:scylladb/scylladb: sl_controller: Remove drain() method sl_controller: Move abort kicking into do_abort() main,sl_controller: Subscribe for early abort main: Unsubscribe sl controller next to subscribing	2024-05-21 17:16:23 +03:00
Anna Stuchlik	a86fb293fe	doc: update Raft information in 6.0 This commit updates the documentation about Raft in version 6.0. - "Introduction": The outdated information about consistent topology updates not being supported is removed and replaced with the correct information. - "Enabling Raft": The relevant information is moved to other sections. The irrelevant information is removed. The section no longer exists. - "Verifying that the Raft upgrade procedure finished successfully" - moved under Schema (in the same document). I additionally removed the include saying that after you verify that schema on Raft is enabled, you MUST enable topology changes on Raft (it is not mandatory; also, it should be part of the upgrade guide, not the Raft document). - Unnecessary or incorrect references to versions are removed. Refs https://github.com/scylladb/scylladb/issues/18580 Closes scylladb/scylladb#18689	2024-05-21 11:45:36 +02:00
Anna Stuchlik	eefa4a7333	doc: replace 5.4-to-5.5 with 5.4-to-6.0 upgrade guide This commit replaces the 5.4-to-5.5 upgrade guide with the 5.4-to-6.0 upgrade guide, including the metrics update information. The guide references the "Enable Consistent Topology Updates" document, as enabling consistent topology updates is a new step when upgrading to version 6.0. Also, a procedure for image upgrades has been added (as verified by @yaronkaikov). Fixes scylladb/scylladb#18254 Fixes scylladb/scylladb#17896 Refs scylladb/scylladb#18580 Closes scylladb/scylladb#18728	2024-05-21 11:31:04 +02:00
Piotr Dulikowski	9820472277	main: introduce schema commitlog scheduling group Currently, we do not explicitly set a scheduling group for the schema commitlog which causes it to run in the default scheduling group (called "main"). However: - It is important and significant enough that it should run in a scheduling group that is separate from the main one, - It should not run in the existing "commitlog" group as user writes may sometimes need to wait for schema commitlog writes (e.g. read barrier done to learn the schema necessary to interpret the user write) and we want to avoid priority inversion issues. Therefore, introduce a new scheduling group dedicated to the schema commitlog. Fixes: scylladb/scylladb#15566 Closes scylladb/scylladb#18715	2024-05-21 11:29:57 +02:00
Kefu Chai	5db315930e	sstables: fix a typo in comment: s/Mimicks/Mimics/ this typo was identified by the codespell workflow Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18781	2024-05-21 12:14:10 +03:00
Nadav Har'El	dcd26d8a16	Merge 'docs: update isolation.md' from Botond Dénes Update `docs/dev/isolation.d`: * Update the list of scheduling groups * Remove IO priority groups (they were folded into scheduling groups) * Add section on RPC isolation Closes scylladb/scylladb#18749 * github.com:scylladb/scylladb: docs: isolation.md: add section on RPC call isolation docs: isolation.md: remove mention of IO priority groups docs: isolation.md: update scheduling group list, add aliases	2024-05-21 11:46:57 +03:00
Kefu Chai	44e85c7d79	build: "undo" the coverage compiling options added to abseil we are not interseted in the code coverage of abseil library, so no need to apply the compiling options enabling the coverage instrumentation when building the abseil library. moreover, since the path of the file passed to `-fprofile-list` is a relative path. when building with coverage enabled, the build fails when building abseil, like: ``` /usr/lib64/ccache/clang++ -I/jenkins/workspace/scylla-master/scylla-ci/scylla/abseil -std=c++20 -I/jenkins/workspace/scylla-master/scylla-ci/scylla/seastar/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/debug/seastar/gen/include -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DBOOST_NO_CXX98_FUNCTION_BASE -DFMT_SHARED -I/usr/include/p11-kit-1 -fprofile-instr-generate -fcoverage-mapping -fprofile-list=./coverage_sources.list -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -MF absl/strings/CMakeFiles/strings.dir/str_cat.cc.o.d -o absl/strings/CMakeFiles/strings.dir/str_cat.cc.o -c /jenkins/workspace/scylla-master/scylla-ci/scylla/abseil/absl/strings/str_cat.cc clang-16: error: no such file or directory: './coverage_sources.list'` ``` in this change, we just remove the compiling options enabling the coverage instrumentation from the cflags when building abseil. Fixes #18686 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18748	2024-05-21 11:43:16 +03:00
Marcin Maliszkiewicz	570b766e8b	cql3: coroutinize create/alter/drop service levels	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	f98cb6e309	auth: coroutinize alter_role and drop_role	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	21556c39d3	auth: coroutinize grant_permissions and revoke_permissions	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	6709947ccf	auth: coroutinize create_role	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	7f5d259b54	cql3: statements: co-routinize auth related statements	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	dee17e5ab6	cql3: statements: release unused guard explicitly in auth related statements Currently guard is released immediately because those functions are based on continuations and guard lifetime is not extended. In the following commit we rewrite those functions to coroutines and lifetime will be automatically extended. This would deadlock the client because we'd try to take second guard inside auth code without releasing this unused one. In the future commits auth guard will be removed and the one from statement will be used but this needs some more code re-arrangements.	2024-05-21 10:37:26 +02:00
Botond Dénes	11fa79a537	docs: isolation.md: add section on RPC call isolation	2024-05-21 03:12:22 -04:00
Kefu Chai	86b988a70b	test/lib: do not use variable which could be moved away C++ standard does not define the order in which the parameters passed to a function are evaluated. so in theory, in ```c++ reusable_sst(sst->get_schema(), std::move(sst)); ``` `std::move(sst)` could be evaluated before `sst->get_schema`. but please note, `std::move(sst)` does not move `sst` away, it merely cast `sst` to a rvalue reference, it is `reusable_sst()` which could move `sst` away by consuming it. so following call is much more dangerous than the above one: ```c++ reusable_sst(sst->get_schema(), modify_sst(std::move(sst))) ``` nevertheless, this usage is still confusing. so instead of passing a copy of `sst` to `reusable_sst`. this change is inspired by clang-tidy, it warns like: ``` Warning: /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: warning: 'sst' used after it was moved [bugprone-use-after-move] 397 \| return reusable_sst(sst->get_schema(), std::move(sst)); \| ^ /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:44: note: move occurred here 397 \| return reusable_sst(sst->get_schema(), std::move(sst)); \| ^ /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 397 \| return reusable_sst(sst->get_schema(), std::move(sst)); \| ``` per the analysis above, this is a false alarm. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18775	2024-05-21 10:02:10 +03:00
Pavel Emelyanov	428e0bd7d4	locator: Remove unused lshift-operator for topology Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18714	2024-05-21 09:46:30 +03:00
Pavel Emelyanov	b24fb8dc87	inet_address: Remove to_sstring() in favor of fmt::to_string The existing inet_address::to_string() calls fmt::format("{}", *this) anyway. However, the to_string() method is declared in .cc file, while form formatter is in the header and is equipeed with constexprs so that converting an address to string is done as much as possible compile-time. Also, though minor, fmt::to_string(foo) is believed to be even faster than fmt::format("{}", foo). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18712	2024-05-21 09:43:08 +03:00
Pavel Emelyanov	fed457eb06	sl_controller: Remove drain() method The draining now only consists of waiting for the data update future to resolve. It can be safely moved to .stop() (i.e. -- later) because its stopping had already been initiated by abort-source, and no other services depend on sl-controller to be stopped and drained. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-21 09:42:16 +03:00
Pavel Emelyanov	535e5f4ae7	sl_controller: Move abort kicking into do_abort() Draining sl controller consists of two parts -- first, kicks the wrap-up process by aborting operations, breaking semaphores, etc. It's no-waiting part. At last there goes co_await of the completion future. This part moves the no-waiting part into recently introduced abort subscription, so that wrap-up starts few bits earlier. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-21 09:42:16 +03:00
Kefu Chai	b6e2d6868b	build: add dependencies from binaries to abseil libraries in `0b0e661a`, we brought abseil submodule back. but we didn't update the build.ninja rules properly -- we should have add the abseil libraries to the dependencies of the binaries so that the abseil libraries are always generated before a certain binary is built. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18753	2024-05-21 08:50:48 +03:00
Avi Kivity	33ec6ccea9	test: boost: chunked_vector_test: include <optional> std::optional is used but not imported. This fails on libstdc++-14. Closes scylladb/scylladb#18739	2024-05-21 07:37:11 +03:00
Pavel Emelyanov	8d4c8711fa	main,sl_controller: Subscribe for early abort There's stop-signal in main that fires an abort source on stop. Lots of other services are subscribed in it, add the sl-controller too. For now it's a no-op, but next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-20 21:26:31 +03:00
Pavel Emelyanov	5105ee3284	main: Unsubscribe sl controller next to subscribing The subscription only handles on_leave_cluster() and only for local node, so even if controller gets subscribed for longer, it won't do any harm. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-20 21:26:31 +03:00
Yaron Kaikov	bc596a3e76	pull_request_template: clearify the template and remove checkbox verification It seems that having the checkbox in the PR template and failing the action is confusing and not very clear. Let's remove it completely and just add to the template an explanation to explain the backport reason Closes scylladb/scylladb#18708	2024-05-20 18:24:28 +03:00
Botond Dénes	f239339a29	Merge 'Improve modularity of some per-table API endpoints' from Pavel Emelyanov There's a set of API endpoints that toggle per-table auto-compaction and tombstone-gc booleans. They all live in two different .cc files under api/ directory and duplicate code of each other. This PR generalizes those handlers, places them next to each other, fixes leak on stop and, as a nice side effect, enlightens database.hh header. Closes scylladb/scylladb#18703 * github.com:scylladb/scylladb: api,database: Move auto-compaction toggle guard api: Move some table manipulation helpers from storage_service api: Move table-related calls from storage_service domain api: Reimplement some endpoints using existing helpers api: Lost unset of tombstone-gc endpoints	2024-05-20 18:01:54 +03:00
Avi Kivity	61505d057e	Merge 'Sort user-defined types in describe statements' from Michał Jadwiszczak User-defined types can depend on each other, creating directed acyclic graph. In order to support restoring schema from `DESC SCHEMA`, UDTs should be ordered topologically, not alphabetically as it was till now. This patch changes the way UDTs are ordered in `DESC SCHEMA`/`DESC KEYSPACE <ks>` statements, so the output can be safely copy-pasted to restore the schema. Fixes #18539 Closes scylladb/scylladb#18302 * github.com:scylladb/scylladb: test/cql-pytest/test_describe: add test for UDTs ordering cql3/statements/describe_statement: UDTs topological sorting cql3/statements/describe_statement: allow to skip alphabetical sorting types: add a method to get all referenced user types db/cql_type_parser: use generic topological sorting db/cql_type_parses: futurize raw_builder::build() test/boost: add test for topological sorting utils: introduce generic topological sorting algorithm	2024-05-20 16:58:17 +03:00
Pavel Emelyanov	159e44d08a	test.py: Make it possible to avoid wildcard test names matching There's a nasty scenario when this searching plays bad joke. When CI picks up a new branch and notices, that a test had changed, it spawns a custom job with test.py --repeat 100 $changed_test_name in it. Next, when the test.py tries opt-in test name matching, it uses the wildcard search and can pick up extra unwanted tests into the run. To solve this, the case-selection syntax is extended. Now if the caller specifies `suite/test::` as test, the test file is selected by exact name match, but the specific test-case is not selected, the `` makes it run all cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18704	2024-05-20 15:50:47 +02:00
Botond Dénes	e1c4e6c151	Merge 'sstables_manager: use maintenance scheduling group to run components reload fiber' from Lakshmi Narayanan Sreethar PR https://github.com/scylladb/scylladb/pull/18186 introduced a fiber that reloads reclaimed bloom filters when memory becomes available. Use maintenance scheduling group to run that fiber instead of running it in the main scheduling group. Fixes #18675 Closes scylladb/scylladb#18721 * github.com:scylladb/scylladb: sstables_manager: use maintenance scheduling group to run components reload fiber sstables_manager: add member to store maintenance scheduling group	2024-05-20 16:38:42 +03:00
Takuya ASADA	33af97ca5a	dist/docker: revert dropping systemd package On `7ce6962141` we dropped openssh-server, it also dropped systemd package and caused an error on Scylla Operator (#17787). This reverts dropping systemd package and fix the issue. Fix #17787 Closes scylladb/scylladb#18643	2024-05-20 16:38:15 +03:00
Andrei Chekun	bce53efd36	Enrich test results produced by test.py This PR resolves issue with double count of the test result for topology tests. It will not appear in the consolidated report anymore. Another fix is to provide a better view which test failed by modifying the test case name in the report enriching it with mode and run id, so making them unique across the run. The scope of this change is: 1. Modify the test name to have run id in name 2. Add handlers to get logs of test.py and pytest in one file that are related to test, rather than to the full suite 3. Remove topology tests from aggregating them on a suite level in Junit results 4. Add a link to the logs related to the failed tests in Junit results, so it will be easier to navigate to all logs related to test 5. Gather logs related to the failed test to one directory for better logs investigation Ref: scylladb/scylladb#17851 Closes scylladb/scylladb#18277	2024-05-20 15:33:57 +02:00
Avi Kivity	52fe351c31	Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec This is needed to avoid severe imbalance between shards which can happen when some table grows and is split. The inter-node balance can be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N then there is not even a possibility of moving tablets around to fix the imbalance. The only way to bring the system to balance is to move tablets within the nodes. The system is not prepared for intra-node migration currently. Request coordination is host-based, while for intra-node migration it should be (also) shard-based. The solution employed here is to keep the coordination between nodes as-is, and for intra-node migration storage_proxy-level coordinator is not aware of the migration (no pending host). The replica-side request handler will be a second-level coordinator which routes requests to shards, similar to how the first-level coordinator routes them to hosts. Tablet sharder is adjusted to handle intra-migration where a tablet can have two replicas on the same host. For reads, sharder uses the read selector to resolve the conflict. For writes, the write selector is used. The old shard_of() API is kept to represent shard for reads, and new method is introduced to query the shards for writing: shard_for_writes(). All writers should be switched to that API, which is not done in this patch yet. The request handler on replica side acts as a second-level coordinator, using sharder to determine routing to shards. A given sharder has a scope of a single topology version, a single effective_replication_map_ptr, which should be kept alive during writes. perf-simple-query test results show no signs of regression: Command: perf-simple-query -c1 -m1G --write --tablets --duration=10 Before: > 83294.81 tps ( 59.5 allocs/op, 14.3 tasks/op, 53725 insns/op, 0 errors) > 87756.72 tps ( 59.5 allocs/op, 14.3 tasks/op, 54049 insns/op, 0 errors) > 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > 86211.38 tps ( 59.7 allocs/op, 14.3 tasks/op, 54219 insns/op, 0 errors) > 86559.89 tps ( 59.6 allocs/op, 14.3 tasks/op, 54188 insns/op, 0 errors) > 86609.39 tps ( 59.6 allocs/op, 14.3 tasks/op, 54117 insns/op, 0 errors) > 87464.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 54039 insns/op, 0 errors) > 86185.43 tps ( 59.6 allocs/op, 14.3 tasks/op, 54169 insns/op, 0 errors) > 86254.71 tps ( 59.6 allocs/op, 14.3 tasks/op, 54139 insns/op, 0 errors) > 83395.35 tps ( 60.2 allocs/op, 14.4 tasks/op, 54693 insns/op, 0 errors) > > median 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > median absolute deviation: 243.04 > maximum: 87756.72 > minimum: 83294.81 > After: > 85523.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 53872 insns/op, 0 errors) > 89362.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54226 insns/op, 0 errors) > 88167.55 tps ( 59.7 allocs/op, 14.3 tasks/op, 54400 insns/op, 0 errors) > 87044.40 tps ( 59.7 allocs/op, 14.3 tasks/op, 54310 insns/op, 0 errors) > 88344.50 tps ( 59.6 allocs/op, 14.3 tasks/op, 54289 insns/op, 0 errors) > 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > 88725.46 tps ( 59.6 allocs/op, 14.3 tasks/op, 54230 insns/op, 0 errors) > 88640.08 tps ( 59.6 allocs/op, 14.3 tasks/op, 54210 insns/op, 0 errors) > 90306.31 tps ( 59.4 allocs/op, 14.3 tasks/op, 54043 insns/op, 0 errors) > 87343.62 tps ( 59.8 allocs/op, 14.3 tasks/op, 54496 insns/op, 0 errors) > > median 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > median absolute deviation: 1007.41 > maximum: 90306.31 > minimum: 85523.06 Command (reads): perf-simple-query -c1 -m1G --tablets --duration=10 Before: > 95860.18 tps ( 63.1 allocs/op, 14.1 tasks/op, 42476 insns/op, 0 errors) > 97537.69 tps ( 63.1 allocs/op, 14.1 tasks/op, 42454 insns/op, 0 errors) > 97549.23 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97511.29 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97227.32 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 94031.94 tps ( 63.1 allocs/op, 14.1 tasks/op, 42441 insns/op, 0 errors) > 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > 96401.70 tps ( 63.1 allocs/op, 14.1 tasks/op, 42473 insns/op, 0 errors) > 96573.77 tps ( 63.1 allocs/op, 14.1 tasks/op, 42440 insns/op, 0 errors) > 96340.54 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > > median 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > median absolute deviation: 571.20 > maximum: 97549.23 > minimum: 94031.94 > After: > 99794.67 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 101244.99 tps ( 63.1 allocs/op, 14.1 tasks/op, 42472 insns/op, 0 errors) > 101128.37 tps ( 63.1 allocs/op, 14.1 tasks/op, 42485 insns/op, 0 errors) > 101065.27 tps ( 63.1 allocs/op, 14.1 tasks/op, 42465 insns/op, 0 errors) > 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > 101413.31 tps ( 63.1 allocs/op, 14.1 tasks/op, 42463 insns/op, 0 errors) > 101464.92 tps ( 63.1 allocs/op, 14.1 tasks/op, 42466 insns/op, 0 errors) > 101086.74 tps ( 63.1 allocs/op, 14.1 tasks/op, 42488 insns/op, 0 errors) > 101559.09 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > 100742.58 tps ( 63.1 allocs/op, 14.1 tasks/op, 42491 insns/op, 0 errors) > > median 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > median absolute deviation: 200.33 > maximum: 101559.09 > minimum: 99794.67 > Fixes #16594 Closes scylladb/scylladb#18026 * github.com:scylladb/scylladb: Implement fast streaming for intra-node migration test: tablets_test: Test sharding during intra-node migration test: tablets_test: Check sharding also on the pending host test: py: tablets: Test writes concurrent with migration test: py: tablets: Test crash during intra-node migration api, storage_service: Introduce API to wait for topology to quiesce dht, replica: Remove deprecated sharder APIs test: Avoid using deprecated sharded API db: do_apply_many() avoid deprecated sharded API replica: mutation_dump: Avoid deprecated sharder API repair: Avoid deprecated sharder API table: Remove optimization which returns empty reader when key is not owned by the shard dht: is_single_shard: Avoid deprecated sharder API dht: split_range_to_single_shard: Work with static_sharder only dht: ring_position_range_sharder: Avoid deprecated sharder APIs dht: token: Avoid use of deprecated sharder API by switching to static_sharder selective_token_sharder: Avoid use of deprecated sharder API docs: Document tablet sharding vs tablet replica placement readers/multishard.cc: use shard_for_reads() instead of shard_of() multishard_mutation_query.cc: use shard_for_reads() instead of shard_of() storage_proxy: Extract common code to apply mutations on many shards according to sharder storage_proxy: Prepare per-partition rate-limiting for intra-node migration storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate() storage_proxy: Prepare mutate_hint() for intra-node tablet migration commitlog_replayer: Avoid deprecated sharder::shard_of() lwt: Avoid deprecated sharder::shard_of() compaction: Avoid deprecated sharder::shard_of() dht: Extract dht::static_sharder replica: Deprecate table::shard_of() locator: Deprecate effective_replication_map::shard_of() dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard tests: tablets: py: Add intra-node migration test tests: tablets: Test that drained nodes are not balanced internally tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load tests: tablets: Verify that disabling balancing results in no intra-node migrations tests: tablets: Check that nodes are internally balanced tests: tablets: Improve debuggability by showing which rows are missing tablets, storage_service: Support intra-node migration in move_tablet() API tablet_allocator: Generate intra-node migration plan tablet_allocator: Extract make_internode_plan() tablet_allocator: Maintain candidate list and shard tablet count for target nodes tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions tablets, streaming: Implement tablet streaming for intra-node migration dht, auto_refreshing_sharder: Allow overriding write selector multishard_writer: Handle intra-node migration storage_proxy: Handle intra-node tablet migration for writes tablets: Get rid of tablet_map::get_shard() tablets: Avoid tablet_map::get_shard in cleanup tablets: test: Use sharder instead of tablet_map::get_shard() tablets: tablet_sharder: Allow working with non-local host sharding: Prepare for intra-node-migration docs: Document sharder use for tablets tablets: Introduce tablet transition kind for intra-node migration tests: tablets: Fix use-after-move of skiplist in rebalance_tablets() sstables, gdb: Track readers in a linked list raft topology: Fix global token metadata barrier to not fence ahead of what is drained	2024-05-20 16:13:01 +03:00
Kefu Chai	a517fcf970	service/storage_proxy: capture `tr_state` by copy in handle_paxos_accept() this change is inspired by following warning from clang-tidy ``` Warning: /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:884:13: warning: 'tr_state' used after it was moved [bugprone-use-after-move] 884 \| if (tr_state) { \| ^ /home/runner/work/scylladb/scylladb/service/storage_proxy.cc:872:139: note: move occurred here 872 \| auto f = get_schema_for_read(proposal.update.schema_version(), src_addr, *timeout).then([&sp = _sp, &sys_ks = _sys_ks, tr_state = std::move(tr_state), \| ^ ``` this is not a false positive. as `tr_state` is a captured by move for constructing a variable in the captured list of a lambda which is in turn passed to the expression evaluated to `f`. even the expression itself is not evaluated yet when we reference `tr_state` to check if it is empty after preparing the expression, `tr_state` is already moved away into the captured variable. so at that moment, the statement of `f = f.finally(...)` is never evaluated, because `tr_state` is always empty by then. so before this change, the trace message is never recorded. in this change, we address this issue by capturing `tr_state` by copying it. as `tr_state` is backed by a `lw_shared_ptr`, the overhead is neglectable. after this change, the tracing message is recorded. the change introduced this issue was `548767f91e`. please note, we could coroutinize this function to improve its readability, but since this is a fix and should be backported, let's start with a minimal fix, and worry about the readability in a follow-up change. Refs `548767f91e` Fixes #18725 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18702	2024-05-20 12:58:49 +03:00
Kefu Chai	40ce52c3cc	test: use generic boost_test_print_type() in this change, we trade the `boost_test_print_type()` overloads for the generic template of `boost_test_print_type()`, except for those in the very small tests, which presumably want to keep themselves relative self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18727	2024-05-20 12:56:20 +03:00
Botond Dénes	0e23cd45ad	Merge 'feature: grandfather some old cluster features' from Avi Kivity This series grandfathers the following features: MD_SSTABLE_FORMAT ME_SSTABLE feature VIEW_VIRTUAL_COLUMNS DIGEST_INSENSITIVE_TO_EXPIRY CDC NONFROZEN_UDTS PER_TABLE_PARTITIONERS PER_TABLE_CACHING DIGEST_FOR_NULL_VALUES CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX Note that for the last (CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX) some code remains to support indexes created before the new feature was adopted. Each patch names the version where the feature was introduced. Closes scylladb/scylladb#18428 * github.com:scylladb/scylladb: feature, index: grandfather CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX feature: grandfather DIGEST_FOR_NULL_VALUES storage_proxy: drop use of MD5 as a digest algorithm feature: grandfather PER_TABLE_CACHING feature: grandfather LWT feature: grandfather HINTED_HANDOFF_SEPARATE_CONNECTION feature: grandfather PER_TABLE_PARTITIONERS test: schema_change_test: regenerate digest for PER_TABLE_PARTITIONERS test: test_schema_change_digest: drop unneeded reference digests feature: grandfather NONFROZEN_UDTS feature: grandfather CDC feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY feature: grandfather VIEW_VIRTUAL_COLUMNS feature: grandfather ME_SSTABLE feature feature: grandfather MD_SSTABLE_FORMAT	2024-05-20 11:48:07 +03:00
Botond Dénes	936a7e282b	docs: isolation.md: remove mention of IO priority groups They were folded into CPU scheduling groups, which now apply to both CPU and IO.	2024-05-20 03:33:24 -04:00
Botond Dénes	8f61468322	docs: isolation.md: update scheduling group list, add aliases	2024-05-20 03:30:04 -04:00
Lakshmi Narayanan Sreethar	6f58768c46	sstables_manager: use maintenance scheduling group to run components reload fiber Fixes #18675 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-19 15:23:45 +05:30
Lakshmi Narayanan Sreethar	79f6746298	sstables_manager: add member to store maintenance scheduling group Store that maintenance scheduling group inside the sstables_manager. The next patch will use this to run the components reloader fiber. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-19 15:23:45 +05:30
Avi Kivity	54a82fed6b	feature, index: grandfather CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX This feature corrected how we store the token in secondary indexes. It was introduced in `7ff72b0ba5` (2020; 4.4) and can now be assumed present everywhere. Note that we still support indexes created with the old format.	2024-05-18 00:24:11 +03:00
Avi Kivity	2fbd78c769	feature: grandfather DIGEST_FOR_NULL_VALUES The DIGEST_FOR_NULL_VALUES feature was added in `21a77612b3` (2020; 4.4) and can now be assumed to be always present. The hasher which it invoked is removed.	2024-05-18 00:24:00 +03:00
Avi Kivity	879583c489	storage_proxy: drop use of MD5 as a digest algorithm The XXHASH feature was introduced in `0bab3e59c2` (2017; 2.2) and made mandatory in `defe6f49df` (2020; 4.4), but some vestiges remain. Remove them now. Note that md5_hasher itself is still in use by other components, so it cannot be removed.	2024-05-18 00:23:47 +03:00
Avi Kivity	7c264e8a71	feature: grandfather PER_TABLE_CACHING The PER_TABLE_CACHING feature was added in `0475dab359` (2020; 4.2) and can now be assumed to be always present.	2024-05-18 00:23:30 +03:00
Avi Kivity	d52c424a5f	feature: grandfather LWT LWT was make non-experimental in `9948f548a5` (2020; 4.1) and can now be assumed to be always present.	2024-05-18 00:20:53 +03:00
Avi Kivity	93088d0921	feature: grandfather HINTED_HANDOFF_SEPARATE_CONNECTION The HINTED_HANDOFF_SEPARATE_CONNECTION feature was introduced in `3a46b1bb2b` (2019; 3.3) and can be assumed always present.	2024-05-18 00:18:27 +03:00
Avi Kivity	3bead8cea0	feature: grandfather PER_TABLE_PARTITIONERS The PER_TABLE_PARTITIONERS feature was added in `90df9a44ce` (2020; 4.0) and can now be assumed to be always present. We also remove the associated schema_feature.	2024-05-18 00:15:07 +03:00
Avi Kivity	6b532fd40b	test: schema_change_test: regenerate digest for PER_TABLE_PARTITIONERS The first digest tested was generated without the PER_TABLE_PARTITIONERS schema feature. We're about to make that feature mandatory, so we won't be able (and won't need) to generate a digest without it. Update the digest to include the feature. Note it wasn't untested before, we have a test with schema_features::full().	2024-05-18 00:14:43 +03:00
Avi Kivity	c4d8b17f4c	test: test_schema_change_digest: drop unneeded reference digests digests[0] was used by the VIEW_VIRTUAL_COLUMNS feature, which no longer exists. digests[1] is the same as digests[2], so drop it.	2024-05-17 20:41:20 +03:00
Avi Kivity	93113da01b	feature: grandfather NONFROZEN_UDTS The NONFROZEN_UDTS feature was added in `e74b5deb5d` (2019; 3.2) and can now be assumed to be always present.	2024-05-17 20:41:20 +03:00
Avi Kivity	c7d7ca2c23	feature: grandfather CDC The CDC feature was made non-experimental in `e9072542c1` (2020; 4.4) and can now be assumed to be always present. We also remove the corresponding schema_feature.	2024-05-17 20:41:20 +03:00
Avi Kivity	82ad2913ca	feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY The DIGEST_INSENSITIVE_TO_EXPIRY feature was added in `9de071d214` (2019; 3.2) and can now be assumed to be always present. We enable the corresponding schema_feature unconditionally. We do not remove the corresponding schema feature, because it can be disabled when the related TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is present.	2024-05-17 20:41:19 +03:00
Avi Kivity	b5f6021a6b	feature: grandfather VIEW_VIRTUAL_COLUMNS The VIEW_VIRTUAL_COLUMNS feature was added in `a108df09f9` (2019; 3.1) and can now be assumed to be always present. The corresponding schema_feature is removed. Note schema_features are not sent over the wire. A digest calculation without VIEW_VIRTUAL_COLUMNS is no longer tested.	2024-05-17 20:41:19 +03:00
Avi Kivity	7952200c8c	feature: grandfather ME_SSTABLE feature "me" format sstables were introduced in `d370558279` (Jan 2022; 5.1) and so can be assumed always present. The listener that checks when the cluster understands ME_SSTABLE was removed and in its place we default to sstable_version_types::me (and call on_enabled() immediately).	2024-05-17 20:41:19 +03:00
Avi Kivity	6d0c0b542c	feature: grandfather MD_SSTABLE_FORMAT "md" sstable support was introduced in `e8d7744040` (2020; 4.4) and so can be assumed to be present on all versions we upgrade from. Nothing appears to depend on it.	2024-05-17 20:41:19 +03:00
Anna Stuchlik	c93a7d2664	doc: replace 5.5 with 6.0 in SStable docs (me) This commit replaces the version number 5.5 with 6.0, because 5.5 has never been released. This is a follow-up to https://github.com/scylladb/scylladb/pull/16716. Refs https://github.com/scylladb/scylladb/issues/16551 Refs https://github.com/scylladb/scylladb/issues/18580 Closes scylladb/scylladb#18730	2024-05-17 16:34:18 +03:00
Botond Dénes	db70e8dd5f	test/cql-pytest: test_tombstone_limit.py: enable xfailing tests These tests were marked as xfail because they use to fail with tablets. They don't anymore, so remove the xfail. Fixes: #16486 Closes scylladb/scylladb#18671	2024-05-16 20:14:47 +03:00
Nadav Har'El	c7aa47354a	Merge 'mutation_fragment_stream_validating_filter: respect validating_level::none' from Botond Dénes Even when configured to not do any validation at all, the validator still did some. This small series fixes this, and adds a test to check that validation levels in general are respected, and the validator doesn't validate more than it is asked to. Fixes: #18662 Closes scylladb/scylladb#18667 * github.com:scylladb/scylladb: test/boost/mutation_fragment_test.cc: add test for validator validation levels mutation: mutation_fragment_stream_validating_filter: fix validation_level::none mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter	2024-05-16 19:57:49 +03:00
Kamil Braun	734c5de314	Merge 'fix test teardown race with ongoing test operation' from Artsiom Mishuta This commit brings several new features in scylla_cluster.py to fix runaway asyncio task problems in topology tests - Start-Stop Lock and Stop Event in ScyllaServer - Tasks History, Wait for tasks from Tasks History and Manager broken state in ScyllaClusterManager - make ManagerClient object function scope - test_finished_event in ManagerClient Fixes: scylladb/scylladb#16472 Fixes: scylladb/scylladb#16651 Closes scylladb/scylladb#18236 * github.com:scylladb/scylladb: test/pylib: Introduce ManagerClient.test_finished_event test/topology: make ManagerClient object function scope test/pylib: Introduce Manager broken state: test/pylib: Wait for tasks from Tasks History: test/pylib: Introduce Tasks History: test/pylib: Introduce Stop Event test/pylib: Introduce Start-Stop Lock:	2024-05-16 17:42:00 +02:00
Kefu Chai	759156b56d	test: perf: alternator: mark format string as `constexpr` before this change, we use `update_item_suffix` as a format string fed to `format(...)`, which is resolved to `seastar::format()`. but with a patch which migrates the `seastar::format()` to the backend with compile-time format check, the caller sites using `format()` would fail to build, because `update_item_suffix` is not a `constexpr`: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -MF test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o.d -o test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -c /home/kefu/dev/scylladb/test/perf/perf_alternator.cc /home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: error: call to consteval function 'fmt::basic_format_string<char, const char (&)[1]>::basic_format_string<const char , 0>' is not a constant expression 249 \| return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, "")); \| ^ /usr/include/fmt/core.h:2776:67: note: read of non-constexpr variable 'update_item_suffix' is not allowed in a constant expression 2776 \| FMT_CONSTEVAL FMT_INLINE basic_format_string(const S& s) : str_(s) { \| ^ /home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: note: in call to 'basic_format_string<const char , 0>(update_item_suffix)' 249 \| return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, "")); \| ^~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/test/perf/perf_alternator.cc:198:6: note: declared here 198 \| auto update_item_suffix = R"( \| ^ ``` so, to prepare the change switching to compile-time format checking, let's mark this variable `static constexpr`. this is also more correct, as this variable is * a compile time constant, and * is not shared across different compilation units. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18685	2024-05-16 15:18:42 +03:00
Avi Kivity	6982de6dde	Merge 'Fix stalls in forward_service::dispatch() with large tablet count' from Raphael "Raph" Carvalho With a large tablet count, e.g. 128k, forward_service::dispatch() can potentially stall when grouping ranges per endpoint. ` Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f ` Also there are inefficient copies that are being removed. partition_range_vector for a single endpoint can grow beyond 1M. Closes scylladb/scylladb#18695 * github.com:scylladb/scylladb: service: fix indentation in dispatch() service: fix reactor stall with large tablet count service: avoid potential expensive copies in forward_service::dispatch() service: coroutinize forward_service::dispatch()	2024-05-16 15:17:43 +03:00
Kefu Chai	617e532859	db: config: drop operator<<() for error_injection_at_startup it is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18701	2024-05-16 15:10:57 +03:00
Pavel Emelyanov	dffd985401	data_dictionary: Resurrect formatter for keyspace_metadata It was commented out by the `a439ebcfce` (treewide: include fmt/ranges.h and/or fmt/std.h) , probably by mistake Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18665	2024-05-16 15:09:45 +03:00
Pavel Emelyanov	31d05925cc	api,database: Move auto-compaction toggle guard Toggling per-table auto-compaction enabling bit is guarded with on-database boolean and raii guard. It's only used by a single api/column_family.cc file, so it can live there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:51 +03:00
Pavel Emelyanov	a43b178f72	api: Move some table manipulation helpers from storage_service Continuation of the previous patch -- helpers toggling tombstone_gc and auto_compaction on tables should live in the same file that uses them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	862fcd7bc7	api: Move table-related calls from storage_service domain The storage_service/(enable\|disable)_(tombstone_gc\|auto_compaction) endpoints are not handled by storage_service _service_ and should rather live in the column_family/ domain which is handler by replica::database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	ba53283d21	api: Reimplement some endpoints using existing helpers The (enable\|disable)_(tombstone_gc\|auto_compaction) endpoints living in column_family domain can benefit from the helpers that do the same in the storage_service domain. The "difference" is that c.f. endpoints do it per-table, while s.s. ones operate on a vector of tables, so the former is a corner case of the latter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	231ffa623c	api: Lost unset of tombstone-gc endpoints On stop all endpoints must be unregistered, these three are lost Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Michał Jadwiszczak	b3e6a39604	test/cql-pytest/test_describe: add test for UDTs ordering	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	f29820fb27	cql3/statements/describe_statement: UDTs topological sorting User-defined types can depend on each other, creating directed acyclic graph. In order to support restoring schema from `DESC SCHEMA`, UDTs should be ordered topologically, not alphabetically as it was till now.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	7be938192b	cql3/statements/describe_statement: allow to skip alphabetical sorting In a next commit, we are going to introduce topological sorting of user-defined types, so alphabetical sorting must be skipped not to interfere.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	8157d260f2	types: add a method to get all referenced user types The method allows to collect all UDTs used to create a type. This is required to sort UDTs in a topological order.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	573e13e3f1	db/cql_type_parser: use generic topological sorting	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	3830f3bd23	db/cql_type_parses: futurize raw_builder::build() In order to use generic topological sort, build() method needs to return future.	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	7f04c88395	test/boost: add test for topological sorting	2024-05-16 13:30:03 +02:00
Michał Jadwiszczak	aa08e586fd	utils: introduce generic topological sorting algorithm Until now, we have implemented topological sorting in db/cql_type_parser.cc but it is specific to its usage. Now we want to use topological sorting in another place, so generic sorting algoritm provides one implementation to be reused in several places.	2024-05-16 13:30:03 +02:00
Nadav Har'El	27ab560abd	cql: fix hang during certain SELECT statements The function intersection(r1,r2) in statement_restrictions.cc is used when several WHERE restrictions were applied to the same column. For example, for "WHERE b<1 AND b<2" the intersection of the two ranges is calculated to be b<1. As noted in issue #18690, Scylla is inconsistent in where it allows or doesn't allow these intersecting restrictions. But where they are allowed they must be implemented correctly. And it turns out the function intersection() had a bug that caused it to sometimes enter an infinite loop - when the intent was only to call itself once with swapped parameters. This patch includes a test reproducing this bug, and a fix for the bug. The test hangs before the fix, and passes after the fix. While at it, I carefully reviewed the entire code used to implement the intersection() function to try to make sure that the bug we found was the only one. I also added a few more comments where I thought they were needed to understand complicated logic of the code. The bug, the fix and the test were originally discovered by Michał Chojnowski. Fixes #18688 Refs #18690 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18694	2024-05-16 11:25:44 +03:00
Piotr Dulikowski	68eca3778c	Merge 'mv: throttle view update generation for large queries' from Wojciech Mitros This series is a reupload of #13792 with a few modifications, namely a test is added and the conflicts with recent tablet related changes are fixed. See https://github.com/scylladb/scylladb/issues/12379 and https://github.com/scylladb/scylladb/pull/13583 for a detailed description of the problem and discussions. This PR aims to extend the existing throttling mechanism to work with requests that internally generate a large amount of view updates, as suggested by @nyh. The existing mechanism works in the following way: * Client sends a request, we generate the view updates corresponding to the request and spawn background tasks which will send these updates to remote nodes * Each background task consumes some units from the `view_update_concurrency_semaphore`, but doesn't wait for these units, it's just for tracking * We keep track of the percent of consumed units on each node, this is called `view update backlog`. * Before sending a response to the client we sleep for a short amount of time. The amount of time to sleep for is based on the fullness of this `view update backlog`. For a well behaved client with limited concurrency this will limit the amount of incoming requests to a manageable level. This mechanism doesn't handle large DELETE queries. Deleting a partition is fast for the base table, but it requires us to generate a view update for every single deleted row. The number of deleted rows per single client request can be in the millions. Delaying response to the request doesn't help when a single request can generate millions of updates. To deal with this we could treat the view update generator just like any other client and force it to wait a bit of time before sending the next batch of updates. The amount of time to wait for is calculated just like in the existing throttling code, it's based on the fullness of `view update backlogs`. The new algorithm of view update generation looks something like this: ```c++ for(;;) { auto updates = generate_updates_batch_with_max_100_rows(); co_await seastar::sleep(calculate_sleep_time_from_backlogs()); spawn_background_tasks_for_updates(updates); } ``` Fixes: https://github.com/scylladb/scylladb/issues/12379 Closes scylladb/scylladb#16819 * github.com:scylladb/scylladb: test: add test for bad_allocs during large mv queries mv: throttle view update generation for large queries exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception db/view: extract view throttling delay calculation to a global function view_update_generator: add get_storage_proxy() storage_proxy: make view backlog getters public	2024-05-16 08:22:54 +02:00
Botond Dénes	af9e173c99	Merge 'repair: Don't get topology via database' from Pavel Emelyanov Database has token-metadata onboard and other services use it to get topology from. Repair code has simpler and cleaner ways to get access to topology. Closes scylladb/scylladb#18677 * github.com:scylladb/scylladb: repair: Get topology via replication map repair: Use repair_service::my_address() in handlers repair: Remove repair_meta::_myip repair: Use repair_meta::myip() everywhere repair: Add repair_service::my_address() method	2024-05-16 08:28:14 +03:00
Raphael S. Carvalho	715ae689c0	Implement fast streaming for intra-node migration With intra-node migration, all the movement is local, so we can make streaming faster by just cloning the sstable set of leaving replica and loading it into the pending one. This cloning is underlying storage specific, but s3 doesn't support snapshot() yet (th sstables::storage procedure which clone is built upon). It's only supported by file system, with help of hard links. A new generation is picked for new cloned sstable, and it will live in the same directory as the original. A challenge I bumped into was to understand why table refused to load the sstable at pending replica, as it considered them foreign. Later I realized that sharder (for reads) at this stage of migration will point only to leaving replica. It didn't fail with mutation based streaming, because the sstable writer considers the shard -- that the sstable was written into -- as its owner, regardless of what sharder says. That was fixed by mimicking this behavior during loading at pending. test: ./test.py --mode=dev intranode --repeat=100 passes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	a179f37780	test: tablets_test: Test sharding during intra-node migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	5f32d2ddb6	test: tablets_test: Check sharding also on the pending host	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	6d809c75fb	test: py: tablets: Test writes concurrent with migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	ad02d85c16	test: py: tablets: Test crash during intra-node migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7956a2991e	api, storage_service: Introduce API to wait for topology to quiesce	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	679baff25a	dht, replica: Remove deprecated sharder APIs	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	32a191384a	test: Avoid using deprecated sharded API There is not tablet migration in unit tests, so shard_of() can be safely replaced with shard_for_reads(). Even if it's used for writes.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	539460dd71	db: do_apply_many() avoid deprecated sharded API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	0f50504c39	replica: mutation_dump: Avoid deprecated sharder API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7bf5733fa5	repair: Avoid deprecated sharder API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7c03646f99	table: Remove optimization which returns empty reader when key is not owned by the shard This check would lead to correctness issues with intra-node migration because the shard may switch during read, from "read old" to "read new". If the coordinator used "read old" for shard routing, but table on the old shard is already using "read new" erm, such a read would observe empty result, which is wrong. Drop the optimization. In the scenario above, read will observe all past writes because: 1) writes are still using "write both" 2) writes are switched to "write new" only after all requests which might be using "read old" are done Replica-side coordinators should already route single-key requests to the correct shard, so it's not important as an optimization. This issue shows how assumptions about static sharding are embedded in the current code base and how intra-node migration, by violating those assumptions, can lead to correctness issues.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	26f2e6aa8e	dht: is_single_shard: Avoid deprecated sharder API All current uses are used in the read path.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c9e6b4dca7	dht: split_range_to_single_shard: Work with static_sharder only In preparation for intra-node tablet migration, to avoid using deprecated sharder APIs. This function is used for generating sstable sharding metadata. For tablets, it is not invoked, so we can safely work with the static sharder. The call site already passes static_sharder only.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c380aecf64	dht: ring_position_range_sharder: Avoid deprecated sharder APIs In preparation for tablet intra-node migration. Existing uses are for reads, so it's safe to use shard_for_reads(): - in multishard reader - in forward_service The ring_position_range_vector_sharder is used when computing sstable shards, which for intra-node migration should use the view for reads. If we haven't completed streaming, sstables should be attached to the old shard (used by reads). When in write-both-read-new stage, streaming is complete, reads are using the new shard, and we should attach sstables to the new shard. When not in intra-node migration, the view for reads on the pending node will return the pending shard even if read selector is "read old". So if pending node restarts during streaming, we will attach to sstables to the shard which is used by writes even though we're using the selector for reads.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	a1aac409bf	dht: token: Avoid use of deprecated sharder API by switching to static_sharder The touched APIs are used only with static_sharder.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	dd4a086b87	selective_token_sharder: Avoid use of deprecated sharder API I analyzed all the uses and all except the alternator/ttl.cc seem to be interested in the result for the purpose of reading. Alternator is not supported with tablets yet, so the use was annotated with a relevant issue.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	eb3a22d5a8	docs: Document tablet sharding vs tablet replica placement	2024-05-16 00:28:47 +02:00
Botond Dénes	635aba435b	readers/multishard.cc: use shard_for_reads() instead of shard_of() The latter is deprecated.	2024-05-16 00:28:47 +02:00
Botond Dénes	bc779ed00c	multishard_mutation_query.cc: use shard_for_reads() instead of shard_of() The latter is deprecated.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	3b7d7088d1	storage_proxy: Extract common code to apply mutations on many shards according to sharder	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	660b3d1765	storage_proxy: Prepare per-partition rate-limiting for intra-node migration Note: there is a potential problem with rate-limit count going out of sync during intra-node migration between old and the new shard. Before this patch, when coordinator accounted and admitted the request, so the rate_limit_info passed to apply_locally() is account_only, it was converted to std::monostate for requests to the local replia. This makes sense because the request was already accounted by the coordinator. However, during intra-node migration when we do double writes to two shards locally, that means that the new shard will not account the write, it will have lower count than the limiter on the old shard. This means that the new shard may accept writes which will end up being rejected. This is not desirable, but not the end of the world since it's temporary, and the new shard will still protect itself from overload based on its own rate limiter.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	7c3291b5ea	storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate() Cunters are not supported with tablets, so we should not reach this path.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	db2809317d	storage_proxy: Prepare mutate_hint() for intra-node tablet migration	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	feafe0f6a7	commitlog_replayer: Avoid deprecated sharder::shard_of() shard_for_writes() is appropriate, because we're writing. It can happen that the tablet was migrated away and no shard is the owner. In that case the mutation is dropped, as it should be, because "shards" is empty.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c9294b1642	lwt: Avoid deprecated sharder::shard_of() Instead, use shard_for_reads(). The justification is that: 1) In cas_shard(), we need to pick a single request coordinator. shard_for_reads() gives that, which is equivalent to shard_of() if there is no intra-node migration. 2) In paxos handler for prepare(), the shard we execute it on is the shard from which we read, so shard_for_reads() is the one. 3) Updates of paxos state are separate CQL requests, and use their own sharding. 4) Handler for learn is executing updates using calls to storage_proxy::mutate_locally() which will use the right sharder for writes However, the code is still not prepared for intra-node migration, and possibly regular migration too in case of abandoned requests, because the locking of paxos state assumes that the shard is static. That would have to be fixed separately, e.g. by locking both shards (shard_for_writes()) during migration, so that the set of locked shards always intersects during migration and local serialization of paxos state updates is achieved. I left FIXMEs for that.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	1631bab658	compaction: Avoid deprecated sharder::shard_of()	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	9da3bd84c7	dht: Extract dht::static_sharder Before the patch, dht::sharder could be instantiated and it would behave like a static sharder. This is not safe with regards to extensions of the API because if a derived implementation forgets to override some method, it would incorrectly default to the implementation from static sharder. Better to fail the compilation in this case, so extract static sharder logic to dht::static_sharder class and make all methods in dht::sharder pure virtual. This also allows us to have algorithms indicate that they only work with static sharder by accepting the type, and have compile-time safety for this requirement. schema::get_sharder() is changed to return the static_sharder&.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	dbca598e99	replica: Deprecate table::shard_of()	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	a1bee16ee9	locator: Deprecate effective_replication_map::shard_of()	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	10a4903d0c	dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard Require users to specify whether we want shard for reads or for writes by switching to appropriate non-deprecated variant. For example, shard_of() can be replaced with shard_for_reads() or shard_for_writes(). The next_shard/token_for_next_shard APIs have only for-reads variant, and the act of switching will be a testimony to the fact that the code is valid for intra-node migration.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	b3cdf9a379	tests: tablets: py: Add intra-node migration test	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	d26cd97633	tests: tablets: Test that drained nodes are not balanced internally It would be a waste of effort to do so, since we migrate tablets away anyway.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	04f0088679	tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	c76ba52c70	tests: tablets: Verify that disabling balancing results in no intra-node migrations	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	0addca88b9	tests: tablets: Check that nodes are internally balanced Existing tests are augmented with a check which verifies that all nodes are internally balanced.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	0e2617336a	tests: tablets: Improve debuggability by showing which rows are missing	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	329342bfb2	tablets, storage_service: Support intra-node migration in move_tablet() API	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	db9d3f0128	tablet_allocator: Generate intra-node migration plan Intra-node migrations are scheduled for each node independently with the aim to equalize per-shard tablet count on each node. This is needed to avoid severe imbalance between shards which can happen when some table grows and is split. The inter-node balance can be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N then there is not even a possibility of moving tablets around to fix the imbalance. The only way to bring the system to balance is to move tablets within the nodes. After scheduling inter-node migrations, the algorithm schedules intra-node migrations. This means that across-node migrations can proceed in parallel with intra-node migrations if there is free capacity to carry them out, but across-node migrations have higher priority. Fixes #16594	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	793af3d6e1	tablet_allocator: Extract make_internode_plan() Currently the load balancer is only generting an inter-node plan, and the algorithm is embedded in make_plan(). The method will become even harder to follow once we add more kinds of plan generating steps, e.g. inter-node plan. Extract the inter-node plan to make it easier to add other plans and see the grand flow.	2024-05-16 00:28:47 +02:00
Tomasz Grabiec	f95a0f0182	tablet_allocator: Maintain candidate list and shard tablet count for target nodes The node_load datastructure was not updated to reflect migration decisions on the target node. This is not needed for inter-node migration because target nodes are not considered as sources. But we want it to reflect migration decisions so that later inter-node migration sees an accurate picture with earlier migrations reflected in node_load.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	c86f659421	tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions Will be needed by member methods which generate migration plans.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	fdcaaea91a	tablets, streaming: Implement tablet streaming for intra-node migration	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	aafeacc8d9	dht, auto_refreshing_sharder: Allow overriding write selector During streaming for intra-node migration we want to write only to the new shard. To achieve that, allow altering write selector in sharder::shard_for_writes() and per-instance of auto_refreshing_sharder.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	dfed4efcc5	multishard_writer: Handle intra-node migration This writer is used by streaming, on tablet migration and load-and-stream. The caller of distribute_reader_and_consume_on_shards(), which provides a sharder, is supposed to ensure that effective_replication_map is kept alive around it, in order for topology coordinator to wait for any writes which may be in flight to reach their shards before tablet replica starts another migration. This is already the case: 1) repair and load-and-stream keep the erm around writing. 2) tablet migration uses autorefreshing_sharder, so it does not, but it keeps the topology_guard around the operation in the consumer, which serves the same purpose.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	4df818db98	storage_proxy: Handle intra-node tablet migration for writes When sharder says that the write should go to multiple shards, we need to consider the write as applied only if it was applied to all those shards. This can happen during intra-node tablet migration. During such migration, the request coordinator on storage_proxy side is coordinating to hosts as if no migration was in progress. The replica-side coordinator coordinates to shards based on sharder response. One way to think about it is that effective_replication_map::get_natural_endpoints()/get_pending_endpoints() tells how to coordinate between nodes, and sharder tells how to coordinate between shards. Both work with some snapshot of tablet metadata, which should be kept alive around the operation. Sharder is associated with its own effective_replication_map, which marks the topology version as used and allows barriers to synchronize with replica-side operations.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	6c6ce2d928	tablets: Get rid of tablet_map::get_shard() Its semantics do not fit well with intra-node migration which allow two owning shards. Replace uses with the new has_replica() API.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	d000ad0325	tablets: Avoid tablet_map::get_shard in cleanup In preparation for intra-node migration for which get_shard() is not prepared.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	daaceda963	tablets: test: Use sharder instead of tablet_map::get_shard() tablet_map::get_shard() will go away as it is not prepared for intra-node migration.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	d47dfceb34	tablets: tablet_sharder: Allow working with non-local host Will be used in tests.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	6946ad2a45	sharding: Prepare for intra-node-migration Tablet sharder is adjusted to handle intra-migration where a tablet can have two replicas on the same host. For reads, sharder uses the read selector to resolve the conflict. For writes, the write selector is used. The old shard_of() API is kept to represent shard for reads, and new method is introduced to query the shards for writing: shard_for_writes(). All writers should be switched to that API, which is not done in this patch yet. The request handler on replica side acts as a second-level coordinator, using sharder to determine routing to shards. A given sharder has a scope of a single topology version, a single effective_replication_map_ptr, which should be kept alive during writes.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	b5bb46357b	docs: Document sharder use for tablets	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	82b34d34d8	tablets: Introduce tablet transition kind for intra-node migration We need a separate transition kind for intra node migration so that we don't have to recover this information from replica set in an expensive way. This information is needed in the hot path - in effective_replicaiton_map, to not return the pending tablet replica to the coordinator. From its perspective, replica set is not transitional. The transition will also be used to alter the behavior of the sharder. When not in intra-node migration, the sharder should advertise the shard which is either in the previous or next replica set. During intra-node migration, that's not possible as there may be two such shards. So it will return the shard according to the current read selector.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	942ea39bf0	tests: tablets: Fix use-after-move of skiplist in rebalance_tablets() balance_tablets() is invoked in a loop, so only the first call will see non-empty skiplist. This bug starts to manifest after adding intra-node migration plan, causing failures of the test_load_balancing_with_skiplist test case. The reason is that rebalancing will now require multiple passes before convergence is reached, due to intra-node migrations, and later calls will not see the skiplist and try to balance skipped nodes, vioating test's assertions.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	4d84451cf1	sstables, gdb: Track readers in a linked list For the purpose of scylla-gdb.py command "scylla active-sstables". Before the patch, readers were located by scanning the heap for live objects with vtable pointers corresponding to readers. It was observed that the test scylla_gdb/test_misc.py::test_active_sstables started failing like this: gdb.error: Error occurred in Python: Cannot access memory at address 0x300000000000000 This could be explained by there being a live object on the heap which used to be a reader but now is a different object, and the _sst field contains some other data which is not a pointer. To fix, track readers explicitly in a linked list so that the gdb script can reliably walk readers. Fixes #18618.	2024-05-16 00:28:46 +02:00
Tomasz Grabiec	fad6c41cee	raft topology: Fix global token metadata barrier to not fence ahead of what is drained Topology version may be updated, for example, by executing a RESTful API call to move a tablet. If that is done concurrently with an ongoing token metadata barrier executed by topology coordinator (because there is active tablet migration, for example), then some requests may fail due to being fenced out unnecessarily. The problem is that barrier function assumes no concurrent topology updates so it sets the fence version to the one which is current after other nodes are drained. This patch changes it to set the fence to the version which was current before other nodes were drained. Semantics of the barrier are preserved because it only guarantees that topology state from before the invocation of barrier is propagated. Fixes #18699	2024-05-16 00:28:46 +02:00
Benny Halevy	3c4c81c2d9	utils: chunked_vector: optimize for trivially_copyable types Use std::uninitialized_{copy,move} and std::destroy that have optimizations for trivially copyable and trivially moveable types. In those cases, memory can be copied onto the uninitialized memory, rather than invoking the respective copy/move constructors, one item at a time. perf-simple-query results: ``` base: median 95954.90 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42312 insns/op, 0 errors) post: median 97530.65 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42331 insns/op, 0 errors) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18609	2024-05-15 22:32:45 +03:00
Raphael S. Carvalho	012ba25b5b	service: fix indentation in dispatch() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	0a9e073154	service: fix reactor stall with large tablet count with a large tablet count, e.g. 128k, forward_service::dispatch() can potentially stall when grouping ranges per endpoint. Reactor stalled for 4 ms on shard 1. Backtrace: 0x5eb15ea 0x5eb09f5 0x5eb1daf 0x3dbaf 0x2d01e57 0x33f7d1e 0x348255f 0x2d005d4 0x2d3d017 0x2d3d58c 0x2d3d225 0x5e59622 0x5ec328f 0x5ec4577 0x5ee84e0 0x5e8394a 0x8c946 0x11296f Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	f7659b357c	service: avoid potential expensive copies in forward_service::dispatch() each partition_range_vector might grow to ~9600 elements, assuming 96-shard nodes, each with 100 tablets. ~9600 elements, where each is 120 bytes (sizeof(partition_range)) can result in vector with capacity of ~2M due to growth factor of 2. we're copying each range 3x in dispatch(), and we can easily avoid it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Raphael S. Carvalho	f9d2b9a83b	service: coroutinize forward_service::dispatch() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-15 16:30:06 -03:00
Pavel Emelyanov	16db2f650e	functions: Do not crash when schema is missing Getting token() function first tries to find a schema for underlying table and continues with nullptr if there's no one. Later, when creating token_fct, the schema is passed as is and referenced. If it's null crash happens. It used to throw before `5983e9e7b2` (cql3: test_assignment: pass optional schema everywhere) on missing schema, but this commit changed the way schema is looked up, so nullptr is now possible. fixes: #18637 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18639	2024-05-15 17:20:40 +03:00
Pavel Emelyanov	d267fbd894	repair: Get topology via replication map When row_level_repair is constructed it sorts provided list of enpoints. For that it needs to get topology from somewhere and it goes the database->token_metadata->topology chain. Patch this palce to get topology from erm instead. It's consistent with how other code from row_level_repair gets it and removes one more place that uses database to token metadata "provider". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	2706f27cd9	repair: Use repair_service::my_address() in handlers Some handlers want to print local node address in logs. Now the repair_service has a method to get one, so those places can stop getting it via database->token_metadata dependency chain. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	7fb405ba65	repair: Remove repair_meta::_myip In favor of recently introduced my_address() one. One nice side effect of this change is minus one place that gets token metadata from database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	017f650955	repair: Use repair_meta::myip() everywhere The method returns _myip and some places in this class use _myip directly. Next patch is going to remove _myip, so prepare for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Pavel Emelyanov	6899bf83ec	repair: Add repair_service::my_address() method To be used in next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-15 17:07:45 +03:00
Avi Kivity	a5fea84d82	Merge 'scylla-nodetool: add tablet support for ring command' from Botond Dénes Currently, invoking `nodetool ring` on a tablet keyspace fails with an error, because it doesn't pass the required table parameter to `/storage_service/ownership/{keyspace}`. Further to this, the command will currently always output the vnode ring, regardless of the keyspace and table parameter. This series fixes this, adding tablet support to `/storage_service/tokens_endpoint`, which will now return the tablet ring (tablet token -> tablet primary replica mapping) if the new keyspace and table parameters are provided. `nodetool status` also gets a touch-up, to provide the tablet ring's token count (the tablet count) when invoked with a tablet keyspace and table. Fixes: #17889 Fixes: #18474 - [x] native-nodetool is new functionality, no backport is needed Closes scylladb/scylladb#18608 * github.com:scylladb/scylladb: test/nodetool: make test pass with cassandra nodetool tools/scylla-nodetool: status: fix token count for tablets tools/scylla-nodetool: add tablet support to ring command api/storage_service: add tablet support for /storage_service/tokens_endpoint service/storage_service: introduce get_tablet_to_endpoint_map() locator/tablets: introduce the primary replica concept	2024-05-15 16:05:10 +03:00
Artsiom Mishuta	d659d9338b	test/pylib: Introduce ManagerClient.test_finished_event introduce ManagerClient.test_finished_event to block access to REST client object from the test if ManagerClient.after_test method was called (test teardown started)	2024-05-15 11:33:45 +02:00
Botond Dénes	7b41bb601c	Merge 'Simplify access to topology::my_address()' from Pavel Emelyanov Recent commit `12f160045b` (Get rid of fb_utilities) replaced the usage of global fb_utilities and made all services use topology::my_address() in order to get local node broadcast address. Some places resulted in long dependency chains dereferences. to get to topology This PR fixes some of them. Closes scylladb/scylladb#18672 * github.com:scylladb/scylladb: service_level_controller_test: Use topology::is_me() helper service_level_controller: Add dependency on shared_token_metadata tracing: Get my_address() via proxy storage_proxy: Get token metadata via local member, not database	2024-05-15 11:23:16 +03:00
Wojciech Mitros	5154429713	mv gossip: check errno instead of value returned by strtoull Currently, when a view update backlog is changed and sent using gossip, we check whether the strtoll/strtoull function used for reading the backlog returned LLONG_MAX/ULLONG_MAX, signaling an error of a value exceeding the type's limit, and if so, we do not store it as the new value for the node. However, the ULLONG_MAX value can also be used as the max backlog size when sending empty backlogs that were never updated. In theory, we could avoid sending the default backlog because each node has its real backlog (based on the node's memory, different than the ULLONG_MAX used in the default backlog). In practice, if the node's backlog changed to 0, the backlog sent by it will be likely the default backlog, because when selecting the biggest backlog across node's shards, we use the operator<=>(), which treats the default backlog as equal to an empty backlog and we may get the default backlog during comparison if the backlog of some shard was never changed (also it's the initial max value we compare shard's backlogs against). This patch removes the (U)LLONG_MAX check and replaces it with the errno check, which is also set to ERANGE during the strtoll error, and which won't prevent empty backlogs from being read Fixes: #18462 Closes scylladb/scylladb#18560	2024-05-15 07:14:36 +02:00
Pavel Emelyanov	59aec1f300	database: Don't break namespace withexternal alias The namespace replica is broken in the middle with sstable_list alias, while the latter can be declared earlier Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18664	2024-05-14 16:45:20 +03:00
Piotr Dulikowski	9ab57b12bb	Merge 'cql/describe: hide cdc log tables' from Michał Jadwiszczak Currently all tables are printed in statements like `DESC TABLES`, `DESC KEYSPACE ks` or `DESC SCHEMA`. But when we create a table with cdc enabled, additional table with `_scylla_cdc_log` suffix is created. Those tables shouldn't be recreated manually but created automatically when the base table is created. This patch hides tables with `_scylla_cdc_log` suffix in all describe statements. To preserve properties values of those tables, `ALTER TABLE` statement with all properties and their current values for log cdc table is added to description of the base table. Fixes #18459 Closes scylladb/scylladb#18467 * github.com:scylladb/scylladb: test/cql-pytest/test_describe: add test for hiding cdc tables cql3/statements/describe_statement: hide cdc tables schema: add a method to generate ALTER statement with all properties schema: extract schema's properties generation	2024-05-14 15:02:29 +02:00
Pavel Emelyanov	a30337e719	service_level_controller_test: Use topology::is_me() helper The on_leave_cluster() callback needs to check if the leaving node is the local one. It currently compares endpoint with the my_address() obtained via pretty long dependency chain of auth_service->query_processor->storage_proxy->database->token_metadata This patch makes the whole thing _much_ shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:47:12 +03:00
Pavel Emelyanov	634c066c43	service_level_controller: Add dependency on shared_token_metadata The controller needs to access topology, so it needs the token metadata at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:43:01 +03:00
Pavel Emelyanov	f9c34f7bd5	tracing: Get my_address() via proxy The my_address() helper method gets the address via a long qp->proxy->database->token_metadata->topology chain. That's quite an overkill, storage_proxy has public my_address() method. The latter also accesses topology, but without the help of the database. Also this change makes tracing code a bit shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:41:04 +03:00
Pavel Emelyanov	75d5eb96f2	storage_proxy: Get token metadata via local member, not database The my_address() method eventually needs to access topology and goes long way via sharded<database>. No need in that, shared token metadata is available on proxy itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-14 15:40:10 +03:00
Artsiom Mishuta	fb6b572b9e	test/topology: make ManagerClient object function scope move ManagerClient object creation/clear to functions scope instead of session scope to prevent test cases affect each other by stopping sharing connections to cluster between tests	2024-05-14 14:31:10 +02:00
Artsiom Mishuta	efb079ec15	test/pylib: Introduce Manager broken state: Waiting for all tasks does not guarantee that test will not spawn new tasks while we wait Manager broken state prevents all future put requests in case of 1) fail during task waiting 2) Test continue to create tasks in test_after stage	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	a8bab03c15	test/pylib: Wait for tasks from Tasks History: To ensure the atomicity of tests and recycle clusters without any issues, it is crucial that all active requests in ScyllaClusterManager are completed before proceeding further.	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	2ee063c90c	test/pylib: Introduce Tasks History: Topology tests might spawn asynchronous tasks in parallel in ScyllaClusterManager. Tasks history is introduced to be able log and analyze all actions against cluster in case of failures	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	38125a0049	test/pylib: Introduce Stop Event indrodce stop event that interrupt start node on state "wait for node started" if someone wants to stop it	2024-05-14 14:24:03 +02:00
Artsiom Mishuta	4c2527efce	test/pylib: Introduce Start-Stop Lock: The methods stop, stop_gracefully, and start in ScyllaServer are not designed for parallel execution. To circumvent issues arising from concurrent calls, a start_stop_lock has been introduced. This lock ensures that these methods are executed sequentially.	2024-05-14 14:24:03 +02:00
Botond Dénes	a15a9c3e8d	Merge 'utils: chunked_vector: fill ctor: make exception safe' from Benny Halevy Currently, if the fill ctor throws an exception, the destructor won't be called, as it object is not fully constructed yet. Call the default ctor first (which doesn't throw) to make sure the destructor will be called on exception. Fixes scylladb/scylladb#18635 - [x] Although the fixes is for a rare bug, it has very low risk and so it's worth backporting to all live versions Closes scylladb/scylladb#18636 * github.com:scylladb/scylladb: chunked_vector_test: add more exception safety tests chunked_vector_test: exception_safe_class: count also moved objects utils: chunked_vector: fill ctor: make exception safe	2024-05-14 13:35:02 +03:00
Botond Dénes	78afb3644c	test/boost/mutation_fragment_test.cc: add test for validator validation levels To make sure that the validator doesn't validate what the validation level doesn't include.	2024-05-14 06:03:20 -04:00
Botond Dénes	e7b07692b6	mutation: mutation_fragment_stream_validating_filter: fix validation_level::none Despite its name, this validation level still did some validation. Fix this, by short-circuiting the catch-all operator(), preventing any validation when the user asked for none.	2024-05-14 06:02:10 -04:00
Botond Dénes	f6511ca1b0	mutation: mutation_fragment_stream_validating_filter: add raises_error ctor parameter When set to false, no exceptions will be raised from the validator on validation error. Instead, it will just return false from the respective validator methods. This makes testing simpler, asserting exceptions is clunky. When true (default), the previous behaviour will remain: any validation error will invoke on_internal_error(), resulting in either std::abort() or an exception.	2024-05-14 05:59:40 -04:00
Piotr Dulikowski	448f651049	Merge 'hinted handoff: Prevent segmentation fault when initializing endpoint managers ' from Dawid Mędrek We don't attempt to create an endpoint manager for a hint directory if there is no mapping host ID–IP corresponding to the directory's name, an IP address. That prevents a segmentation fault. Fixes scylladb/scylladb#18649 Closes scylladb/scylladb#18650 * github.com:scylladb/scylladb: db/hints: Remove an unused header db/hints: Remove migrating flag before initializing endpoint managers db/hints: Prevent segmentation fault when initializing endpoint managers	2024-05-14 07:34:16 +02:00
Amnon Heiman	0c84692c97	replica/table.cc: Add metrics per-table-per-node This patch adds metrics that will be reported per-table per-node. The added metrics (that are part of the per-table per-shard metrics) are: scylla_column_family_cache_hit_rate scylla_column_family_read_latency scylla_column_family_write_latency scylla_column_family_live_disk_space Fixes #18642 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#18645	2024-05-14 07:54:34 +03:00
Raphael S. Carvalho	0b2ec3063c	sstables: Fix incremental_reader_selector (for range reads) with tablets incremental_reader_selector is the mechanism for incremental comsumption of disjoint sstables on range reads. tablet_sstable_set was implemented, such that selector is efficient with tablets. The problem is selector is vnode addicted and will only consider a given set exhausted when maximum token is reached. With tablets, that means a range read on first tablet of a given shard will also consume other tablets living in the same shard. That results in combined reader having to work with empty sstable readers of tablets that don't intersect with the range of the read. It won't cause extra I/O because the underlying sstables don't intersect with the range of the read. It's only unnecessary CPU work, as it involves creating readers (= allocation), feeding them into combined reader, which will in turn invoke the sstable readers only to realize they don't have any data for that range. With 100k tablets (ranges), and 100 tablets per shard, and ~5 sstables per tablet, there will be this amount of readers (empty or not): (100k * ((100^2 + 100) / 2) * avg_sstable_per_tablet=5) = ~2.5 billions. ~5000 times more readers, it can be quite significant additional cpu work, even though I/O dominates the most in scans. It's an inefficiency that we rather get rid of. The behavior can be observed from logs (there's 1 sstable for each of 4 tablets, but note how readers are created for every single one of them when reading only 1 tablet range): ``` table - make_reader_v2 - range=(-inf, {-4611686018427387905, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {minimum token, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._34qn42... that has range [{-9151620220812943033, start},{-4813568684827439727, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {-4611686018427387904, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._368nk2... that has range [{-4599560452460784857, start},{-78043747517466964, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {0, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._38lj42... that has range [{851021166589397842, start},{3516631334339266977, end}] incremental_reader_selector - create_new_readers(null): selecting on pos {4611686018427387904, w=-1} sstable - make_reader - reader on (-inf, {-4611686018427387905, end}] for sst 3gfx_..._3dba82... that has range [{5065088566032249228, start},{9215673076482556375, end}] ``` Fix is about making sure the tablet set won't select past the supplied range of the read. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18556	2024-05-14 07:43:22 +03:00
Wojciech Mitros	485eb7a64c	test: add test for bad_allocs during large mv queries This patch adds a test for reproducing issue #12379, which is being fixed in #16819. The test case works by creating a table with a materialized view, and then performing a partition delete query on it. At the same time, it uses injections to limit the memory to a level lower than usual, in order to increase the consistency of the test, and to limit its runtime. Before #16819, the test would exceed the limit and fail, and now the next allocation is throttled using a sleep.	2024-05-13 18:16:39 +02:00
Jan Ciolek	e0442d7bfa	mv: throttle view update generation for large queries For every mutation applied to the base table we have to generate the corresponding materialized view table updates. In case of simple requests, like INSERT or UPDATE, the number of view updates generated per base table mutation is limited to at most a few view table updates per base table update. The situation is different for DELETE queries, which can delete the whole partitions or clustering ranges. Range deletions are fast on the base table, but for the view table the situation is different. Deleting a single partition in the base table will generate as many singular view updates as there are rows in the deleted partition, which could potentially be in the millions. To prevent OOM view updates are generated in batches of at most 100 rows. There is a loop which generates the next batch of updates, spawns tasks to send them to remote nodes, generates another batch and so on. The problem is that there is no concurrency control - each batch is scheduled to be sent in the background, but the following batch is generated without waiting for the previously generated updates to be sent. This can lead to unbounded concurrency and OOM. To protect against this view update generation should be limited somehow. There is an existing mechanism for limiting view updates - throttling. We keep track of how many pending view updates there are, in the view backlog, and delay responses to the client based on this backlog's fullness. For a well behaved client with limited concurrency this will slow down the amount of incoming requests until it reaches an optimal point. This works for simple queries (INSERT, UPDATE, ...), but it doesn't do anything for range DELETEs. A DELETE is a single request that generates millions of view updates, delaying client response doesn't help. The throttling mechanism could be extend to cover this case - we could treat the DELETE request like any other client and force it to wait before sending more updates. This commit implements this approach - before sending the next batch of updates the generator is forced to sleep for a bit of time, calculated using the exisiting throttling equation. The more full the backlog gets the more the generator will have to sleep for, and hopefully this will prevent overloading the system with view updates. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-13 18:16:23 +02:00
Jan Ciolek	cd62697605	exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception The `request_timeout_exception` is thrown when a client request can't be completed in time. Previously this class included some fields specific to read/write timeouts: ``` db::consistency_level consistency; int32_t received; int32_t block_for; ``` The problem is that a request can timeout for reasons other than read/write timeout, for example the request might timeout due to materialized view update generation taking too long. In such cases of non read/write timeouts we would like to be able use request_timeout_exception, but it contains fields that aren't releveant in these cases. To deal with this let's create read_write_timeout_exception, which inherits from request_timeout_exception. read_write_timout_exception will contain all of these fields that are specific to read/write timeouts. request_timeout_exception will become the base class that doesn't have any fields, the other case-specific exceptions will derive from it and add the desired fields. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-13 18:16:09 +02:00
Jan Ciolek	ae28b8bdb7	db/view: extract view throttling delay calculation to a global function In order to prevent overload caused by too many view updates, their number is limited by delaying client responses. The amount of time to delay for is calculated based on the fullness of the view update backlog. Currently this is done in the function calculate_delay, used by abstract_write_response_handler. In the following commits I will introduce another throttling mechanism that uses the same equation to calculate wait time, so it would be good to reuse the exsiting function. Let's make the function globally accessible. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-13 18:14:56 +02:00
Pavel Emelyanov	bb1696910c	Merge 'scylla-nodetool: make documentation links product and version dependant' from Botond Dénes Currently, all documentation links that feature anywhere in the help output of scylla-nodetool, are hard-coded to point to the documentation of the latest stable release. As our documentation is version and product (open-source or enterprise) specific, this is not correct. This PR addresses this, by generating documentation links such that they point to the documentation appropriate for the product and version of the scylladb release. Fixes: https://github.com/scylladb/scylladb/issues/18276 - [x] the native nodetool is a new feature, no backport needed Closes scylladb/scylladb#18476 * github.com:scylladb/scylladb: tools/scylla-nodetool: make doc link version-specific release: introduce doc_link() build: pass scylla product to release.cc	2024-05-13 18:03:45 +03:00
Botond Dénes	d82a31f15f	service/storage_proxy: add useful version of base write throttle metrics There are two metrics to help observe base-write throttling: * current_throttled_base_writes * last_mv_flow_control_delay Both show a snapshot of what is happening right at the time of querying these metrincs. This doesn't work well when one wants to investigate the role throttling is playing in occasional write timeouts.s Prometheus scrapes metrics in multi-second intervals, and the probability of that instant catching the throttling at play is very small (almost zero). Add two new metrics: * throttled_base_writes_total * mv_flow_control_delay_total These accumulate all values, allowing graphana to derive the values and extract information about throttle events that happened in the past (but not necessarily at the instant of the scrape). Note that dividing the two values, will yield the average delay for a throttle, which is also useful. Closes scylladb/scylladb#18435	2024-05-13 18:02:06 +03:00
Dawid Medrek	ef8f14d44b	db/hints: Remove an unused header	2024-05-13 16:40:47 +02:00
Dawid Medrek	c9bbb92b1a	db/hints: Remove migrating flag before initializing endpoint managers Before these changes, if initializing endpoint managers after the migration of hinted handoff to host ID is done throws an exception, we don't remove the flag indicating the migration is still in progress. However, the migration has, in practice, finished -- all of the hint directories have been mapped to host IDs and all of the nodes in the cluster are host-ID-based. Because of that, it makes sense to remove the flag early on.	2024-05-13 16:40:47 +02:00
Dawid Medrek	bdcde0c210	db/hints: Prevent segmentation fault when initializing endpoint managers If hinted handoff is still IP-based and there is a hint directory representing an IP without a corresponding mapping to a host ID in `locator::token_metadata`, an attemp to initialize its endpoint manager will result in a segmentation fault. This commit prevents that.	2024-05-13 16:40:47 +02:00
Benny Halevy	4bbb66f805	chunked_vector_test: add more exception safety tests For insertion, with and without reservation, and for fill and copy constructors. Reproduces https://github.com/scylladb/scylladb/issues/18635 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-13 17:18:38 +03:00
Benny Halevy	88b3173d03	chunked_vector_test: exception_safe_class: count also moved objects We have to account for moved objects as well as copied objects so they will be balanced with the respective `del_live_object` calls called by the destructor. However, since chunked_vector requires the value_type to be nothrow_move_constructible, just count the additional live object, but do not modify _countdown or, respectively, throw an exception, as this should be considered only for the default and copy constructors. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-13 17:18:38 +03:00
Benny Halevy	64c51cf32c	utils: chunked_vector: fill ctor: make exception safe Currently, if the fill ctor throws an exception, the destructor won't be called, as it object is not fully constructed yet. Call the default ctor first (which doesn't throw) to make sure the destructor will be called on exception. Fixes scylladb/scylladb#18635 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-13 17:18:38 +03:00
Michał Jadwiszczak	3e5c34831c	test/cql-pytest/test_describe: add test for hiding cdc tables	2024-05-13 16:14:11 +02:00
Michał Jadwiszczak	f12edbdd95	cql3/statements/describe_statement: hide cdc tables Tables with `_scylla_cdc_log` suffix are internal tables used by cdc. We want to hide those tables in all describe statements, as they shouldn't be created by user but created by Scylla when user creates a table with cdc enabled. Instead, we include `ALTER TABLE <cdc log table> WITH <all table properties>` to the description of cdc base table, so all changes to cdc log table's properties are preserved in backup.	2024-05-13 16:11:13 +02:00
Michał Jadwiszczak	05a51c9286	schema: add a method to generate ALTER statement with all properties In the describe statement, we need to generate `ALTER TABLE` statement with all schema's properties for some tables (cdc log tables). The method prints valid CQL statement with current values of the properties.	2024-05-13 16:11:06 +02:00
Michał Jadwiszczak	b62f7a1dd3	schema: extract schema's properties generation In a later commit, we want to add a method to create `ALTER TABLE ... WITH` statement including all schema's properties with current values.	2024-05-13 14:52:32 +02:00
Asias He	952dfc6157	repair: Introduce repair_partition_count_estimation_ratio config option In commit `642f9a1966` (repair: Improve estimated_partitions to reduce memory usage), a 10% hard coded estimation ratio is used. This patch introduces a new config option to specify the estimation ratio of partitions written by repair out of the total partitions. It is set to 0.1 by default. Fixes #18615 Closes scylladb/scylladb#18634	2024-05-13 15:16:55 +03:00
Botond Dénes	afa870a387	Merge 'Some sstable set related improvements' from Raphael "Raph" Carvalho Closes scylladb/scylladb#18616 * github.com:scylladb/scylladb: replica: Make it explicit table's sstable set is immutable replica: avoid reallocations in tablet_sstable_set replica: Avoid compound set if only one sstable set is filled	2024-05-13 14:17:24 +03:00
Botond Dénes	a77796f484	test/nodetool: make test pass with cassandra nodetool After the recent fixes 4 tests started failing with the java nodetool implementation. We are about to ditch the java implementation, but until we actually do, it is valuable to keep the tests passing with both the native and java implementation. So in this patch, these tests are fixed to pass with the java implementation too. There is one test, test_help.py, which fails only if run together with all the tests. I couldn't confirm this 100%, but it seems like this is due to JMX sending a rouge request on some timer, which happens to hit this test. I don't think this is worth trying to fix.	2024-05-13 07:09:20 -04:00
Botond Dénes	bec4c17db4	tools/scylla-nodetool: status: fix token count for tablets Currently, the token count column is always based on the vnodes, which makes no sense for tablet keyspaces. If a tablet keyspace is provided as the keyspace argument, don't print the vnode token count. If the user provided a table argument as well, print the tablet count, otherwise print "?".	2024-05-13 07:09:20 -04:00
Botond Dénes	e82455beab	tools/scylla-nodetool: add tablet support to ring command Add a table parameter. Pass both keyspace and table (when provided) to the /storage_service/tokens_endpoint API endpoint, so that the returned (and printed) token ring is that of the table's tablets, not the vnode ring. Also pass the table param to the ownership API, which will complain if this param is missing for a tablet keyspace.	2024-05-13 07:09:20 -04:00
Botond Dénes	fd25bb6f9f	api/storage_service: add tablet support for /storage_service/tokens_endpoint Add a keyspace and cf parameter. When specified, the endpoint will return token -> primary replica mapping for the table's tablet tokens, not the vnodes.	2024-05-13 07:09:20 -04:00
Botond Dénes	8690dbf8ad	service/storage_service: introduce get_tablet_to_endpoint_map() The tablet variant of the existing get_token_to_endpoint_map(), which returns a list of tablet tokens and the primary replica for each.	2024-05-13 06:57:13 -04:00
Pavel Emelyanov	2ce643d06b	table: Directly compare std::optional<shard_id> with shard_id There's a loop that calculates the number of shard matches over a tablet map. The check of the given shard against optional<shard> can be made shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18592	2024-05-13 13:25:05 +03:00
Andrei Chekun	76a766cab0	Migrate alternator tests to PythonTestSuite As part of the unification process, alternator tests are migrated to the PythonTestSuite instead of using the RunTestSuite. The main idea is to have one suite, so there will be easier to maintain and introduce new features. Introduce the prepare_sql option for suite.yaml to add possibility to run cql statements as precondition for the test suite. Related: https://github.com/scylladb/scylladb/issues/18188 Closes scylladb/scylladb#18442	2024-05-13 13:23:29 +03:00
Avi Kivity	51d09e6a2a	cql3: castas_fcts: do not rely on boost casting large multiprecision integers to floats behavior In [1] a bug casting large multiprecision integers to floats is documented (note that it received two fixes, the most recent and relevant is [2]). Even with the fix, boost now returns NaN instead of ±∞ as it did before [3]. Since we cannot rely on boost, detect the conditions that trigger the bug and return the expected result. The unit test is extended to cover large negative numbers. Boost version behavior: - 1.78 - returns ±∞ - 1.79 - terminates - 1.79 + fix - returns NaN Fixes https://github.com/scylladb/scylladb/issues/18508 [1] https://github.com/boostorg/multiprecision/issues/553 [2] `ea786494db` [3] https://github.com/boostorg/math/issues/1132 Closes scylladb/scylladb#18532	2024-05-13 13:18:28 +03:00
Yaniv Michael Kaul	4639ca1bf5	compaction_strategy.cc: typo -> "performanceimproves" -> "performance improves" Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#18629	2024-05-13 08:43:38 +03:00
Patryk Wrobel	ec820e214c	scylla_io_setup: ensure correct RLIMIT_NOFILE for iotune The default limit of open file descriptors per process may be too small for iotune on certain machines with large number of cores. In such case iotune reports failure due to unability to create files or to set up seastar framework. This change configures the limit of open file descriptors before running iotune to ensure that the failure does not occur. The limit is set via 'resource.setrlimit()' in the parent process. The limit is then inherited by the child process. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#18546	2024-05-13 08:35:52 +03:00
Botond Dénes	32a0867b38	locator/tablets: introduce the primary replica concept The primary replica is an arbitrary replica of the tablet's, which is considered to tbe the "main" owner of the tablet, similar to how replicas own tokens in the vnode world. To avoid aliasing the primary replicas with a certain DC or rack, primary replicas are rotated among the tablet's replicas, selecting tablet_id % replica_count as the primary replica.	2024-05-13 01:35:05 -04:00
Avi Kivity	cc8b4e0630	batchlog_manager, test: initialize delay configuration In `b4e66ddf1d` (4.0) we added a new batchlog_manager configuration named delay, but forgot to initialize it in cql_test_env. This somehow worked, but doesn't with clang 18. Fix it by initializing to 0 (there isn't a good reason to delay it). Also provide a default to make it safer. Closes scylladb/scylladb#18572	2024-05-13 07:57:35 +03:00
Israel Fruchter	a1a6bd6798	Update tools/cqlsh submodule to v6.0.18 * tools/cqlsh e5f5eafd...c8158555 (11): > cqlshlib/sslhandling: fix logic of `ssl_check_hostname` > cqlshlib/sslhandling.py: don't use empty userkey/usercert > Dockerfile: noninteractive isn't enough for answering yet on apt-get > fix cqlsh version print > cqlshlib/sslhandling: change `check_hostname` deafult to False > Introduce new ssl configuration for disableing check_hostname > set the hostname in ssl_options.server_hostname when SSL is used > issue-73 Fixed a bug where username and password from the credentials file were ignored. > issue-73 Fixed a bug where username and password from the credentials file were ignored. > issue-73 > github actions: update `cibuildwheel==v2.16.5` Fixes: scylladb/scylladb#18590 Closes scylladb/scylladb#18591	2024-05-13 07:25:10 +03:00
Yaron Kaikov	3eb81915c1	docker: drop jmx and tools-java from installation Following the work done in `dd0779675f`, removing the scylla-jmx and scylla-tools-java from our docker image Closes scylladb/scylladb#18566	2024-05-13 07:24:23 +03:00
Takuya ASADA	9538af0d95	scylla_kernel_check: fix block device size error on latest mkfs.xfs On latest mkfs.xfs, it does not allow to format a block device which is smaller than 300MB. There are options to ignore this validation but it is unsupported feature, so it is better to increase the loopback image size to "supported size" == 300MB. reference: https://lore.kernel.org/all/164738662491.3191861.15611882856331908607.stgit@magnolia/ Fixes #18568 Closes scylladb/scylladb#18620	2024-05-13 07:23:29 +03:00
Avi Kivity	c8cc47df2d	Merge 'replica: allocate storage groups dynamically' from Aleksandra Martyniuk Allocate storage groups dynamically, i.e.: - on table creation allocate only storage groups that are on this shard; - allocate a storage group for tablet that is moved to this shard; - deallocate storage group for tablet that is moved out of this shard. Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` before change: ``` random-seed=2248493992 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 64933.90 tps ( 63.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42163 insns/op, 0 errors) 65865.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42155 insns/op, 0 errors) 66649.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) 67029.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) 68361.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42166 insns/op, 0 errors) median 66649.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42176 insns/op, 0 errors) median absolute deviation: 784.00 maximum: 68361.21 minimum: 64933.90 ``` Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` after change: ``` random-seed=2248493992 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 63744.12 tps ( 63.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42153 insns/op, 0 errors) 66613.16 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42153 insns/op, 0 errors) 69667.39 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42184 insns/op, 0 errors) 67824.78 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42180 insns/op, 0 errors) 67244.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42174 insns/op, 0 errors) median 67244.21 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42174 insns/op, 0 errors) median absolute deviation: 631.05 maximum: 69667.39 minimum: 63744.12 ``` Fixes: #16877. Closes scylladb/scylladb#17664 * github.com:scylladb/scylladb: test: add test for back and forth tablets migration replica: allocate storage groups dynamically replica: refresh snapshot in compaction_group::cleanup replica: add rwlock to storage_group_manager replica: handle reads of non-existing tablets gracefully service: move to cleanup stage if allow_write_both_read_old fails replica: replace table::as_table_state compaction: pass compaction group id to reshape_compaction_group replica: open code get_compaction_group in perform_cleanup_compaction replica: drop single_compaction_group_if_available	2024-05-12 21:22:02 +03:00
Nadav Har'El	9813ec9446	Merge 'test: perf: add end-to-end benchmark for alternator' from Marcin Maliszkiewicz The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core). Related: https://github.com/scylladb/scylladb/issues/12518 Closes scylladb/scylladb#13121 * github.com:scylladb/scylladb: test: perf: alternator: add option to skip data pre-population perf-alternator-workloads: add operations-per-shard option test: perf: add global secondary indexes write workload for alternator test: perf: add option to continue after failed request test: perf: add read modify write workload for alternator (lwt) test: perf: add scan workload for alternator test: perf: add end-to-end benchmark for alternator test: perf: extract result aggregation logic to a separate struct	2024-05-12 18:15:29 +03:00
Kefu Chai	fd14b6f26b	test/nodetool: do not accept 1 return code when passing --help to nodetool in `906700d5`, we accepted 0 as well as the return code of "nodetool <command> --help", because we needed to be prepared for the newer seastar submodule while be compatible with the older seastar versions. now that in `305f1bd3`, we bumped up the seastar module, and this commit picked up the change to return 0 when handling "--help" command line option in seastar, we are able to drop the workaround. so, in this change, we only use "0" as the expected return code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18627	2024-05-12 14:30:31 +03:00
Avi Kivity	be76527781	Merge 'build: cmake build dist-unified by default and put tarballs under per-config paths' from Kefu Chai in the same spirit of `d57a82c156`, this change adds `dist-unified` as one of the default targets. so that it is built by default. the unified package is required to when redistributing the precompiled packages -- we publish the rpm, deb and tar balls to S3. - [x] cmake related change, no need to backport Closes scylladb/scylladb#18621 * github.com:scylladb/scylladb: build: cmake: use paths to be compatible with CI build: cmake build dist-unified by default	2024-05-12 11:16:03 +03:00
Benny Halevy	796ca367d1	gossiper: rename topo_sm member to _topo_sm Follow scylla convention for class member naming. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18528	2024-05-12 11:02:35 +03:00
Avi Kivity	2ad13e5d76	auth: complete coroutinization of password_authenticator::create_default_if_missing password_authenticator::create_default_if_missing() is a confusing mix of coroutines and continuations, simplify it to a normal coroutine. Closes scylladb/scylladb#18571	2024-05-11 17:04:20 +03:00
Kefu Chai	1186ddef16	build: cmake: use paths to be compatible with CI our CI workflow for publishing the packages expects the tar balls to be located under `build/$buildMode/dist/tar`, where `$buildMode` is "release" or "debug". before this change, the CMake building system puts the tar balls under "build/dist" when the multi-config generator is used. and `configure.py` uses multi-config generator. in this change, we put the tar balls for redistribution under `build/$<CONFIG>/dist/tar`, where `$<CONFIG>` is "RelWithDebInfo" or "Debug", this works better with the CI workflow -- we just need to map "release" and "debug" to "RelWithDebInfo" and "Debug" respectively. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-11 21:56:50 +08:00
Kefu Chai	0f85255c74	build: cmake build dist-unified by default in the same spirit of `d57a82c156`, this change adds `dist-unified` as one of the default targets. so that it is built by default. the unified package is required to when redistributing the precompiled packages -- we publish the rpm, deb and tar balls to S3. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-11 18:44:11 +08:00
Raphael S. Carvalho	7faba69f28	replica: Make it explicit table's sstable set is immutable Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-10 11:58:08 -03:00
Raphael S. Carvalho	55c0272b68	replica: avoid reallocations in tablet_sstable_set reserve upfront wherever possible to avoid reallocations. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-10 10:44:39 -03:00
Raphael S. Carvalho	35a0d47408	replica: Avoid compound set if only one sstable set is filled Most of the time only main set is filled, so we can avoid one layer of indirection (= compound set) when maintenance set is empty. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-10 10:44:34 -03:00
Aleksandra Martyniuk	51fdda4199	test: add test for back and forth tablets migration	2024-05-10 15:08:56 +02:00
Aleksandra Martyniuk	b4371a0ea0	replica: allocate storage groups dynamically Currently empty storage_groups are allocated for tablets that are not on this shard. Allocate storage groups dynamically, i.e.: - on table creation allocate only storage groups that are on this shard; - allocate a storage group for tablet that is moved to this shard; - deallocate storage group for tablet that is cleaned up. Stop compaction group before it's deallocated. Add a flag to table::cleanup_tablet deciding whether to deallocate sgs and use it in commitlog tests.	2024-05-10 15:08:21 +02:00
Aleksandra Martyniuk	6e1e082e8c	replica: refresh snapshot in compaction_group::cleanup During compaction_group::cleanup sstables set is updated, but row_cache::_underlaying still keeps a shared ptr to the old set. Due to that descriptors to deleted sstables aren't closed. Refresh snapshot in order to store new sstables set in _underlying mutation source.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	c283746b32	replica: add rwlock to storage_group_manager Add rwlock which prevents storage groups from being added/deleted while some other layers itereates over them (or their compaction groups). Add methods to iterate over storage groups with the lock held.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	54fcb7be53	replica: handle reads of non-existing tablets gracefully In the following patches, storage groups (and so also sstables sets) will be allocated only for tablets that are located on this shard. Some layers may try to read non-existing sstable sets. Handle this case as if the sstables set was empty instead of calling on_internal_error.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	561fb1dd09	service: move to cleanup stage if allow_write_both_read_old fails If allow_write_both_read_old tablet transition stage fails, move to cleanup_target stage before reverting migration. It's a preparation for further patches which deallocate storage group of a tablet during cleanup.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	532653f118	replica: replace table::as_table_state Replace table::as_table_state with table::try_get_table_state_with_static_sharding which throws if a table does not use static sharding.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	cf9913b0b7	compaction: pass compaction group id to reshape_compaction_group Pass compaction group id to shard_reshaping_compaction_task_impl::reshape_compaction_group. Modify table::as_table_state to return table_state of the given compaction group.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	90d618d8c9	replica: open code get_compaction_group in perform_cleanup_compaction Open code get_compaction_group in table::perform_cleanup_compaction as its definition won't be relevant once storage groups are allocated dynamically.	2024-05-10 14:56:38 +02:00
Aleksandra Martyniuk	8505389963	replica: drop single_compaction_group_if_available Drop single_compaction_group_if_available as it's unused.	2024-05-10 14:56:38 +02:00
Lakshmi Narayanan Sreethar	d39adf6438	compaction: improve partition estimates for garbage collected sstables When a compaction strategy uses garbage collected sstables to track expired tombstones, do not use complete partition estimates for them, instead, use a fraction of it based on the droppable tombstone ratio estimate. Fixes #18283 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18465	2024-05-10 13:02:34 +03:00
Botond Dénes	3286a6fa14	Merge 'Reload reclaimed bloom filters when memory is available' from Lakshmi Narayanan Sreethar PR #17771 introduced a threshold for the total memory used by all bloom filters across SSTables. When the total usage surpasses the threshold, the largest bloom filter will be removed from memory, bringing the total usage back under the threshold. This PR adds support for reloading such reclaimed bloom filters back into memory when memory becomes available (i.e., within the 10% of available memory earmarked for the reclaimable components). The SSTables manager now maintains a list of all SSTables whose bloom filter was removed from memory and attempts to reload them when an SSTable, whose bloom filter is still in memory, gets deleted. The manager reloads from the smallest to the largest bloom filter to maximize the number of filters being reloaded into memory. Closes scylladb/scylladb#18186 * github.com:scylladb/scylladb: sstable_datafile_test: add testcase to test reclaim during reload sstable_datafile_test: add test to verify auto reload of reclaimed components sstables_manager: reload previously reclaimed components when memory is available sstables_manager: start a fiber to reload components sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables sstable_datafile_test: add test to verify reclaimed components reload sstables: support reloading reclaimed components sstables_manager: add new intrusive set to track the reclaimed sstables sstable: add link and comparator class to support new instrusive set sstable: renamed intrusive list link type sstable: track memory reclaimed from components per sstable sstable: rename local variable in sstable::total_reclaimable_memory_size	2024-05-10 13:01:01 +03:00
Kefu Chai	305f1bd382	Update seastar submodule * seastar b73e5e7d...42f15a5f (27): > prometheus: revert the condition for enabling aggregation > tests/unit: add a unit test for json2code > seastar-json2code: fix the path param handling > github/workflow: do not override <clang++,23,release> > github/workflow: add a github workflow for running tests > prometheus: support disabling aggregation at query time > apps/httpd: free allocated http_server_control > rpc: cast rpc::tuple to std::tuple when passing it to std::apply > stall-analyser: move `args` into main() > stall-analyser: move print_command_line_options() out of Graph > stall-analyser: pass branch_threshold via parameter > stall-analyser: move process_graph() into Graph class > scripts: addr2line: cache the results of resolve_address() > stall-analyser: document the parser of log lines > stall-analyser: move resolver into main() > stall-analyser: extract get_command_line_parser() out > stall-analyser: move graph into main() > stall-analyser: extract main() out > stall-analyser: extract print_command_line_options() out > stall-analyser: add more typing annotatins > stall-analyser: surround top-level function with two empty lines > core/app_template: return status code 0 for --help > iotune: Print file alignments too > seastar-json2code: extract Parameter class > seastar-json2code: use f-string when appropriate > seastar-json2code: use nickname in place of oper['nickname'] > seastar-json2code: use dict.get() when checking allowMultiple Closes scylladb/scylladb#18598	2024-05-10 12:50:16 +03:00
Patryk Jędrzejczak	a04ea7b997	topology_coordinator: send barrier to a decommissioning node The code in `global_token_metadata_barrier` allows drain to fail. Then, it relies on fencing. However, we don't send the barrier command to a decommissioning node, which may still receive requests. The node may accept a write with a stale topology version. It makes fencing ineffective. Fix this issue by sending the barrier command to a decommissioning node. The raft-based topology is moved out of experimental in 6.0, no need to backport the patch. Fixes scylladb/scylladb#17108 Closes scylladb/scylladb#18599	2024-05-10 10:53:16 +02:00
Botond Dénes	c35031dda5	Merge 'repair: tablet_repair: make best effort in spite of errors' from Benny Halevy Currently if any shard repair task fails, `tablet_repair_task_impl` per-shard loop breaks, since it doesn't handle the expection. Although repair does return an error, which is as expected, we change vnode-based repair to make a best effort and try to repair as much as it can, even if any of the ranges failed. This causes the `test_repair_with_down_nodes_2b` dtest to fail with tablets, as seen in, e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/52/testReport/repair_additional_test/TestRepairAdditional/FullDtest___full_split002___test_repair_with_down_nodes_2b/ ``` AssertionError: assert 1765 == 2000 ``` - [x] Backport reason (please explain below if this patch should be backported or not) Tablet repair code will be introduced in 6.0, no need to backport to earlier versions. Closes scylladb/scylladb#18518 * github.com:scylladb/scylladb: repair: tablet_repair_task_impl: modernize table lookup repair: tablet_repair: make best effort in spite of errors	2024-05-10 10:51:09 +03:00
Piotr Dulikowski	a3070089de	main: initialize scheduling group keys before service levels Due to scylladb/seastar#2231, creating a scheduling group and a scheduling group key is not safe to do in parallel. The service level code may attempt to create scheduling groups while the cql_transport::cql_sg_stats scheduling group key is being created. Until the seastar issue is fixed, move initialization of the cql sg states before service level initialization. Refs: scylladb/seastar#2231 Closes scylladb/scylladb#18581	2024-05-10 10:35:05 +03:00
Kefu Chai	28791aa2c1	build: cmake: link thrift against absl::header this change is a leftover of `0b0e661a85`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18596	2024-05-09 18:43:23 +03:00
Avi Kivity	37d32a5f8b	Merge 'Cleanup inactive reads on tablet migration' from Botond Dénes When a tablet is migrated away, any inactive read which might be reading from said tablet, has to be dropped. Otherwise these inactive reads can prevent sstables from being removed and these sstables can potentially survive until the tablet is migrated back and resurrect data. This series introduces the fix as well as a reproducer test. Fixes: https://github.com/scylladb/scylladb/issues/18110 Closes scylladb/scylladb#18179 * github.com:scylladb/scylladb: test: add test for cleaning up cached querier on tablet migration querier: allow injecting cache entry ttl by error injector replica/table: cleanup_tablet(): clear inactive reads for the tablet replica/database: introduce clear_inactive_reads_for_tablet() replica/database: introduce foreach_reader_concurrency_semaphore reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table() reader_concurrency_semaphore: allow storing a range with the inactive reader reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon()	2024-05-09 17:34:49 +03:00
Lakshmi Narayanan Sreethar	4d22c4b68b	sstable_datafile_test: add testcase to test reclaim during reload Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 19:57:40 +05:30
Pavel Emelyanov	5497bb5a3d	loading_shared_values: Replace static-assert with concept The templatized get_or_load() accepts Loader template parameter and static-asserts on its signature. Concept is more suitable here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18582	2024-05-09 16:29:49 +03:00
Patryk Jędrzejczak	332bd8ea98	raft: raft_group_registry: start_server_for_group: catch and rethrow abort_requested_exception If we initiate the shutdown while starting the group 0 server, we could catch `abort_requested_exception` in `start_server_for_group` and call `on_internal_error`. Then, Scylla aborts with a coredump. It causes problems in tests that shut down bootstrapping nodes. The `abort_requested_exception` can be thrown from `gossiper::lock_endpoint` called in `storage_service::topology_state_load`. So, the issue is new and applies only to the raft-based topology. Hence, there is no need to backport the patch. Fixes scylladb/scylladb#17794 Fixes scylladb/scylladb#18197 Closes scylladb/scylladb#18569	2024-05-09 14:55:11 +02:00
Benny Halevy	073680768f	repair: tablet_repair_task_impl: modernize table lookup Currently, the loop that goes over all repair metas checks for the table's existance using `find_column_family()`. Although this is correct, it might cause an exception storm if a table o keyspace are dropped during repair. This can be avoided by using the more modern interface, `get_table_if_exists` in the database `tables_metadata` that returns a `lw_shared_ptr<replica::table>`, exactly as we need, that has value iff the table still exists without throwing any exception. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-09 15:43:00 +03:00
Benny Halevy	c55aa4b121	repair: tablet_repair: make best effort in spite of errors Currently if any shard repair task fails, `tablet_repair_task_impl` per-shard loop breaks, since it doesn't handle the expection. Although repair does return an error, which is as expected, we change vnode-based repair to make a best effort and try to repair as much as it can, even if any of the ranges failed. This causes the `test_repair_with_down_nodes_2b` dtest to fail with tablets, as seen in, e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/tablets/job/gating-dtest-release-with-tablets/52/testReport/repair_additional_test/TestRepairAdditional/FullDtest___full_split002___test_repair_with_down_nodes_2b/ ``` AssertionError: assert 1765 == 2000 ``` This change adds a check for the keyspace and table presence whenever an individual repair task fails, instead of the global check at the end, so that failures due to dropping of the keyspace or the table are logged as warnings, but ignored for the purpose of failing the overall repair status. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-09 15:42:59 +03:00
Lakshmi Narayanan Sreethar	a080daaa94	sstable_datafile_test: add test to verify auto reload of reclaimed components Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar	0b061194a7	sstables_manager: reload previously reclaimed components when memory is available When an SSTable is dropped, the associated bloom filter gets discarded from memory, bringing down the total memory consumption of bloom filters. Any bloom filter that was previously reclaimed from memory due to the total usage crossing the threshold, can now be reloaded back into memory if the total usage can still stay below the threshold. Added support to reload such reclaimed filters back into memory when memory becomes available. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar	f758d7b114	sstables_manager: start a fiber to reload components Start a fiber that gets notified whenever an sstable gets deleted. The fiber doesn't do anything yet but the following patch will add support to reload reclaimed components if there is sufficient memory. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:49:22 +05:30
Lakshmi Narayanan Sreethar	24064064e9	sstable_directory_test: fix generation in sstable_directory_test_table_scan_incomplete_sstables The testcase uses an sstable whose mutation key and the generation are owned by different shards. Due to this, when process_sstable_dir is called, the sstable gets loaded into a different shard than the one that was intended. This also means that the sstable and the sstable manager end up in different shards. The following patch will introduce a condition variable in sstables manager which will be signalled from the sstables. If the sstable and the sstable manager are in different shards, the signalling will cause the testcase to fail in debug mode with this error : "Promise task was set on shard x but made ready on shard y". So, fix it by supplying appropriate generation number owned by the same shard which owns the mutation key as well. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	69b2a127b0	sstable_datafile_test: add test to verify reclaimed components reload Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	54bb03cff8	sstables: support reloading reclaimed components Added support to reload components from which memory was previously reclaimed as the total memory of reclaimable components crossed a threshold. The implementation is kept simple as only the bloom filters are considered reclaimable for now. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	2340ab63c6	sstables_manager: add new intrusive set to track the reclaimed sstables The new set holds the sstables from where the memory has been reclaimed and is sorted in ascending order of the total memory reclaimed. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	140d8871e1	sstable: add link and comparator class to support new instrusive set Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	3ef2f79d14	sstable: renamed intrusive list link type Renamed the intrusive list link type to differentiate it from the set link type that will be added in an upcoming patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	02d272fdb3	sstable: track memory reclaimed from components per sstable Added a member variable _total_memory_reclaimed to the sstable class that tracks the total memory reclaimed from a sstable. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Lakshmi Narayanan Sreethar	a53af1f878	sstable: rename local variable in sstable::total_reclaimable_memory_size Renamed local variable in sstable::total_reclaimable_memory_size in preparation for the next patch which adds a new member variable _total_memory_reclaimed to the sstable class. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-05-09 17:48:58 +05:30
Marcin Maliszkiewicz	a1099791c4	test: perf: alternator: add option to skip data pre-population	2024-05-09 13:59:17 +02:00
Marcin Maliszkiewicz	fd416fac3b	perf-alternator-workloads: add operations-per-shard option	2024-05-09 13:59:13 +02:00
Marcin Maliszkiewicz	5b8acf182a	test: perf: add global secondary indexes write workload for alternator	2024-05-09 13:59:08 +02:00
Marcin Maliszkiewicz	43a64ac558	test: perf: add option to continue after failed request	2024-05-09 13:59:03 +02:00
Marcin Maliszkiewicz	70b5b5024b	test: perf: add read modify write workload for alternator (lwt)	2024-05-09 13:58:58 +02:00
Marcin Maliszkiewicz	5b8e554431	test: perf: add scan workload for alternator	2024-05-09 13:58:54 +02:00
Marcin Maliszkiewicz	55030b1550	test: perf: add end-to-end benchmark for alternator The code is based on similar idea as perf_simple_query. The main differences are: - it starts full scylla process - communicates with alternator via http (localhost) - uses richer table schema with all dynamoDB types instead of only strings Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc). Results on my machine (with 1 vCPU): > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null ... median 23402.59616090321 median absolute deviation: 598.77 maximum: 24014.41 minimum: 19990.34 > ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null ... median 16089.34211320635 median absolute deviation: 552.65 maximum: 16915.95 minimum: 14781.97 The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).	2024-05-09 13:58:40 +02:00
Marcin Maliszkiewicz	6152223890	test: perf: extract result aggregation logic to a separate struct It will be reused later by a new tool.	2024-05-09 13:58:29 +02:00
Gleb Natapov	3b40d450e5	gossiper: try to locate an endpoint by the host id when applying state if search by IP fails Even if there is no endpoint for the given IP the state can still belong to existing endpoint that was restarted with different IP, so lets try to locate the endpoint by host id as well. Do it in raft topology mode only to not have impact on gossiper mode. Also make the test more robust in detecting wrong amount of entries in the peers table. Today it may miss that there is a wrong entry there because the map will squash two entries for the same host id into one. Fixes: scylladb/scylladb#18419 Fixes: scylladb/scylladb#18457	2024-05-09 13:14:54 +02:00
Patrik	b0fbe71eaf	Update launch-on-gcp.rst Closes scylladb/scylladb#18512	2024-05-09 10:12:31 +03:00
Avi Kivity	b7055b5f2f	storage_service: don't rely on optional<> formatting for removed node error std::optional formatting changed while moving from the home-grown formatter to the fmt provided formatter; don't rely on it for user visible messages. Here, the optional formatted is known to be engaged, so just print it. Closes scylladb/scylladb#18534	2024-05-09 10:03:23 +03:00
Kefu Chai	906700d523	test/nodetool: accept -1 returncode also when --help is invoked in newer seastar, 0 is returned as the returncode of the application when handling `--help`. to prepare for this behavior, let's accept it before updating the seastar submodule. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18574	2024-05-09 08:26:44 +03:00
Kefu Chai	6047b3b6aa	build: cmake: build async_utils.cc async_utils.cc was introduced in `e1411f39`, so let's update the cmake building system to build it. without which, we'd run into link failure like: ``` ld.lld: error: undefined symbol: to_mutation_gently(canonical_mutation const&, seastar::lw_shared_ptr<schema const>) >>> referenced by storage_service.cc >>> storage_service.cc.o:(service::storage_service::merge_topology_snapshot(service::raft_snapshot)) in archive service/Dev/libservice.a >>> referenced by group0_state_machine.cc >>> group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>)) inarchive service/Dev/libservice.a >>> referenced by group0_state_machine.cc >>> group0_state_machine.cc.o:(service::write_mutations_to_database(service::storage_proxy&, gms::inet_address, std::vector<canonical_mutation, std::allocator<canonical_mutation>>) (.resume)) in archive service/Dev/libservice.a >>> referenced 1 more times ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18524	2024-05-09 08:26:44 +03:00
Kefu Chai	c336904722	build: cmake: mark abseil include SYSTEM this change is a followup of `0b0e661a`. it helps to ensure that the header files in abseil submodule have higher priority when the compiler includes abseil headers when building with CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18523	2024-05-09 08:26:44 +03:00
Kefu Chai	2a9a874e19	db,service: fix typos in comments Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18567	2024-05-09 08:26:44 +03:00
Anna Stuchlik	65c8b81051	doc: add OS support in version 6.0 This commit adds OS support in version 6.0. In addition, it removes the information about version 5.2, as this version is no longer supported, according to our policy. Closes scylladb/scylladb#18562	2024-05-09 08:26:44 +03:00
Anna Stuchlik	74fb9808ed	doc: update Consistent Topology with Raft This PR: - Removes the `.. only:: opensource` directive from Consistent Topology with Raft. This feature is no longer an Open Source-only experimental feature. - Removes redundant version-specific information. - Moves the necessary version-specific information to a separate file. This is a follow-up to `55b011902e`. Refs https://github.com/scylladb/scylladb/pull/18285/ Closes scylladb/scylladb#18553	2024-05-09 08:26:44 +03:00
Calle Wilund	79d56ccaad	commitlog: Fix request_controller semaphore accounting. Fixes #18488 Due to the discrepancy between bytes added to CL and bytes written to disk (due to CRC sector overhead), we fail to account for the proper byte count when issuing account_memory_usage in allocate (using bytes added) and in cycle:s notify_memory_written (disk bytes written). This leads us to slowly, but surely, add to the semaphore all the time. Eventually rendering it useless. Also, terminate call would _not_ take any of this into account, and the chunk overhead there would cause a (smaller) discrepancy as well. Fix by simply ensuring that buffer alloc handles its byte usage, then accounting based on buffer position, not input byte size. Closes scylladb/scylladb#18489	2024-05-09 08:26:44 +03:00
Botond Dénes	155332ebf8	Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail. This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop. Closes scylladb/scylladb#18408 * github.com:scylladb/scylladb: Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" view: Abort pending view updates when draining	2024-05-09 08:26:44 +03:00
Aleksandra Martyniuk	67bbaad62e	tasks: use default task_ttl in scylla.yaml Currently default task_ttl_in_seconds is 0, but scylla.yaml changes the value to 10. Change task_ttl_in_seconds in scylla.yaml to 0, so that there are consistent defaults. Comment it out. Fixes: #16714. Closes scylladb/scylladb#18495	2024-05-09 08:26:44 +03:00
Botond Dénes	0438febdc9	Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table. The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards. Fixes #5883. Closes scylladb/scylladb#18286 * github.com:scylladb/scylladb: test/alternator: test addressing LSI using REST API REST API: stop using deprecated, buggy, path parameter	2024-05-09 08:26:43 +03:00
Yaniv Michael Kaul	124064844f	docs/dev/object_stroage.md: convert example AWS keys to be more innocent Someone thought that they actually represent real keys (the 'EXAMPLE' in their name was not enough). Converted them to be as clear as can be, example data. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#18565	2024-05-09 08:26:43 +03:00
Asias He	46269a99d8	repair: Add ranges_parallelism option support for tablet The ranges_parallelism option is introduced in commit `9b3fd9407b`. Currently, this option works for vnode table repair only. This patch enables it for tablet repair, since it is useful for tablet repair too. Fixes #18383 Closes scylladb/scylladb#18385	2024-05-09 08:26:43 +03:00
Benny Halevy	0156e97560	storage_proxy: cas: reject for tablets-enabled tables Currently, LWT is not supported with tablets. In particular the interaction between paxos and tablet migration is not handled yet. Therefore, it is better to outright reject LWT queries for tablets-enabled tables rather than support them in a flaky way. This commit also marks tests that depend on LWT as expeced to fail. Fixes scylladb/scylladb#18066 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18103	2024-05-09 08:26:43 +03:00
Patryk Jędrzejczak	053a2893cf	raft topology: join_token_ring: prevent shutdown hangs Shutdown of a bootstrapping node could hang on `_topology_state_machine.event.when()` in `wait_for_topology_request_completion`. It caused scylladb/scylladb#17246 and scylladb/scylladb#17608. On a normal node, `wait_for_group0_stop` would prevent it, but this function won't be called before we join group 0. Solve it by adding a new subscriber to `_abort_source`. Additionally, trigger `_group0_as` to prevent other hang scenarios. Note that if both the new subscriber and `wait_for_group0_stop` are called, nothing will break. `abort_source::request_abort` and `conditional_variable::broken` can be called multiple times. The raft-based topology is moved out of experimental in 6.0, no need to backport the patch. Fixes scylladb/scylladb#17246 Fixes scylladb/scylladb#17608 Closes scylladb/scylladb#18549	2024-05-09 08:26:43 +03:00
Botond Dénes	96a7ed7efb	Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili This is the second half of the fix for issue #13968. The first half is already merged with PR #18346 Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row. This change adds a new counter for dead rows to the large_partitions table. This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells. This is an enhancement, and backporting is not needed. Fixes #13968 Closes scylladb/scylladb#18458 * github.com:scylladb/scylladb: sstable: added test for counting dead rows sstable: added docs for system.large_partitions.dead_rows sstable: added cluster feature for dead rows and range tombstones sstable: write dead_rows count to system.large_partitions sstable: added counter for dead rows	2024-05-09 08:26:43 +03:00
David Garcia	d63d418ae3	docs: change "create an issue" github label to "type/documentation" Closes scylladb/scylladb#18550	2024-05-09 08:26:43 +03:00
Kefu Chai	02be1e9309	.github: add clang-tidy workflow clang-tidy is a tool provided by Clang to perform static analysis on C++ source files. here, we are mostly intersted in using its https://clang.llvm.org/extra/clang-tidy/checks/bugprone/use-after-move.html check to reveal the potential issues. this workflow is added to run clang-tidy when building the tree, so that the warnings from clang-tidy can be noticed by developers. a dedicated action is added so other github workflow can reuse it to setup the building environment in an ubuntu:jammy runner. clang-tidy-matcher.json is added to annotate the change, so that the warnings are more visible with github webpage. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18342	2024-05-09 08:26:43 +03:00
David Garcia	4a1b109641	docs: add swagger ui extension Renders the API Reference from api/api-doc using Swagger UI 2.2.10. address comments Closes scylladb/scylladb#18253	2024-05-09 08:26:43 +03:00
Botond Dénes	c7c4964b1c	tools/scylla-nodetool: make doc link version-specific Generate documentation link, such that they point to the documentation page, which is appropriate to the current product (open-source or enterprise) and version. The documentation links are generated by a new function and the documentation links are injected into the description of nodetool command via fmt::format().	2024-05-08 09:41:18 -04:00
Botond Dénes	2d1e938849	release: introduce doc_link() Allows generating documentation links that are appropriate for the current product (open-source or enterprise) and version. To be used in the next patch to make scylla-nodetool's documentation links product and version appropriate.	2024-05-08 09:41:17 -04:00
Botond Dénes	9d2156bd8a	build: pass scylla product to release.cc In the form of -DSCYLLA_PRODUCT. To be used in the next patch.	2024-05-08 09:40:24 -04:00
Kamil Braun	4dcae66380	Merge 'test: {auth,topology}: use manager.rolling_restart' from Piotr Dulikowski Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369 Closes scylladb/scylladb#18379 * github.com:scylladb/scylladb: test: get rid of server-side server_restart test: util: get rid of the `restart` helper test: {auth,topology}: use manager.rolling_restart	2024-05-08 09:45:08 +02:00
Piotr Dulikowski	180cb7a2b9	storage_service: notify lifecycle subs only after token metadata update Currently, in raft mode, when raft topology is reloaded from disk or a notification is received from gossip about an endpoint change, token metadata is updated accordingly. While updating token metadata we detect whether some nodes are joining or are leaving and we notify endpoint lifecycle subscribers if such an event occurs. These notifications are fired _before_ we finish updating token metadata and before the updated version is globally available. This behavior, for "node leaving" notifications specifically, was not present in legacy topology mode. Hinted handoff depends on token metadata being updated before it is notified about a leaving node (we had a similar issue before: scylladb/scylladb#5087, and we fixed it by enforcing this property). Because this is not true right now for raft mode, this causes the hint draining logic not to work properly - when a node leaves the cluster, there should be an attempt to send out hints for that node, but instead hints are not sent out and are kept on disk. In order to fix the issue with hints, postpone notifying endpoint lifecycle subscribers about joined and left nodes only after the final token metadata is computed and replicated to all shards. Fixes: scylladb/scylladb#17023 Closes scylladb/scylladb#18377	2024-05-08 09:40:44 +02:00
Kamil Braun	03818c4aa9	direct_failure_detector: increase ping timeout and make it tunable The direct failure detector design is simplistic. It sends pings sequentially and times out listeners that reached the threshold (i.e. didn't hear from a given endpoint for too long) in-between pings. Given the sequential nature, the previous ping must finish so the next ping can start. We timeout pings that take too long. The timeout was hardcoded and set to 300ms. This is too low for wide-area setups -- latencies across the Earth can indeed go up to 300ms. 3 subsequent timed out pings to a given node were sufficient for the Raft listener to "mark server as down" (the listener used a threshold of 1s). Increase the ping timeout to 600ms which should be enough even for pinging the opposite side of Earth, and make it tunable. Increase the Raft listener threshold from 1s to 2s. Without the increased threshold, one timed out ping would be enough to mark the server as down. Increasing it to 2s requires 3 timed out pings which makes it more robust in presence of transient network hiccups. In the future we'll most likely want to decrease the Raft listener threshold again, if we use Raft for data path -- so leader elections start quickly after leader failures. (Faster than 2s). To do that we'll have to improve the design of the direct failure detector. Ref: scylladb/scylladb#16410 Fixes: scylladb/scylladb#16607 --- I tested the change manually using `tc qdisc ... netem delay`, setting network delay on local setup to ~300ms with jitter. Without the change, the result is as observed in scylladb/scylladb#16410: interleaving ``` raft_group_registry - marking Raft server ... as dead for Raft groups raft_group_registry - marking Raft server ... as alive for Raft groups ``` happening once every few seconds. The "marking as dead" happens whenever we get 3 subsequent failed pings, which is happens with certain (high) probability depending on the latency jitter. Then as soon as we get a successful ping, we mark server back as alive. With the change, the phenomenon no longer appears. Closes scylladb/scylladb#18443	2024-05-07 23:40:23 +02:00
Anna Stuchlik	98367cb6a1	doc: Snitch switch is not supported with tablets This commit adds the tablets-related limitation: if you use tablets, then changing snitch is not supported Refs:https://github.com/scylladb/scylladb/issues/17513 See: https://github.com/scylladb/scylladb/issues/17513#issuecomment-2022552677 Closes scylladb/scylladb#18548	2024-05-07 17:26:05 +02:00
Pavel Emelyanov	677e80a4d5	table: Coroutinize table::delete_sstables_atomically() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18499	2024-05-07 17:10:28 +02:00
Kamil Braun	53443f566a	Merge 'Coroutinize generic_server's listen() method' from Pavel Emelyanov It needs some local naming cleanup, but otherwise it's pretty simple Closes scylladb/scylladb#18510 * github.com:scylladb/scylladb: generic_server: Fix indentation after previous patch generic_server: Coroutinize listen() method generic_server: Rename creds argument to builder	2024-05-07 17:08:59 +02:00
Ferenc Szili	60bf846f68	sstable: added test for counting dead rows	2024-05-07 15:44:33 +02:00
Ferenc Szili	8e9771d010	sstable: added docs for system.large_partitions.dead_rows	2024-05-07 15:44:33 +02:00
Avi Kivity	9b8dfb2b19	compaction: compaction_strategy validation: don't rely on optional<> formatting std::optional formatting changed while moving from the home-grown formatter to the fmt provided formatter; don't rely on it for user visible messages. Here, the optional formatted is known to be engaged, so just print it. Closes scylladb/scylladb#18533	2024-05-07 12:02:33 +03:00
Kefu Chai	7e578ae964	message: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18527	2024-05-07 11:59:36 +03:00
Raphael S. Carvalho	570e3f8df0	compaction: exclude expired sstables from calculation of base timestamps base timestamps are feeded into the sstable writer for calculating delta, used by varints. given that expired ssts are bypassed, we don't have to account them. so if we compacting fully expired and new sstable together, we can save a bit by having a base ts closer to the data actually written into output. also I wanted to move the calculation into the loop in setup(), to avoid two iterations over input set that can have even more than 1k elements. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18504	2024-05-07 08:43:50 +03:00
Raphael S. Carvalho	2d9142250e	Fix flakiness in test_tablet_load_and_stream due to premature gossiper abort on shutdown Until https://github.com/scylladb/scylladb/issues/15356 is fixed, this will be handled by explicitly closing the connection, so if scylla fails to update gossiper state due to premature abort on shutdown, then we won't be stuck in an endless reconnection attempt (later through heartbeats (30s interval)), causing the test to timeout. Manifests in scylla logs like this: gossip - failure_detector_loop: Got error in the loop, live_nodes={127.147.5.10, 127.147.5.16}: seastar::sleep_aborted (Sleep is aborted) gossip - failure_detector_loop: Finished main loop migration_manager - stopping migration service storage_service - Shutting down native transport server gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) cql_server_controller - CQL server stopped ... gossip - My status = NORMAL gossip - Announcing shutdown gossip - Fail to apply application_state: seastar::abort_requested_exception (abort requested) gossip - Sending a GossipShutdown to 127.147.5.10 with generation 1714449924 gossip - Sending a GossipShutdown to 127.147.5.16 with generation 1714449924 gossip - === Gossip round FAIL: seastar::abort_requested_exception (abort requested) Refs #14746. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18484	2024-05-07 02:31:02 +02:00
Piotr Dulikowski	5459cfed6a	Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz We won't run: - old pre auth-v1 migration code - code creating auth-v1 tables We will keep running: - code creating default rows - code creating auth-v1 keyspace (needed due to cqlsh legacy hack, it errors when executing `list roles` or `list users` if there is no system_auth keyspace, it does support case when there is no expected tables) Fixes https://github.com/scylladb/scylladb/issues/17737 Closes scylladb/scylladb#17939 * github.com:scylladb/scylladb: auth: don't run legacy migrations on auth-v2 startup auth: fix indent in password_authenticator::start auth: remove unused service::has_existing_legacy_users func	2024-05-06 19:53:35 +02:00
Wojciech Mitros	8472c46c8a	service_level_controller: coroutinize notify_service_level_removed To avoid conflicts arising from the discrepancy between different versions of the repository, use coroutines instead of continuations in service_level_controller::notify_service_level_removed(). Closes scylladb/scylladb#18525	2024-05-06 14:20:49 +03:00
Piotr Dulikowski	92e5018ddb	test: get rid of server-side server_restart Restarting a node amounts to just shutting it down and then starting again. There is no good reason to have a dedicated endpoint in the ScyllaClusterManager for restarting when it can be implemented by calling two endpoints in a sequence: stop and start - it's just code duplication. Remove the server_restart endpoint in ScyllaClusterManager and reimplement it as two endpoint calls in the ManagerClient.	2024-05-06 12:54:53 +02:00
Piotr Dulikowski	8de2bda7ae	test: util: get rid of the `restart` helper We already have `ManagerClient.server_restart`, which can be used in its place.	2024-05-06 12:24:40 +02:00
Piotr Dulikowski	897e603bf0	test: {auth,topology}: use manager.rolling_restart Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated `manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior. In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed. Fixes: scylladb/scylladb#18369	2024-05-06 12:24:40 +02:00
Kamil Braun	ccbb9f5343	Merge 'topology_coordinator: clear obsolete generations earlier' from Patryk Jędrzejczak We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. https://github.com/scylladb/scylladb/issues/17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Additionally, improve `test_raft_snapshot_request` that would become flaky after the change so it's not sensitive to changes anymore. The raft-based topology was experimental before 6.0, no need to backport. Ref: scylladb/scylladb#17545 Closes scylladb/scylladb#18497 * github.com:scylladb/scylladb: topology_coordinator: clear obsolete generations earlier test: test_raft_snapshot_request: improve the last assertion test: test_raft_snapshot_request: find raft leader after restart test: test_raft_shanpshot_request: simplify appended_command	2024-05-06 12:03:33 +02:00
Kamil Braun	1a50a524e7	Merge 'topology_coordinator: compute cluster size correctly during upgrade' from Piotr Dulikowski During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints(). Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case). The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster. In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size. Fixes: scylladb/scylladb#18198 Closes scylladb/scylladb#18261 * github.com:scylladb/scylladb: test: topology: test that upgrade succeeds after recent removal topology_coordinator: compute cluster size correctly during upgrade	2024-05-06 11:06:09 +02:00
Piotr Dulikowski	64ba620dc2	Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased. The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost. Refs scylladb/scylladb#6403 Fixes scylladb/scylladb#12278 Closes scylladb/scylladb#15567 * github.com:scylladb/scylladb: docs: Update Hinted Handoff documentation db/hints: Add endpoint_downtime_not_bigger_than() db/hints: Migrate hinted handoff when cluster feature is enabled db/hints: Handle arbitrary directories in resource manager db/hints: Start using hint_directory_manager db/hints: Enforce providing IP in get_ep_manager() db/hints: Introduce hint_directory_manager db/hints/resource_manager: Update function description db/hints: Coroutinize space_watchdog::scan_one_ep_dir() db/hints: Expose update lock of space watchdog db/hints: Add function for migrating hint directories to host ID db/hints: Take both IP and host ID when storing hints db/hints: Prepare initializing endpoint managers for migrating from IP to host ID db/hints: Migrate to locator::host_id db/hints: Remove noexcept in do_send_one_mutation() service: Add locator::host_id to on_leave_cluster service: Fix indentation db/hints: Fix indentation	2024-05-06 09:58:18 +02:00
Patryk Jędrzejczak	628d7e709e	cdc: generation: fix retrieve_generation_data_v2 `system_keyspace::read_cdc_generation_opt` queries `system.cdc_generations_v3`, which stores ids of CDC generations as timeuuids. This function shouldn't be called with a normal uuid (used by `system.cdc_generations_v2` to store generation ids). Such a call would end with a marshaling error. Before this patch,`retrieve_generation_data_v2` could call `system_keyspace::read_cdc_generation_opt` with a normal uuid if the generation wasn't present in `system.cdc_generations_v2`. This logic caused a marshaling error while handling the `check_and_repair_cdc_streams` request in the `cdc_test.TestCdc.test_check_and_repair_cdc_streams_liveness` dtest. This patch fixes the code being added in 6.0, no need to backport it. Fixes scylladb/scylladb#18473 Closes scylladb/scylladb#18483	2024-05-06 09:12:47 +02:00
Kamil Braun	16846bf5ce	Merge 'Do not serialize removenode operation with api lock if topology over raft is enabled' from Gleb With topology over raft all operation are already serialized by the coordinator anyway, so no need to synchronize removenode using api lock. All others are still synchronized since there cannot be executed in parallel for the same node anyway. * 'gleb/17681-fix' of github.com:scylladb/scylla-dev: storage_service: do not take API lock for removenode operation if topology coordinator is enabled test: return file mark from wait_for that points after the found string	2024-05-06 09:03:03 +02:00
Benny Halevy	ebff5f5d70	everywhere: include seastar headers using angle brackets seastar is an external library therefore it should use the system-include syntax. Closes scylladb/scylladb#18513	2024-05-06 10:00:31 +03:00
Kefu Chai	5ca9a46a91	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18515	2024-05-05 23:31:48 +03:00
Kefu Chai	0b0e661a85	build: bring abseil submodule back because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689, the rebuilt abseil package provided by fedora has different settings than the ones if the tree is built with the sanitizer enabled. this inconsistency leads to a crash. to address this problem, we have to reinstate the abseil submodule, so we can built it with the same compiler options with which we build the tree. in this change * Revert "build: drop abseil submodule, replace with distribution abseil" * update CMake building system with abseil header include settings * bump up the abseil submodule to the latest LTS branch of abseil: lts_2024_01_16 * update scylla-gdb.py to adapt to the new structure of flat_hash_map This reverts commit `8635d24424`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18511	2024-05-05 23:31:09 +03:00
Kefu Chai	ea791919cf	service/storage_proxy: drop unused operator<< operator<<(ostream, paxos_response_handler) is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18520	2024-05-05 16:33:29 +03:00
Nadav Har'El	21557cfaa6	cql3: Fix invalid JSON parsing for JSON object with different key types More than three years ago, in issue #7949, we noticed that trying to set a `map<ascii, int>` from JSON input (i.e., using INSERT JSON or the fromJson() function) fails - the ascii key is incorrectly parsed. We fixed that issue in commit `75109e9519` but unfortunately, did not do our due diligence: We did not write enough tests inspired by this bug, and failed to discover that actually we have the same bug for many other key types, not just for "ascii". Specifically, the following key types have exactly the same bug: * blob * date * inet * time * timestamp * timeuuid * uuid Other types, like numbers or boolean worked "by accident" - instead of parsing them as a normal string, we asked the JSON parser to parse them again after removing the quotes, and because unquoted numbers and unquoted true/false happwn to work in JSON, this didn't fail. The fix here is very simple - for all native types (i.e., not collections or tuples), the encoding of the key in JSON is simply a quoted string - and removing the quotes is all we need to do and there's no need to run the JSON parser a second time. Only for more elaborate types - collections and tuples - we need to run the JSON parser a second time on the key string to build the more elaborate object. This patch also includes tests for fromJson() reading a map with all native key types, confirming that all the aforementioned key types were broken before this patch, and all key types (including the numbers and booleans which worked even befoe this patch) work with this patch. Fixes #18477. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18482	2024-05-05 15:42:43 +03:00
Kefu Chai	f2b1c47dfc	test/boost: s/boost::range::random_shuffle/std::ranges::shuffle/ `boost::range::random_shuffle()` uses the deprecated `std::random_shuffle()` under the hood, so let's use `std::ranges::shuffle()` which is available since C++20. this change should address the warning like: ``` [312/753] CXX build/debug/test/boost/counter_test.o In file included from test/boost/counter_test.cc:17: /usr/include/boost/range/algorithm/random_shuffle.hpp:106:13: warning: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' is deprecated: use 'std::shuffle' instead [-Wdepr ecated-declarations] 106 \| detail::random_shuffle(boost::begin(rng), boost::end(rng)); \| ^ test/boost/counter_test.cc:507:27: note: in instantiation of function template specialization 'boost::range::random_shuffle<std::vector<counter_shard>>' requested here 507 \| boost::range::random_shuffle(shards); \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_algo.h:4489:5: note: 'random_shuffle<__gnu_cxx::__normal_iterator<counter_shard , std::vector<counter_shard>>>' has been explicitly marked deprecated here 4489 \| _GLIBCXX14_DEPRECATED_SUGGEST("std::shuffle") \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1957:45: note: expanded from macro '_GLIBCXX14_DEPRECATED_SUGGEST' 1957 \| # define _GLIBCXX14_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT) \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/x86_64-redhat-linux/bits/c++config.h:1941:19: note: expanded from macro '_GLIBCXX_DEPRECATED_SUGGEST' 1941 \| __attribute__ ((__deprecated__ ("use '" ALT "' instead"))) \| ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18517	2024-05-05 15:39:57 +03:00
Pavel Emelyanov	99f9807f15	sstables: Remove operator<<(std::ostream&, const deletion_time&) It's completely unused, likely in favor of recently added formatter for the type in question. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18502	2024-05-05 14:43:27 +03:00
Pavel Emelyanov	ddd2623418	generic_server: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:29:08 +03:00
Pavel Emelyanov	a1daa7093e	generic_server: Coroutinize listen() method Straightforward. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:28:42 +03:00
Pavel Emelyanov	030f1ef81c	generic_server: Rename creds argument to builder So that it doesn't clash with local creds variable that will appear in this method after its coroutinization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-03 12:27:37 +03:00
Kefu Chai	53b98a8610	test: string_format_test: disable test if {fmt} >= 10.0.0 {fmt} v10.0.0 introduces formatter for `std::optional`, so there is no need to test it. furthermore the behavior of this formatter is different from our homebrew one. so let's skip this test if {fmt} v10.0.0 or up is used. Refs #18508 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18509	2024-05-03 11:34:23 +03:00
Kefu Chai	3421e6dcc1	tools/scylla-nodetool: add formatter for char* in {fmt} version 10.0.0, it has a regression, which dropped the formatter for `char `, even it does format `const char`, as the latter is convertible to `fmt::stirng_view`. and this issue was addressed in 10.1.0 using 616a4937, which adds the formatter for `Char ` back, where `Char` is a template parameter. but we do need to print `vector<char>`, so, to address the build failure with {fmt} version 10.0.0, which is shipped along with fedora 39. let's backport this formatter. Fixes #18503 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18505	2024-05-02 23:25:24 +03:00
Avi Kivity	8de81f8f91	Merge 'Unstall merge topology snapshot' from Benny Halevy This series adds facilities to gently convert canonical mutations back to mutations and to gently make canonical mutations or freeze mutations in a seastar thread. Those are used in storage_service::merge_topology_snapshot to prevent reactor stalls due to large mutation, as seed in the test_add_many_nodes_under_load dtest. Also, migration_manager migration_request was converted to use a seastar thread to use the above facilities to prevent reactor stalls with large schema mutations, e,g, with a large number of tables, and/or when reading tablets mutations with a large number of tablets in a table. perf-simple-query --write results: Before: ``` median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) ``` After: ``` median 79716.73 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53314 insns/op, 0 errors) ``` Closes scylladb/scylladb#18290 * github.com:scylladb/scylladb: storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method raft: group0_state_machine: write_mutations_to_database: freeze mutations gently database: apply_in_memory: unfreeze_gently large mutations storage_service: get_system_mutations: make_canonical_mutation_gently tablets: read_tablet_mutations: make_canonical_mutation_gently schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference storage_service: merge_topology_snapshot: freeze_gently canonical_mutation: add make_canonical_mutation_gently frozen_mutation: move unfreeze_gently to async_utils mutation: add freeze_gently idl-compiler: generate async serialization functions for stub members raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently storage_service: merge_topology_snapshot: co_await to_mutation_gently canonical_mutation: add to_mutation_gently idl-compiler: emit include directive in generated impl header file mutation_partition: add apply_gently collection_mutation: improve collection_mutation_view formatting mutation_partition: apply_monotonically: do not support schema upgrade test/perf: report also log_allocations/op	2024-05-02 23:24:38 +03:00
Nadav Har'El	f604269f0a	cql3, secondary index: consistently choose index to use in a query When a table has secondary indexes on multiple columns, and several such columns are used for filtering in a query, Scylla chooses one of these indexes as the main driver of the query, and the second column's restriction is implemented as filtering. Before this patch, the index to use was chosen fairly randomly, based on the order of the indexes in the schema. This order may be different in different coordinators, and may even change across restarts on the same coordinators. This is not only inconsistent, it can cause outright wrong results when using paging and switching (or restarting) coordinates in the middle of a paged scan... One coordinator saves one index's key in the paging state, and then the other coordinator gets this paging state and wrongly believes it is supposed to be a key of a different index. The fix in this patch is to pick the index suitable for the first indexed column mentioned in the query. This has two benefits over the situation before the patch: 1. The decision of which index to use no longer changes between coordinators or across restarts - it just depends on the schema and the specific query. 2. Different indexes can have different "specificity" so using one or the other can change the query's performance. After this patch, the user is in control over which index is used by changing the order of terms in the query. A curious user can use tracing to check which index was used to implement a particular query. An xfailing test we had for this issue no longer fails, so the "xfail" marker is removed. Fixes #7969 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#14450	2024-05-02 19:52:42 +02:00
Benny Halevy	890b890e36	storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method Generalizing the ad-hoc implementation out of group0_state_machine.write_mutations_to_database. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:42:58 +03:00
Benny Halevy	4ae5bbb058	raft: group0_state_machine: write_mutations_to_database: freeze mutations gently write_mutations_to_database might need to handle large mutations from system tables, so to prevent reactor stalls, freeze the mutations gently and call proxy.mutate_locally in parallel on the individual frozen mutations, rather than calling the vector<mutation> based entry point that eventually freezes each mutation synchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	a9f157b648	database: apply_in_memory: unfreeze_gently large mutations Prevent stalls coming from applying large mutations in memory synchronously, like the ones seen with the test_add_many_nodes_under_load dtest: ``` \| \| \| ++[5#2/2 44%] addr=0x1498efb total=256 count=3 avg=85: \| \| \| \| replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}::operator() at ./replica/memtable.cc:804 \| \| \| \| (inlined by) logalloc::allocating_section::with_reclaiming_disabled<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&> at ././utils/logalloc.hh:500 \| \| \| \| (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}::operator() at ././utils/logalloc.hh:527 \| \| \| \| (inlined by) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}>(logalloc::region&, replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}&&)::{lambda()#1}> at ././utils/logalloc.hh:471 \| \| \| \| (inlined by) logalloc::allocating_section::operator()<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator()() const::{lambda()#1}> at ././utils/logalloc.hh:526 \| \| \| \| (inlined by) replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0::operator() at ./replica/memtable.cc:800 \| \| \| \| (inlined by) with_allocator<replica::memtable::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const> const&, db::rp_handle&&)::$_0> at ././utils/allocation_strategy.hh:318 \| \| \| \| (inlined by) replica::memtable::apply at ./replica/memtable.cc:799 \| \| \| ++[6#1/1 100%] addr=0x145047b total=1731 count=21 avg=82: \| \| \| \| replica::table::do_apply<frozen_mutation const&, seastar::lw_shared_ptr<schema const>&> at ./replica/table.cc:2896 \| \| \| ++[7#1/1 100%] addr=0x13ddccb total=2852 count=32 avg=89: \| \| \| \| replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0::operator() at ./replica/table.cc:2924 \| \| \| \| (inlined by) seastar::futurize<void>::invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2032 \| \| \| \| (inlined by) seastar::futurize_invoke<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0&> at ././seastar/include/seastar/core/future.hh:2066 \| \| \| \| (inlined by) replica::dirty_memory_manager_logalloc::region_group::run_when_memory_available<replica::table::apply(frozen_mutation const&, seastar::lw_shared_ptr<schema const>, db::rp_handle&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >)::$_0> at ./replica/dirty_memory_manager.hh:572 \| \| \| \| (inlined by) replica::table::apply at ./replica/table.cc:2923 \| \| \| ++ - addr=0x1330ba1: \| \| \| \| replica::database::apply_in_memory at ./replica/database.cc:1812 \| \| \| ++ - addr=0x1360054: \| \| \| \| replica::database::do_apply at ./replica/database.cc:2032 ``` This change has virtually no effect on small mutations (up to 128KB in size). build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1 Before: median 80092.06 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53291 insns/op, 0 errors) After: median 78780.86 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53311 insns/op, 0 errors) To estimate the performance ramifications on large mutations, I measured perf-simple-query --write calling unfreeze_gently in all cases: median 77411.26 tps ( 71.3 allocs/op, 8.0 logallocs/op, 14.3 tasks/op, 53280 insns/op, 0 errors) Showing the allocations that moved out of logalloc (in memtable::apply of frozen_mutation) into seastar allocations (in unfreeze_gently) and <1% cpu overhead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	7dd6a81026	storage_service: get_system_mutations: make_canonical_mutation_gently and also unfreeze_gently the result frozen_mutation:s to prevent the following stalls that were seen with the test_add_many_nodes_under_load dtest: ``` ++[1#1/58 5%] addr=0x16330e9 total=321 count=4 avg=80: \| utils::uleb64_express_encode_impl at ././utils/vle.hh:73 \| (inlined by) utils::uleb64_express_encode<void (&)(char const, unsigned long), void (&)(char const, unsigned long)> at ././utils/vle.hh:82 \| (inlined by) logalloc::region_impl::object_descriptor::encode at ./utils/logalloc.cc:1658 \| (inlined by) logalloc::region_impl::alloc_small at ./utils/logalloc.cc:1743 ++ - addr=0x1634cff: \| logalloc::region_impl::alloc at ./utils/logalloc.cc:2104 \| ++[2#1/2 83%] addr=0x116e22c total=321 count=4 avg=80: \| \| managed_bytes::managed_bytes at ././utils/managed_bytes.hh:552 \| \| ++[3#1/3 51%] addr=0x1551288 total=198 count=3 avg=66: \| \| \| compound_wrapper<clustering_key_prefix, clustering_key_prefix_view>::compound_wrapper at ././keys.hh:149 \| \| \| (inlined by) prefix_compound_wrapper<clustering_key_prefix, clustering_key_prefix_view, clustering_key_prefix>::prefix_compound_wrapper at ././keys.hh:574 \| \| \| (inlined by) clustering_key_prefix::clustering_key_prefix at ././keys.hh:865 \| \| \| (inlined by) rows_entry::rows_entry at ./mutation/mutation_partition.hh:957 \| \| ++ - addr=0x153f09f: \| \| \| allocation_strategy::construct<rows_entry, schema const&, position_in_partition_view&, seastar::bool_class<dummy_tag>&, seastar::bool_class<continuous_tag>&> at ././utils/allocation_strategy.hh:160 \| \| ++ - addr=0x151409a: \| \| \| mutation_partition::append_clustered_row at ./mutation/mutation_partition.cc:719 \| \| ++ - addr=0x14ab38f: \| \| \| partition_builder::accept_row at ././partition_builder.hh:57 \| \| \| ++[4#1/1 100%] addr=0x1579766 total=577 count=7 avg=82: \| \| \| \| mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212 \| \| \| ++[5#1/2 56%] addr=0x14e737c total=321 count=4 avg=80: \| \| \| \| frozen_mutation::unfreeze at ./mutation/frozen_mutation.cc:116 \| \| \| \| ++[6#1/1 100%] addr=0x24fb47e total=1476 count=18 avg=82: \| \| \| \| \| service::storage_service::get_system_mutations at ./service/storage_service.cc:6401 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	3143f575e5	tablets: read_tablet_mutations: make_canonical_mutation_gently To prevent reactor stalls due to large tablets mutations (that can contain over 100,000 rows). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:06 +03:00
Benny Halevy	7f372dd9ae	schema_tables: convert_schema_to_mutations: make_canonical_mutation_gently To prevent stalls due to large schema mutations. While at it, reserve the result canonical_mutation vector. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	61dea98185	schema_tables: redact_columns_for_missing_features: get input mutation using rvalue reference The function upgrades the input mutation only in certain cases. Currently it accepts the input mutation by value, which may cause and extraneous copy if the caller doesn't move the mutation, as done in `adjust_schema_for_schema_features`. Getting an rvalue reference instead makes the interface clearer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	bc1985b8ce	storage_service: merge_topology_snapshot: freeze_gently Freezing large mutations synchronously may cause reactor stalls, as seen in the test_add_many_nodes_under_load dtest: ``` ++[1#1/37 5%] addr=0x15b0bf total=99 count=2 avg=50: ?? ??:0 \| ++[2#1/2 67%] addr=0x15a331f total=66 count=1 avg=66: \| \| bytes_ostream::write at ././bytes_ostream.hh:248 \| \| (inlined by) bytes_ostream::write at ././bytes_ostream.hh:263 \| \| (inlined by) ser::serialize_integral<unsigned int, bytes_ostream> at ././serializer.hh:203 \| \| (inlined by) ser::integral_serializer<unsigned int>::write<bytes_ostream> at ././serializer.hh:217 \| \| (inlined by) ser::serialize<unsigned int, bytes_ostream> at ././serializer.hh:254 \| \| (inlined by) ser::writer_of_column<bytes_ostream>::write_id at ./build/dev/gen/idl/mutation.dist.impl.hh:4680 \| \| ++[3#1/1 100%] addr=0x159df71 total=132 count=2 avg=66: \| \| \| (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}::operator() at ./mutation/mutation_partition_serializer.cc:99 \| \| \| (inlined by) row::maybe_invoke_with_hash<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1} const, cell_and_hash const> at ./mutation/mutation_partition.hh:133 \| \| \| (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}::operator() at ./mutation/mutation_partition.hh:152 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>::operator() at ././utils/compact-radix-tree.hh:1888 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit_slot<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1560 \| \| ++ - addr=0x159d84d: \| \| \| compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:1364 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > at ././utils/compact-radix-tree.hh:799 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true>&> at ././utils/compact-radix-tree.hh:807 \| \| ++[4#1/1 100%] addr=0x1596f4a total=329 count=5 avg=66: \| \| \| compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:473 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::visit<compact_radix_tree::tree<cell_and_hash, unsigned int>::walking_visitor<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}, true> > at ././utils/compact-radix-tree.hh:1626 \| \| \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::walk<row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}>(ser::deletable_row__cells<bytes_ostream>&&) const::{lambda(unsigned int, cell_and_hash const&)#1}> at ././utils/compact-radix-tree.hh:1909 \| \| \| (inlined by) row::for_each_cell<(anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> >(ser::deletable_row__cells<bytes_ostream>&&, row const&, schema const&, column_kind)::{lambda(unsigned int, atomic_cell_or_collection const&)#1}> at ./mutation/mutation_partition.hh:151 \| \| \| (inlined by) (anonymous namespace)::write_row_cells<ser::deletable_row__cells<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:97 \| \| \| (inlined by) write_row<ser::writer_of_deletable_row<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:168 \| \| ++[5#1/2 80%] addr=0x15a310c total=263 count=4 avg=66: \| \| \| mutation_partition_serializer::write_serialized<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/mutation_partition_serializer.cc:180 \| \| \| ++[6#1/2 62%] addr=0x14eb60a total=428 count=7 avg=61: \| \| \| \| frozen_mutation::frozen_mutation(mutation const&)::$_0::operator()<ser::writer_of_mutation_partition<bytes_ostream> > at ./mutation/frozen_mutation.cc:85 \| \| \| \| (inlined by) ser::after_mutation__key<bytes_ostream>::partition<frozen_mutation::frozen_mutation(mutation const&)::$_0> at ./build/dev/gen/idl/mutation.dist.impl.hh:7058 \| \| \| \| (inlined by) frozen_mutation::frozen_mutation at ./mutation/frozen_mutation.cc:84 \| \| \| \| ++[7#1/1 100%] addr=0x14ed388 total=532 count=9 avg=59: \| \| \| \| \| freeze at ./mutation/frozen_mutation.cc:143 \| \| \| \| ++[8#1/2 74%] addr=0x252cf55 total=394 count=6 avg=66: \| \| \| \| \| service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:763 ``` This change uses freeze_gently to freeze the cdc_generations_v2 mutations one at a time to prevent the stalls reported above. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:05 +03:00
Benny Halevy	a016e1d05d	canonical_mutation: add make_canonical_mutation_gently Make a canonical mutation gently using an async serialization function. Similar to freeze_gently, yielding is considered only in-between range tombstones and rows. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:37:04 +03:00
Benny Halevy	a126160d7e	frozen_mutation: move unfreeze_gently to async_utils Unfreeze_gently doesn't have to be a method of frozen_mutation. It might as well be implemented as a free function reading from a frozen_mutation and preparing a mutation gently. The logic will be used in a later patch to make a canonical mutation directly from a frozen_mutation instead of unfreezing it and then converting it to a canonical_mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	aa27ef8811	mutation: add freeze_gently Allow yielding in between serializing of range tombstones and rows to prevent reactor stalls due to large mutations with many rows or range tombstones. mutations that have many cells might still stall but those are considered infrequent enough to ignore for now. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	0da2940c72	idl-compiler: generate async serialization functions for stub members To be used in a following patch for e.g. mutation::freeze_gently. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	504a9ab897	raft: group0_state_machine: write_mutations_to_database: use to_mutation_gently Prevent stalls coming from writing large mutations like the ones seen with the test_add_many_nodes_under_load dtest: ``` ++[1#11/11 6%] addr=0x15408f6 total=33 count=1 avg=33: \| managed_bytes::managed_bytes at ././utils/managed_bytes.hh:284 \| (inlined by) atomic_cell_or_collection::atomic_cell_or_collection at ./mutation/atomic_cell_or_collection.hh:25 \| (inlined by) cell_and_hash::cell_and_hash at ./mutation/mutation_partition.hh:73 \| (inlined by) compact_radix_tree::tree<cell_and_hash, unsigned int>::emplace<atomic_cell_or_collection, seastar::optimized_optional<cell_hash> > at ././utils/compact-radix-tree.hh:1809 ++ - addr=0x1518bae: \| row::append_cell at ./mutation/mutation_partition.cc:1344 ++ - addr=0x14acb23: \| partition_builder::accept_row_cell at ././partition_builder.hh:70 ++ - addr=0x157a6a6: \| mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor::accept_atomic_cell at ./mutation/mutation_partition_view.cc:218 \| (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor::operator() at ./mutation/mutation_partition_view.cc:138 \| (inlined by) boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>::internal_visit<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/variant.hpp:1028 \| (inlined by) boost::detail::variant::visitation_impl_invoke_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type> > at /usr/include/boost/variant/detail/visitation_impl.hpp:117 \| (inlined by) boost::detail::variant::visitation_impl_invoke<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:157 \| (inlined by) boost::detail::variant::visitation_impl<mpl_::int_<0>, boost::detail::variant::visitation_impl_step<boost::mpl::l_iter<boost::mpl::l_item<mpl_::long_<3l>, boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, boost::mpl::l_item<mpl_::long_<2l>, ser::collection_cell_view, boost::mpl::l_item<mpl_::long_<1l>, ser::unknown_variant_type, boost::mpl::l_end> > > >, boost::mpl::l_iter<boost::mpl::l_end> >, boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::has_fallback_type_> at /usr/include/boost/variant/detail/visitation_impl.hpp:238 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor_impl<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false>, void> at /usr/include/boost/variant/variant.hpp:2337 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::internal_apply_visitor<boost::detail::variant::invoke_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const, false> > at /usr/include/boost/variant/variant.hpp:2349 \| (inlined by) boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor const> at /usr/include/boost/variant/variant.hpp:2393 \| (inlined by) boost::apply_visitor<(anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor>(ser::row_view, column_mapping const&, column_kind, partition_builder&&)::atomic_cell_or_collection_visitor, boost::variant<boost::variant<ser::live_cell_view, ser::expiring_cell_view, ser::dead_cell_view, ser::counter_cell_view, ser::unknown_variant_type>, ser::collection_cell_view, ser::unknown_variant_type>&> at /usr/include/boost/variant/detail/apply_visitor_unary.hpp:68 \| (inlined by) (anonymous namespace)::read_and_visit_row<mutation_partition_view::do_accept<partition_builder>(column_mapping const&, partition_builder&) const::cell_visitor> at ./mutation/mutation_partition_view.cc:158 \| (inlined by) mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:224 ++ - addr=0x151234a: \| mutation_partition::apply at ./mutation/mutation_partition.cc:476 ++ - addr=0x14e1103: \| canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:76 ++ - addr=0x283f9ee: \| service::write_mutations_to_database at ./service/raft/group0_state_machine.cc:124 ++ - addr=0x283f36c: \| service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2::operator() at ./service/raft/group0_state_machine.cc:165 ++ - addr=0x28395e3: \| std::__invoke_impl<seastar::future<void>, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 \| (inlined by) std::__invoke<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, service::topology_change&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:96 \| (inlined by) std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<std::__detail::__variant::__deduce_visit_result<seastar::future<void> > (*)(seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>&&, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&)>, std::integer_sequence<unsigned long, 2ul> >::__visit_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1032 \| (inlined by) std::__do_visit<std::__detail::__variant::__deduce_visit_result<seastar::future<void> >, seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1793 \| (inlined by) std::visit<seastar::internal::variant_visitor<service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_0, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_1, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_2, service::group0_state_machine::merge_and_apply(service::group0_state_machine_merger&)::$_3>, std::variant<service::schema_change, service::broadcast_table_query, service::topology_change, service::write_mutations>&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/variant:1854 \| (inlined by) service::group0_state_machine::merge_and_apply at ./service/raft/group0_state_machine.cc:156 ++ - addr=0x284781e: \| service::group0_state_machine::apply at ./service/raft/group0_state_machine.cc:220 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	574cb7d977	storage_service: merge_topology_snapshot: co_await to_mutation_gently Perevent stalls from "unpacking" of large canonical mutations seen with test_add_many_nodes_under_load when called from `group0_state_machine::transfer_snapshot`: ``` ++[1#1/44 14%] addr=0x395b2f total=569 count=6 avg=95: ?? ??:0 \| ++[2#1/2 56%] addr=0x3991e3 total=321 count=4 avg=80: ?? ??:0 \| ++ - addr=0x1587159: \| \| std::__new_allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/new_allocator.h:147 \| \| (inlined by) std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/allocator.h:198 \| \| (inlined by) std::allocator_traits<std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/alloc_traits.h:482 \| \| (inlined by) std::_Vector_base<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::_M_allocate at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/stl_vector.h:378 \| \| (inlined by) std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >::reserve at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/vector.tcc:79 \| \| (inlined by) ser::idl::serializers::internal::vector_serializer<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > > >::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer_impl.hh:226 \| \| (inlined by) ser::deserialize<std::vector<seastar::basic_sstring<signed char, unsigned int, 31u, false>, std::allocator<seastar::basic_sstring<signed char, unsigned int, 31u, false> > >, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264 \| \| (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:31 \| ++ - addr=0x1587085: \| \| seastar::with_serialized_stream<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>, ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> >(seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator>&)::{lambda(auto:1&)#1}, void, void> at ././seastar/include/seastar/core/simple-stream.hh:646 \| \| (inlined by) ser::serializer<clustering_key_prefix>::read<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ./build/dev/gen/idl/keys.dist.impl.hh:28 \| \| (inlined by) ser::deserialize<clustering_key_prefix, seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> > at ././serializer.hh:264 \| \| (inlined by) ser::deletable_row_view::key() const::{lambda(auto:1&)#1}::operator()<seastar::fragmented_memory_input_stream<bytes_ostream::fragment_iterator> const> at ./build/dev/gen/idl/mutation.dist.impl.hh:1268 \| \| ++[3#1/1 100%] addr=0x15865a3 total=577 count=7 avg=82: \| \| \| seastar::memory_input_stream<bytes_ostream::fragment_iterator>::with_stream<ser::deletable_row_view::key() const::{lambda(auto:1&)#1}> at ././seastar/include/seastar/core/simple-stream.hh:491 \| \| \| (inlined by) seastar::with_serialized_stream<seastar::memory_input_stream<bytes_ostream::fragment_iterator> const, ser::deletable_row_view::key() const::{lambda(auto:1&)#1}, void> at ././seastar/include/seastar/core/simple-stream.hh:639 \| \| \| (inlined by) ser::deletable_row_view::key at ./build/dev/gen/idl/mutation.dist.impl.hh:1264 \| \| ++[4#1/1 100%] addr=0x157cf27 total=643 count=8 avg=80: \| \| \| mutation_partition_view::do_accept<partition_builder> at ./mutation/mutation_partition_view.cc:212 \| \| ++ - addr=0x1516cac: \| \| \| mutation_partition::apply at ./mutation/mutation_partition.cc:497 \| \| ++[5#1/1 100%] addr=0x14e4433 total=1765 count=22 avg=80: \| \| \| canonical_mutation::to_mutation at ./mutation/canonical_mutation.cc:60 \| \| ++[6#1/2 98%] addr=0x2452a60 total=1732 count=21 avg=82: \| \| \| service::storage_service::merge_topology_snapshot at ./service/storage_service.cc:761 \| \| ++ - addr=0x2858782: \| \| \| service::group0_state_machine::transfer_snapshot at ./service/raft/group0_state_machine.cc:303 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:56 +03:00
Benny Halevy	c485ed6287	canonical_mutation: add to_mutation_gently to_mutation_gently generates mutation from canonical_mutation asynchronously using the newly introduced mutation_partition accept_gently method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:27:54 +03:00
Benny Halevy	7f7e4616ab	idl-compiler: emit include directive in generated impl header file The generated implementation header file depends on the generated header file for the types it uses. Generate a respective #include directive to make it self-sufficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:50:16 +03:00
Benny Halevy	e1411f3911	mutation_partition: add apply_gently To be used for freezing mutations or making canonical mutations gently. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:45:24 +03:00
Benny Halevy	f625cd76a9	collection_mutation: improve collection_mutation_view formatting Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Benny Halevy	15e8ecb670	mutation_partition: apply_monotonically: do not support schema upgrade Currently, if the input mutation_partition requires schema upgrade, apply_monotonically always silently reverts to being non-preemptible, even if the caller passed is_preemptible::yes. To prevent that from happening, put the burden of upgrading the mutation_partition schem on the caller, which is today the apply() methods, which are synchronous anyhow. With that, we reduce the proliferation of the `apply_monotonically` overloads and keep only the low level one (which could potentially be private as well, as it's called only from within the mutation/ source files and from tests) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Benny Halevy	e5ca65f78b	test/perf: report also log_allocations/op Currently perf-simple-query --write ignores log allocations that happen on the memtable apply path. This change adds tracking and accounting of the number of log allocation, and reporting of thereof. For reference, here's the output of build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1 ``` random-seed=1 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction 78073.55 tps ( 59.4 allocs/op, 16.3 logallocs/op, 14.3 tasks/op, 52991 insns/op, 0 errors) 77263.59 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53282 insns/op, 0 errors) 79913.07 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53295 insns/op, 0 errors) 79554.32 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53284 insns/op, 0 errors) 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) median 79151.53 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53289 insns/op, 0 errors) median absolute deviation: 761.54 maximum: 79913.07 minimum: 77263.59 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 18:42:41 +03:00
Avi Kivity	e0d597348b	Merge 'Remove sstable_directory::_sstable_dir member' from Pavel Emelyanov Different sstable storage backends use slightly different notion of what sstable location is. Filesystem storage knows it's `/var/lib/data/ks/cf-uuid/state` path, while s3 storage keeps only this path's part without state (and even that's not very accurate, because bucket prefix is missing as well as "/var/lib/data" prefix is not needed and eventually should be omitted). Nonetheless, the sstable_directory still keeps the filsystem-like path, while it's really only needed by the filesystem lister. This PR removes it. Closes scylladb/scylladb#18496 * github.com:scylladb/scylladb: sstable_directory: Remove _sstable_dir member sstable_directory: Create sstable path with make_path() when logging sstable_directory: Use make_path to construct filesystem lister sstable_directory: Move some logging around	2024-05-02 17:52:21 +03:00
Patryk Jędrzejczak	b8e3bf4b09	topology_coordinator: clear obsolete generations earlier We want to clear CDC generations that are no longer needed (because all writes are already using a new generation) so they don't take space and are not sent during snapshot transfers (see e.g. scylladb/scylladb#17545). The condition used previously was that we clear generations which were closed (i.e., a new generation started at this time) more than 24h ago. This is a safe choice, but too conservative: we could easily end up with a large number of obsolete generations if we boot multiple nodes during 24h (which is especially easy to do with tablets.) Change this bound from 24h to `5s + ring_delay`. The choice is explained in a comment in the code. Also, prevent `test_cdc_generation_clearing` from being flaky by firing the `increase_cdc_generation_leeway` error injection on the server being the topology coordinator. Ref: scylladb/scylladb#17545	2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak	f61c50baa4	test: test_raft_snapshot_request: improve the last assertion The last assertion in the test is very sensitive to changes. The constant has already been increased from 0 to 1 due to flakiness. The old comment explains it. In the following patch, we change the CDC generation publisher so that it clears the obsolete CDC generations earlier. This change would make this assertion flaky again. After restarting the servers, the new topology coordinator could remove the first generation if it became obsolete. This operation appends a new entry to the log. If it happened after triggering snapshot, the assertion could fail with `2 <= 1`. We could increase the constant again to unflake the test, but we better improve it once and for all. We change the assertion so that it's not sensitive to changes in the code based on Raft. The explanation is in the new comment.	2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak	44791a849e	test: test_raft_snapshot_request: find raft leader after restart Finding the new Raft leader after restart simplifies the test and makes it easier to reason about. There are two improvements: - we only need to wait until the leader appends a command, so the read barrier becomes unnecessary, - we only need to trigger snapshot on the leader. We also use the knowledge about the leader in the following patch.	2024-05-02 12:46:33 +02:00
Patryk Jędrzejczak	41198998c5	test: test_raft_shanpshot_request: simplify appended_command We shorten the code and remove the unused `log_size` variable.	2024-05-02 12:46:31 +02:00
Yaron Kaikov	2cf7cc1ea5	scylla_setup: Remove jmx and tools packages from being verified Following `b8634fb244` machine image started to fail with the following error: ``` 10:44:59 ␛[0;32m googlecompute.gce: scylla-jmx package is not installed.␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: Traceback (most recent call last):␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: File "/home/ubuntu/scylla_install_image", line 135, in <module>␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: run('/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup', shell=True, check=True)␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: File "/usr/lib/python3.10/subprocess.py", line 526, in run␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: raise CalledProcessError(retcode, process.args,␛[0m 10:44:59 ␛[1;31m==> googlecompute.gce: subprocess.CalledProcessError: Command '/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup' returned non-zero exit status 1.␛[0m ``` It seems we no longer need to verify that jmx and tools-java packages are installed. Closes scylladb/scylladb#18494	2024-05-02 13:30:50 +03:00
Pavel Emelyanov	b8f9eeb82b	sstable_directory: Remove _sstable_dir member It's no longer in use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:12:59 +03:00
Pavel Emelyanov	608762adda	sstable_directory: Create sstable path with make_path() when logging The sstable_directory::sstable_filename() should generate a name of an sstable for log messages. It's not accurate, because it silently assumes that the filename is on local storage, which might not be the case. Fixing it is large chage, so for now replace _sstable_dir with explicit call to make_path(). The change is idempotent, as _sstable_dir is initialized with the result of make_path() call in constructor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:12:59 +03:00
Pavel Emelyanov	07c1df575e	sstable_directory: Use make_path to construct filesystem lister The _sstable_dir is used currently, but it's initialized with make_path() result anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:12:59 +03:00
Pavel Emelyanov	ef98777b27	sstable_directory: Move some logging around At the beginning of .process() method there's a log message which path and which storage is being processed. That's not really nice, because, e.g. filesystem lister may skip processing quarantine directory. Also, the registry lister doesn't list entries by their _sstable_dir, but rather by its _location (spoiler: dir = location / state). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 13:08:28 +03:00
Ferenc Szili	90634b419c	sstable: added cluster feature for dead rows and range tombstones Previously, writing into system.large_partitions was done by calling record_large_partition(). In order to write different data based on the cluster feature flag, another level of indirection was added by calling _record_large_partitions which is initialized to a lambda which calls internal_record_large_partitions(). This function does not record the values of the two new columns (dead_rows and range_tombstones). After the cluster feature flag becomes true, _record_large_partitions is set to a lambda which calls internal_record_large_partitions_all_data() which record the values of the two new columns.	2024-05-02 11:49:46 +02:00
Ferenc Szili	b06af5b2b9	sstable: write dead_rows count to system.large_partitions	2024-05-02 11:49:10 +02:00
Ferenc Szili	63e724c974	sstable: added counter for dead rows	2024-05-02 11:49:10 +02:00
Nadav Har'El	5558143014	test/alternator: test addressing LSI using REST API The name of the Scylla table backing an Alternator LSI looks like basename:!lsiname. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request may decide to "URL encode" it - convert it to %21. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding, which leads to the REST API request failing to address the LSI table. This patch introduces a test for this bug, which fails without the Seastar issue being fixed, and passes afterwards (i.e., after the previous patch that starts to use the new, fixed, Seastar API). The test creates an LSI, uses the REST API to find its name and then tries to call some REST API ("compaction_strategy") on this table name, after deliberately URL-encoding it. Refs #5883. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-02 12:33:54 +03:00
Nadav Har'El	1aacfdf460	REST API: stop using deprecated, buggy, path parameter The API req->param["name"] to access parameters in the path part of the URL was buggy - it forgot to do URL decoding and the result of our use of it in Scylla was bugs like #5883 - where special characters in certain REST API requests got botched up (encoded by the client, then not decoded by the server). The solution is to replace all uses of req->param["name"] by the new req->get_path_param("name"), which does the decoding correctly. Unfortunately we needed to change 104 (!) callers in this patch, but the transformation is mostly mechanical and there is no functional changes in this patch. Another set of changes was to bring req, not req->param, to a few functions that want to get the path param. This patch avoids the numerous deprecation warnings we had before, and more importantly, it fixes #5883. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-02 12:33:46 +03:00
Jan Ciolek	59b7920b0b	view_update_generator: add get_storage_proxy() During view generation we would like to be able to access information about the current state of view update backlogs, but this information is kept inside storage_proxy. A reference to storage_proxy is kept inside view_update_generator, so the easiest way to get access to it from the view update code is by adding a public getter there. There's already a similar getter for replica::database: get_db(), so it's in line with the rest of the code. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-02 10:59:55 +02:00
Jan Ciolek	4c5cfc7683	storage_proxy: make view backlog getters public Storage proxy maintains information about both local and remote view update backlogs. This information might also be useful outside of storage_proxy, so let's expose the functions that allow to acces backlog information. There aren't any implementation quirks that would make it unsafe to make the functions public, the worst that can happen is that someone causes a lot of atomic operations by repeatedly calling get_view_update_backlog(). Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-02 10:59:55 +02:00
Pavel Emelyanov	67736b5cd3	Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" This reverts commit `9c2a836607`.	2024-05-02 08:16:14 +03:00
Pavel Emelyanov	d47053266b	view: Abort pending view updates when draining When view builder is drained (it now happens very early, but next patch moves this into regular drain) it waits for all on-going view build steps to complete. This includes waiting for any outstanding proxy view writes to complete as well. View writes in proxy have very high timeout of 5 minutes but they are cancellable. However, canecelling of such writes happens in proxy's drain_on_shutdown() call which, in turn, happens pretty late on shutdown. Effectively, by the time it happens all view writes mush have completed already, so stop-time cancelling doesn't really work nowadays. Next patch makes view builder drain happen a bit later during shutdown, namely -- _after_ shutting down messaging service. When it happen that late, non-working view writes cancellation becomes critical, as view builder drain hangs for aforementioned 5 minutes. This patch explicitly cancels all view writes when view builder stops. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 08:16:12 +03:00
Kefu Chai	f183f5aa80	Update seastar submodule * seastar 2b43417d...b73e5e7d (11): > treewide: inherit from formatter<string_view> not formatter<std::string_view> > CMakeLists.txt: Apply CXX deprecated flags conditionally > tls: add assignment operator for gnutls_datum > tls: s/get0()/get()/ > io_queue: do not reference moved variable > TLS: use helper function in get_distinguished_name & get_alt_name_information > TLS: Add support for TLS1.3 session tickets > iotune: ignore shards with id above max_iodepth > core/future: remove a template parameter from set_callback() > util: with_file_input_stream: always close file > core/sleep: Use more raii-sh aproach to maintain sleeper Fixes #5181 Closes scylladb/scylladb#18491	2024-05-02 07:35:42 +03:00
Takuya ASADA	b8634fb244	dist: stop installing scylla-tools, scylla-jmx by default Since we added native nodetool, we no longer need to install scylla-tools and scylla-jmx, drop them from scylla metapackage and make it optional package. Closes #18472 Closes scylladb/scylladb#18487	2024-05-01 22:15:40 +03:00
Kefu Chai	af5674211d	redis/server.hh: suppress -Wimplicit-fallthrough from protocol_parser.hh when compiling the tree with clang-18 and ragel 6.10, the compiler warns like: ``` /usr/local/bin/cmake -E __run_co_compile --tidy="clang-tidy-18;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/home/runner/work/scylladb/scylladb/redis/controller.cc -- /usr/bin/clang++-18 -DBOOST_NO_CXX98_FUNCTION_BASE -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/home/runner/work/scylladb/scylladb -I/home/runner/work/scylladb/scylladb/build/gen -I/home/runner/work/scylladb/scylladb/seastar/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/src -isystem /home/runner/work/scylladb/scylladb/cooking/include -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/runner/work/scylladb/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT redis/CMakeFiles/redis.dir/controller.cc.o -MF redis/CMakeFiles/redis.dir/controller.cc.o.d -o redis/CMakeFiles/redis.dir/controller.cc.o -c /home/runner/work/scylladb/scylladb/redis/controller.cc error: too many errors emitted, stopping now [clang-diagnostic-error] Error: /home/runner/work/scylladb/scylladb/build/gen/redis/protocol_parser.hh:110:1: error: unannotated fall-through between switch labels [clang-diagnostic-implicit-fallthrough] 110 \| case 1: \| ^ /home/runner/work/scylladb/scylladb/build/gen/redis/protocol_parser.hh:110:1: note: insert 'FMT_FALLTHROUGH;' to silence this warning 110 \| case 1: \| ^ \| FMT_FALLTHROUGH; ``` since we have `-Werror`, the warnings like this are considered as error, hence the build fails. in order to address this failure, let's silence this warning when including this generated header file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18447	2024-05-01 18:47:24 +03:00
Kefu Chai	08d1362f80	utils/chunked_vector: fix some typos in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18486	2024-05-01 16:38:43 +03:00
Nadav Har'El	4e78e2d506	test/cql-pytest, cdc: add test for what happens when log name is taken In our CDC implementation, the CDC log table for table "xyz" is always called "xyz_scylla_cdc_log". If this table name is taken, and the user tries to create a table "xyz" with CDC enabled - or enable CDC on the table "xyz", the creation/enabling should fail gracefully, with a clear error message. This test verifies this. The new test passes - the code is already correct. I just wanted to verify that it is (and to prevent future regressions). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18485	2024-05-01 14:46:19 +03:00
Pavel Emelyanov	5d992a4f01	proxy: Remove declaration of nonexisting view_update_write_response_handler class Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18417	2024-05-01 10:15:41 +03:00
Botond Dénes	65a385f5d0	Merge 'Relax the way view builder code checks if a table exists' from Pavel Emelyanov There are two places that workaround db.column_family_exists() call with some fancy exceptions-catching lambda. This PR makes things simpler. Closes scylladb/scylladb#18441 * github.com:scylladb/scylladb: view: Open-code one line lambda checking if table exists view: Use non-throwoing check if a table exists	2024-05-01 10:14:58 +03:00
Kefu Chai	94ac0799d9	build: cmake: link scylla_tracing against scylla-main because tracing/trace_keyspace_helper.cc references symbols defined by table_helper, which is in turn provided by scylla-main, we should link tracing_tracing against scylla-main. otherwise we could have following link failure: ``` ./build/./tracing/trace_keyspace_helper.cc:214: error: undefined reference to 'table_helper::setup_keyspace(cql3::query_processor&, service::migration_manager&, std::basic_string_view<char, std::char_traits<char> >, seastar::basic_sstring<char, unsigned int, 15u, true>, service::query_state&, std::vector<table_helper, std::allocator<table_helper> >)' ./build/./tracing/trace_keyspace_helper.cc:396: error: undefined reference to 'table_helper::cache_table_info(cql3::query_processor&, service::migration_manager&, service::query_state&)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' ./table_helper.hh:92: error: undefined reference to 'table_helper::insert(cql3::query_processor&, service::migration_manager&, service::query_state&, seastar::noncopyable_function<cql3::query_options ()>)' clang++-18: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18455	2024-05-01 10:08:11 +03:00
Kefu Chai	f0d12df7fc	reloc: create $BUILDDIR for getting its path when building with CMake, there is a use case where the $BUILDIR is not created yet, when `reloc/build_rpm.sh` is launched. in order to enable us to run this script without creating $BUILDIR first, let's create this directory first. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18464	2024-05-01 09:52:17 +03:00
Kefu Chai	8168f02550	raft_group_registry: do not use moved variable clang-tidy warns like: ``` [628/713] Building CXX object service/CMakeFiles/service.dir/raft/raft_group_registry.cc.o Warning: /home/runner/work/scylladb/scylladb/service/raft/raft_group_registry.cc:543:66: warning: 'id' used after it was moved [bugprone-use-after-move] 543 \| auto& rate_limit = _rate_limits.try_get_recent_entry(id, std::chrono::minutes(5)); \| ^ /home/runner/work/scylladb/scylladb/service/raft/raft_group_registry.cc:539:19: note: move occurred here 539 \| auto dst_id = raft::server_id{std::move(id)}; \| ^ ``` this is a false alarm. as the type of `id` is actually `utils::UUID` which is a struct enclosing two `int64_t` variables. and we don't define a move constructor for `utils::UUID`. so the value of of `id` is intact after being moved away. but it is still confusing at the first glance, as we are indeed referencing a moved-away variable. so in order to reduce the confusion and to silence the warning, let's just do not `std::move(id)`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18449	2024-05-01 09:45:12 +03:00
Kefu Chai	bd0d246b57	tools/scylla-nodetool: implement the resetlocalschema command Fixes #18468 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18470	2024-05-01 08:49:11 +03:00
Raphael S. Carvalho	b980634ff2	test: Verify tablet cleanup is properly retried on failure Doesn't test only coordinator ability to retry on failure, but also that replica will be able to properly continue cleanup of a storage group from where it left off (when failure happened), not leave any sstables behind. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18426	2024-04-30 19:27:17 +02:00
Raphael S. Carvalho	62b1cfa89c	topology_coordinator: Fix synchronization of tablet split with other concurrent ops Finalization of tablet split was only synchronizing with migrations, but that's not enough as we want to make sure that all processes like repair completes first as they might hold erm and therefore will be working with a "stale" version of token metadata. For synchronization to work properly, handling of tablet split finalize will now take over the state machine, when possible, and execute a global token metadata barrier to guarantee that update in topology by split won't cause problems. Repair for example could be writing a sstable with stale metadata, and therefore, could generate a sstable that spans multiple tablets. We don't want that to happen, therefore we need the barrier. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18380	2024-04-30 19:23:28 +02:00
Botond Dénes	525553aa41	SCYLLA-VERSION-GEN: warn against using - or _ in custom version names Doing so is a pitfall that will make one waste a lot of time rebuilding the packages, just because at the end it turns out that the version has illegal characters in it. The author of this patch has certainly fallen into this pitfall a lot of times. Closes scylladb/scylladb#18429	2024-04-30 18:14:51 +03:00
Avi Kivity	ea15ddc7dc	Merge 'Fix population of non-normal sstables from registry' from Pavel Emelyanov On boot sstables are populated from normal location as well as from quarantine and staging. It turned out that sstables listed in registry (S3-backed ones) are not populated from non-normal states. Closes scylladb/scylladb#18439 * github.com:scylladb/scylladb: test: Add test for how quarantined sstables registry entries are loaded sstable_directory: Use sstable location to initialize registry lister	2024-04-30 18:10:11 +03:00
Avi Kivity	329b135b5e	Merge 'chunked_vector: fix use after free in emplace back' from Benny Halevy Currently, push_back or emplace_back reallocate the last chunk before constructing the new element. If the arg passed to push_back/emplace_back is a reference to an existing element in the vector, reallocating the last chunk will invalidate the arg reference before it is used. This patch changes the order when reallocating the last chunk in reserve_for_emplace_back: First, a new chunk_ptr is allocated. Then, the back_element is emplaced in the newly allocated array. And only then, existing elements in the current last chunk are migrated to the new chunk. Eventually, the new chunk replaces the existing chunk. If no reservation is requried, the back element is emplaced "in place" in the current last chunk. Fixes scylladb/scylladb#18072 Closes scylladb/scylladb#18073 * github.com:scylladb/scylladb: test: chunked_managed_vector_test: add test_push_back_using_existing_element utils: chunked_vector: reserve_for_emplace_back: emplace before migrating existing elements utils: chunked_vector: push_back: call emplace_back utils: chunked_vector: define min_chunk_capacity utils: chunked*vector: use std::clamp	2024-04-30 18:09:04 +03:00
David Garcia	f62197ee1e	docs: enable concurrent downloads Downloads chunks of 10 CSV concurrently to speed up doc builds. Closes scylladb/scylladb#18469	2024-04-30 16:13:40 +03:00
Raphael S. Carvalho	d7a01598ce	tools: Make sstable shard-of efficient by loading minimum to compute owners Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18440	2024-04-30 16:10:58 +03:00
Gleb Natapov	f2b0a5e9e1	storage_service: do not take API lock for removenode operation if topology coordinator is enabled Topology coordinator serialize operations internally, so there is no need to have an external lock. Fixes: scylladb/scylladb#17681	2024-04-30 15:13:50 +03:00
Gleb Natapov	0a7101923c	test: return file mark from wait_for that points after the found string Returning file mark allows to start searching from the point where the previous string was found.	2024-04-30 15:06:32 +03:00
Kefu Chai	3a1ceb96d7	utils: UUID_gen: include <atomic> in UUID_gen.cc, we are using `std::atomic<int64_t>` in `make_thread_local_node()`, but this template is not defined by any of the included headers. but we should include used headers to be self-contained. when compiling on ubuntu:jammy with libstdc++-13, we have following error: ``` /usr/local/bin/cmake -E __run_co_compile --tidy="clang-tidy-18;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/home/runner/work/scylladb/scylladb/utils/UUID_gen.cc -- /usr/bin/clang++-18 -DBOOST_ALL_NO_LIB -DBOOST_NO_CXX98_FUNCTION_BASE -DBOOST_REGEX_DYN_LINK -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/home/runner/work/scylladb/scylladb -I/home/runner/work/scylladb/scylladb/seastar/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/include -I/home/runner/work/scylladb/scylladb/build/seastar/gen/src -isystem /home/runner/work/scylladb/scylladb/cooking/include -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overl Error: /home/runner/work/scylladb/scylladb/utils/UUID_gen.cc:29:33: error: implicit instantiation of undefined template 'std::atomic<long>' [clang-diagnostic-error] 29 \| static std::atomic<int64_t> thread_id_counter; \| ^ /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/shared_ptr_atomic.h:361:11: note: template is declared here 361 \| class atomic; \| ^ ``` so, in this change, we include `<atomic>` to address this build failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18387	2024-04-30 09:07:22 +03:00
Kefu Chai	6a73c911e3	tools: lua_sstable_consumer.cc: be compatible with Lua 5.3's lua_resume() in Lua 5.3, lua_resume() only accepts three parameters, while in Lua 5.4, this function accepts four parameters. so in order to be compatible with Lua 5.3, we should not pass the 4th parameter to this function. a macro is defined to conditionally pass this parameter based on the Lua's version. see https://www.lua.org/manual/5.3/manual.html#lua_resume Refs `5b5b8b3264` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18450	2024-04-30 09:06:25 +03:00
Botond Dénes	0ace90ad04	test: add test for cleaning up cached querier on tablet migration Check that a cached querier, which exists prior to a migration, will be cleaned up afterwards. This reproduces #18110. The test fails before the fix for the above and passes afterwards.	2024-04-30 01:47:16 -04:00
Botond Dénes	64c817462e	querier: allow injecting cache entry ttl by error injector To allow making tests more robust by setting TTL to a very large value, whent the test relies on entries being present for a given time.	2024-04-30 01:47:16 -04:00
Botond Dénes	03995d9397	replica/table: cleanup_tablet(): clear inactive reads for the tablet To avoid any resource surviving the cleanup, via some inactive read pinning it. This can cause data resurrection if the tablet is later migrated back and the pinned data source is added back to the tablet.	2024-04-30 01:47:16 -04:00
Botond Dénes	a062e3f650	replica/database: introduce clear_inactive_reads_for_tablet() To be used on the tablet cleanup path, to clear any inactive read which might be related to the cleaned-up tablet.	2024-04-30 01:44:03 -04:00
Botond Dénes	338af5055c	replica/database: introduce foreach_reader_concurrency_semaphore Currently we have a single method -- detach_column_family() -- which does something with each semaphore. Soon there will be another one. Introduce a method to do something with all semaphores, to make this smoother. Enterprise has a different set of semaphores, and this will reduce friction.	2024-04-30 01:43:56 -04:00
Botond Dénes	3c813fbb99	reader_concurrency_semaphore: add range param to evict_inactive_reads_for_table() When the new optional parameter has a value, evict only inactive reads, whose ranges overlap with the provided range. The range for the inactive read is provided in `register_inactive_read()`. If the inactive read has no range, ovarlap is assumed and the read is evicted. This will be used to evict all inactive reads that could potentially use a cleaned-up tablet.	2024-04-30 01:31:08 -04:00
Botond Dénes	9e7a957ffb	reader_concurrency_semaphore: allow storing a range with the inactive reader This allows specifying the range the inactive read is reading from. To be used in the next patch to selectively evict inactive reads whose range overlaps with a certain (tablet) range.	2024-04-30 01:31:08 -04:00
Botond Dénes	67684308d1	reader_concurrency_semaphore: avoid detach() in inactive_read_handle::abandon() inactive_read_handle::abandon() evicts and destroyes the inactive-read, so it is not left behind. Currently, while doing so, it triggers the inactive_read's own version of abandon(): detach(). The two has bad interaction when the inactive_read_handle stores the last permit instance, causing (so far benign) use-after-free. Prevent triggering detach() to avoid this bad interaction altogether.	2024-04-30 01:31:08 -04:00
Piotr Dulikowski	35f456c483	Merge 'Extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop' from Michał Jadwiszczak In order to correctly restore schema from `DESC SCHEMA WITH INTERNALS`, we need a way to drop a column with a timestamp in the past. Example: - table t(a int pk, b int) - insert some data1 - drop column b - add column b int - insert some data2 If the sstables weren't compacted, after restoring the schema from description: - we will loss column b in data2 if we simply do `ALTER TABLE t DROP b` and `ALTER TABLE t ADD b int` - we will resurrect column b in data1 if we skip dropping and re-adding the column Test for this: https://github.com/scylladb/scylla-dtest/pull/4122 Fixes #16482 Closes scylladb/scylladb#18115 * github.com:scylladb/scylladb: docs/cql: update ALTER TABLE docs test/cqlpytest: add test for prepared `ALTER TABLE ... DROP ... USING TIMESTAMP ?` test/cql-pytest: remove `xfail` from alter table with timestamp tests cql3/statements: extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop cql3/statements: pass `query_options` to `prepare_schema_mutations()` cql3/statements: add bound terms to alter table statement cql3/statements: split alter_table_statement into raw and prepared schema: allow to specify timestamp of dropped column	2024-04-29 14:05:05 +02:00
Piotr Dulikowski	dec652de9e	test: topology: test that upgrade succeeds after recent removal Adds a regression test for scylladb/scylladb#18198 - start a two node cluster in legacy topology mode, use nodetool removenode on one of the nodes, upgrade the remaining 1-node cluster and observe that it succeeds.	2024-04-29 13:33:40 +02:00
Piotr Dulikowski	cb4a4f2caf	topology_coordinator: compute cluster size correctly during upgrade During upgrade to raft topology, information about service levels is copied from the legacy tables in system_distributed to the raft-managed tables of group 0. system_distributed has RF=3, so if the cluster has only one or two nodes we should use lower consistency level than ALL - and the current procedure does exactly that, it selects QUORUM in case of two nodes and ONE in case of only one node. The cluster size is determined based on the call to _gossiper.num_endpoints(). Despite its name, gossiper::num_endpoints() does not necessarily return the number of nodes in the cluster but rather the number of endpoint states in gossiper (this behavior is documented in a comment near the declaration of this function). In some cases, e.g. after gossiper-based nodetool remove, the state might be kept for some time after removal (3 days in this case). The consequence of this is that gossiper::num_endpoints() might return more than the current number of nodes during upgrade, and that in turn might cause migration of data from one table to another to fail - causing the upgrade procedure to get stuck if there is only 1 or two nodes in the cluster. In order to fix this, use token_metadata::get_all_endpoints() as a measure of the cluster size. Fixes: scylladb/scylladb#18198	2024-04-29 13:26:29 +02:00
Takuya ASADA	af0c0ee8af	configure.py: revert changing builddir as absolute path On `be3776ec2a`, we changed outdir to absolute path. This causes "unknown target" error when we build Scylla using the relative path something like "ninja build/dev/scylla", since the target name become absolte path. Revert the change to able to build with the relative path. Also, change optimized_clang.sh to use relative path for --builddir, since we reference "../../$builddir/SCYLLA-*-FILE" when we build submodule, it won't work with absolute path. Fixes #18321 Closes scylladb/scylladb#18338	2024-04-29 09:35:21 +03:00
Kefu Chai	4433d2e10e	build: cmake: let iotune depends on config specific file before this change, in order to build `${iotune_path}`, we use the rule to build `app_iotune` but this target is built using the default build type, see https://cmake.org/cmake/help/latest/variable/CMAKE_DEFAULT_BUILD_TYPE.html#variable:CMAKE_DEFAULT_BUILD_TYPE so, if we want to build `${iotune_path}` for the configuration which is not listed as the first item in `CMAKE_CONFIGURATION_TYPES`, we would end up with copying an nonexistent file. to address this issue, we override the this behavior using the `$<OUTPUT_CONFIG:...>` generator-expression. so that we can depend on non-unique path. and the file-level dependency between ${iotune_path} and $<CONFIG>/iotune can be established. see also https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html#custom-commands Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18395	2024-04-29 09:06:39 +03:00
Kefu Chai	f03f69ad4f	partition_version: move the base class in move ctor before this change, `partition_version` uses a hand-crafted move constructor. but it suffers from the warning from clang-tidy, which believe there is a use-after-move issue, as the inner instance of it's parent class is constructed using `anchorless_list_base_hook(std::move(pv))`, and its other member variables are initialized like `_partition(std::move(pv._partition))` `std::move(pv)` does not do anything, but indicates `pv` maybe moved from. and what is moved away is but the part belong to its parent class. so this issue is benign. but, it's still annoying. as we need to tell the genuine issues reported by clang-tidy from the false alarms. so we have at least two options: - stop using clang-tidy - ignore this warning - silence this warning using LINT direction in a comment - use another way to implement the move constructor in this change, we just cast the moved instance to its base class and move it instead, this should applease clang-tidy. Fixes #18354 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18359	2024-04-28 18:34:45 +02:00
Dawid Medrek	bf802e99eb	docs: Update Hinted Handoff documentation We briefly explain the process of migration of Hinted Handoff to host IDs, the rationale for it, consequences, and possible side effects.	2024-04-28 01:22:59 +02:00
Dawid Medrek	46ab22f805	db/hints: Add endpoint_downtime_not_bigger_than() We add an auxiliary function checking if a node hasn't been down for too long. Although `gms::gossiper` provides already exposes a function responsible for that, it requires that its argument be an IP address. That's the reason we add a new function.	2024-04-28 01:22:59 +02:00
Dawid Medrek	0ef8d67d32	db/hints: Migrate hinted handoff when cluster feature is enabled These changes migrate hinted handoff to using host ID as soon as the corresponding cluster feature is enabled. When a node starts, it defaults to creating directories naming them after IP addresses. When the whole cluster has upgraded to a version of Scylla that can handle directories representing host IDs, we perform a migration of the IP folders, i.e. we try to rename them to host IDs. Invalid directories, i.e. those that represent neither an IP address, nor a host ID, are removed. During the migration, hinted handoff is disabled. It is necessary because we have to modify the disk's contents, so new hints cannot be saved until the migration finishes.	2024-04-28 01:22:57 +02:00
Dawid Medrek	58784cd8db	db/hints: Handle arbitrary directories in resource manager Before these changes, resource manager only handled the case when directories it browsed represented valid host IDs. However, since before migrating hinted handoff to using host IDs we still name directories after IP addresses, that would lead to exceptins that shouldn't happen. We make resource manager handle directories of arbitrary names correctly.	2024-04-27 22:31:07 +02:00
Dawid Medrek	ee84e810ca	db/hints: Start using hint_directory_manager We start keeping track of mappings IP - host ID. The mappings are between endpoint managers (identified by host IDs) and the hint directories managed by them (represented by IP addresses). This is a prelude to handling IP directories by the hint shard manager. The structure should only be used by the hint manager before it's migrated to using host IDs. The reason for that is that we rely on the information obtained from the structure, but it might not make sense later on. When we start creating directories named after host IDs and there are no longer directories representing IP addresses, there is no relation between host IDs and IPs -- just because the structure is supposed to keep track between endpoint managers and hint directories that represent IP addresses. If they represent host IDs, the connection between the two is lost. Still using the data structure could lead to bugs, e.g. if we tried to associate a given endpoint manager's host ID with its corresponding IP address from locator::token_metadata, it could happen that two different host IDs would be bound to the same IP address by the data structure: node A has IP I1, node A changes its IP to I2, node B changes its IP to I1. Though nodes A and B have different host IDs (because they are unique), the code would try to save hints towards node B in node A's hint directory, which should NOT happen. Relying on the data structure is thus only safe before migrating hinted handoff to using host IDs. It may happen that we save a hint in the hint directory of the wrong node indeed, but since migration to using host IDs is a process that only happens once, it's a price we are ready to pay. It's only imperative to prevent it from happening in normal circumstances.	2024-04-27 22:31:07 +02:00
Dawid Medrek	aa4b06a895	db/hints: Enforce providing IP in get_ep_manager() We drop the default argument in the function's signature. Also, we adjust the code of change_host_filter() to be able to perform calls to get_ep_manager().	2024-04-27 22:31:07 +02:00
Dawid Medrek	d0f58736c8	db/hints: Introduce hint_directory_manager This commit introduces a new class responsible for keeping track of mappings IP-host ID. Before hinted handoff is migrated to using host IDs, hint directories still have to represent IP addresses. However, since we identify endpoint managers by host IDs already, we need to be able to associate them with the directories they manage. This class serves this purpose.	2024-04-27 22:31:07 +02:00
Dawid Medrek	f9af01852d	db/hints/resource_manager: Update function description The current description of the function `space_watchdog::scan_one_ep_dir` is not up-to-date with the function's signature. This commit updates it.	2024-04-27 22:31:07 +02:00
Dawid Medrek	59d49c5219	db/hints: Coroutinize space_watchdog::scan_one_ep_dir()	2024-04-27 22:31:07 +02:00
Dawid Medrek	8fd9c80387	db/hints: Expose update lock of space watchdog We expose the update lock of space watchdog to be able to prevent it from scanning hint directories. It will be necessary in an upcoming commit when we will be renaming hint directories and possibly removing some of them. Race conditions are unacceptable, so resource manager cannot be able to access the directory during that time.	2024-04-27 22:31:07 +02:00
Dawid Medrek	934e4bb45e	db/hints: Add function for migrating hint directories to host ID We add a function that will be used while migrating hinted handoff to using host IDs. It iterates over existing hint directories and tries to rename them to the corresponding host IDs. In case of a failure, we remove it so that at the end of its execution the only remaining directories are those that represent host IDs.	2024-04-27 22:31:04 +02:00
Dawid Medrek	e36f853f9b	db/hints: Take both IP and host ID when storing hints The store_hint() method starts taking both an IP and a host ID as its arguments. The rationale for the change is depending on the stage of the cluster (before an upgrade to the host-ID-based hinted handdof and after it), we might need to create a directory representing either an IP address, or a host ID. Because locator::topology can change in the before obtaining the host ID we pass and when the function is being executed, we need to pass both parameters explicitly to ensure the consistency between them.	2024-04-27 20:35:58 +02:00
Dawid Medrek	063d4d5e91	db/hints: Prepare initializing endpoint managers for migrating from IP to host ID We extract the initialization of endpoint managers from the start method of the hint manager to a separate function and make it handle directories that represent either IP addresses, or host IDs; other directories are ignored. It's necessary because before Scylla is upgraded to a version that uses host-ID-based hinted handoff, we need to continue only managing IP directories. When Scylla has been upgraded, we will need to handle host ID directories. It may also happen that after an upgrade (but not before it), Scylla fails while renaming the directories, so we end up with some of them representing IP address, and some representing host IDs. After these changes, the code handles that scenario as well.	2024-04-27 20:35:53 +02:00
Dawid Medrek	cfd03fe273	db/hints: Migrate to locator::host_id We change the type of node identifiers used within the module and fix compilation. Directories storing hints to specific nodes are now represented by host IDs instead of IPs.	2024-04-26 22:44:04 +02:00
Dawid Medrek	1af7fa74e8	db/hints: Remove noexcept in do_send_one_mutation() While the function is marked as noexcept, the returned future can in fact store an exception. We remove the specifier to reflect the actual behavior of the function.	2024-04-26 22:44:04 +02:00
Dawid Medrek	54ae9797b9	service: Add locator::host_id to on_leave_cluster We extend the function endpoint_lifecycle_subscriber::on_leave_cluster by another argument -- locator::host_id. It's more convenient to have a consistent pair of IP and host ID.	2024-04-26 22:44:03 +02:00
Dawid Medrek	a36387d942	service: Fix indentation	2024-04-26 22:44:03 +02:00
Dawid Medrek	c585444c60	db/hints: Fix indentation	2024-04-26 22:44:03 +02:00
Pavel Emelyanov	7f2742893e	view: Open-code one line lambda checking if table exists Continuation of the previous patch. The lambda in question used to be a heavyweight(y) code, but now it's one-liner. And it's only called once, so no more point in keeping it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 20:19:38 +03:00
Pavel Emelyanov	a3e76f9c93	view: Use non-throwoing check if a table exists Two places in view code check if a table exists by finding its schema ID and catching no_such_column_family exception. That's a bit heavyweight, database has column_family_exists() method for such cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 20:17:35 +03:00
Pavel Emelyanov	5e23493d25	test: Add test for how quarantined sstables registry entries are loaded Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 16:54:43 +03:00
Pavel Emelyanov	ba512c52a5	sstable_directory: Use sstable location to initialize registry lister When populating sstables on boot a bunch of sstable_directory objects is created. For each sstable there come three -- one for normal, quarantine and staging state. Each is initialized with sstable location (which is now a datadir/ks_name/cf_name-and-uuid) and the desired state (a enum class). When created, the directory object wires up component lister, depending on which storage options are provided. For local sstables a legacy filesystem lister is created and it's initialized with a path where to search files for -- location + / + string(state). But for s3 sstables, that keep their entries in registry, the lister is errorneously initialized with the same location + / + string(state) value. The mistake is that sstables in registry keep location and state in different columns, so for any state lister should query registry with the same location value (then it filters entries by state on its own). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-26 16:36:47 +03:00
Kamil Braun	d8313dda43	Merge 'db: config: move consistent-topology-changes out of experimental and make it the default for new clusters' from Patryk Jędrzejczak We move consistent cluster management out of experimental and make it the default for new clusters in 6.0. In code, we make the `consistent-topology-changes` flag unused and assumed to be true. In 6.0, the topology upgrade procedure will be manual and voluntary, so some clusters will still be using the gossip-based topology even though they support the raft-based topology. Therefore, we need to continue testing the gossip-based topology. This is possible by using the `force-gossip-topology-changes` flag introduced in scylladb/scylladb#18284. Ref scylladb/scylladb#17802 Closes scylladb/scylladb#18285 * github.com:scylladb/scylladb: docs: raft.rst: update after removing consistent-topology-changes treewide: fix indentation after the previous patch db: config: make consistent-topology-changes unused test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls test: test_read_required_hosts: run with force-gossip-topology-changes storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes storage_service: join_token_ring: fix finish_setup_after_join calls	2024-04-26 14:45:29 +02:00
Botond Dénes	b96f28356a	Merge 'api/storage_service: convert runtime_error from repair to http error ' from Kefu Chai in `set_repair()`, despite that the repair is performed asynchronously, we check the options specified by client immediately, and throw `std::runtime_error`, if any of them is not supported. before this change, these unhandled exceptions are translated to HTTP 500 error but the underlying HTTP router. but this is misleading, as these errors are caused by client, not server. in this change, we handle the `runtime_error`, and translate them into `httpd::bad_param_exception`, so that the client can have HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error), and with informative error message. for instance, if we apply repair with "small_table_optimization" enabled on a keyspace with tablets enabled. we should have an HTTP error 400 with "The small_table_optimization option is not supported for tablet repair" as the body of the error. this would much more helpful. Closes scylladb/scylladb#18389 * github.com:scylladb/scylladb: api/storage_service: convert runtime_error from repair to http error repair: change runtime_error to invalid_argument in do_repair_start() api/storage_service: coroutinize set_repair()	2024-04-26 13:27:51 +03:00
Patryk Jędrzejczak	3a100cd16c	test: test_raft_recovery_stuck: ensure raft upgrade procedure failed We have log browsing in test.py now, so we can fix this TODO easily. Closes scylladb/scylladb#18425	2024-04-26 10:16:49 +02:00
Asias He	62a9ecff51	repair: Cleanup repair history status entry for tablet The entry in the repair history map that is used to track repair status internally for each repair job should be removed after the repair job is done. We do the same for vnode repairs. This patch adds the missing automatic history cleanup code which is missed in the initial tablet repair support in commit `54239514af`, which does not support repair history update back then. Refs #17046 Closes scylladb/scylladb#18434	2024-04-26 10:56:45 +03:00
Botond Dénes	044fd7a3ec	Merge 'Move some view updating methods from table to view_update_generator' from Pavel Emelyanov The populate_views() and generate_and_propagate_view_updates() both naturally belong to view_update_generator -- they don't need anything special from table itself, but rather depend on some internals of the v.u.generator itself. Moving them there lets removing the view concurrency semaphore from keyspace and table, thus reducing the cross-components dependencies. Closes scylladb/scylladb#18421 * github.com:scylladb/scylladb: replica: Do not carry view concurrency semaphore pointer around view: Get concurrency semaphore via database, not table view_update_generator: Mark mutate_MV() private view: Move view_update_generator methods' code view: Move table::generate_and_propagate_view_updates into view code view: Move table::populate_views() into view_update_generator class	2024-04-26 10:55:38 +03:00
Botond Dénes	d566eec89a	Merge 'treewide: remove {dclocal_,}read_repair_chance options' from Kefu Chai dclocal_read_repair_chance and read_repair_chance have been removed in Cassandra 3.11 and 4.x, see https://issues.apache.org/jira/browse/CASSANDRA-13910. if we expose these properties via DDL, Cassandra would fail to consume the CQL statement creating the table when performing migration from Scylla to Cassandra 4.x, as the latter does not understand these properties anymore. currently the default values of `dc_local_read_repair_chance` and `read_repair_chance` are both "0". so they are practically disabled, unless user deliberately set them to a value greater than 0. also, as a side effect, Cassandra 4.x has better support of Python3. the cqlsh shipped along with Cassandra 3.11.16 only supports python2.7, see https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py it errors out if the system only provides python3 with the error of ``` No appropriate python interpreter found. ``` but modern linux systems do not provide python2 anymore. so, in this change, we deprecate these two options. Fixes #3502 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18087 * github.com:scylladb/scylladb: docs: drop documents related to {,dclocal_}read_repair_chance treewide: remove {dclocal_,}read_repair_chance options	2024-04-26 10:48:47 +03:00
Michał Chojnowski	c1146314a1	docs: clarify that `DELETE` can be used with `USING TIMEOUT` The current text seems to suggest that `USING TIMEOUT` doesn't work with `DELETE` and `BATCH`. But that's wrong. Closes scylladb/scylladb#18424	2024-04-26 10:48:17 +03:00
Pavel Emelyanov	4ac30e5337	view-builder: Print correct exception in built ste exception handler Inside .handle_exception() continuation std::current_exception() doesn't work, there's std::exception ex argument to handler's lambda instead fixes #18423 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18349	2024-04-26 09:58:45 +03:00
Kefu Chai	0bbaded4ce	api/storage_service: convert runtime_error from repair to http error in `set_repair()`, despite that the repair is performed asynchronously, we check the options specified by client immediately, and throw `std::runtime_error`, if any of them is not supported. before this change, these unhandled exceptions are translated to HTTP 500 error but the underlying HTTP router. but this is misleading, as these errors are caused by client, not server. and the error message is missing in the HTTP error message when performing the translation. in this change, we handle the `runtime_error`, and translate them into `httpd::bad_param_exception`, so that the client can have HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error), and with informative error message. for instance, if we apply repair with "small_table_optimization" enabled on a keyspace with tablets enabled. we should have an HTTP error 400 with "The small_table_optimization option is not supported for tablet repair" as the body of the error. this would much more helpful. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:25:15 +08:00
Kefu Chai	9de9f401a1	repair: change runtime_error to invalid_argument in do_repair_start() if an error is caused by the option provided by user, would be better to throw an `std::invalid_argument` instead of `std::runtime_error`, so that the caller can make a better decision when handling the thrown exceptions. so, in this change, we change the exceptions raise directly in `repair_service::do_repair_start()` from `std::runtime_error` to `std::invalid_argument`. please note, in the lambda named `host2ip`, since the hostname is not provided by user, so we are not changing the exception type in that lambda. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:24:45 +08:00
Kefu Chai	d737ba1ab2	api/storage_service: coroutinize set_repair() before this change, `set_repair()` uses a lambda for handling the client-side requests. and this works great. but the underlying `repair_start()` throws if any of the given options is not sane. and we don't handle any of these throw exceptions in `set_repair()`, from client's point of view, it would get an HTTP 500 error code, which implies an "Internal Server Error". but actually, we should blame the client for the error, not the server. so, to prepare the error handling, let's take the opportunity to coroutinize the lambda handling the request, so that we can handle the exception in a more elegant way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:24:03 +08:00
Michał Jadwiszczak	7f839f727e	docs/cql: update ALTER TABLE docs	2024-04-26 07:01:08 +02:00
Michał Jadwiszczak	7cbce78480	test/cqlpytest: add test for prepared `ALTER TABLE ... DROP ... USING TIMESTAMP ?`	2024-04-26 07:01:02 +02:00
Botond Dénes	7cbe5c78b4	install.sh: use the native nodetool directly * tools/java b810e8b00e...4ee15fd9ea (1): > install.sh: don't install nodetool into /usr/bin Add a bin/nodetool and install it to bin/ in install.sh. This script simply forwards to scylla nodetool and it is the replacement for the Java nodetool, which is dropped from the java-tools's install.sh, in the submodule update also included in this patch. With this change, we now hardwire the usage of the native nodetool, as the nodetool, with the intermediary nodetool wrapper script removed from the picture. Bash completion was copied from the java tools repository and it is now installed by the scylla package, together with nodetool. The Java nodetool is still available as as a fall-back, in case the native nodetool has problems, at the path of /opt/scylladb/share/cassandra/bin/nodetool. Testing I tested upgrades on a DEB and RPM distro: Ubuntu and Fedora. First I installed scylla-5.4, then I installed the packages for this PR. On Ubuntu, I had to use dpkg -i --auto-deconfigure, otherwise, dpkg would refuse to install the new packages because they break the old ones. No extra flags were required on Fedora. In both cases, /usr/bin/nodetool was changed from a thunk calling the Java nodetool (from 5.4) to the native launcher script from this PR. /opt/scylladb/share/cassandra/bin/nodetool remained in place and still works after the upgrade. I also verified that --nonroot installs also work. Nodetool works both when called with an absolute path, or when ~/scylladb/bin is added to $PATH. Fixes: #18226 Fixes: #17412 Closes scylladb/scylladb#18255 [avi: reset submodule to actual hash we ended up with]	2024-04-25 22:52:00 +03:00
Michał Jadwiszczak	27a4331dcd	test/cql-pytest: remove `xfail` from alter table with timestamp tests Previous patch introduced `ALTER TABLE ... DROP .. USING TIMESTAM ...` so those test should no longer fail. Refs #9929	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	80f0357436	cql3/statements: extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	7dc0d068c0	cql3/statements: pass `query_options` to `prepare_schema_mutations()` The object is needed to get timestamp from attributes (in a case when the statement was prepared with parameter marker).	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	998a65a4f6	cql3/statements: add bound terms to alter table statement Until now, alter table couldn't take any parameter marker, so the bound terms were always 0. Adding `USING TIMESTAMP` to `ALTER TABLE ... DROP` also adds possibility to prepare a alter table statement with a paramenter marker.	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	d268641c27	cql3/statements: split alter_table_statement into raw and prepared Currently alter table doesn't prepare any parameters so raw statement and prepared one could be the same class. Later commit will add attributes to the statement, which needs to be prepared, that's why I'm splitting.	2024-04-25 21:27:40 +02:00
Michał Jadwiszczak	1c5563ba44	schema: allow to specify timestamp of dropped column In order to drop a column with specified timestamp, we need to allow it in out schema class.	2024-04-25 21:27:40 +02:00
Avi Kivity	c2b8ca7d71	Merge 'cql3: statements: change default tombstone_gc mode for tablets' from Aleksandra Martyniuk Repair may miss some tablets that migrated across nodes. So if tombstones expire after some timeout, then we can have data resurrection. Set default tombstone_gc mode to "repair" for tables which use tablets (if repair is required). Fixes: #16627. Closes scylladb/scylladb#18013 * github.com:scylladb/scylladb: test: check default value of tombstone_gc test: topology: move some functions to util.py cql3: statements: change default tombstone_gc mode for tablets	2024-04-25 19:18:37 +03:00
Lakshmi Narayanan Sreethar	6af2659b57	sstables: reclaim_memory_from_components: do not update _recognised_components When reclaiming memory from bloom filters, do not remove them from _recognised_components, as that leads to the on-disk filter component being left back on disk when the SSTable is deleted. Fixes #18398 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18400	2024-04-25 19:15:59 +03:00
Raphael S. Carvalho	4a5fdc5814	table: Remove outdated FIXME about sstable spanning multiple tablets The FIXME was added back then because we thought the interface of compaction_group_for_sstable might have to be adjusted if a sstable were allowed to temporarily span multiple tablets until it's split, but we have gone a different path. If a sstable's key range incorrectly spans more than one tablet, that will be considered a bug and an exception is thrown. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18410	2024-04-25 17:21:11 +03:00
Marcin Maliszkiewicz	7085339f72	cql3: test: include get_mutations_internal log in test.py We have a concurrent modification conflict in tests and suspect duplicated requests but since we don't log successful requests we have no way to verify if that's the case. get_mutations_internal log will help to tell wchich nodes are trying to push auth or service levels mutations into raft. Refs scylladb/scylladb#18319 Closes scylladb/scylladb#18413	2024-04-25 17:17:53 +03:00
Botond Dénes	0234b4542a	Merge '[github] add PR template and action to verify PR tasks was completed' from Yaron Kaikov Today with the backport automation, the developer added the relevant backport label, but without any explanation of why Adding the PR template with a placeholder for the developer to add his decision about backport yes or no The placeholder is marked as a task, so once the explanation is added, the task must be checked as completed Also adding another check to the PR summary will make it clear to the maintainer/reviewer if the developer explained about backport Closes scylladb/scylladb#18275 * github.com:scylladb/scylladb: [github] add action to verify PR tasks was completed [github] add PR template	2024-04-25 17:14:50 +03:00
Pavel Emelyanov	18cc2cfa31	replica: Generalize snapshot details for single table/snapshot dir There are two places that get total:live stats for a table snapshot -- database::get_snapshot_details() and table::get_snapshot_details(). Both do pretty similar thing -- walk the table/snapshots/ directory, then each of the found sub-directory and accumulate the found files' sizes into snapshot details structure. Both try to tell total from live sizes by checking whether an sstable component found in snapshots is present in the table datadir. The database code does it in a more correct way -- not just checks the file presense, but also compares if it's a hardlink on the snapshot file, while the table code just checks if the file of the same name exists. This patch does both -- makes both database and table call the same helper method for a single snapshot details, and makes the generalized version use more elaborated collision check, thus fixing the per-table details getting behavior. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18347	2024-04-25 17:12:42 +03:00
Asias He	1ca779d287	streaming: Fix use after move in fire_stream_event The event is used in a loop. Found by clang-tidy: ``` streaming/stream_result_future.cc:80:49: warning: 'event' used after it was moved [bugprone-use-after-move] listener->handle_stream_event(std::move(event)); ^ streaming/stream_result_future.cc:80:39: note: move occurred here listener->handle_stream_event(std::move(event)); ^ streaming/stream_result_future.cc:80:49: note: the use happens in a later loop iteration than the move listener->handle_stream_event(std::move(event)); ^ ``` Fixes #18332 Closes scylladb/scylladb#18333	2024-04-25 16:48:54 +03:00
Botond Dénes	2c8bd99cd4	Merge 'Coroutinize view_builder::stop()' from Pavel Emelyanov It's pretty straightforward, but prior to that, exception handling needs some care Closes scylladb/scylladb#18378 * github.com:scylladb/scylladb: view-builder: Coroutinize stop() view_builder: Do not try to handle step join exceptions on stop	2024-04-25 16:48:25 +03:00
Kefu Chai	014a069ed2	build: cmake: require {fmt} >= 9.0.0 we are using `fmt::ostream_formatter` which was introduced in {fmt} v9.0.0, see https://github.com/fmtlib/fmt/releases/tag/9.0.0 . before this change, we depend on Seastar to find {fmt}. but the minimal version of {fmt} required by Seastar is 5.0.0, which cannot fulfill the needs to build scylladb. in this change, we find {fmt} package in scylla, and specify the minimal required version of 9.0.0, so the build can fail at the configuration time. {fmt} v8 could be still being used by users. for instance, ubuntu:jammy comes with libfmt-dev 8.1.1. and ubuntu:jammy is EOL in Apr 2027, see https://ubuntu.com/about/release-cycle . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18386	2024-04-25 16:35:08 +03:00
Amnon Heiman	dfea50a7e9	db/config.cc add metric family config from file Metric family config lets a user configure the metric family aggregate labels. This patch modifies the existing relable-config from file to accept metric family config. Similar to the existing relable_config, it adds a metric_family_configs section. For example, the following configuration demonstrates changing aggregate labels by name and regular expression. ``` metric_family_configs: - name: storage_service aggregate_labels: [shard] - regex: (storage_proxy.*) aggregate_labels: [shard, scheduling_group_name] ``` Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#18339	2024-04-25 16:03:39 +03:00
Kefu Chai	e9b31cb4c1	test: locator_topology: s/get0()/get()/ this change addresses the leftover of `9e8805bb49` Refs `9e8805bb49` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18390	2024-04-25 16:03:01 +03:00
Patryk Jędrzejczak	55b011902e	docs: raft.rst: update after removing consistent-topology-changes	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	0d428a3857	treewide: fix indentation after the previous patch	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	3a34bb18cd	db: config: make consistent-topology-changes unused We make the `consistent-topology-changes` experimental feature unused and assumed to be true in 6.0. We remove code branches that executed if `consistent-topology-changes` was disabled.	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	77342ffb34	test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls In the following commit, we make the `consistent-topology-changes` experimental feature unused. Then, all unit tests in the boost suite will start using the raft-based topology by default. Unfortunately, tests with multiple `single_node_cql_env::run_in_thread` calls (usually coming from the `do_with_cql_env_thread` calls) would fail. In a noninitial `run_in_thread` call, a node is started as if it booted for the first time. On the other hand, it has its persistent state from previous boots. Hence, the node can behave strangely and unexpectedly. In particular, `SYSTEM.TOPOLOGY` is not empty and the assertion that expects it to be empty when we boot for the first time fails. We fix this issue by making noninitial `run_in_thread` calls behave as normal restarts. After this change, `test_schema_digest_does_not_change_with_disabled_features` starts failing. This test copies the data directory before booting for the first time, so the new `_sys_ks.local().build_bootstrap_info().get();` makes the node incorrectly think it restarts. Then, after noticing it is not a part of group 0, the node would start the raft upgrade procedure if we didn't run it in the raft RECOVERY mode. This procedure would get stuck because it depends on messaging being enabled even if the node communicates only with itself and messaging is disabled in boost tests.	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	88038d958a	test: test_read_required_hosts: run with force-gossip-topology-changes In one of the following commits, we make the `consistent-topology-changes` experimental feature unused. Then, all unit tests in the boost suite will start using the raft-based topology by default. Unfortunately, some tests would start failing and `test_read_required_hosts` is one of them. `tablet_cql_test_config` in `tablets_test.cc` doesn't use `consistent-topology-changes`, so all test cases in this file run incorrectly wit the gossip-based topology changes. With `consistent-topology-changes`, only `test_read_required_hosts` fails. The failure happens on `auto table2 = add_table(e).get();`: ``` ERROR 2024-04-17 11:14:16,083 [shard 0:main] load_balancer - Replica 9b94d710-fbfb-11ee-9c4f-448617b47e11:0 of tablet 9b94d713-fbfb-11ee-9c4f-448617b47e11:0 not found in topology ``` This test case needs to be investigated and rewritten so that it passes with the raft-based topology. However, we don't want this issue to block the process of making the `consistent-topology-changes` experimental feature unused. We leave a FIXME and we will open a new issue to track it.	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	213f2f6882	storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes The `force_gossip_based_join` error injection does exactly what we expect from `force-gossip-topology-changes` so we can do a simple replacement. We prefer a flag over an error injection because we will use it a lot in CI jobs' configurations, some tests, manual testing etc. It's much more convenient. Moreover, the flag can be used in the release mode, so we re-enable all tests that were disabled in release mode only because of using the `force_gossip_based_join` error injection. The name of the `force-gossip-topology-changes` flag suggests that using it should always succesfully force the gossip-based topology or, if forcing is not possible, the booting should fail. We don't want a node with `force-gossip-topology-changes=true` that silently boots in the raft-topology mode. We achieve it by throwing a runtime error from `join_cluster` in two cases: - the node is restarting in the cluster that is using raft topology - the node is joining the cluster that is using raft topology	2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak	d6ee540efc	storage_service: join_token_ring: fix finish_setup_after_join calls The `topology_change_enabled` parameter of `finish_setup_after_join` is used underneath to enable pulling raft topology snapshots in two cases: - when the node joins the cluster that uses the raft-based topology, - when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature is enabled. The first case happens in the first changed call. `_raft_experimental_topology` always equals true there. The second call was incorrect as it could enable pulling snapshots before SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES was enabled. It could cause problems during rolling upgrade to 6.0. For more information see `07aba3abc4`.	2024-04-25 14:33:21 +02:00
Yaron Kaikov	5e63f74984	[github] add action to verify PR tasks was completed Adding another check to the PR summary will make it clear to the maintainer/reviewer if the developer explained about backport	2024-04-25 15:24:22 +03:00
Botond Dénes	aaa76d4c0e	Merge 'Getting per-table snapshot size is racy wrt creating new snapshots' from Pavel Emelyanov The API endpoint in question calls table::get_snapshot_detail() which just walks table/snapshots/ directory. This can clash with creating a new snapshot. Database-wide walk is guarded with snapshot-ctl's locking, so should the per-table API do Closes scylladb/scylladb#18414 * github.com:scylladb/scylladb: snapshot: Get per-table snapshot size under snapshot lock snapshot: Move per-table snap API to other snapshot endpoints	2024-04-25 14:57:52 +03:00
Kefu Chai	e5b30ae2ad	partition_version: do not rereference moved variable in `partition_entry::apply_to_incomplete()`, we pass `dst_snp` and `std::move(dst_snp)` to build the capture variable list of a lambda, but the order of evaluation of these variables are unspecified. fortunately, we haven't run into any issues at this moment. but this is not future-proof. so, let's avoid this by storing a reference of the dereferenced smart pointer, and use it later on. this issue is identified by clang-tidy: ``` /home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: warning: 'dst_snp' used after it was moved [bugprone-use-after-move] 500 \| cur = partition_snapshot_row_cursor(s, dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:502:23: note: move occurred here 502 \| dst_snp = std::move(dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 500 \| cur = partition_snapshot_row_cursor(s, dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: warning: 'src_snp' used after it was moved [bugprone-use-after-move] 501 \| src_cur = partition_snapshot_row_cursor(s, src_snp, can_move), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:504:23: note: move occurred here 504 \| src_snp = std::move(src_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 501 \| src_cur = partition_snapshot_row_cursor(s, *src_snp, can_move), \| ^ ``` Fixes #18360 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18361	2024-04-25 14:57:52 +03:00
Pavel Emelyanov	8aaa09ee97	replica: Do not carry view concurrency semaphore pointer around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:27:43 +03:00
Pavel Emelyanov	2ee7c41139	view: Get concurrency semaphore via database, not table The _view_update_concurrency_sem field on database propagates itself via keyspace config down to table config and view_update_generator then grabs one via table:: helper. That's an overkil, view_update_generator has direct reference on the database and can get this semaphore from there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:25:57 +03:00
Pavel Emelyanov	3d8b572d96	view_update_generator: Mark mutate_MV() private Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:25:40 +03:00
Pavel Emelyanov	bc4552740f	view: Move view_update_generator methods' code Now when the two methods belong to another class, move the code itself to db/view , where the class itself resides. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:24:20 +03:00
Pavel Emelyanov	c2bf6b43b2	view: Move table::generate_and_propagate_view_updates into view code Similarly to populate_views() method, this one also naturally belongs to view_update_generator class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:20:06 +03:00
Pavel Emelyanov	670c7c925c	view: Move table::populate_views() into view_update_generator class The method in question has little to do with table, effectively it only needs stats and consurrency semaphore. And the semaphore in question is obtained from table indirectly, it really resides on database. On the other hand, the method carries lots of bits from db::view, e.g. the view_update_builder class, memory_usage_of() helper and a bit more. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 14:17:20 +03:00
Kefu Chai	e5bcea6718	docs: drop documents related to {,dclocal_}read_repair_chance since "read_repair_chance" and "dclocal_read_repair_chance" are removed, and not supported anymore. let's stop documenting them. Refs #3502 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-25 17:15:27 +08:00
Kefu Chai	c323c93fa4	treewide: remove {dclocal_,}read_repair_chance options dclocal_read_repair_chance and read_repair_chance have been removed in Cassandra 3.11 and 4.x, see https://issues.apache.org/jira/browse/CASSANDRA-13910. if we expose the properties via DDL, Cassandra would fails to consume the CQL statement to creating the table when performing migration from Scylla to Cassandra 4.x, as the latter does not understand these properties anymore. currently the default values of `dc_local_read_repair_chance` and `read_repair_chance` are both "0". so this is practically disabled, unless user deliberately set them to a value greater than 0. also, as a side effect, Cassandra 4.x has better support of Python3. the cqlsh shipped along with Cassandra 3.11.16 only supports python2.7, see https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py it errors out if the system only provides python3 with the error of ``` No appropriate python interpreter found. ``` but modern linux systems do not provide python2 anymore. so, in this change, we deprecate these two options. Fixes #3502 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-25 17:15:27 +08:00
Botond Dénes	ca26899c36	Merge 'sstable: large data handler needs to count range tombstones as rows' from Ferenc Szili When issuing warnings about partitions with the number of rows above a configured threshold, the large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition. This PR fixes the first part of issue #13968 It does not cover distinguishing between live and dead rows. A subsequent PR will handle that. Closes scylladb/scylladb#18346 * github.com:scylladb/scylladb: sstables: add docs changes for system.large_partitions sstable: large data handler needs to count range tombstones as rows	2024-04-25 11:38:30 +03:00
Pavel Emelyanov	e97abfc473	tablets: Fix indentation after flat-hash-map patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18364	2024-04-25 11:36:37 +03:00
Kefu Chai	0b5a861961	build: cmake: reference build_mode with ${scylla_build_mode_${CMAKE_BUILD_TYPE}} before this change, if we generate the building system with plain `Ninja`, instead of `Ninja Multi-Config` using cmake, the build fails, because `${scylla_build_mode_${CMAKE_BUILD_TYPE}}` is not defined. so the profile used for building the rust library would be "rust-", which does not match any of the profiles defined by `Cargo.toml`. in this change, we use `$CMAKE_BUILD_TYPE` instead of "$config". as the former is defined for non-multi generator. while the latter is. see https://cmake.org/cmake/help/latest/generator/Ninja%20Multi-Config.html with this change, we are able to generate the building system properly with the "Ninja" generator. if we just want to run some static analyzer against the source tree or just want to build scylladb with a single configuration, the "Ninja" generator is a good fit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18353	2024-04-25 10:51:54 +03:00
Pavel Emelyanov	ae4c1c44ec	snapshot: Get per-table snapshot size under snapshot lock Walking per-table snapshot directory without lock is racy. There's snapshot-ctl locking that's used to get db-wide snapshot details, it should be used to get per-table snapshot details too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 10:05:51 +03:00
Pavel Emelyanov	186b36165e	snapshot: Move per-table snap API to other snapshot endpoints So that they are collected in one place and to facilitate next patch that's going to use snapshot-ctl for per-table API too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 10:05:01 +03:00
Anna Stuchlik	b5d256a991	doc: add Scylla Doctor to the docs This commit adds the description and usage instructions of Scylla Doctor to the "How to Report a ScyllaDB Problem" page. Scylla Doctor replaces Health Check Report, so the description of and references to the latter are removed with this commit. Fixes https://github.com/scylladb/scylladb/issues/16276 Closes scylladb/scylladb#17617	2024-04-25 09:50:38 +03:00
Asias He	037bba0ca1	repair: Turn on off_strategy_updater for tablet repair The off_strategy_updater is used during repair to update the automatic off strategy timer so off_strategy compaction starts automatically only after repair finishes. We still use off_strategy for tablets. So we should still turn on the updater. The update logic is used for vnode tables. We can share the code with vnode table instead of copying, but since there is a possibility we could disable off_strategy for tablets. We'd better postpone the code sharing as follow ups. If later, we decide to disable off_strategy for tablets, we can remove the updater for tablet. Fixes #18196 Closes scylladb/scylladb#18266	2024-04-25 09:03:07 +03:00
Kamil Braun	3363f6e1e8	Merge 'Fix write failures during node replace with same IP with topology over raft' from Gleb Currently a new node is marked as alive too late, after it is already reported as a pending node. The patch series changes replace procedure to be the same as what node_ops do: first stop reporting the IP of the node that is being replaced as a natural replica for writes, then mark the IP is alive, and only after that report the IP as a pending endpoint. Fixes: scylladb/scylladb#17421 * 'gleb/17421-fix-v2' of github.com:scylladb/scylla-dev: test_replace_reuse_ip: add data plane load sync_raft_topology_nodes: make replace procedure similar to nodeops one storage_service: topology_coordinator: fix indentation after previous patch storage_service: topology coordinator: drop ring check in node_state::replacing state	2024-04-24 17:09:01 +02:00
Petr Gusev	bc98774f83	test_replace_reuse_ip: add data plane load In this commit we enhance test_replace_reuse_ip to reproduce #17421. We create a test table and run insert queries on it while the first node is being replaced. In this form the test fails without the fix from the previous commit. Some insert requests fail with [Unavailable exception] "Cannot achieve consistency level for cl QUORUM...".	2024-04-24 16:59:24 +03:00
Gleb Natapov	4614fedd22	sync_raft_topology_nodes: make replace procedure similar to nodeops one In replace-with-same-ip a new node calls gossiper.start_gossiping from join_token_ring with the 'advertise' parameter set to false. This means that this node will fail echo RPC-s from other nodes, making it appear as not alive to them. The node changes this only in storage_service::join_node_response_handler, when the topology coordinator notifies it that it's actually allowed to join the cluster. The node calls _gossiper.advertise_to_nodes({}), and only from this moment other nodes can see it as alive. The problem is that topology coordinator sends this notification in topology::transition_state::join_group0 state. In this state nodes of the cluster already see the new node as pending, they react with calling tmpr->add_replacing_endpoint and update_topology_change_info when they process the corresponding raft notification in sync_raft_topology_nodes. When the new token_metadata is published, assure_sufficient_live_nodes sees the new node in pending_endpoints. All of this happen before the new node handled successful join notification, so it's not alive yet. Suppose we had a cluster with three nodes and we're replacing on them with a fourth node. For cl=qurum assure_sufficient_live_nodes throws if live < need + pending, which in our case becomes 2 < 2 + 1. The end effect is that during replace-with-same-ip data plane requests can fail with unavailable_exception, breaking availability. The patch makes boot procedure more similar to node ops one. It splits the marking of a node as "being replaced" and adding it to pending set in to different steps and marks it as alive in the middle. So when the node is in topology::transition_state::join_group0 state it marked as "being replaced" which means it will no longer be used for reads and writes. Then, in the next state, new node is marked as alive and is added to pending list. fixes scylladb/scylladb#17421	2024-04-24 16:59:22 +03:00
Kamil Braun	1297b9a322	mutation: mutation_by_size_splitter: skip last mutation if it's empty Currently, the last mutation emitted by split_mutation could be empty. It can happen as follows: - consume range tombstone change at pos `1` with some timestamp - consume clustering row at pos `2` - flush: this will create mutation with range tombstone (1, 2) and clustering row at 2 - consume range tombstone change at pos `2` with no timestamp (i.e. closing rtc) - end of partition since the closing rtc has the same position as the clustering row, no additional range tombstone will be emitted -- the only necessary range tombstone was already emitted in the previous mutation. On the other hand, `test_split_mutations` expects all emitted mutations to be non-empty, which is a sane expectation for this function. The test catched a case like this with random-seed=629157129. Fix this by skipping the last mutation if it turns out to be empty. Fixes: scylladb/scylladb#18042 Closes scylladb/scylladb#18375	2024-04-24 16:25:31 +03:00
Raphael S. Carvalho	71682aebdd	storage_service: Fix use-after-move in storage_service::node_ops_cmd_handler ``` service/storage_service.cc:4288:62: warning: 'req' used after it was moved [bugprone-use-after-move] node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable { ^ service/storage_service.cc:4288:107: note: move occurred here node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable { ^ service/storage_service.cc:4288:62: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated node_ops_insert(ops_uuid, coordinator, std::move(req.ignore_nodes), [this, coordinator, req = std::move(req)] () mutable { ^ ``` if evaluation order is right-to-left (GCC), req is moved first, and req.ignore_nodes will be empty, so nodes that should be ignored will still be considered, potentially resulting in a failure during replace. https://godbolt.org/z/jPcM6GEx1 courtesy of clang-tidy. Fixes #18324. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18366	2024-04-24 15:36:28 +03:00
Aleksandra Martyniuk	06f6aaf2cf	test: check default value of tombstone_gc Add a test which checks whether default tombstone_gc value is properly set and if it does not override previous setting.	2024-04-24 10:57:51 +02:00
Aleksandra Martyniuk	e0d498716a	test: topology: move some functions to util.py Move functions marked with asynccontextmanager from test/topology/test_mv.py to test/topology/util.py so that they can be used in other tests.	2024-04-24 10:57:51 +02:00
Aleksandra Martyniuk	58f72f9019	cql3: statements: change default tombstone_gc mode for tablets Currently, if tombstone_gc mode isn't specified for a table, then "timeout" is used by default. With tablets, running "nodetool repair -pr" may miss a tablet if it migrated across the nodes. Then, if we expire tombstones for ranges that weren't repaired, we may get data resurrection. Set default tombstone_gc mode value for DDLs that don't specify it. It's set to "repair" for tables which use tablets unless they use local replication strategy or rf = 1. Otherwise it's set to "timeout".	2024-04-24 10:42:10 +02:00
Kamil Braun	8876b9b0ef	test/pylib: random_tables: use IF NOT EXISTS when creating keyspace Due to Python driver's unexpected behavior, "CREATE KEYSPACE" statement may sometimes get executed twice (scylladb/python-driver#317), leading to "Keyspace ... already exists" error in our tests (scylladb/scylladb#17654). Work around this by using "IF NOT EXISTS". Fixes: scylladb/scylladb#17654 Closes scylladb/scylladb#18368	2024-04-24 10:09:26 +03:00
Pavel Emelyanov	1b1b86809d	view-builder: Coroutinize stop() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-23 20:43:42 +03:00
Pavel Emelyanov	eaf78fca04	view_builder: Do not try to handle step join exceptions on stop Commit `23c891923e` (main: make sure view_builder doesn't propagate semaphore errors) ignored some exceptions that could pop up from the _build_step/do_build_step() serialized action, since they are "benign" on stop. Later there came `b56b10a4bb` (view_builder: do_build_step: handle unexpected exceptions) that plugged any exception from the action in question, regardless of they happen on stop or run-time. Apparently, the latter commit supersedes the former. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-23 20:26:14 +03:00
Anna Stuchlik	c0e4f3e646	doc: include OSS-specific info as separate files This commit excludes OSS-specific links and content added in https://github.com/scylladb/scylladb/pull/17624 to separate files and adds the include directive `.. scylladb_include_flag::` to include these files in the doc source files. Reason: Adding the link to the Open Source upgrade guide (/upgrade/upgrade-opensource/upgrade-guide-from-5.4-to-6.0/enable-consistent-topology) breaks the Enterprise documentation because the Enterprise docs don't contain that upgrade guide. We must add separate files for OSS and Enterprise to prevent failing the Enterprise build and breaking the links. Closes scylladb/scylladb#18372	2024-04-23 16:59:05 +02:00
Raphael S. Carvalho	fa2dc5aefa	sstables: Fix use-after-move in an error path of FS-based sstable writer ``` sstables/storage.cc:152:21: warning: 'file_path' used after it was moved [bugprone-use-after-move] remove_file(file_path).get(); ^ sstables/storage.cc:145:64: note: move occurred here auto w = file_writer(output_stream<char>(std::move(sink)), std::move(file_path)); ``` It's a regression when TOC is found for a new sstable, and we try to delete temporary TOC. courtesy of clang-tidy. Fixes #18323. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18367	2024-04-23 17:19:55 +03:00
Pavel Emelyanov	f5f57dc817	table: No need to open directory in snapshot_exists() In order to check if a snapshot of a certain name exists the checking method opens directory. It can be made with more lightweight call. Also, though not critical, is that it fogets to close it. Coroutinuze the method while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18365	2024-04-23 17:19:24 +03:00
Botond Dénes	572003c469	Merge 'Cleanup the way snapshot details are propagated via API' from Pavel Emelyanov There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several snapshot_details aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls. Closes scylladb/scylladb#18317 * github.com:scylladb/scylladb: snapshot_ctl: Brush up true_snapshots_size() internals snapshot_ctl: Remove unused details struct snapshot_ctl: No double recoding of details database,snapshots: Move database::snapshot_details into snapshot_ctl database,snapshots: Make database::get_snapshot_details() return map, not vector table,snapshots: Move table::snapshot_details into snapshot_ctl	2024-04-23 16:28:25 +03:00
Kefu Chai	9e8805bb49	repair, transport: s/get0()/get()/ `future::get0()` was deprecated in favor of `future::get()`. so let's use the latter instead. this change silences a `-Wdeprecated` warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18357	2024-04-23 15:48:54 +03:00
Kefu Chai	4fd9b2a791	reader: silence false-positive use-after-move warning when compiling with clang-tidy, it warngs: ``` [6/9] Building CXX object readers/CMakeFiles/readers.dir/multishard.cc.o /home/kefu/dev/scylladb/readers/multishard.cc:84:53: warning: 'fut_and_result' used after it was moved [bugprone-use-after-move] 84 \| auto result = std::get<1>(std::move(fut_and_result)); \| ^ /home/kefu/dev/scylladb/readers/multishard.cc:79:34: note: move occurred here 79 \| _read_ahead_future = std::get<0>(std::move(fut_and_result)); \| ^ ``` but this warning is but a false alarm, as we are not really moving away the whole tuple, we are just move away an element from it. but clang-tidy cannot tell which element we are actually moving. so, silence both places of `std::move()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18363	2024-04-23 15:47:50 +03:00
Botond Dénes	5a1e3b25d0	Merge 'Sanitize sstables::directory_semaphore usage' from Pavel Emelyanov The semaphore in question is used to limit parallelism of manipulations with table's sstables. It's currently used in two places -- sstable_directory (mainly on boot) and by table::take_snapshot() to take snapshot. For the latter, there's also a database -> sharded<directory_semaphore> reference. This PR sanitizes the semaphore usage. The results are - directory_semaphore no longer needs to friend several classes that mess with its internals - database no longer references directory_semaphore Closes scylladb/scylladb#18281 * github.com:scylladb/scylladb: database: Keep local directory_semaphore to initialize sstables managers database: Don't reference directory_semaphore table: Use directory semaphore from sstables manager table: Indentation fix after previous patch table: Use directory_semaphore for rate-limited snapshot taking sstables: Move directory_semaphore::parallel_for_each() to header sstables: Move parallel_for_each_restricted to directory_semaphore table: Use smp::all_cpus() to iterate over all CPUs locally	2024-04-23 13:54:52 +03:00
Kefu Chai	ab4de1f470	auth: move fmt::formatter<auth::resource_kind> up before this change, `fmt::formatter<auth::resource_kind>` is located at line 250 in this file, but it is used at line 130. so, {fmt} is not able to find it: ``` /usr/include/fmt/core.h:2593:45: error: implicit instantiation of undefined template 'fmt::detail::type_is_unformattable_for<auth::resource_kind, char>' 2593 \| type_is_unformattable_for<T, char_type> _; \| ^ /usr/include/fmt/core.h:2656:23: note: in instantiation of function template specialization 'fmt::detail::parse_format_specs<auth::resource_kind, fmt::detail::compile_parse_context<char>>' requested here 2656 \| parse_funcs_{&parse_format_specs<Args, parse_context_type>...} {} \| ^ /usr/include/fmt/core.h:2787:47: note: in instantiation of member function 'fmt::detail::format_string_checker<char, auth::resource_kind, auth::resource_kind>::format_string_checker' requested here 2787 \| detail::parse_format_string<true>(str_, checker(s)); \| ^ /home/kefu/dev/scylladb/auth/resource.hh:130:29: note: in instantiation of function template specialization 'fmt::basic_format_string<char, auth::resource_kind &, auth::resource_kind &>::basic_format_string<char[65], 0>' requested here 130 \| seastar::format("This resource has kind '{}', but was expected to have kind '{}'.", actual, expected)) { \| ^ /usr/include/fmt/core.h:1578:45: note: template is declared here 1578 \| template <typename T, typename Char> struct type_is_unformattable_for; \| ^ ``` in this change, `fmt::formatter<auth::resource_kind>` is moved up to where `auth::resource_kind` is defined. so that it can be used by its caller. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18316	2024-04-23 12:11:17 +03:00
Kefu Chai	48048c2f94	utils/to_string: include fmt/std.h if fmt >= v10 in to_string.hh, we define the specialization of `fmt::formatter<std::optional<T>>`, which is available in {fmt} v10 and up. to avoid conditionally including `utils/to_string.hh` and `fmt/std.h` in all source files formatting `std::optional<T>` using {fmt}, let's include `fmt/std.h` if {fmt}'s verison is greater or equal to 10. in future, we should drop the specialization and use `fmt/std.h` directly. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18325	2024-04-23 12:09:05 +03:00
Kefu Chai	e2d5054c53	types: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18326	2024-04-23 12:08:23 +03:00
Pavel Emelyanov	4445ee9a55	Merge 'install-dependencies.sh: add more dependencies for debian' from Kefu Chai in this changeset, we install `libxxhash-dev` and `cargo` for debian, and install cxxbridge for all distros, so that at least debian can be built without further preparations after running `install-dependencies.sh`. Closes scylladb/scylladb#18335 * github.com:scylladb/scylladb: install-dependencies.sh: move cargo out of fedora branch install-dependencies: install cargo and wabt for debian install-dependencies.sh: add libxxhash-dev for debian	2024-04-23 12:04:47 +03:00
Lakshmi Narayanan Sreethar	de6570e1ec	serializer_impl, sstables: fix build failure due to missing includes When building scylla with cmake, it fails due to missing includes in serializer_impl.hh and sstables/compress.hh files. Fix that by adding the appropriate include files. Fixes #18343 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18344	2024-04-23 12:03:51 +03:00
Kefu Chai	826f413cad	thrift: avoid use-after-move in `make_non_overlapping_ranges()` in handler.cc, `make_non_overlapping_ranges()` references a moved instance of `ColumnSlice` when something unexpected happens to format the error message in an exception, the move constructor of `ColumnSlice` is default-generated, so the members' move constructors are used to construct the new instance in the move constructor. this could lead to undefined behavior when dereferencing the move instance. in this change, in order to avoid use-after free, let's keep a copy of the referenced member variables and reference them when formatting error message in the exception. this use-after-move issue was introduced in `822a315dfa`, which implemented `get_multi_slice` verb and this piece in the first place. since both 5.2 and 5.4 include this commit, we should backport this change to them. Refs `822a315dfa` Fixes #18356 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18358	2024-04-23 12:02:09 +03:00
Kefu Chai	ad2c26824a	main: do not reference moved variable before this change, we dereference `linfo` after moving it away. and clang-tidy warns us like ``` [19/171] Building CXX object CMakeFiles/scylla.dir/main.cc.o /home/kefu/dev/scylladb/main.cc:559:12: warning: 'linfo' used after it was moved [bugprone-use-after-move] 559 \| return linfo.host_id; \| ^ /home/kefu/dev/scylladb/main.cc:558:36: note: move occurred here 558 \| sys_ks.local().save_local_info(std::move(linfo), snitch.local()->get_location(), broadcast_address, broadcast_rpc_address).get(); \| ^ ``` the default-generated move constructor of `local_info` uses the default-generated move constructor of `locator::host_id`, which in turn use the default-generated move constructor of `utils::tagged_uuid<struct host_id_tag>`, and then `utils::UUID` 's move constructor. since `UUID` does not contain any moveable resources, what it has is but two `int64_t` member variables. so this is a benign issue. but still, it is distracting. in this change, we keep the value of `host_id` locally, and return it instead to silence this warning, and to improve the maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18362	2024-04-23 11:58:58 +03:00
Patryk Jędrzejczak	14911051ee	db: config: introduce force-gossip-topology-changes We are going to make the `consistent-topology-changes` experimental feature unused in 6.0. However, the topology upgrade procedure will be manual and voluntary, so some 6.0 clusters will be using the gossip-based topology. Therefore, we need to continue testing the gossip-based topology. The solution is introducing a new flag, `force-gossip-topology-changes`, that will enforce the gossip-based topology in a fresh cluster. In this patch, we only introduce the parameter without any effect. Here is the explanation. Making `consistent-topology-changes` unused and introducing `force-gossip-topology-changes` requires adjustments in scylla-dtest. We want to merge changes to scylladb and scylla-dtest in a way that ensures all tests are run correctly during the whole process. If we merged all changes to scylladb first, before merging the scylla-dtest changes, all tests would run with the raft-based topology and the ones excluded in the raft-based topology would fail. We also can't merge all changes to scylla-dtest first. However, we can follow this plan: 1. scylladb: merge this patch 2. scylla-dtest: start using `force-gossip-topology-changes` in jobs that run without the raft-based topology 3. scylladb: merge the rest of the changes 4. scylla-dtest: merge the rest of the changes Ref scylladb/scylladb#17802 Closes scylladb/scylladb#18284	2024-04-23 09:42:46 +02:00
Botond Dénes	275ed9a9bc	replica/mutation_dump: create_underlying_mutation_sources(): remove false move transformed_cr is moved in a loop, in each iteration. This is harmless because the variable is const and the move has no effect, yet it is confusing to readers and triggers false positives in clang-tidy (moved-from object reused). Remove it. Fixes: #18322 Closes scylladb/scylladb#18348	2024-04-23 01:21:36 +02:00
Kamil Braun	e9285e5c04	Merge 'various fixes for topology coordinator' from Gleb The series contains fixes for some problems found during scalability testing and one clean up patch. Ref: scylladb/scylladb#17545 * 'gleb/topology-fixes-v4' of github.com:scylladb/scylla-dev: gossiper: disable status check for endpoints in raft mode storage_service: introduce a setter for topology_change_kind topology coordinator: drop unused structure storage_service: yield in get_system_mutations	2024-04-22 17:37:47 +02:00
Calle Wilund	82d97da3e0	commitlog: Remove (benign) use-after-move Fixes #18329 named_file::assign call uses old object "known_size" after a move of the object. While this is wholly ok, since the attribute accessed will not be modified/destroyed by the move, it causes warnings in "tidy" runs, and might confuse or cause real errors should impl. change. Closes scylladb/scylladb#18337	2024-04-22 17:20:19 +03:00
Ferenc Szili	c528597a84	sstables: add docs changes for system.large_partitions This commit updates the documentation changes for the new column range_tombstones in system.large_partitions	2024-04-22 15:25:41 +02:00
Ferenc Szili	98bec4e02a	sstable: large data handler needs to count range tombstones as rows When issuing warnings about partitions with the number of rows above a configured threshold, the large partitions handler does not take into consideration the number of range tombstone markers in the total rows count. This fix adds the number of range tombstone markers to the total number of rows and saves this total in system.large_partitions.rows (if it is above the threshold). It also adds a new column range_tombstones to the system.large_partitions table which only contains the number of range tombstone markers for the given partition. This PR fixes the first part of issue #13968 It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.	2024-04-22 15:24:18 +02:00
Kefu Chai	ff04375016	main: drop unused namespace alias `fs` namespace alias was introduced in `ff4d8b6e85`, but we don't use it anymore. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18308	2024-04-22 13:50:28 +03:00
Nadav Har'El	59b40484c8	Update seastar submodule * seastar 8fabb30a...2b43417d (6): > future: deprecate future::get0() > build: do not export valgrind with export() > http: deprecate buggy path param[] > http/request: add get_path_param method > http/request: get_query_param refactor > http/util: add path_decode method Refs #5883 (fixes https://github.com/scylladb/seastar/issues/725 and provides a new API to read the decoded paths). Closes scylladb/scylladb#18297	2024-04-22 11:12:49 +03:00
Kefu Chai	85406a450c	install-dependencies.sh: move cargo out of fedora branch so that we install cxxbridge-cmd on all distros, and cxxbridge is available when building scylladb. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-22 15:41:20 +08:00
Kefu Chai	835742af6d	install-dependencies: install cargo and wabt for debian cargo is used for installing cxxbridge-cmd, which is in turn used when building the cxx bindings for the rust modules. so we need it on all distros. in this change, we add cargo for debian. so that we don't have build failure like: ``` CMake Error at rust/CMakeLists.txt:32 (find_program): Could not find CXXBRIDGE using the following names: cxxbridge ``` for similar reason, we also need wabt, which provides wasm2wat, without which, we'd have ``` CMake Error at test/resource/wasm/CMakeLists.txt:1 (find_program): Could not find WASM2WAT using the following names: wasm2wat ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-22 15:41:20 +08:00
Kefu Chai	a70a288627	install-dependencies.sh: add libxxhash-dev for debian libxxhash is used for building on both fedora and debian. `xxhash-devel` is already listed in `fedora_packages`, we should have its counterpart in `debian_base_packages`. otherwise the build on debian and its derivatives could fail like ``` CMake Error at /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find xxHash (missing: xxhash_LIBRARY xxhash_INCLUDE_DIR) (found version "") Call Stack (most recent call first): /usr/local/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE) cmake/FindxxHash.cmake:30 (find_package_handle_standard_args) CMakeLists.txt:75 (find_package) ``` if we are using CMake to generate the building system. if we use `configure.py` to generate `build.ninja`, the build would fails at build time. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-22 15:22:51 +08:00
Gleb Natapov	0c77e96b0b	storage_service: topology_coordinator: fix indentation after previous patch	2024-04-21 18:53:21 +03:00
Gleb Natapov	b8ee8911ca	storage_service: topology coordinator: drop ring check in node_state::replacing state Always modify topology metadata in node_state::replacing state. There is no dependency on the ring value at all.	2024-04-21 18:53:04 +03:00
Gleb Natapov	06e6ed09ed	gossiper: disable status check for endpoints in raft mode Gossiper automatically removes endpoints that do not have tokens in normal state and either do not send gossiper updates or are dead for a long time. We do not need this with topology coordinator mode since in this mode the coordinator is responsible to manage the set of nodes in the cluster. In addition the patch disables quarantined endpoint maintenance in gossiper in raft mode and uses left node list from the topology coordinator to ignore updates for nodes that are no longer part of the topology.	2024-04-21 16:36:07 +03:00
Gleb Natapov	0e3f92fa49	storage_service: introduce a setter for topology_change_kind In the next patch we will extend it to have other side affects.	2024-04-21 16:36:07 +03:00
Gleb Natapov	040c6ca0c1	topology coordinator: drop unused structure	2024-04-21 16:36:07 +03:00
Gleb Natapov	d0a00f3489	storage_service: yield in get_system_mutations Yield in a loop that converts a result to canonical_mutation. We observed stalls for very large tables.	2024-04-21 16:36:07 +03:00
Avi Kivity	87b08c957f	Merge 'treewide: drop `FMT_DEPRECATED_OSTREAM` macro and homebrew range formatters' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Closes scylladb/scylladb#17968 * github.com:scylladb/scylladb: treewide: do not define FMT_DEPRECATED_OSTREAM treewide: include fmt/ranges.h and/or fmt/std.h utils/managed_bytes: add support for fmt::to_string() to bytes and friends	2024-04-20 22:25:00 +03:00
Mikołaj Grzebieluch	65cfb9b4e0	storage_service: skip wait_for_gossip_to_settle if topology changes are based on raft Waiting for gossip to settle slows down the bootstrap of the cluster. It is safe to disable it if the topology is based on Raft. Fixes scylladb/scylladb#16055 Closes scylladb/scylladb#17960	2024-04-20 17:56:51 +02:00
Pavel Emelyanov	67a408447f	snapshot_ctl: Brush up true_snapshots_size() internals Previous patches broke indentation in this method. Fix it by shortening the summation loop with the help of std::accumulate() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 21:06:06 +03:00
Pavel Emelyanov	50add3314d	snapshot_ctl: Remove unused details struct Now the details are manipulated via some other structs and this one can just be removed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:34 +03:00
Pavel Emelyanov	e8f10be12e	snapshot_ctl: No double recoding of details Currently database::get_snapshot_details() returns a collection of snapshots. The snapshot_ctl converts this collection into similarly looking one with slightly different structures inside. The resulting collection is converted one more time on the API layer into another similarly looking map. This patch removes the intermediate conversion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:32 +03:00
Pavel Emelyanov	8ec3f057a8	database,snapshots: Move database::snapshot_details into snapshot_ctl Similarly to how it looks like for table::snapshot_details Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:29 +03:00
Pavel Emelyanov	f6bc283bbb	database,snapshots: Make database::get_snapshot_details() return map, not vector So that it's in-sync with table::get_snapshot_details(). Next patches will improve this place even further. Also, there can be many snapshots and vector can grow large, but that's less of an issue here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:25 +03:00
Pavel Emelyanov	a36c13beb3	table,snapshots: Move table::snapshot_details into snapshot_ctl Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 19:59:34 +03:00
Kefu Chai	372a4d1b79	treewide: do not define FMT_DEPRECATED_OSTREAM since we do not rely on FMT_DEPRECATED_OSTREAM to define the fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`. in this change, * utils: drop the range formatters in to_string.hh and to_string.c, as we don't use them anymore. and the tests for them in test/boost/string_format_test.cc are removed accordingly. * utils: use fmt to print chunk_vector and small_vector. as we are not able to print the elements using operator<< anymore after switching to {fmt} formatters. * test/boost: specialize fmt::details::is_std_string_like<bytes> due to a bug in {fmt} v9, {fmt} fails to format a range whose element type is `basic_sstring<uint8_t>`, as it considers it as a string-like type, but `basic_sstring<uint8_t>`'s char type is signed char, not char. this issue does not exist in {fmt} v10, so, in this change, we add a workaround to explicitly specialize the type trait to assure that {fmt} format this type using its `fmt::formatter` specialization instead of trying to format it as a string. also, {fmt}'s generic ranges formatter calls the pair formatter's `set_brackets()` and `set_separator()` methods when printing the range, but operator<< based formatter does not provide these method, we have to include this change in the change switching to {fmt}, otherwise the change specializing `fmt::details::is_std_string_like<bytes>` won't compile. * test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends for comparing values. but without the operator<< based formatters, Boost.Test would not be able to print them. after removing the homebrew formatters, we need to use the generic `boost_test_print_type()` helper to do this job. so we are including `test_utils.hh` in tests so that we can print the formattable types. * treewide: add "#include "utils/to_string.hh" where `fmt::formatter<optional<>>` is used. * configure.py: do not define FMT_DEPRECATED_OSTREAM * cmake: do not define FMT_DEPRECATED_OSTREAM Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:57:36 +08:00
Kefu Chai	a439ebcfce	treewide: include fmt/ranges.h and/or fmt/std.h before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:16 +08:00
Kefu Chai	01f13850cb	utils/managed_bytes: add support for fmt::to_string() to bytes and friends in `3835ebfcdc`, `fmt::formatter` were added to `bytes` and friend, but their `format()` methods were intentionally implemented as plain methods, which only acccept `fmt::format_context`. it was a decision decision. the intention was to reduce the usage of template, to speed up the compilation at the expense of dropping the support of other appenders, notably the one used by `fmt::to_string()`, where the type of "format_context" is not a `fmt::format_context`, but a string appender. but it turns out we still have users in tests using `fmt::to_string()`, to convert, for instance, `bytes` to `std::string`, so, to make their life easier, we add the templated `format()` to these types. an alternative is to change the callers to use something like `fmt::format("{}", v)`, which is less convenient though. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:13 +08:00
Kefu Chai	5ab527e669	main: do not echo parsed options when calling scylla interactively in `2f0f53ac`, we added logging of parsed command line options so that we can see how scylla is launched in case it fails to boot. but when scylla is called interactively in console. this echo is a little bit annoying. see following console session ```console $ scylla --help-loggers Scylla version 5.5.0~dev-0.20240419.3c9651adf297 with build-id 7dd6a110e608535e5c259a03548eda6517ab4bde starting ... command used: "./RelWithDebInfo/scylla --help-loggers" pid: 996503 parsed command line options: [help-loggers] Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ... ``` so in this change, we check if the stdin is associated with a terminal device, if that the case, we don't print the scylla version, parsed command line and pid. and the interactive session looks like: ```console $ scylla --help-loggers Available loggers: BatchStatement LeveledManifest alter_keyspace alter_table ``` no more distracting information printed. the original behavior can be tested like: ```console $ : \| ./RelWithDebInfo/scylla --help-loggers ``` assuming scylla is always launched with systemd, which connects stdin to /dev/null. see https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#Logging%20and%20Standard%20Input/Output . so this behavior is preserved with this change. Refs #4203 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18309	2024-04-19 15:00:05 +03:00
Raphael S. Carvalho	223214439b	compaction: Disconsider active tables in the hourly compaction reevaluation This hourly reevaluation is there to help tablets that have very low write activity, which can go a long time without flushing a memtable, and it's important to reevaluate compaction as data can get expired. Today it can happen that we reevaluate a table that is being compacted actively, which is waste of cpu as the reevaluation will happen anyway when there are changes to sstable set. This waste can be amplified with a significant tablet count in a given shard. Eventually, we could make the revaluation time per table based on expiration histogram, but until we get there, let's avoid this waste by only reevaluating tables that are compaction idle for more than 1h. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18280	2024-04-19 14:33:40 +03:00
Pavel Emelyanov	ba58b71eea	database: Keep local directory_semaphore to initialize sstables managers Now database is constructed with sharded<directory_semaphore>, but it no longer needs sharded, local is enough. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	53909da390	database: Don't reference directory_semaphore It was only used by table taking snapshot code. Now it uses sstables manager's reference and database can stop carrying it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	be5bc38cde	table: Use directory semaphore from sstables manager It's natural for a table to itarate over its sstables, get the semaphore from the manager of sstables, not from database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	7e7dd2649b	table: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	2fced3c557	table: Use directory_semaphore for rate-limited snapshot taking The table::take_snapshot() limits its parallelizm with the help of direcoty semaphore already, but implements it "by hand". There's already parallel_for_each() method on the dir.sem. class that does exactly that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	6514c67fae	sstables: Move directory_semaphore::parallel_for_each() to header It's a template and in order to use it in other .cc files it's more convenient to move it into a header file Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	ad1a9d4c11	sstables: Move parallel_for_each_restricted to directory_semaphore In order not to make sstable_directory mess with private members of this class. Next patch will also make use of this new method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Pavel Emelyanov	0d2178202d	table: Use smp::all_cpus() to iterate over all CPUs locally Currently it uses irange(0, smp::count0), but seastar provides convenient helper call for the very same range object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 13:53:57 +03:00
Kefu Chai	a5dae74aee	doc: update `nodetool setlogginglevel` sample output with most recent loggers list in order to reduce the confusion like: > I cannot find foobar in the list, is it supported? also, take this opportunity to use "console" instead of "shell" for rendering the code block. it's a better fit in this case. since we are using pygment for syntax highlighting, see https://pygments.org/docs/lexers/#pygments.lexers.shell.BashSessionLexer for details on the "console" lexer. and add a prompt before the command line, so that "console" lexer can render the command line and output better. also, add a note explaining that user should refer the output of `scylla` to see the list of logger classes. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18311	2024-04-19 13:25:39 +03:00
Kefu Chai	c04654e865	storage_service: capture this explicitly clang-19 complains with `-Wdeprecated-this-capture`: ``` /home/kefu/dev/scylladb/service/storage_service.cc:5837:22: error: implicit capture of 'this' with a capture default of '=' is deprecated [-Werror,-Wdeprecated-this-capture] 5837 \| auto* node = get_token_metadata().get_topology().find_node(dst.host); \| ^ /home/kefu/dev/scylladb/service/storage_service.cc:5830:44: note: add an explicit capture of 'this' to capture '*this' by reference 5830 \| co_await transit_tablet(table, token, [=] (const locator::tablet_map& tmap, api::timestamp_type write_timestamp) { \| ^ \| , this ``` since https://open-std.org/JTC1/SC22/WG21/docs/papers/2018/p0806r2.html was approved, see https://eel.is/c++draft/depr.capture.this. and newer versions of C++ compilers implemented it, so we need to capture `this` explicitly to be more standard compliant, and to be more future-proof in this regard. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18306	2024-04-19 10:05:57 +03:00
Kefu Chai	168ade72f8	treewide: replace formatter<std::string_view> with formatter<string_view> in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>` for `std::string_view` as well as the specialization of `fmt::formatter<..>` for `fmt::string_view` which is an implementation builtin in {fmt} for compatibility of pre-C++17. and this type is used even if the code is compiled with C++ stadandard greater or equal to C++17. also, before v10, the `fmt::formatter<std::string_view>::format()` is defined so it accepts `std::string_view`. after v10, `fmt::formatter<std::string_view>` still exists, but it is now defined using `format_as()` machinery, so it's `format()` method does not actually accept `std::string_view`, it accepts `fmt::string_view`, as the former can be converted to `fmt::string_view`. this is why we can inherit from `fmt::formatter<std::string_view>` and use `formatter<std::string_view>::format(foo, ctx);` to implement the `format()` method with {fmt} v9, but we cannot do this with {fmt} v10, and we would have following compilation failure: ``` FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o /home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc /home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format' 254 \| return formatter<std::string_view>::format(it->second, ctx); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ /usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument 2759 \| FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const \| ^ ~~~~~~~~~~~~ ``` because the inherited `format()` method actually comes from `fmt::formatter<fmt::string_view>`. to reduce the confusion, in this change, we just inherit from `fmt::format<string_view>`, where `string_view` is actually `fmt::string_view`. this follows the document at https://fmt.dev/latest/api.html#formatting-user-defined-types, and since there is less indirection under the hood -- we do not use the specialization created by `FMT_FORMAT_AS` which inherit from `formatter<fmt::string_view>`, hopefully this can improve the compilation speed a little bit. also, this change addresses the build failure with {fmt} v10. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18299	2024-04-19 07:44:07 +03:00
Avi Kivity	6e487a49aa	Merge 'toolchain: support building an optimized clang' from Takuya ASADA This is complete version of #12786, since I take over the issue from @mykaul. Update from original version are: - Support ARM64 build (disable BOLT for now since it doesn't functioning) - Changed toolchain settings to get current scylla able to build (support WASM, etc) - Stop git clone scylladb repo, create git-archive of current scylla directory and import it - Update Clang to 17.0.6 - Save entire clang directory for BUILD mode, not just /usr/bin/clang binary - Implemented INSTALL_PREBUILT mode to install prebuilt image which built in BUILD mode Note that this patch drops cross-build support of frozen toolchain, since building clang and scylla multiple time in qemu-user-static will very slow, it's not usable. Instead, we should build the image for each architecture natively. ---- This is a different way attempting to combine building an optimized clang (using LTO, PGO and BOLT, based on compiling ScyllaDB) to dbuild. Per Avi's request, there are 3 options: skip this phase (which is the current default), build it and build + install it to the default path. Fixes: #10985 Fixes: scylladb/scylla-enterprise#2539 Closes scylladb/scylladb#17196 * github.com:scylladb/scylladb: toolchain: support building an optimized clang configure.py: add --build-dir option	2024-04-18 19:20:23 +00:00
Anna Stuchlik	a3481a4566	doc: document the system_auth_v2 feature This commit includes updates related to replacing system_auth with system_auth_v2. - The keyspace name system_auth is renamed to system_auth_v2. - The procedures are updated to account for system_auth_v2. - No longer required system_auth RF changes are removed from procedures. - The information is added that if the consistent topology updates feature was not enabled upon upgrade from 5.4, there are limitations or additional steps to do (depending on the procedure). The files with that kind of information are to be found in _common folders and included as needed. - The upgrade guide has been updated to reflect system_auth_v2 and related impacts. Closes scylladb/scylladb#18077	2024-04-18 18:33:49 +02:00
Kefu Chai	21b03d2ce3	topology_coordinator: remove unused variable when compiling the tree with clang-19, it complains: ``` /home/kefu/dev/scylladb/service/topology_coordinator.cc:1968:31: error: variable 'reject' set but not used [-Werror,-Wunused-but-set-variable] 1968 \| if (auto* reject = std::get_if<join_node_response_params::rejected>(&validation_result)) { \| ^ 1 error generated. ``` so, despite that we evaluate the assignment statement to see it evaluates to true or false, the compiler still believes that the variable is not used. probably, the value of the statement is not dependent on the value of the value being assigned. either way, let's use `std::holds_alternative<..>` instead of `std::get_if<..>`, to silence this warning, and the code is a little bit more compacted, in the sense of less tokens in the `if` statement. in order to be self-contained, we take the opportunity to include `<variant>` in this source file, as a function declared in this header is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18291	2024-04-18 18:04:56 +03:00
Amnon Heiman	e8410848a8	Update seastar submodule This patch updates the seastar submodule to get the latest safety patch for the metric layer. The latest patch allows manipulating metric_families early in the start-up process and is safer if someone chooses to aggregate summaries. * seastar f3058414...8fabb30a (4): > stall-analyser: improve stall pattern matching > TLS: Move background BYE handshake to engine::run_in_background > metrics.cc: Safer set_metric_family_configs > src/core/metrics.cc: handle SUMMARY add operator Closes scylladb/scylladb#18293	2024-04-18 18:02:28 +03:00
Tomasz Grabiec	393cb54c01	Merge 'Generalize tablet transition API calls' from Pavel Emelyanov Recently there had been added add_tablet_replica and del_tablet_replica API calls that copy big portion of the existing move_tablet API call's logic. This PR generalizes the common parts Closes scylladb/scylladb#18272 * github.com:scylladb/scylladb: tablets: Generalize transition mutations preparation tablets: Generalize tablet-already-in-transition check tablets: Generalize raft communications for tablet transition API calls tablets: Drop src vs dst equality check from move_tablet()	2024-04-18 14:42:10 +02:00
Anna Stuchlik	ad81f9f56a	doc: replace Scylla with ScyllaDB in Glossary This commit replaces "Scylla" with "ScyllaDB" on the Glossary page. The product has been rebranded as "ScyllaDB". Closes scylladb/scylladb#18296	2024-04-18 14:59:23 +03:00
Kamil Braun	9c2a836607	Revert "Merge 'Drain view_builder in generic drain' from ScyllaDB" This reverts commit `298a7fcbf2`, reversing changes made to `5cf53e670d`. The change made CI flaky. Fixes: scylladb/scylladb#18278	2024-04-18 11:50:41 +02:00
Yaron Kaikov	44d1ffe86b	[github] add PR template Today with the backport automation, the developer added the relevant backport label, but without any explanation of why Adding the PR template with a placeholder for the developer to add his decision about backport yes or no The placeholder is marked as a task, so once the explanation is added, the task must be checked as completed	2024-04-17 15:40:32 +03:00
Pavel Emelyanov	1b2cd56bcc	tablets: Generalize transition mutations preparation Tablet transition handlers prepare two mutations -- one for tablets table, that sets transition state, transition mode and few others; and another one for topology table that "activates" the tablet_migration state for topology coordinator. The latter is common to all three handlers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 12:01:51 +03:00
Pavel Emelyanov	3beccb8165	tablets: Generalize tablet-already-in-transition check Continuation of the previous patch -- there's a common sanity check of tablet transition API handlers, namely that this tablet is not in transition already. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 12:01:02 +03:00
Pavel Emelyanov	14923812ad	tablets: Generalize raft communications for tablet transition API calls There are three transition calls -- move, add replica and del replica -- and all three work similarly. In a loop they try to get guard for raft operation, then perform sanity checks on topology state, then prepare mutations and then try to apply them to raft. After the loop finishes all three wait for transition for the given tablet to complete. This patch generalizes the raft kicking loop and the transition completion waiting code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 11:59:03 +03:00
Pavel Emelyanov	c4d538320e	tablets: Drop src vs dst equality check from move_tablet() The code here looks like this if src.host == dst.host throw "Local migration not possible" if src == dst co_return; The 2nd check is apparently never satisfied -- if src == dst this means that src.host == dst.host and it should have thrown already Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-17 11:57:10 +03:00
Marcin Maliszkiewicz	7e749cd848	auth: don't run legacy migrations on auth-v2 startup We won't run: - old pre auth-v1 migration code - code creating auth-v1 tables We will keep running: - code creating default rows - code creating auth-v1 keyspace (needed due to cqlsh legacy hack, it errors when executing `list roles` or `list users` if there is no system_auth keyspace, it does support case when there is no expected tables)	2024-04-15 12:09:39 +02:00
Marcin Maliszkiewicz	d40ff81c5b	auth: fix indent in password_authenticator::start	2024-04-15 12:09:32 +02:00
Marcin Maliszkiewicz	3e8cf20b98	auth: remove unused service::has_existing_legacy_users func	2024-04-15 12:09:32 +02:00
Benny Halevy	a7c5fccab9	test: chunked_managed_vector_test: add test_push_back_using_existing_element chunked_managed_vector isn't susceptible to #18072 since the elements it keeps are managed_ref<T> and those must be constructed by the caller, before reallocation takes place, so it's safer with that respect. The unit test is added to verify that and prevent regressions in the future. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:34:50 +03:00
Benny Halevy	2afc584f08	utils: chunked_vector: reserve_for_emplace_back: emplace before migrating existing elements Currently, push_back or emplace_back reallocate the last chunk before constructing the new element. If the arg passed to push_back/emplace_back is a reference to an existing element in the vector, reallocating the last chunk will invalidate the arg reference before it is used. This patch changes the order when reallocating the last chunk in reserve_for_emplace_back: First, a new chunk_ptr is allocated. Then, the back_element is emplaced in the newly allocated array. And only then, existing elements in the current last chunk are migrated to the new chunk. Eventually, the new chunk replaces the existing chunk. If no reservation is requried, the back element is emplaced "in place" in the current last chunk. Fixes scylladb/scylladb#18072 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:34:48 +03:00
Benny Halevy	2c0e40a21f	utils: chunked_vector: push_back: call emplace_back When pushing an element with a value referencing an exisiting element in the vector, we currently risking use-after-free when that element gets moved to a reallocated chunk, if capacity needs to be reserved, by that, invaliding the refernce to the existing element before it is used. This patch prepares for fixing that in the emplace path by converging to a single code path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:33:43 +03:00
Benny Halevy	882bb21903	utils: chunked_vector: define min_chunk_capacity Expose the number of items in the first allocated chunk. This will be used by a unit test in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:33:43 +03:00
Benny Halevy	e066f81cb3	utils: chunked*vector: use std::clamp It is available in the std library since C++17. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-11 14:33:43 +03:00
Yaniv Kaul	bd34f2fe46	toolchain: support building an optimized clang This is a different way attempting to combine building an optimized clang (using LTO, PGO and BOLT, based on compiling ScyllaDB) to dbuild. Per Avi's request, there are 3 options: skip this phase (which is the current default), build it and build + install it to the default path. Fixes: #10985 Fixes: scylladb/scylla-enterprise#2539	2024-04-08 22:53:59 +09:00
Takuya ASADA	be3776ec2a	configure.py: add --build-dir option Add --build-dir option to specify build directory. This is needed for optimized clang support, since it requires to build Scylla in tools/toolchain/prepare, w/o deleting current build/ directory.	2024-04-01 18:35:42 +09:00

1552 changed files with 36282 additions and 24926 deletions

1

.gitattributes vendored

View File

@@ -1,3 +1,4 @@
 *.cc diff=cpp
 *.hh diff=cpp
 *.svg binary
 docs/_static/api/js/* binary

									
										20

.github/clang-include-cleaner.json
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,20 @@

				{

				    "problemMatcher": [

				        {

				            "owner": "clang-include-cleaner",

				            "severity": "error",

				            "pattern": [

				                {

				                    "regexp": "^([^\\-\\+].*)$",

				                    "file": 1

				                },

				                {

				                    "regexp": "^(-\\s+[^\\s]+)\\s+@Line:(\\d+)$",

				                    "line": 2,

				                    "message": 1,

				                    "loop": true

				                }

				            ]

				        }

				    ]

				}

									
										18

.github/clang-matcher.json
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,18 @@

				{

				    "problemMatcher": [

				        {

				            "owner": "clang",

				            "pattern": [

				                {

				                    "regexp": "^([^:]+):(\\d+):(\\d+):\\s+(warning|error):\\s+(.*?)\\s+\\[(.*?)\\]$",

				                    "file": 1,

				                    "line": 2,

				                    "column": 3,

				                    "severity": 4,

				                    "message": 5,

				                    "code": 6

				                }

				            ]

				        }

				    ]

				}

									
										25

.github/mergify.yml
									
										vendored
									
												View File
												
				@@ -65,3 +65,28 @@ pull_request_rules:

				          - branch-5.4

				        assignees:

				          - "{{ author }}"

				  - name: Automate backport pull request 6.0

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/6.0 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 6.0] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				           Refs #{{number}}

				        branches:

				          - branch-6.0

				        assignees:

				          - "{{ author }}"

1

.github/pull_request_template.md vendored Normal file

View File

				`@@ -0,0 +1 @@`
				`*Please replace this line with justification for the backport/\ labels added to this PR**`

									
										186

.github/scripts/auto-backport.py
									
										vendored
									
										Executable file
									
												View File
												
				@@ -0,0 +1,186 @@

				#!/usr/bin/env python3

				import argparse

				import os

				import re

				import sys

				import tempfile

				import logging

				from github import Github, GithubException

				from git import Repo, GitCommandError

				logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				except KeyError:

				    print("Please set the 'GITHUB_TOKEN' environment variable")

				    sys.exit(1)

				def is_pull_request():

				    return '--pull-request' in sys.argv[1:]

				def parse_args():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--repo', type=str, required=True, help='Github repository name')

				    parser.add_argument('--base-branch', type=str, default='refs/heads/master', help='Base branch')

				    parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')

				    parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')

				    parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')

				    return parser.parse_args()

				def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):

				    pr_body = f'{pr.body}\n\n'

				    for commit in commits:

				        pr_body += f'- (cherry picked from commit {commit})\n\n'

				    pr_body += f'Parent PR: #{pr.number}'

				    try:

				        backport_pr = repo.create_pull(

				            title=backport_pr_title,

				            body=pr_body,

				            head=f'scylladbbot:{new_branch_name}',

				            base=base_branch_name,

				            draft=is_draft

				        )

				        logging.info(f"Pull request created: {backport_pr.html_url}")

				        backport_pr.add_to_assignees(pr.user)

				        if is_draft:

				            backport_pr.add_to_labels("conflicts")

				            pr_comment = f"@{pr.user} - This PR was marked as draft because it has conflicts\n"

				            pr_comment += "Please resolve them and mark this PR as ready for review"

				            backport_pr.create_issue_comment(pr_comment)

				        logging.info(f"Assigned PR to original author: {pr.user}")

				        return backport_pr

				    except GithubException as e:

				        if 'A pull request already exists' in str(e):

				            logging.warning(f'A pull request already exists for {pr.user}:{new_branch_name}')

				        else:

				            logging.error(f'Failed to create PR: {e}')

				def get_pr_commits(repo, pr, stable_branch, start_commit=None):

				    commits = []

				    if pr.merged:

				        merge_commit = repo.get_commit(pr.merge_commit_sha)

				        if len(merge_commit.parents) > 1:  # Check if this merge commit includes multiple commits

				            commits.append(pr.merge_commit_sha)

				        else:

				            if start_commit:

				                promoted_commits = repo.compare(start_commit, stable_branch).commits

				            else:

				                promoted_commits = repo.get_commits(sha=stable_branch)

				            for commit in pr.get_commits():

				                for promoted_commit in promoted_commits:

				                    commit_title = commit.commit.message.splitlines()[0]

				                    # In Scylla-pkg and scylla-dtest, for example,

				                    # we don't create a merge commit for a PR with multiple commits,

				                    # according to the GitHub API, the last commit will be the merge commit,

				                    # which is not what we need when backporting (we need all the commits).

				                    # So here, we are validating the correct SHA for each commit so we can cherry-pick

				                    if promoted_commit.commit.message.startswith(commit_title):

				                        commits.append(promoted_commit.sha)

				    elif pr.state == 'closed':

				        events = pr.get_issue_events()

				        for event in events:

				            if event.event == 'closed':

				                commits.append(event.commit_id)

				    return commits

				def create_pr_comment_and_remove_label(pr, comment_body):

				    labels = pr.get_labels()

				    pattern = re.compile(r"backport/\d+\.\d+$")

				    for label in labels:

				        if pattern.match(label.name):

				            print(f"Removing label: {label.name}")

				            comment_body += f'- {label.name}\n'

				            pr.remove_from_labels(label)

				    pr.create_issue_comment(comment_body)

				def backport(repo, pr, version, commits, backport_base_branch):

				    new_branch_name = f'backport/{pr.number}/to-{version}'

				    backport_pr_title = f'[Backport {version}] {pr.title}'

				    repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'

				    fork_repo = f'https://scylladbbot:{github_token}@github.com/scylladbbot/{repo.name}.git'

				    with (tempfile.TemporaryDirectory() as local_repo_path):

				        try:

				            repo_local = Repo.clone_from(repo_url, local_repo_path, branch=backport_base_branch)

				            repo_local.git.checkout(b=new_branch_name)

				            is_draft = False

				            for commit in commits:

				                try:

				                    repo_local.git.cherry_pick(commit, '-m1', '-x')

				                except GitCommandError as e:

				                    logging.warning(f'Cherry-pick conflict on commit {commit}: {e}')

				                    is_draft = True

				                    repo_local.git.add(A=True)

				                    repo_local.git.cherry_pick('--continue')

				            if not repo.private and not repo.has_in_collaborators(pr.user.login):

				                repo.add_to_collaborators(pr.user.login, permission="push")

				                comment = f':warning:  @{pr.user.login} you have been added as collaborator to scylladbbot fork '

				                comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again'

				                create_pr_comment_and_remove_label(pr, comment)

				                return

				            repo_local.git.push(fork_repo, new_branch_name, force=True)

				            create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,

				                                is_draft=is_draft)

				        except GitCommandError as e:

				            logging.warning(f"GitCommandError: {e}")

				def main():

				    args = parse_args()

				    base_branch = args.base_branch.split('/')[2]

				    promoted_label = 'promoted-to-master'

				    repo_name = args.repo

				    if 'scylla-enterprise' in args.repo:

				        promoted_label = 'promoted-to-enterprise'

				    stable_branch = base_branch

				    backport_branch = 'branch-'

				    backport_label_pattern = re.compile(r'backport/\d+\.\d+$')

				    g = Github(github_token)

				    repo = g.get_repo(repo_name)

				    closed_prs = []

				    start_commit = None

				    if args.commits:

				        start_commit, end_commit = args.commits.split('..')

				        commits = repo.compare(start_commit, end_commit).commits

				        for commit in commits:

				            match = re.search(rf"Closes .*#([0-9]+)", commit.commit.message, re.IGNORECASE)

				            if match:

				                pr_number = int(match.group(1))

				                pr = repo.get_pull(pr_number)

				                closed_prs.append(pr)

				    if args.pull_request:

				        start_commit = args.head_commit

				        pr = repo.get_pull(args.pull_request)

				        closed_prs = [pr]

				    for pr in closed_prs:

				        labels = [label.name for label in pr.labels]

				        backport_labels = [label for label in labels if backport_label_pattern.match(label)]

				        if promoted_label not in labels:

				            print(f'no {promoted_label} label: {pr.number}')

				            continue

				        if not backport_labels:

				            print(f'no backport label: {pr.number}')

				            continue

				        commits = get_pr_commits(repo, pr, stable_branch, start_commit)

				        logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")

				        for backport_label in backport_labels:

				            version = backport_label.replace('backport/', '')

				            backport_base_branch = backport_label.replace('backport/', backport_branch)

				            backport(repo, pr, version, commits, backport_base_branch)

				if __name__ == "__main__":

				    main()

									
										82

.github/scripts/label_promoted_commits.py
									
										vendored
									
												View File
												
				@@ -1,9 +1,9 @@

				import requests

				from github import Github

				import argparse

				import re

				import sys

				import os

				from github import Github

				from github.GithubException import UnknownObjectException

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				@@ -16,43 +16,71 @@ def parser():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--repository', type=str, required=True,

				                        help='Github repository name (e.g., scylladb/scylladb)')

				    parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('

				                                                                               'newest commit).')

				    parser.add_argument('--commit_after_merge', type=str, required=True,

				                        help='Git commit ID to end labeling at (oldest '

				                             'commit, exclusive).')

				    parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '

				                                                                         'done')

				    parser.add_argument('--label', type=str, required=True, help='Label to use')

				    parser.add_argument('--commits', type=str, required=True, help='Range of promoted commits.')

				    parser.add_argument('--label', type=str, default='promoted-to-master', help='Label to use')

				    parser.add_argument('--ref', type=str, required=True, help='PR target branch')

				    return parser.parse_args()

				def add_comment_and_close_pr(pr, comment):

				    if pr.state == 'open':

				        pr.create_issue_comment(comment)

				        pr.edit(state="closed")

				def mark_backport_done(repo, ref_pr_number, branch):

				    pr = repo.get_pull(int(ref_pr_number))

				    label_to_remove = f'backport/{branch}'

				    label_to_add = f'{label_to_remove}-done'

				    current_labels = [label.name for label in pr.get_labels()]

				    if label_to_remove in current_labels:

				        pr.remove_from_labels(label_to_remove)

				    if label_to_add not in current_labels:

				        pr.add_to_labels(label_to_add)

				def main():

				    # This script is triggered by a push event to either the master branch or a branch named branch-x.y (where x and y represent version numbers). Based on the pushed branch, the script performs the following actions:

				    # - When ref branch is `master`, it will add the `promoted-to-master` label, which we need later for the auto backport process

				    # - When ref branch is `branch-x.y` (which means we backported a patch), it will replace in the original PR the `backport/x.y` label with `backport/x.y-done` and will close the backport PR (Since GitHub close only the one referring to default branch)

				    args = parser()

				    pr_pattern = re.compile(r'Closes .*#([0-9]+)')

				    target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)

				    g = Github(github_token)

				    repo = g.get_repo(args.repository, lazy=False)

				    commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)

				    start_commit, end_commit = args.commits.split('..')

				    commits = repo.compare(start_commit, end_commit).commits

				    processed_prs = set()

				    # Print commit information

				    for commit in commits.commits:

				        print(commit.sha)

				    for commit in commits:

				        print(f'Commit sha is: {commit.sha}')

				        match = pr_pattern.search(commit.commit.message)

				        if match:

				            pr_number = match.group(1)

				            url = f'https://api.github.com/repos/{args.repository}/issues/{pr_number}/labels'

				            data = {

				                "labels": [f'{args.label}']

				            }

				            headers = {

				                "Authorization": f"token {github_token}",

				                "Accept": "application/vnd.github.v3+json"

				            }

				            response = requests.post(url, headers=headers, json=data)

				            if response.ok:

				                print(f"Label added successfully to {url}")

				            pr_number = int(match.group(1))

				            if pr_number in processed_prs:

				                continue

				            if target_branch:

				                pr = repo.get_pull(pr_number)

				                branch_name = target_branch[1]

				                refs_pr = re.findall(r'Parent PR: (?:#|https.*?)(\d+)', pr.body)

				                if refs_pr:

				                    print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')

				                    # 1. change the backport label of the parent PR to note that

				                    #    we've merged the corresponding backport PR

				                    # 2. close the backport PR and leave a comment on it to note

				                    #    that it has been merged with a certain git commit.

				                    ref_pr_number = refs_pr[0]

				                    mark_backport_done(repo, ref_pr_number, branch_name)

				                    comment = f'Closed via {commit.sha}'

				                    add_comment_and_close_pr(pr, comment)

				            else:

				                print(f"No label was added to {url}")

				                try:

				                    pr = repo.get_pull(pr_number)

				                    pr.add_to_labels('promoted-to-master')

				                    print(f'master branch, pr number is: {pr_number}')

				                except UnknownObjectException:

				                    print(f'{pr_number} is not a PR but an issue, no need to add label')

				            processed_prs.add(pr_number)

				if __name__ == "__main__":

									
										55

.github/workflows/add-label-when-promoted.yaml
									
										vendored
									
												View File
												
				@@ -4,6 +4,11 @@ on:

				  push:

				    branches:

				      - master

				      - branch-*.*

				      - enterprise

				    pull_request_target:

				      types: [labeled]

				      branches: [master, next, enterprise]

				jobs:

				  check-commit:

				@@ -12,15 +17,55 @@ jobs:

				      pull-requests: write

				      issues: write

				    steps:

				      - name: Dump GitHub context

				        env:

				          GITHUB_CONTEXT: ${{ toJson(github) }}

				        run: echo "$GITHUB_CONTEXT"

				      - name: Set Default Branch

				        id: set_branch

				        run: |

				          if [[ "${{ github.repository }}" == *enterprise* ]]; then

				            echo "DEFAULT_BRANCH=enterprise" >> $GITHUB_ENV

				          else

				            echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV

				          fi

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				          repository: ${{ github.repository }}

				          ref: ${{ env.DEFAULT_BRANCH }}

				          token: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				          fetch-depth: 0  # Fetch all history for all tags and branches

				      - name: Set up Git identity

				        run: |

				          git config --global user.name "GitHub Action"

				          git config --global user.email "action@github.com"

				          git config --global merge.conflictstyle diff3

				      - name: Install dependencies

				        run: sudo apt-get install -y python3-github

				        run: sudo apt-get install -y python3-github python3-git

				      - name: Run python script

				        if: github.event_name == 'push'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --label promoted-to-master

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/label_promoted_commits.py  --commits ${{ github.event.before }}..${{ github.sha }} --repository ${{ github.repository }} --ref ${{ github.ref }}

				      - name: Run auto-backport.py when promotion completed

				        if: ${{ github.event_name == 'push' && github.ref == format('refs/heads/{0}', env.DEFAULT_BRANCH) }}

				        env:

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}

				      - name: Check if label starts with 'backport/' and contains digits

				        id: check_label

				        run: |

				          label_name="${{ github.event.label.name }}"

				          if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then

				            echo "Label matches backport/X.X pattern."

				            echo "backport_label=true" >> $GITHUB_OUTPUT

				          else

				            echo "Label does not match the required pattern."

				            echo "backport_label=false" >> $GITHUB_OUTPUT

				          fi

				      - name: Run auto-backport.py when label was added

				        if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && github.event.pull_request.state == 'closed' }}

				        env:

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}

									
										35

.github/workflows/build-scylla.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				name: Build Scylla

				on:

				  workflow_call:

				    inputs:

				      build_mode:

				        description: 'the build mode'

				        type: string

				        required: true

				    outputs:

				      md5sum:

				        description: 'the md5sum for scylla executable'

				        value: ${{ jobs.build.outputs.md5sum }}

				jobs:

				  build:

				    runs-on: ubuntu-latest

				    # be consistent with tools/toolchain/image

				    container: scylladb/scylla-toolchain:fedora-40-20240621

				    outputs:

				      md5sum: ${{ steps.checksum.outputs.md5sum }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: recursive

				      - name: Generate the building system

				        run: |

				          git config --global --add safe.directory $GITHUB_WORKSPACE

				          ./configure.py --mode ${{ inputs.build_mode }} --with scylla

				      - run: |

				          ninja build/${{ inputs.build_mode }}/scylla

				      - id: checksum

				        run: |

				          checksum=$(md5sum build/${{ inputs.build_mode }}/scylla | cut -c -32)

				          echo "md5sum=$checksum" >> $GITHUB_OUTPUT

									
										65

.github/workflows/clang-nightly.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,65 @@

				name: clang-nightly

				on:

				  schedule:

				    # only at 5AM Saturday

				    - cron: '0 5 * * SAT'

				env:

				  # use the development branch explicitly

				  CLANG_VERSION: 19

				  BUILD_DIR: build

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  clang-dev:

				    name: Build with clang nightly

				    runs-on: ubuntu-latest

				    container: fedora:40

				    strategy:

				      matrix:

				        build_type:

				          - Debug

				          - RelWithDebInfo

				          - Dev

				    steps:

				      - run: |

				          sudo dnf -y install git

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Install build dependencies

				        run: |

				          # use the copr repo for llvm snapshot builds, see

				          # https://copr.fedorainfracloud.org/coprs/g/fedora-llvm-team/llvm-snapshots/

				          sudo dnf -y install 'dnf-command(copr)'

				          sudo dnf copr enable -y @fedora-llvm-team/llvm-snapshots

				          # do not install java dependencies, which is not only not used here

				          sed -i.orig \

				            -e '/tools\/.*\/install-dependencies.sh/d' \

				            -e 's/(minio_download_jobs)/(true)/' \

				            ./install-dependencies.sh

				          sudo ./install-dependencies.sh

				          sudo dnf -y install lld

				      - name: Generate the building system

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \

				            -DCMAKE_C_COMPILER=clang-$CLANG_VERSION     \

				            -DCMAKE_CXX_COMPILER=clang++-$CLANG_VERSION \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      # see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md

				      - run: |

				          echo "::add-matcher::.github/clang-matcher.json"

				      - run: |

				          cmake --build $BUILD_DIR --target scylla

				      - run: |

				          echo "::remove-matcher owner=clang::"

									
										67

.github/workflows/clang-tidy.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,67 @@

				name: clang-tidy

				on:

				  pull_request:

				    branches:

				      - master

				    paths-ignore:

				      - '**/*.rst'

				      - '**/*.md'

				      - 'docs/**'

				      - '.github/**'

				  workflow_dispatch:

				  schedule:

				    # only at 5AM Saturday

				    - cron: '0 5 * * SAT'

				env:

				  BUILD_TYPE: RelWithDebInfo

				  BUILD_DIR: build

				  CLANG_TIDY_CHECKS: '-*,bugprone-use-after-move'

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  read-toolchain:

				    uses: ./.github/workflows/read-toolchain.yaml

				  clang-tidy:

				    name: Run clang-tidy

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    steps:

				      - env:

				          IMAGE: ${{ needs.read-toolchain.image }}

				        run: |

				          echo ${{ needs.read-toolchain.image }}

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          sudo dnf -y install clang-tools-extra

				      - name: Generate the building system

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=$BUILD_TYPE              \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DScylla_USE_LINKER=ld.lld                  \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON          \

				            -DCMAKE_CXX_CLANG_TIDY="clang-tidy;--checks=$CLANG_TIDY_CHECKS" \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      # see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md

				      - run: |

				          echo "::add-matcher::.github/clang-matcher.json"

				      - name: Build with clang-tidy enabled

				        run: |

				          cmake --build $BUILD_DIR --target scylla

				      - run: |

				          echo "::remove-matcher owner=clang::"

									
										2

.github/workflows/codespell.yaml
									
										vendored
									
												View File
												
				@@ -14,4 +14,4 @@ jobs:

				        with:

				          only_warn: 1

				          ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison"

				          skip: "./.git,./build,./tools,*.js,*.thrift,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

				          skip: "./.git,./build,./tools,*.js,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

									
										80

.github/workflows/iwyu.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,80 @@

				name: iwyu

				on:

				  pull_request:

				    branches:

				      - master

				env:

				  BUILD_TYPE: RelWithDebInfo

				  BUILD_DIR: build

				  CLEANER_OUTPUT_PATH: build/clang-include-cleaner.log

				  CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  read-toolchain:

				    uses: ./.github/workflows/read-toolchain.yaml

				  clang-include-cleaner:

				    name: "Analyze #includes in source files"

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          sudo dnf -y install clang-tools-extra

				      - name: Generate compilation database

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=$BUILD_TYPE              \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON          \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      - name: Build headers

				        run: |

				          swagger_targets=''

				          for f in api/api-doc/*.json; do

				            if test "${f#*.}" = json; then

				              name=$(basename "$f" .json)

				              if test $name != swagger20_header; then

				                swagger_targets+=" scylla_swagger_gen_$name"

				              fi

				            fi

				          done

				          cmake                                         \

				            --build build                               \

				             --target seastar_http_request_parser       \

				             --target idl-sources                       \

				             --target $swagger_targets

				      - run: |

				          echo "::add-matcher::.github/clang-include-cleaner.json"

				      - name: clang-include-cleaner

				        run: |

				          for d in $CLEANER_DIRS; do

				            find $d -name '*.cc' -o -name '*.hh'          \

				              -exec echo {} \;                            \

				              -exec clang-include-cleaner                 \

				                --ignore-headers=seastarx.hh              \

				                --print=changes                           \

				                -p $BUILD_DIR                             \

				                {} \; | tee --append $CLEANER_OUTPUT_PATH

				          done

				      - run: |

				          echo "::remove-matcher owner=clang-include-cleaner::"

				      - uses: actions/upload-artifact@v4

				        with:

				          name: Logs (clang-include-cleaner)

				          path: "./${{ env.CLEANER_OUTPUT_PATH }}"

									
										23

.github/workflows/read-toolchain.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,23 @@

				name: Read Toolchain

				on:

				  workflow_call:

				    outputs:

				      image:

				        description: "the toolchain docker image"

				        value: ${{ jobs.read-toolchain.outputs.image }}

				jobs:

				  read-toolchain:

				    runs-on: ubuntu-latest

				    outputs:

				      image: ${{ steps.read.outputs.image }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          sparse-checkout: tools/toolchain/image

				          sparse-checkout-cone-mode: false

				      - id: read

				        run: |

				          image=$(cat tools/toolchain/image)

				          echo "image=$image" >> $GITHUB_OUTPUT

									
										34

.github/workflows/reproducible-build.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,34 @@

				name: Check Reproducible Build

				on:

				  schedule:

				    # 5AM every friday

				    - cron: '0 5 * * FRI'

				permissions: {}

				env:

				  BUILD_MODE: release

				jobs:

				  build-a:

				    uses: ./.github/workflows/build-scylla.yaml

				    with:

				      build_mode: release

				  build-b:

				    uses: ./.github/workflows/build-scylla.yaml

				    with:

				      build_mode: release

				  compare-checksum:

				    runs-on: ubuntu-latest

				    needs:

				      - build-a

				      - build-b

				    steps:

				      - env:

				          CHECKSUM_A: ${{needs.build-a.outputs.md5sum}}

				          CHECKSUM_B: ${{needs.build-b.outputs.md5sum}}

				        run: |

				          if [ $CHECKSUM_A != $CHECKSUM_B ]; then                             \

				            echo "::error::mismatched checksums: $CHECKSUM_A != $CHECKSUM_B"; \

				            exit 1;                                                           \

				          fi

									
										50

.github/workflows/seastar.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,50 @@

				name: Build with the latest Seastar

				on:

				  schedule:

				    # 5AM everyday

				    - cron: '0 5 * * *'

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				env:

				  BUILD_DIR: build

				jobs:

				  build-with-the-latest-seastar:

				    runs-on: ubuntu-latest

				    # be consistent with tools/toolchain/image

				    container: scylladb/scylla-toolchain:fedora-40-20240621

				    strategy:

				      matrix:

				        build_type:

				          - Debug

				          - RelWithDebInfo

				          - Dev

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          rm -rf seastar

				      - uses: actions/checkout@v4

				        with:

				          repository: scylladb/seastar

				          submodules: true

				          path: seastar

				      - name: Generate the building system

				        run: |

				          git config --global --add safe.directory $GITHUB_WORKSPACE

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      - run: |

				          cmake --build $BUILD_DIR --target scylla

									
										6

.github/workflows/sync-labels.yaml
									
										vendored
									
												View File
												
				@@ -16,6 +16,10 @@ jobs:

				      pull-requests: write

				      issues: write

				    steps:

				      - name: Dump GitHub context

				        env:

				          GITHUB_CONTEXT: ${{ toJson(github) }}

				        run: echo "$GITHUB_CONTEXT"

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				@@ -33,7 +37,7 @@ jobs:

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}

				      - name: Pull request labeled or unlabeled event

				        if: github.event_name == 'pull_request' && startsWith(github.event.label.name, 'backport/')

				        if: github.event_name == 'pull_request_target' && startsWith(github.event.label.name, 'backport/')

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}

4

.gitignore vendored

View File

@@ -3,6 +3,7 @@
 .settings
 build
 build.ninja
 build.ninja.new
 cscope.*
 /debian/
 dist/ami/files/*.rpm
@@ -18,7 +19,7 @@ CMakeLists.txt.user
 *.egg-info
 __pycache__CMakeLists.txt.user
 .gdbinit
 resources
 /resources
 .pytest_cache
 /expressions.tokens
 tags
@@ -30,3 +31,4 @@ compile_commands.json
 .ccls-cache/
 .mypy_cache
 .envrc
 clang_build

5

.gitmodules vendored

View File

@@ -1,11 +1,14 @@
 [submodule "seastar"]
 	path = seastar
 	url = ../seastar
 	url = ../scylla-seastar
 	ignore = dirty
 [submodule "swagger-ui"]
 	path = swagger-ui
 	url = ../scylla-swagger-ui
 	ignore = dirty
 [submodule "abseil"]
 	path = abseil
 	url = ../abseil-cpp
 [submodule "scylla-jmx"]
 	path = tools/jmx
 	url = ../scylla-jmx

									
										55

CMakeLists.txt
									
												View File
												
				@@ -42,21 +42,48 @@ else()

				        COMMENT "List configured modes")

				endif()

				add_compile_definitions(

				    FMT_DEPRECATED_OSTREAM)

				include(limit_jobs)

				# Configure Seastar compile options to align with Scylla

				set(CMAKE_CXX_STANDARD "20" CACHE INTERNAL "")

				set(CMAKE_CXX_STANDARD "23" CACHE INTERNAL "")

				set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")

				set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")

				set(CMAKE_CXX_VISIBILITY_PRESET hidden)

				set(Seastar_TESTING ON CACHE BOOL "" FORCE)

				set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)

				set(Seastar_DEPRECATED_OSTREAM_FORMATTERS OFF CACHE BOOL "" FORCE)

				set(Seastar_APPS ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

				add_subdirectory(seastar)

				set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)

				find_package(Sanitizers QUIET)

				set(sanitizer_cxx_flags

				    $<$<IN_LIST:$<CONFIG>,Debug;Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)

				if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

				    set(ABSL_GCC_FLAGS ${sanitizer_cxx_flags})

				elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")

				    set(ABSL_LLVM_FLAGS ${sanitizer_cxx_flags})

				endif()

				set(ABSL_DEFAULT_LINKOPTS

				    $<$<IN_LIST:$<CONFIG>,Debug;Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_LINK_LIBRARIES>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_LINK_LIBRARIES>>)

				add_subdirectory(abseil)

				add_library(absl-headers INTERFACE)

				target_include_directories(absl-headers SYSTEM INTERFACE

				    "${PROJECT_SOURCE_DIR}/abseil")

				add_library(absl::headers ALIAS absl-headers)

				# Exclude absl::strerror from the default "all" target since it's not

				# used in Scylla build and, moreover, makes use of deprecated glibc APIs,

				# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,

				# which happens to be the case for recent Fedora distribution versions.

				#

				# Need to use the internal "absl_strerror" target name instead of namespaced

				# variant because `set_target_properties` does not understand the latter form,

				# unfortunately.

				set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)

				# System libraries dependencies

				find_package(Boost REQUIRED

				@@ -68,13 +95,13 @@ target_link_libraries(Boost::regex

				find_package(Lua REQUIRED)

				find_package(ZLIB REQUIRED)

				find_package(ICU COMPONENTS uc i18n REQUIRED)

				find_package(absl COMPONENTS hash raw_hash_set REQUIRED)

				find_package(fmt 9.0.0 REQUIRED)

				find_package(libdeflate REQUIRED)

				find_package(libxcrypt REQUIRED)

				find_package(Snappy REQUIRED)

				find_package(RapidJSON REQUIRED)

				find_package(Thrift REQUIRED)

				find_package(xxHash REQUIRED)

				find_package(zstd REQUIRED)

				set(scylla_gen_build_dir "${CMAKE_BINARY_DIR}/gen")

				file(MAKE_DIRECTORY "${scylla_gen_build_dir}")

				@@ -82,6 +109,14 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")

				include(add_version_library)

				generate_scylla_version()

				add_library(scylla-zstd STATIC

				    zstd.cc)

				target_link_libraries(scylla-zstd

				  PRIVATE

				    db

				    Seastar::seastar

				    zstd::libzstd)

				add_library(scylla-main STATIC)

				target_sources(scylla-main

				  PRIVATE

				@@ -120,11 +155,13 @@ target_sources(scylla-main

				    timeout_config.cc

				    unimplemented.cc

				    validation.cc

				    vint-serialization.cc

				    zstd.cc)

				    vint-serialization.cc)

				target_link_libraries(scylla-main

				  PRIVATE

				    "$<LINK_LIBRARY:WHOLE_ARCHIVE,scylla-zstd>"

				    db

				    absl::headers

				    absl::btree

				    absl::hash

				    absl::raw_hash_set

				    Seastar::seastar

				@@ -169,7 +206,6 @@ add_subdirectory(dht)

				add_subdirectory(gms)

				add_subdirectory(idl)

				add_subdirectory(index)

				add_subdirectory(interface)

				add_subdirectory(lang)

				add_subdirectory(locator)

				add_subdirectory(message)

				@@ -187,7 +223,6 @@ add_subdirectory(service)

				add_subdirectory(sstables)

				add_subdirectory(streaming)

				add_subdirectory(test)

				add_subdirectory(thrift)

				add_subdirectory(tools)

				add_subdirectory(tracing)

				add_subdirectory(transport)

				@@ -228,7 +263,6 @@ target_link_libraries(scylla PRIVATE

				    sstables

				    streaming

				    test-perf

				    thrift

				    tools

				    tracing

				    transport

				@@ -237,6 +271,7 @@ target_link_libraries(scylla PRIVATE

				target_link_libraries(scylla PRIVATE

				    seastar

				    absl::headers

				    Boost::program_options)

				target_include_directories(scylla PRIVATE

									
										2

HACKING.md
									
												View File
												
				@@ -199,7 +199,7 @@ The `scylla.yaml` file in the repository by default writes all database data to

				Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.

				Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.

				Additionally, when running on under-powered platforms like portable laptops, the `--overprovisioned` flag is useful.

				On a development machine, one might run Scylla as

									
										6

README.md
									
												View File
												
				@@ -65,11 +65,13 @@ $ ./tools/toolchain/dbuild ./build/release/scylla --help

				## Testing

				[![Build with the latest Seastar](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml) [![Check Reproducible Build](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml) [![clang-nightly](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml)

				See [test.py manual](docs/dev/testing.md).

				## Scylla APIs and compatibility

				By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and

				Thrift. There is also support for the API of Amazon DynamoDB™,

				By default, Scylla is compatible with Apache Cassandra and its API - CQL.

				There is also support for the API of Amazon DynamoDB™,

				which needs to be enabled and configured in order to be used. For more

				information on how to enable the DynamoDB™ API in Scylla,

				and the current compatibility of this feature as well as Scylla-specific extensions, see

9

SCYLLA-VERSION-GEN

View File

@@ -78,7 +78,7 @@ fi
 # Default scylla product/version tags
 PRODUCT=scylla
 VERSION=5.5.0-dev
 VERSION=6.1.6
 if test -f version
 then
@@ -88,10 +88,13 @@ else
 	SCYLLA_VERSION=$VERSION
 	if [ -z "$SCYLLA_RELEASE" ]; then
 		GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
 		# For custom package builds, replace "0" with "counter.your_name",
 		# For custom package builds, replace "0" with "counter.yourname",
 		# where counter starts at 1 and increments for successive versions.
 		# This ensures that the package manager will select your custom
 		# package over the standard release.
 		# Do not use any special characters like - or _ in the name above!
 		# These characters either have special meaning or are illegal in
 		# version strings.
 		SCYLLA_BUILD=0
 		SCYLLA_RELEASE=$SCYLLA_BUILD.$DATE.$GIT_COMMIT
 	elif [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
@@ -101,7 +104,7 @@ else
 fi
 if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
 	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" |cut -d . -f 3)
 	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" | rev | cut -d . -f 1 | rev)
 	if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
 		exit 0
 	fi

1

abseil Submodule

Submodule abseil added at d7aaad83b4

									
										3

alternator/CMakeLists.txt
									
												View File
												
				@@ -27,7 +27,8 @@ target_link_libraries(alternator

				  cql3

				  idl

				  Seastar::seastar

				  xxHash::xxhash)

				  xxHash::xxhash

				  absl::headers)

				check_headers(check-headers alternator

				  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

									
										17

alternator/auth.cc
									
												View File
												
				@@ -19,6 +19,7 @@

				#include "alternator/executor.hh"

				#include "cql3/selection/selection.hh"

				#include "cql3/result_set.hh"

				#include "types/types.hh"

				#include <seastar/core/coroutine.hh>

				namespace alternator {

				@@ -31,11 +32,12 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::serv

				    dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};

				    std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};

				    const column_definition* salted_hash_col = schema->get_column_definition(bytes("salted_hash"));

				    if (!salted_hash_col) {

				    const column_definition* can_login_col = schema->get_column_definition(bytes("can_login"));

				    if (!salted_hash_col || !can_login_col) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("Credentials cannot be fetched for: {}", username)));

				    }

				    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col});

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id}, selection->get_query_options());

				    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col, can_login_col});

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id, can_login_col->id}, selection->get_query_options());

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,

				            proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				@@ -51,7 +53,14 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::serv

				    if (result_set->empty()) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("User not found: {}", username)));

				    }

				    const managed_bytes_opt& salted_hash = result_set->rows().front().front(); // We only asked for 1 row and 1 column

				    const auto& result = result_set->rows().front();

				    bool can_login = result[1] && value_cast<bool>(boolean_type->deserialize(*result[1]));

				    if (!can_login) {

				        // This is a valid role name, but has "login=False" so should not be

				        // usable for authentication (see #19735).

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("Role {} has login=false so cannot be used for login", username)));

				    }

				    const managed_bytes_opt& salted_hash = result.front();

				    if (!salted_hash) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("No password found for user: {}", username)));

				    }

									
										14

alternator/controller.cc
									
												View File
												
				@@ -32,8 +32,10 @@ controller::controller(

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config)

				    : _gossiper(gossiper)

				        const db::config& config,

				        seastar::scheduling_group sg)

				    : protocol_server(sg)

				    , _gossiper(gossiper)

				    , _proxy(proxy)

				    , _mm(mm)

				    , _sys_dist_ks(sys_dist_ks)

				@@ -62,7 +64,9 @@ std::vector<socket_address> controller::listen_addresses() const {

				}

				future<> controller::start_server() {

				    return seastar::async([this] {

				    seastar::thread_attributes attr;

				    attr.sched_group = _sched_group;

				    return seastar::async(std::move(attr), [this] {

				        _listen_addresses.clear();

				        auto preferred = _config.listen_interface_prefer_ipv6() ? std::make_optional(net::inet_address::family::INET6) : std::nullopt;

				@@ -156,7 +160,9 @@ future<> controller::stop_server() {

				}

				future<> controller::request_stop_server() {

				    return stop_server();

				    return with_scheduling_group(_sched_group, [this] {

				        return stop_server();

				    });

				}

				}

									
										3

alternator/controller.hh
									
												View File
												
				@@ -80,7 +80,8 @@ public:

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config);

				        const db::config& config,

				        seastar::scheduling_group sg);

				    virtual sstring name() const override;

				    virtual sstring protocol() const override;

									
										12

alternator/executor.cc
									
												View File
												
				@@ -6,8 +6,10 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <fmt/ranges.h>

				#include <seastar/core/sleep.hh>

				#include "alternator/executor.hh"

				#include "cdc/log.hh"

				#include "db/config.hh"

				#include "log.hh"

				#include "schema/schema_builder.hh"

				@@ -1250,7 +1252,7 @@ future<executor::request_return_type> executor::update_table(client_state& clien

				        auto schema = builder.build();

				        auto m = co_await service::prepare_column_family_update_announcement(p.local(), schema, false,  std::vector<view_ptr>(), group0_guard.write_timestamp());

				        auto m = co_await service::prepare_column_family_update_announcement(p.local(), schema,  std::vector<view_ptr>(), group0_guard.write_timestamp());

				        co_await mm.announce(std::move(m), std::move(group0_guard), format("alternator-executor: update {} table", tab->cf_name()));

				@@ -4438,8 +4440,10 @@ future<executor::request_return_type> executor::list_tables(client_state& client

				    auto tables = _proxy.data_dictionary().get_tables(); // hold on to temporary, table_names isn't a container, it's a view

				    auto table_names = tables

				            | boost::adaptors::filtered([] (data_dictionary::table t) {

				                        return t.schema()->ks_name().find(KEYSPACE_NAME_PREFIX) == 0 && !t.schema()->is_view();

				            | boost::adaptors::filtered([this] (data_dictionary::table t) {

				                        return t.schema()->ks_name().find(KEYSPACE_NAME_PREFIX) == 0 &&

				                            !t.schema()->is_view() &&

				                            !cdc::is_log_for_some_table(_proxy.local_db(), t.schema()->ks_name(), t.schema()->cf_name());

				                    })

				            | boost::adaptors::transformed([] (data_dictionary::table t) {

				                        return t.schema()->cf_name();

				@@ -4575,7 +4579,7 @@ static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_vie

				    // used by default on new Alternator tables. Change this initialization

				    // to 0 enable tablets by default, with automatic number of tablets.

				    std::optional<unsigned> initial_tablets;

				    if (sp.get_db().local().get_config().check_experimental(db::experimental_features_t::feature::TABLETS)) {

				    if (sp.get_db().local().get_config().enable_tablets()) {

				        auto it = tags_map.find(INITIAL_TABLETS_TAG_KEY);

				        if (it != tags_map.end()) {

				            // Tag set. If it's a valid number, use it. If not - e.g., it's

									
										1

alternator/executor.hh
									
												View File
												
				@@ -9,7 +9,6 @@

				#pragma once

				#include <seastar/core/future.hh>

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include <seastar/json/json_elements.hh>

				#include <seastar/core/sharded.hh>

									
										4

alternator/expressions.cc
									
												View File
												
				@@ -28,7 +28,7 @@

				namespace alternator {

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>

				static Result do_with_parser(std::string_view input, Func&& f) {

				    expressionsLexer::InputStreamType input_stream{

				        reinterpret_cast<const ANTLR_UINT8*>(input.data()),

				@@ -43,7 +43,7 @@ static Result do_with_parser(std::string_view input, Func&& f) {

				    return result;

				}

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>

				static Result parse(const char* input_name, std::string_view input, Func&& f) {

				    if (input.length() > 4096) {

				        throw expressions_syntax_error(format("{} expression size {} exceeds allowed maximum 4096.",

									
										3

alternator/expressions_types.hh
									
												View File
												
				@@ -66,7 +66,6 @@ public:

				    std::vector<std::variant<std::string, unsigned>>& operators() {

				        return _operators;

				    }

				    friend std::ostream& operator<<(std::ostream&, const path&);

				};

				// When an expression is first parsed, all constants are references, like

				@@ -256,6 +255,6 @@ public:

				} // namespace parsed

				} // namespace alternator

				template <> struct fmt::formatter<alternator::parsed::path> : fmt::formatter<std::string_view> {

				template <> struct fmt::formatter<alternator::parsed::path> : fmt::formatter<string_view> {

				    auto format(const alternator::parsed::path&, fmt::format_context& ctx) const -> decltype(ctx.out());

				};

									
										24

alternator/server.cc
									
												View File
												
				@@ -8,6 +8,7 @@

				#include "alternator/server.hh"

				#include "log.hh"

				#include <fmt/ranges.h>

				#include <seastar/http/function_handlers.hh>

				#include <seastar/http/short_streams.hh>

				#include <seastar/core/coroutine.hh>

				@@ -20,6 +21,8 @@

				#include "utils/rjson.hh"

				#include "auth.hh"

				#include <cctype>

				#include <string_view>

				#include <utility>

				#include "service/storage_proxy.hh"

				#include "gms/gossiper.hh"

				#include "utils/overloaded_functor.hh"

				@@ -33,8 +36,6 @@ using reply = http::reply;

				namespace alternator {

				static constexpr auto TARGET = "X-Amz-Target";

				inline std::vector<std::string_view> split(std::string_view text, char separator) {

				    std::vector<std::string_view> tokens;

				    if (text == "") {

				@@ -210,8 +211,13 @@ protected:

				        sstring local_dc = topology.get_datacenter();

				        std::unordered_set<gms::inet_address> local_dc_nodes = topology.get_datacenter_endpoints().at(local_dc);

				        for (auto& ip : local_dc_nodes) {

				            if (_gossiper.is_alive(ip)) {

				                rjson::push_back(results, rjson::from_string(ip.to_sstring()));

				            // Note that it's not enough for the node to be is_alive() - a

				            // node joining the cluster is also "alive" but not responsive to

				            // requests. We alive *and* normal. See #19694, #21538.

				            if (_gossiper.is_alive(ip) && _gossiper.is_normal(ip)) {

				                // Use the gossiped broadcast_rpc_address if available instead

				                // of the internal IP address "ip". See discussion in #18711.

				                rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(ip)));

				            }

				        }

				        rep->set_status(reply::status_type::ok);

				@@ -384,10 +390,10 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_

				future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {

				    _executor._stats.total_operations++;

				    sstring target = req->get_header(TARGET);

				    std::vector<std::string_view> split_target = split(target, '.');

				    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)

				    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());

				    sstring target = req->get_header("X-Amz-Target");

				    // target is DynamoDB API version followed by a dot '.' and operation type (e.g. CreateTable)

				    auto dot = target.find('.');

				    std::string_view op = (dot == sstring::npos) ? std::string_view() : std::string_view(target).substr(dot+1);

				    // JSON parsing can allocate up to roughly 2x the size of the raw

				    // document, + a couple of bytes for maintenance.

				    // TODO: consider the case where req->content_length is missing. Maybe

				@@ -633,7 +639,7 @@ future<> server::json_parser::stop() {

				const char* api_error::what() const noexcept {

				    if (_what_string.empty()) {

				        _what_string = format("{} {}: {}", static_cast<int>(_http_code), _type, _msg);

				        _what_string = format("{} {}: {}", std::to_underlying(_http_code), _type, _msg);

				    }

				    return _what_string.c_str();

				}

									
										1

alternator/stats.hh
									
												View File
												
				@@ -11,7 +11,6 @@

				#include <cstdint>

				#include <seastar/core/metrics_registration.hh>

				#include "utils/estimated_histogram.hh"

				#include "utils/histogram.hh"

				#include "cql3/stats.hh"

									
										18

alternator/streams.cc
									
												View File
												
				@@ -233,11 +233,8 @@ struct shard_id {

				    // dynamo specifies shardid as max 65 chars. 

				    friend std::ostream& operator<<(std::ostream& os, const shard_id& id) {

				        boost::io::ios_flags_saver fs(os);

				        return os << marker << std::hex  

				            << id.time.time_since_epoch().count()

				            << ':' << id.id.to_bytes()

				            ;

				        fmt::print(os, "{} {:x}:{}", marker, id.time.time_since_epoch().count(), id.id.to_bytes());

				        return os;

				    }

				};

				@@ -779,7 +776,7 @@ struct event_id {

				    cdc::stream_id stream;

				    utils::UUID timestamp;

				    static const auto marker = 'E';

				    static constexpr auto marker = 'E';

				    event_id(cdc::stream_id s, utils::UUID ts)

				        : stream(s)

				@@ -787,10 +784,8 @@ struct event_id {

				    {}

				    friend std::ostream& operator<<(std::ostream& os, const event_id& id) {

				        boost::io::ios_flags_saver fs(os);

				        return os << marker << std::hex << id.stream.to_bytes()

				            << ':' << id.timestamp

				            ;

				        fmt::print(os, "{}{}:{}", marker, id.stream.to_bytes(), id.timestamp);

				        return os;

				    }

				};

				}

				@@ -1057,9 +1052,6 @@ void executor::add_stream_options(const rjson::value& stream_specification, sche

				    if (stream_enabled->GetBool()) {

				        auto db = sp.data_dictionary();

				        if (!db.features().cdc) {

				            throw api_error::validation("StreamSpecification: streams (CDC) feature not enabled in cluster.");

				        }

				        if (!db.features().alternator_streams) {

				            throw api_error::validation("StreamSpecification: alternator streams feature not enabled in cluster.");

				        }

									
										106

alternator/ttl.cc
									
												View File
												
				@@ -26,6 +26,7 @@

				#include "log.hh"

				#include "gc_clock.hh"

				#include "replica/database.hh"

				#include "service/client_state.hh"

				#include "service_permit.hh"

				#include "timestamp.hh"

				#include "service/storage_proxy.hh"

				@@ -312,7 +313,7 @@ static size_t random_offset(size_t min, size_t max) {

				// this range's primary node is down. For this we need to return not just

				// a list of this node's secondary ranges - but also the primary owner of

				// each of those ranges.

				static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary_ranges(

				static future<std::vector<std::pair<dht::token_range, gms::inet_address>>> get_secondary_ranges(

				        const locator::effective_replication_map_ptr& erm,

				        gms::inet_address ep) {

				    const auto& tm = *erm->get_token_metadata_ptr();

				@@ -323,6 +324,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				    }

				    auto prev_tok = sorted_tokens.back();

				    for (const auto& tok : sorted_tokens) {

				        co_await coroutine::maybe_yield();

				        inet_address_vector_replica_set eps = erm->get_natural_endpoints(tok);

				        if (eps.size() <= 1 || eps[1] != ep) {

				            prev_tok = tok;

				@@ -350,7 +352,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				        }

				        prev_tok = tok;

				    }

				    return ret;

				    co_return ret;

				}

				@@ -383,63 +385,66 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				// the chances of covering all ranges during a scan when restarts occur.

				// A more deterministic way would be to regularly persist the scanning state,

				// but that incurs overhead that we want to avoid if not needed.

				enum primary_or_secondary_t {primary, secondary};

				template<primary_or_secondary_t primary_or_secondary>

				class token_ranges_owned_by_this_shard {

				    // ranges_holder_primary holds just the primary ranges themselves

				    class ranges_holder_primary {

				        const dht::token_range_vector _token_ranges;

				     public:

				        ranges_holder_primary(const locator::vnode_effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)

				            : _token_ranges(erm->get_primary_ranges(ep)) {}

				        std::size_t size() const { return _token_ranges.size(); }

				        const dht::token_range& operator[](std::size_t i) const {

				            return _token_ranges[i];

				        }

				        bool should_skip(std::size_t i) const {

				            return false;

				        }

				    };

				    // ranges_holder<secondary> holds the secondary token ranges plus each

				    // range's primary owner, needed to implement should_skip().

				    class ranges_holder_secondary {

				        std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;

				        gms::gossiper& _gossiper;

				     public:

				        ranges_holder_secondary(const locator::effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)

				            : _token_ranges(get_secondary_ranges(erm, ep))

				            , _gossiper(g) {}

				        std::size_t size() const { return _token_ranges.size(); }

				        const dht::token_range& operator[](std::size_t i) const {

				            return _token_ranges[i].first;

				        }

				        // range i should be skipped if its primary owner is alive.

				        bool should_skip(std::size_t i) const {

				            return _gossiper.is_alive(_token_ranges[i].second);

				        }

				    };

				//

				// FIXME: Check if this algorithm is safe with tablet migration.

				// https://github.com/scylladb/scylladb/issues/16567

				// ranges_holder_primary holds just the primary ranges themselves

				class ranges_holder_primary {

				    dht::token_range_vector _token_ranges;

				public:

				    explicit ranges_holder_primary(dht::token_range_vector token_ranges) : _token_ranges(std::move(token_ranges)) {}

				    static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map_ptr& erm, gms::inet_address ep) {

				        co_return ranges_holder_primary(co_await erm->get_primary_ranges(ep));

				    }

				    std::size_t size() const { return _token_ranges.size(); }

				    const dht::token_range& operator[](std::size_t i) const {

				        return _token_ranges[i];

				    }

				    bool should_skip(std::size_t i) const {

				        return false;

				    }

				};

				// ranges_holder<secondary> holds the secondary token ranges plus each

				// range's primary owner, needed to implement should_skip().

				class ranges_holder_secondary {

				    std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;

				    const gms::gossiper& _gossiper;

				public:

				    explicit ranges_holder_secondary(std::vector<std::pair<dht::token_range, gms::inet_address>> token_ranges, const gms::gossiper& g)

				        : _token_ranges(std::move(token_ranges))

				        , _gossiper(g) {}

				    static future<ranges_holder_secondary> make(const locator::effective_replication_map_ptr& erm, gms::inet_address ep, const gms::gossiper& g) {

				        co_return ranges_holder_secondary(co_await get_secondary_ranges(erm, ep), g);

				    }

				    std::size_t size() const { return _token_ranges.size(); }

				    const dht::token_range& operator[](std::size_t i) const {

				        return _token_ranges[i].first;

				    }

				    // range i should be skipped if its primary owner is alive.

				    bool should_skip(std::size_t i) const {

				        return _gossiper.is_alive(_token_ranges[i].second);

				    }

				};

				template<class primary_or_secondary_t>

				class token_ranges_owned_by_this_shard {

				    schema_ptr _s;

				    locator::effective_replication_map_ptr _erm;

				    // _token_ranges will contain a list of token ranges owned by this node.

				    // We'll further need to split each such range to the pieces owned by

				    // the current shard, using _intersecter.

				    using ranges_holder = std::conditional_t<

				            primary_or_secondary == primary_or_secondary_t::primary,

				            ranges_holder_primary,

				            ranges_holder_secondary>;

				    const ranges_holder _token_ranges;

				    const primary_or_secondary_t _token_ranges;

				    // NOTICE: _range_idx is used modulo _token_ranges size when accessing

				    // the data to ensure that it doesn't go out of bounds

				    size_t _range_idx;

				    size_t _end_idx;

				    std::optional<dht::selective_token_range_sharder> _intersecter;

				public:

				    token_ranges_owned_by_this_shard(replica::database& db, gms::gossiper& g, schema_ptr s)

				    token_ranges_owned_by_this_shard(schema_ptr s, primary_or_secondary_t token_ranges)

				        :  _s(s)

				        , _erm(s->table().get_effective_replication_map())

				        , _token_ranges(db.find_keyspace(s->ks_name()).get_vnode_effective_replication_map(),

				                g, _erm->get_topology().my_address())

				        , _token_ranges(std::move(token_ranges))

				        , _range_idx(random_offset(0, _token_ranges.size() - 1))

				        , _end_idx(_range_idx + _token_ranges.size())

				    {

				@@ -495,6 +500,7 @@ struct scan_ranges_context {

				    bytes column_name;

				    std::optional<std::string> member;

				    service::client_state internal_client_state;

				    ::shared_ptr<cql3::selection::selection> selection;

				    std::unique_ptr<service::query_state> query_state_ptr;

				    std::unique_ptr<cql3::query_options> query_options;

				@@ -504,6 +510,7 @@ struct scan_ranges_context {

				        : s(s)

				        , column_name(column_name)

				        , member(member)

				        , internal_client_state(service::client_state::internal_tag())

				    {

				        // FIXME: don't read the entire items - read only parts of it.

				        // We must read the key columns (to be able to delete) and also

				@@ -522,10 +529,9 @@ struct scan_ranges_context {

				        std::vector<query::clustering_range> ck_bounds{query::clustering_range::make_open_ended_both_sides()};

				        auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), opts);

				        command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				        executor::client_state client_state{executor::client_state::internal_tag()};

				        tracing::trace_state_ptr trace_state;

				        // NOTICE: empty_service_permit is used because the TTL service has fixed parallelism

				        query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, empty_service_permit());

				        query_state_ptr = std::make_unique<service::query_state>(internal_client_state, trace_state, empty_service_permit());

				        // FIXME: What should we do on multi-DC? Will we run the expiration on the same ranges on all

				        // DCs or only once for each range? If the latter, we need to change the CLs in the

				        // scanner and deleter.

				@@ -721,7 +727,9 @@ static future<bool> scan_table(

				    expiration_stats.scan_table++;

				    // FIXME: need to pace the scan, not do it all at once.

				    scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};

				    token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), gossiper, s);

				    auto erm = db.real_database().find_keyspace(s->ks_name()).get_vnode_effective_replication_map();

				    auto my_address = erm->get_topology().my_address();

				    token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_address));

				    while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {

				        // Note that because of issue #9167 we need to run a separate

				        // query on each partition range, and can't pass several of

				@@ -741,7 +749,7 @@ static future<bool> scan_table(

				    // by tasking another node to take over scanning of the dead node's primary

				    // ranges. What we do here is that this node will also check expiration

				    // on its *secondary* ranges - but only those whose primary owner is down.

				    token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), gossiper, s);

				    token_ranges_owned_by_this_shard my_secondary_ranges(s, co_await ranges_holder_secondary::make(erm, my_address, gossiper));

				    while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {

				        expiration_stats.secondary_ranges_scanned++;

				        dht::partition_range_vector partition_ranges;

									
										3

api/CMakeLists.txt
									
												View File
												
				@@ -72,7 +72,8 @@ target_link_libraries(api

				  idl

				  wasmtime_bindings

				  Seastar::seastar

				  xxHash::xxhash)

				  xxHash::xxhash

				  absl::headers)

				check_headers(check-headers api

				  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

									
										4

api/api-doc/collectd.json
									
												View File
												
				@@ -67,7 +67,7 @@

				               "parameters":[

				                  {

				                     "name":"pluginid",

				                     "description":"The plugin ID, describe the component the metric belongs to. Examples are cache, thrift, etc'. Regex are supported.The plugin ID, describe the component the metric belong to. Examples are: cache, thrift etc'. regex are supported",

				                     "description":"The plugin ID, describe the component the metric belongs to. Examples are cache and alternator, etc'. Regex are supported.",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -199,4 +199,4 @@

				         }

				      }

				   }

				}

				}

									
										56

api/api-doc/error_injection.json
									
												View File
												
				@@ -63,6 +63,28 @@

				                     "paramType":"path"

				                  }

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Read the state of an injection from all shards",

				               "type":"array",

				               "items":{

				                  "type":"error_injection_info"

				               },

				               "nickname":"read_injection",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"injection",

				                     "description":"injection name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      },

				@@ -152,5 +174,39 @@

				            }

				         }

				      }

				   },

				   "models":{

				      "mapper":{

				         "id":"mapper",

				         "description":"A key value mapping",

				         "properties":{

				            "key":{

				               "type":"string",

				               "description":"The key"

				            },

				            "value":{

				               "type":"string",

				               "description":"The value"

				            }

				         }

				      },

				       "error_injection_info":{

				         "id":"error_injection_info",

				         "description":"Information about an error injection",

				         "properties":{

				            "enabled":{

				               "type":"boolean",

				               "description":"Is the error injection enabled"

				            },

				            "parameters":{

				               "type":"array",

				               "items":{

				                  "type":"mapper"

				               },

				               "description":"The parameter values"

				            }

				         },

				         "required":["enabled"]

				      }

				   }

				}

									
										32

api/api-doc/raft.json
									
												View File
												
				@@ -62,6 +62,38 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path": "/raft/read_barrier",

				         "operations": [

				            {

				               "method": "POST",

				               "summary": "Triggers read barrier for the given Raft group to wait for previously committed commands in this group to be applied locally. For example, can be used on group 0 to wait for the node to obtain latest schema changes.",

				               "type": "string",

				               "nickname": "read_barrier",

				               "produces": [

				                  "application/json"

				               ],

				               "parameters": [

				                  {

				                     "name": "group_id",

				                     "description": "The ID of the group. When absent, group0 is used.",

				                     "required": false,

				                     "allowMultiple": false,

				                     "type": "string",

				                     "paramType": "query"

				                  },

				                  {

				                     "name": "timeout",

				                     "description": "Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",

				                     "required": false,

				                     "allowMultiple": false,

				                     "type": "long",

				                     "paramType": "query"

				                  }

				               ]

				            }

				         ]

				      }

				   ]

				}

									
										68

api/api-doc/storage_service.json
									
												View File
												
				@@ -90,7 +90,7 @@

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Returns a list of the tokens endpoint mapping",

				               "summary":"Returns a list of the tokens endpoint mapping, provide keyspace and cf param to get tablet mapping",

				               "type":"array",

				               "items":{

				                  "type":"mapper"

				@@ -100,6 +100,22 @@

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to provide the tablet mapping for",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"The table to provide the tablet mapping for",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				@@ -1673,33 +1689,11 @@

				      {

				         "path":"/storage_service/rpc_server",

				         "operations":[

				            {

				               "method":"DELETE",

				               "summary":"Allows a user to disable thrift",

				               "type":"void",

				               "nickname":"stop_rpc_server",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"POST",

				               "summary":"allows a user to re-enable thrift",

				               "type":"void",

				               "nickname":"start_rpc_server",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Determine if thrift is running",

				               "type":"boolean",

				               "nickname":"is_rpc_server_running",

				               "nickname":"is_thrift_server_running",

				               "produces":[

				                  "application/json"

				               ],

				@@ -1897,6 +1891,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"Enforce the source_dc option, even if it unsafe to use for rebuild",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -2054,7 +2056,7 @@

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Enables/Disables tracing for the whole system. Only thrift requests can start tracing currently",

				               "summary":"Enables/Disables tracing for the whole system.",

				               "type":"void",

				               "nickname":"set_trace_probability",

				               "produces":[

				@@ -2726,6 +2728,22 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/quiesce_topology",

				         "operations":[

				            {

				               "nickname":"quiesce_topology",

				               "method":"POST",

				               "summary":"Waits until there are no ongoing topology operations. Guarantees that topology operations which started before the call are finished after the call. This doesn't consider requested but not started operations. Such operations may start after the call succeeds.",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/metrics/total_hints",

				         "operations":[

									
										15

api/api-doc/system.json
									
												View File
												
				@@ -194,6 +194,21 @@

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/system/highest_supported_sstable_version",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get highest supported sstable version",

				               "type":"string",

				               "nickname":"get_highest_supported_sstable_version",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      }

				   ]

				}

									
										2

api/api-doc/utils.json
									
												View File
												
				@@ -75,7 +75,7 @@

				               "items":{

				                  "type":"double"

				               },

				               "description":"One, five and fifteen mintues rates"

				               "description":"One, five and fifteen minutes rates"

				            },

				            "mean_rate": {

				               "type":"double",

									
										52

api/api.cc
									
												View File
												
				@@ -71,6 +71,8 @@ future<> set_server_init(http_context& ctx) {

				        rb->register_function(r, "error_injection",

				            "The error injection API");

				        set_error_injection(ctx, r);

				        rb->register_function(r, "storage_proxy",

				                "The storage proxy API");

				    });

				}

				@@ -81,6 +83,10 @@ future<> set_server_config(http_context& ctx, const db::config& cfg) {

				    });

				}

				future<> unset_server_config(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_config(ctx, r); });

				}

				static future<> register_api(http_context& ctx, const sstring& api_name,

				        const sstring api_desc,

				        std::function<void(http_context& ctx, routes& r)> f) {

				@@ -100,12 +106,12 @@ future<> unset_transport_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_transport_controller(ctx, r); });

				}

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl) {

				    return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_rpc_controller(ctx, r, ctl); });

				future<> set_thrift_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { set_thrift_controller(ctx, r); });

				}

				future<> unset_rpc_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_rpc_controller(ctx, r); });

				future<> unset_thrift_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_thrift_controller(ctx, r); });

				}

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {

				@@ -118,6 +124,14 @@ future<> unset_server_storage_service(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_storage_service(ctx, r); });

				}

				future<> set_load_meter(http_context& ctx, service::load_meter& lm) {

				    return ctx.http_server.set_routes([&ctx, &lm] (routes& r) { set_load_meter(ctx, r, lm); });

				}

				future<> unset_load_meter(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_load_meter(ctx, r); });

				}

				future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader) {

				    return ctx.http_server.set_routes([&ctx, &sst_loader] (routes& r) { set_sstables_loader(ctx, r, sst_loader); });

				}

				@@ -180,10 +194,21 @@ future<> unset_server_snitch(http_context& ctx) {

				}

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {

				    return register_api(ctx, "gossiper",

				    co_await register_api(ctx, "gossiper",

				                "The gossiper API", [&g] (http_context& ctx, routes& r) {

				                    set_gossiper(ctx, r, g.local());

				                });

				    co_await register_api(ctx, "failure_detector",

				                "The failure detector API", [&g] (http_context& ctx, routes& r) {

				                    set_failure_detector(ctx, r, g.local());

				                });

				}

				future<> unset_server_gossip(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) {

				        unset_gossiper(ctx, r);

				        unset_failure_detector(ctx, r);

				    });

				}

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {

				@@ -208,10 +233,7 @@ future<> unset_server_messaging_service(http_context& ctx) {

				}

				future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy) {

				    return register_api(ctx, "storage_proxy",

				                "The storage proxy API", [&proxy] (http_context& ctx, routes& r) {

				                    set_storage_proxy(ctx, r, proxy);

				                });

				    return ctx.http_server.set_routes([&ctx, &proxy] (routes& r) { set_storage_proxy(ctx, r, proxy); });

				}

				future<> unset_server_storage_proxy(http_context& ctx) {

				@@ -245,16 +267,6 @@ future<> unset_hinted_handoff(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_hinted_handoff(ctx, r); });

				}

				future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &g](routes& r) {

				        rb->register_function(r, "failure_detector",

				                "The failure detector API");

				        set_failure_detector(ctx, r, g.local());

				    });

				}

				future<> set_server_compaction_manager(http_context& ctx) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				@@ -346,7 +358,7 @@ void req_params::process(const request& req) {

				            continue;

				        }

				        try {

				            ent.value = req.param[name];

				            ent.value = req.get_path_param(name);

				        } catch (std::out_of_range&) {

				            throw httpd::bad_param_exception(fmt::format("Mandatory parameter '{}' was not provided", name));

				        }

									
										16

api/api_init.hh
									
												View File
												
				@@ -46,7 +46,6 @@ class snitch_ptr;

				} // namespace locator

				namespace cql_transport { class controller; }

				class thrift_controller;

				namespace db {

				class snapshot_ctl;

				class config;

				@@ -77,17 +76,16 @@ struct http_context {

				    sstring api_doc;

				    httpd::http_server_control http_server;

				    distributed<replica::database>& db;

				    service::load_meter& lmeter;

				    http_context(distributed<replica::database>& _db,

				            service::load_meter& _lm)

				            : db(_db), lmeter(_lm)

				    http_context(distributed<replica::database>& _db)

				            : db(_db)

				    {

				    }

				};

				future<> set_server_init(http_context& ctx);

				future<> set_server_config(http_context& ctx, const db::config& cfg);

				future<> unset_server_config(http_context& ctx);

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);

				future<> unset_server_snitch(http_context& ctx);

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client&);

				@@ -100,8 +98,8 @@ future<> set_server_repair(http_context& ctx, sharded<repair_service>& repair);

				future<> unset_server_repair(http_context& ctx);

				future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);

				future<> unset_transport_controller(http_context& ctx);

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);

				future<> unset_rpc_controller(http_context& ctx);

				future<> set_thrift_controller(http_context& ctx);

				future<> unset_thrift_controller(http_context& ctx);

				future<> set_server_authorization_cache(http_context& ctx, sharded<auth::service> &auth_service);

				future<> unset_server_authorization_cache(http_context& ctx);

				future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);

				@@ -109,6 +107,7 @@ future<> unset_server_snapshot(http_context& ctx);

				future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm);

				future<> unset_server_token_metadata(http_context& ctx);

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);

				future<> unset_server_gossip(http_context& ctx);

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);

				future<> unset_server_column_family(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);

				@@ -119,7 +118,6 @@ future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_

				future<> unset_server_stream_manager(http_context& ctx);

				future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& p);

				future<> unset_hinted_handoff(http_context& ctx);

				future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);

				future<> set_server_cache(http_context& ctx);

				future<> set_server_compaction_manager(http_context& ctx);

				future<> set_server_done(http_context& ctx);

				@@ -131,5 +129,7 @@ future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::

				future<> unset_server_tasks_compaction_module(http_context& ctx);

				future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);

				future<> unset_server_raft(http_context&);

				future<> set_load_meter(http_context& ctx, service::load_meter& lm);

				future<> unset_load_meter(http_context& ctx);

				}

									
										15

api/authorization_cache.hh
									
												View File
												
				@@ -8,11 +8,20 @@

				#pragma once

				#include "api.hh"

				#include <seastar/core/sharded.hh>

				namespace seastar::httpd {

				class routes;

				}

				namespace auth {

				class service;

				}

				namespace api {

				void set_authorization_cache(http_context& ctx, httpd::routes& r, sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_authorization_cache(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, seastar::httpd::routes& r);

				}

									
										7

api/cache_service.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include "cache_service.hh"

				#include "api/api.hh"

				#include "api/api-doc/cache_service.json.hh"

				#include "column_family.hh"

				@@ -195,9 +196,9 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return memory::stats().total_memory();

				    cs::get_row_capacity.set(r, [] (std::unique_ptr<http::request> req) {

				        return seastar::map_reduce(smp::all_cpus(), [] (int cpu) {

				            return make_ready_future<uint64_t>(memory::stats().total_memory());

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

									
										7

api/cache_service.hh
									
												View File
												
				@@ -8,10 +8,13 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_cache_service(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_cache_service(http_context& ctx, seastar::httpd::routes& r);

				}

									
										4

api/collectd.cc
									
												View File
												
				@@ -54,7 +54,7 @@ static const char* str_to_regex(const sstring& v) {

				void set_collectd(http_context& ctx, routes& r) {

				    cd::get_collectd.set(r, [](std::unique_ptr<request> req) {

				        auto id = ::make_shared<scollectd::type_instance_id>(req->param["pluginid"],

				        auto id = ::make_shared<scollectd::type_instance_id>(req->get_path_param("pluginid"),

				                req->get_query_param("instance"), req->get_query_param("type"),

				                req->get_query_param("type_instance"));

				@@ -91,7 +91,7 @@ void set_collectd(http_context& ctx, routes& r) {

				    });

				    cd::enable_collectd.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {

				        std::regex plugin(req->param["pluginid"].c_str());

				        std::regex plugin(req->get_path_param("pluginid").c_str());

				        std::regex instance(str_to_regex(req->get_query_param("instance")));

				        std::regex type(str_to_regex(req->get_query_param("type")));

				        std::regex type_instance(str_to_regex(req->get_query_param("type_instance")));

									
										327

api/column_family.cc
									
												View File
												
				@@ -6,9 +6,11 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <fmt/ranges.h>

				#include "column_family.hh"

				#include "api/api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include <vector>

				#include <seastar/http/exception.hh>

				#include "sstables/sstables.hh"

				@@ -28,6 +30,7 @@ using namespace httpd;

				using namespace json;

				namespace cf = httpd::column_family_json;

				namespace ss = httpd::storage_service_json;

				std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name) {

				    auto pos = name.find("%3A");

				@@ -79,6 +82,65 @@ future<json::json_return_type>  get_cf_stats(http_context& ctx,

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type> set_tables(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {

				    if (tables.empty()) {

				        tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				    }

				    return do_with(keyspace, std::move(tables), [&ctx, set] (const sstring& keyspace, const std::vector<sstring>& tables) {

				        return ctx.db.invoke_on_all([&keyspace, &tables, set] (replica::database& db) {

				            return parallel_for_each(tables, [&db, &keyspace, set] (const sstring& table) {

				                replica::table& t = db.find_column_family(keyspace, table);

				                return set(t);

				            });

				        });

				    }).then([] {

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				}

				class autocompaction_toggle_guard {

				    replica::database& _db;

				public:

				    autocompaction_toggle_guard(replica::database& db) : _db(db) {

				        assert(this_shard_id() == 0);

				        if (!_db._enable_autocompaction_toggle) {

				            throw std::runtime_error("Autocompaction toggle is busy");

				        }

				        _db._enable_autocompaction_toggle = false;

				    }

				    autocompaction_toggle_guard(const autocompaction_toggle_guard&) = delete;

				    autocompaction_toggle_guard(autocompaction_toggle_guard&&) = default;

				    ~autocompaction_toggle_guard() {

				        assert(this_shard_id() == 0);

				        _db._enable_autocompaction_toggle = true;

				    }

				};

				static future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    apilog.info("set_tables_autocompaction: enabled={} keyspace={} tables={}", enabled, keyspace, tables);

				    return ctx.db.invoke_on(0, [&ctx, keyspace, tables = std::move(tables), enabled] (replica::database& db) {

				        auto g = autocompaction_toggle_guard(db);

				        return set_tables(ctx, keyspace, tables, [enabled] (replica::table& cf) {

				            if (enabled) {

				                cf.enable_auto_compaction();

				            } else {

				                return cf.disable_auto_compaction();

				            }

				            return make_ready_future<>();

				        }).finally([g = std::move(g)] {});

				    });

				}

				static future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    apilog.info("set_tables_tombstone_gc: enabled={} keyspace={} tables={}", enabled, keyspace, tables);

				    return set_tables(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {

				        t.set_tombstone_gc_enabled(enabled);

				        return make_ready_future<>();

				    });

				}

				static future<json::json_return_type>  get_cf_stats_count(http_context& ctx, const sstring& name,

				        utils::timed_rate_moving_average_summary_and_histogram replica::column_family_stats::*f) {

				    return map_reduce_cf(ctx, name, int64_t(0), [f](const replica::column_family& cf) {

				@@ -304,6 +366,14 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared

				    return ratio_holder(f + sst->filter_get_recent_true_positive(), f);

				}

				uint64_t accumulate_on_active_memtables(replica::table& t, noncopyable_function<uint64_t(replica::memtable& mt)> action) {

				    uint64_t ret = 0;

				    t.for_each_active_memtable([&] (replica::memtable& mt) {

				        ret += action(mt);

				    });

				    return ret;

				}

				void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {

				    cf::get_column_family_name.set(r, [&ctx] (const_req req){

				        std::vector<sstring> res;

				@@ -338,14 +408,14 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t{0}, [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t{0}, [](replica::column_family& cf) {

				            return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));

				        }, std::plus<>());

				    });

				    cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t{0}, [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				            return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));

				        }, std::plus<>());

				    });

				@@ -358,34 +428,34 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().total_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().total_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				@@ -399,7 +469,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_cf_all_memtables_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return cf.occupancy().total_space();

				        }, std::plus<int64_t>());

				    });

				@@ -415,7 +485,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return cf.occupancy().used_space();

				        }, std::plus<int64_t>());

				    });

				@@ -423,14 +493,14 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::memtable_switch_count);

				        return get_cf_stats(ctx,req->get_path_param("name") ,&replica::column_family_stats::memtable_switch_count);

				    });

				    cf::get_all_memtable_switch_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -439,7 +509,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_size_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), utils::estimated_histogram(0), [](replica::column_family& cf) {

				            utils::estimated_histogram res(0);

				            for (auto sstables = cf.get_sstables(); auto& i : *sstables) {

				                res.merge(i->get_stats_metadata().estimated_partition_size);

				@@ -451,7 +521,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    // FIXME: this refers to partitions, not rows.

				    cf::get_estimated_row_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            uint64_t res = 0;

				            for (auto sstables = cf.get_sstables(); auto& i : *sstables) {

				                res += i->get_stats_metadata().estimated_partition_size.count();

				@@ -462,7 +532,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_estimated_column_count_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), utils::estimated_histogram(0), [](replica::column_family& cf) {

				            utils::estimated_histogram res(0);

				            for (auto sstables = cf.get_sstables(); auto& i : *sstables) {

				                res.merge(i->get_stats_metadata().estimated_cells_count);

				@@ -479,7 +549,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx,req->param["name"] ,&replica::column_family_stats::pending_flushes);

				        return get_cf_stats(ctx,req->get_path_param("name") ,&replica::column_family_stats::pending_flushes);

				    });

				    cf::get_all_pending_flushes.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -487,7 +557,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_count(ctx,req->param["name"] ,&replica::column_family_stats::reads);

				        return get_cf_stats_count(ctx,req->get_path_param("name") ,&replica::column_family_stats::reads);

				    });

				    cf::get_all_read.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -495,7 +565,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_count(ctx, req->param["name"] ,&replica::column_family_stats::writes);

				        return get_cf_stats_count(ctx, req->get_path_param("name") ,&replica::column_family_stats::writes);

				    });

				    cf::get_all_write.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -503,19 +573,19 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);

				        return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::reads);

				    });

				    cf::get_read_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::reads);

				        return get_cf_rate_and_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::reads);

				    });

				    cf::get_read_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_sum(ctx,req->param["name"] ,&replica::column_family_stats::reads);

				        return get_cf_stats_sum(ctx,req->get_path_param("name") ,&replica::column_family_stats::reads);

				    });

				    cf::get_write_latency.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats_sum(ctx, req->param["name"] ,&replica::column_family_stats::writes);

				        return get_cf_stats_sum(ctx, req->get_path_param("name") ,&replica::column_family_stats::writes);

				    });

				    cf::get_all_read_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -527,11 +597,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);

				        return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::writes);

				    });

				    cf::get_write_latency_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_rate_and_histogram(ctx, req->param["name"], &replica::column_family_stats::writes);

				        return get_cf_rate_and_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::writes);

				    });

				    cf::get_all_write_latency_histogram_depricated.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -543,7 +613,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_pending_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), [](replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return cf.estimate_pending_compactions();

				        }, std::plus<int64_t>());

				    });

				@@ -555,7 +625,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx, req->param["name"], &replica::column_family_stats::live_sstable_count);

				        return get_cf_stats(ctx, req->get_path_param("name"), &replica::column_family_stats::live_sstable_count);

				    });

				    cf::get_all_live_ss_table_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -563,11 +633,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_unleveled_sstables.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_unleveled_sstables(ctx, req->param["name"]);

				        return get_cf_unleveled_sstables(ctx, req->get_path_param("name"));

				    });

				    cf::get_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return sum_sstable(ctx, req->param["name"], false);

				        return sum_sstable(ctx, req->get_path_param("name"), false);

				    });

				    cf::get_all_live_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -575,7 +645,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return sum_sstable(ctx, req->param["name"], true);

				        return sum_sstable(ctx, req->get_path_param("name"), true);

				    });

				    cf::get_all_total_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				@@ -584,7 +654,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    // FIXME: this refers to partitions, not rows.

				    cf::get_min_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], INT64_MAX, min_partition_size, min_int64);

				        return map_reduce_cf(ctx, req->get_path_param("name"), INT64_MAX, min_partition_size, min_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				@@ -594,7 +664,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    // FIXME: this refers to partitions, not rows.

				    cf::get_max_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], int64_t(0), max_partition_size, max_int64);

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), max_partition_size, max_int64);

				    });

				    // FIXME: this refers to partitions, not rows.

				@@ -605,7 +675,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    // FIXME: this refers to partitions, not rows.

				    cf::get_mean_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        // Cassandra 3.x mean values are truncated as integrals.

				        return map_reduce_cf(ctx, req->param["name"], integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				        return map_reduce_cf(ctx, req->get_path_param("name"), integral_ratio_holder(), mean_partition_size, std::plus<integral_ratio_holder>());

				    });

				    // FIXME: this refers to partitions, not rows.

				@@ -615,7 +685,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return s + sst->filter_get_false_positive();

				@@ -633,7 +703,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_recent_bloom_filter_false_positives.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return s + sst->filter_get_recent_false_positive();

				@@ -651,7 +721,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), ratio_holder(), [] (replica::column_family& cf) {

				            return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_false_positive_as_ratio_holder), ratio_holder());

				        }, std::plus<>());

				    });

				@@ -663,7 +733,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_recent_bloom_filter_false_ratio.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], ratio_holder(), [] (replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), ratio_holder(), [] (replica::column_family& cf) {

				            return boost::accumulate(*cf.get_sstables() | boost::adaptors::transformed(filter_recent_false_positive_as_ratio_holder), ratio_holder());

				        }, std::plus<>());

				    });

				@@ -675,7 +745,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return s + sst->filter_size();

				@@ -693,7 +763,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return s + sst->filter_memory_size();

				@@ -711,7 +781,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t(0), [] (replica::column_family& cf) {

				            auto sstables = cf.get_sstables();

				            return std::accumulate(sstables->begin(), sstables->end(), uint64_t(0), [](uint64_t s, auto& sst) {

				                return s + sst->get_summary().memory_footprint();

				@@ -734,7 +804,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        // We are missing the off heap memory calculation

				        // Return 0 is the wrong value. It's a work around

				        // until the memory calculation will be available

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        //auto id = get_uuid(req->get_path_param("name"), ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				@@ -747,7 +817,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_speculative_retries.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        //auto id = get_uuid(req->get_path_param("name"), ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				@@ -760,32 +830,14 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_key_cache_hit_rate.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_true_snapshots_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        auto uuid = get_uuid(req->param["name"], ctx.db.local());

				        return ctx.db.local().find_column_family(uuid).get_snapshot_details().then([](

				                const std::unordered_map<sstring, replica::column_family::snapshot_details>& sd) {

				            int64_t res = 0;

				            for (auto i : sd) {

				                res += i.second.total;

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->get_path_param("name"), ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cf::get_row_cache_hit_out_of_range.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        //auto id = get_uuid(req->get_path_param("name"), ctx.db.local());

				        return make_ready_future<json::json_return_type>(0);

				    });

				@@ -796,7 +848,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_row_cache_hit.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {

				        return map_reduce_cf_raw(ctx, req->get_path_param("name"), utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().hits.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				            return make_ready_future<json::json_return_type>(meter_to_json(m));

				@@ -812,7 +864,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_row_cache_miss.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, req->param["name"], utils::rate_moving_average(), [](const replica::column_family& cf) {

				        return map_reduce_cf_raw(ctx, req->get_path_param("name"), utils::rate_moving_average(), [](const replica::column_family& cf) {

				            return cf.get_row_cache().stats().misses.rate();

				        }, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {

				            return make_ready_future<json::json_return_type>(meter_to_json(m));

				@@ -829,102 +881,120 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_cas_prepare.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {

				            return cf.get_stats().cas_prepare.histogram();

				        });

				    });

				    cf::get_cas_propose.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {

				            return cf.get_stats().cas_accept.histogram();

				        });

				    });

				    cf::get_cas_commit.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {

				            return cf.get_stats().cas_learn.histogram();

				        });

				    });

				    cf::get_sstables_per_read_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->param["name"], utils::estimated_histogram(0), [](replica::column_family& cf) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), utils::estimated_histogram(0), [](replica::column_family& cf) {

				            return cf.get_stats().estimated_sstable_per_read;

				        },

				        utils::estimated_histogram_merge, utils_json::estimated_histogram());

				    });

				    cf::get_tombstone_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::tombstone_scanned);

				        return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::tombstone_scanned);

				    });

				    cf::get_live_scanned_histogram.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cf_histogram(ctx, req->param["name"], &replica::column_family_stats::live_scanned);

				        return get_cf_histogram(ctx, req->get_path_param("name"), &replica::column_family_stats::live_scanned);

				    });

				    cf::get_col_update_time_delta_histogram.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        //auto id = get_uuid(req->param["name"], ctx.db.local());

				        //auto id = get_uuid(req->get_path_param("name"), ctx.db.local());

				        std::vector<double> res;

				        return make_ready_future<json::json_return_type>(res);

				    });

				    cf::get_auto_compaction.set(r, [&ctx] (const_req req) {

				        auto uuid = get_uuid(req.param["name"], ctx.db.local());

				        auto uuid = get_uuid(req.get_path_param("name"), ctx.db.local());

				        replica::column_family& cf = ctx.db.local().find_column_family(uuid);

				        return !cf.is_auto_compaction_disabled_by_user();

				    });

				    cf::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/enable_auto_compaction: name={}", req->param["name"]);

				        return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {

				            auto g = replica::database::autocompaction_toggle_guard(db);

				            return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {

				                cf.enable_auto_compaction();

				            }).then([g = std::move(g)] {

				                return make_ready_future<json::json_return_type>(json_void());

				            });

				        });

				        apilog.info("column_family/enable_auto_compaction: name={}", req->get_path_param("name"));

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_autocompaction(ctx, ks, {std::move(cf)}, true);

				    });

				    cf::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/disable_auto_compaction: name={}", req->param["name"]);

				        return ctx.db.invoke_on(0, [&ctx, req = std::move(req)] (replica::database& db) {

				            auto g = replica::database::autocompaction_toggle_guard(db);

				            return foreach_column_family(ctx, req->param["name"], [](replica::column_family &cf) {

				                return cf.disable_auto_compaction();

				            }).then([g = std::move(g)] {

				                return make_ready_future<json::json_return_type>(json_void());

				            });

				        });

				        apilog.info("column_family/disable_auto_compaction: name={}", req->get_path_param("name"));

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_autocompaction(ctx, ks, {std::move(cf)}, false);

				    });

				    ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);

				        return set_tables_autocompaction(ctx, keyspace, tables, true);

				    });

				    ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);

				        return set_tables_autocompaction(ctx, keyspace, tables, false);

				    });

				    cf::get_tombstone_gc.set(r, [&ctx] (const_req req) {

				        auto uuid = get_uuid(req.param["name"], ctx.db.local());

				        auto uuid = get_uuid(req.get_path_param("name"), ctx.db.local());

				        replica::table& t = ctx.db.local().find_column_family(uuid);

				        return t.tombstone_gc_enabled();

				    });

				    cf::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/enable_tombstone_gc: name={}", req->param["name"]);

				        return foreach_column_family(ctx, req->param["name"], [](replica::table& t) {

				            t.set_tombstone_gc_enabled(true);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        apilog.info("column_family/enable_tombstone_gc: name={}", req->get_path_param("name"));

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, true);

				    });

				    cf::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        apilog.info("column_family/disable_tombstone_gc: name={}", req->param["name"]);

				        return foreach_column_family(ctx, req->param["name"], [](replica::table& t) {

				            t.set_tombstone_gc_enabled(false);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				        apilog.info("column_family/disable_tombstone_gc: name={}", req->get_path_param("name"));

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        validate_table(ctx, ks, cf);

				        return set_tables_tombstone_gc(ctx, ks, {std::move(cf)}, false);

				    });

				    ss::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("enable_tombstone_gc: keyspace={} tables={}", keyspace, tables);

				        return set_tables_tombstone_gc(ctx, keyspace, tables, true);

				    });

				    ss::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("disable_tombstone_gc: keyspace={} tables={}", keyspace, tables);

				        return set_tables_tombstone_gc(ctx, keyspace, tables, false);

				    });

				    cf::get_built_indexes.set(r, [&ctx, &sys_ks](std::unique_ptr<http::request> req) {

				        auto ks_cf = parse_fully_qualified_cf_name(req->param["name"]);

				        auto ks_cf = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        auto&& ks = std::get<0>(ks_cf);

				        auto&& cf_name = std::get<1>(ks_cf);

				        return sys_ks.local().load_view_build_progress().then([ks, cf_name, &ctx](const std::vector<db::system_keyspace_view_build_progress>& vb) mutable {

				@@ -962,7 +1032,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_compression_ratio.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto uuid = get_uuid(req->param["name"], ctx.db.local());

				        auto uuid = get_uuid(req->get_path_param("name"), ctx.db.local());

				        return ctx.db.map_reduce(sum_ratio<double>(), [uuid](replica::database& db) {

				            replica::column_family& cf = db.find_column_family(uuid);

				@@ -973,21 +1043,21 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_read_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {

				            return cf.get_stats().reads.histogram();

				        });

				    });

				    cf::get_write_latency_estimated_histogram.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return map_reduce_cf_time_histogram(ctx, req->param["name"], [](const replica::column_family& cf) {

				        return map_reduce_cf_time_histogram(ctx, req->get_path_param("name"), [](const replica::column_family& cf) {

				            return cf.get_stats().writes.histogram();

				        });

				    });

				    cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        sstring strategy = req->get_query_param("class_name");

				        apilog.info("column_family/set_compaction_strategy_class: name={} strategy={}", req->param["name"], strategy);

				        return foreach_column_family(ctx, req->param["name"], [strategy](replica::column_family& cf) {

				        apilog.info("column_family/set_compaction_strategy_class: name={} strategy={}", req->get_path_param("name"), strategy);

				        return foreach_column_family(ctx, req->get_path_param("name"), [strategy](replica::column_family& cf) {

				            cf.set_compaction_strategy(sstables::compaction_strategy::type(strategy));

				        }).then([] {

				                return make_ready_future<json::json_return_type>(json_void());

				@@ -995,7 +1065,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_compaction_strategy_class.set(r, [&ctx](const_req req) {

				        return ctx.db.local().find_column_family(get_uuid(req.param["name"], ctx.db.local())).get_compaction_strategy().name();

				        return ctx.db.local().find_column_family(get_uuid(req.get_path_param("name"), ctx.db.local())).get_compaction_strategy().name();

				    });

				    cf::set_compression_parameters.set(r, [](std::unique_ptr<http::request> req) {

				@@ -1011,7 +1081,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::get_sstable_count_per_level.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return map_reduce_cf_raw(ctx, req->param["name"], std::vector<uint64_t>(), [](const replica::column_family& cf) {

				        return map_reduce_cf_raw(ctx, req->get_path_param("name"), std::vector<uint64_t>(), [](const replica::column_family& cf) {

				            return cf.sstable_count_per_level();

				        }, concat_sstable_count_per_level).then([](const std::vector<uint64_t>& res) {

				            return make_ready_future<json::json_return_type>(res);

				@@ -1020,7 +1090,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_sstables_for_key.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto key = req->get_query_param("key");

				        auto uuid = get_uuid(req->param["name"], ctx.db.local());

				        auto uuid = get_uuid(req->get_path_param("name"), ctx.db.local());

				        return ctx.db.map_reduce0([key, uuid] (replica::database& db) -> future<std::unordered_set<sstring>> {

				            auto sstables = co_await db.find_column_family(uuid).get_sstables_by_partition_key(key);

				@@ -1036,7 +1106,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::toppartitions.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        auto name = req->param["name"];

				        auto name = req->get_path_param("name");

				        auto [ks, cf] = parse_fully_qualified_cf_name(name);

				        api::req_param<std::chrono::milliseconds, unsigned> duration{*req, "duration", 1000ms};

				@@ -1063,7 +1133,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        }

				        auto [ks, cf] = parse_fully_qualified_cf_name(*params.get("name"));

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.info("column_family/force_major_compaction: name={} flush={}", req->param["name"], flush);

				        apilog.info("column_family/force_major_compaction: name={} flush={}", req->get_path_param("name"), flush);

				        auto keyspace = validate_keyspace(ctx, ks);

				        std::vector<table_info> table_infos = {table_info{

				@@ -1156,8 +1226,6 @@ void unset_column_family(http_context& ctx, routes& r) {

				    cf::get_speculative_retries.unset(r);

				    cf::get_all_speculative_retries.unset(r);

				    cf::get_key_cache_hit_rate.unset(r);

				    cf::get_true_snapshots_size.unset(r);

				    cf::get_all_true_snapshots_size.unset(r);

				    cf::get_row_cache_hit_out_of_range.unset(r);

				    cf::get_all_row_cache_hit_out_of_range.unset(r);

				    cf::get_row_cache_hit.unset(r);

				@@ -1174,6 +1242,13 @@ void unset_column_family(http_context& ctx, routes& r) {

				    cf::get_auto_compaction.unset(r);

				    cf::enable_auto_compaction.unset(r);

				    cf::disable_auto_compaction.unset(r);

				    ss::enable_auto_compaction.unset(r);

				    ss::disable_auto_compaction.unset(r);

				    cf::get_tombstone_gc.unset(r);

				    cf::enable_tombstone_gc.unset(r);

				    cf::disable_tombstone_gc.unset(r);

				    ss::enable_tombstone_gc.unset(r);

				    ss::disable_tombstone_gc.unset(r);

				    cf::get_built_indexes.unset(r);

				    cf::get_compression_metadata_off_heap_memory_used.unset(r);

				    cf::get_compression_parameters.unset(r);

									
										1

api/column_family.hh
									
												View File
												
				@@ -9,7 +9,6 @@

				#pragma once

				#include "replica/database.hh"

				#include <seastar/core/future-util.hh>

				#include <seastar/json/json_elements.hh>

				#include <any>

				#include "api/api_init.hh"

									
										37

api/compaction_manager.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include <seastar/core/coroutine.hh>

				#include <seastar/coroutine/exception.hh>

				#include "compaction_manager.hh"

				#include "compaction/compaction_manager.hh"

				@@ -110,7 +111,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				    });

				    cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto ks_name = validate_keyspace(ctx, req->param);

				        auto ks_name = validate_keyspace(ctx, req);

				        auto table_names = parse_tables(ks_name, ctx, req->query_parameters, "tables");

				        if (table_names.empty()) {

				            table_names = map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());

				@@ -153,10 +154,13 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				    });

				    cm::get_compaction_history.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        std::function<future<>(output_stream<char>&&)> f = [&ctx](output_stream<char>&& s) {

				            return do_with(output_stream<char>(std::move(s)), true, [&ctx] (output_stream<char>& s, bool& first){

				                return s.write("[").then([&ctx, &s, &first] {

				                    return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {

				        std::function<future<>(output_stream<char>&&)> f = [&ctx] (output_stream<char>&& out) -> future<> {

				            auto s = std::move(out);

				            bool first = true;

				            std::exception_ptr ex;

				            try {

				                co_await s.write("[");

				                co_await ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable -> future<> {

				                        cm::history h;

				                        h.id = fmt::to_string(entry.id);

				                        h.ks = std::move(entry.ks);

				@@ -170,18 +174,21 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				                            e.value = it.second;

				                            h.rows_merged.push(std::move(e));

				                        }

				                        auto fut = first ? make_ready_future<>() : s.write(", ");

				                        if (!first) {

				                            co_await s.write(", ");

				                        }

				                        first = false;

				                        return fut.then([&s, h = std::move(h)] {

				                            return formatter::write(s, h);

				                        });

				                    }).then([&s] {

				                        return s.write("]").then([&s] {

				                            return s.close();

				                        });

				                        co_await formatter::write(s, h);

				                    });

				                });

				            });

				                co_await s.write("]");

				                co_await s.flush();

				            } catch (...) {

				                ex = std::current_exception();

				            }

				            co_await s.close();

				            if (ex) {

				                co_await coroutine::return_exception_ptr(std::move(ex));

				            }

				        };

				        return make_ready_future<json::json_return_type>(std::move(f));

				    });

									
										102

api/config.cc
									
												View File
												
				@@ -6,8 +6,11 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "api/api.hh"

				#include "api/config.hh"

				#include "api/api-doc/config.json.hh"

				#include "api/api-doc/storage_proxy.json.hh"

				#include "replica/database.hh"

				#include "db/config.hh"

				#include <sstream>

				#include <boost/algorithm/string/replace.hpp>

				@@ -15,6 +18,7 @@

				namespace api {

				using namespace seastar::httpd;

				namespace sp = httpd::storage_proxy_json;

				template<class T>

				json::json_return_type get_json_return_type(const T& val) {

				@@ -92,7 +96,7 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx

				    });

				    cs::find_config_id.set(r, [&cfg] (const_req r) {

				        auto id = r.param["id"];

				        auto id = r.get_path_param("id");

				        for (auto&& cfg_ref : cfg.values()) {

				            auto&& cfg = cfg_ref.get();

				            if (id == cfg.name()) {

				@@ -101,6 +105,102 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx

				        }

				        throw bad_param_exception(sstring("No such config entry: ") + id);

				    });

				    sp::get_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.request_timeout_in_ms()/1000.0;

				    });

				    sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_read_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.read_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_write_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_counter_write_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.counter_write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_cas_contention_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.cas_contention_timeout_in_ms()/1000.0;

				    });

				    sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_range_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.range_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_truncate_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.truncate_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				}

				void unset_config(http_context& ctx, routes& r) {

				    cs::find_config_id.unset(r);

				    sp::get_rpc_timeout.unset(r);

				    sp::set_rpc_timeout.unset(r);

				    sp::get_read_rpc_timeout.unset(r);

				    sp::set_read_rpc_timeout.unset(r);

				    sp::get_write_rpc_timeout.unset(r);

				    sp::set_write_rpc_timeout.unset(r);

				    sp::get_counter_write_rpc_timeout.unset(r);

				    sp::set_counter_write_rpc_timeout.unset(r);

				    sp::get_cas_contention_timeout.unset(r);

				    sp::set_cas_contention_timeout.unset(r);

				    sp::get_range_rpc_timeout.unset(r);

				    sp::set_range_rpc_timeout.unset(r);

				    sp::get_truncate_rpc_timeout.unset(r);

				    sp::set_truncate_rpc_timeout.unset(r);

				}

				}

									
										1

api/config.hh
									
												View File
												
				@@ -14,4 +14,5 @@

				namespace api {

				void set_config(std::shared_ptr<httpd::api_registry_builder20> rb, http_context& ctx, httpd::routes& r, const db::config& cfg, bool first = false);

				void unset_config(http_context& ctx, httpd::routes& r);

				}

									
										36

api/error_injection.cc
									
												View File
												
				@@ -7,10 +7,8 @@

				 */

				#include "api/api-doc/error_injection.json.hh"

				#include "api/api.hh"

				#include "api/api_init.hh"

				#include <seastar/http/exception.hh>

				#include "log.hh"

				#include "utils/error_injection.hh"

				#include "utils/rjson.hh"

				#include <seastar/core/future-util.hh>

				@@ -24,7 +22,7 @@ namespace hf = httpd::error_injection_json;

				void set_error_injection(http_context& ctx, routes& r) {

				    hf::enable_injection.set(r, [](std::unique_ptr<request> req) {

				        sstring injection = req->param["injection"];

				        sstring injection = req->get_path_param("injection");

				        bool one_shot = req->get_query_param("one_shot") == "True";

				        auto params = req->content;

				@@ -56,7 +54,7 @@ void set_error_injection(http_context& ctx, routes& r) {

				    });

				    hf::disable_injection.set(r, [](std::unique_ptr<request> req) {

				        sstring injection = req->param["injection"];

				        sstring injection = req->get_path_param("injection");

				        auto& errinj = utils::get_local_injector();

				        return errinj.disable_on_all(injection).then([] {

				@@ -64,6 +62,32 @@ void set_error_injection(http_context& ctx, routes& r) {

				        });

				    });

				    hf::read_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {

				        const sstring injection = req->get_path_param("injection");

				        std::vector<error_injection_json::error_injection_info> error_injection_infos(smp::count, error_injection_json::error_injection_info{});

				        co_await smp::invoke_on_all([&] {

				            auto& info = error_injection_infos[this_shard_id()];

				            auto& errinj = utils::get_local_injector();

				            const auto enabled = errinj.is_enabled(injection);

				            info.enabled = enabled;

				            if (!enabled) {

				                return;

				            }

				            std::vector<error_injection_json::mapper> parameters;

				            for (const auto& p : errinj.get_injection_parameters(injection)) {

				                error_injection_json::mapper param;

				                param.key = p.first;

				                param.value = p.second;

				                parameters.push_back(std::move(param));

				            }

				            info.parameters = std::move(parameters);

				        });

				        co_return json::json_return_type(error_injection_infos);

				    });

				    hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {

				        auto& errinj = utils::get_local_injector();

				        return errinj.disable_on_all().then([] {

				@@ -72,7 +96,7 @@ void set_error_injection(http_context& ctx, routes& r) {

				    });

				    hf::message_injection.set(r, [](std::unique_ptr<request> req) {

				        sstring injection = req->param["injection"];

				        sstring injection = req->get_path_param("injection");

				        auto& errinj = utils::get_local_injector();

				        return errinj.receive_message_on_all(injection).then([] {

				            return make_ready_future<json::json_return_type>(json::json_void());

									
										17

api/failure_detector.cc
									
												View File
												
				@@ -66,7 +66,7 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				        return g.container().invoke_on(0, [] (gms::gossiper& g) {

				            std::map<sstring, sstring> nodes_status;

				            g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {

				                nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");

				                nodes_status.emplace(fmt::to_string(node), g.is_alive(node) ? "UP" : "DOWN");

				            });

				            return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));

				        });

				@@ -81,9 +81,9 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    fd::get_endpoint_state.set(r, [&g] (std::unique_ptr<request> req) {

				        return g.container().invoke_on(0, [req = std::move(req)] (gms::gossiper& g) {

				            auto state = g.get_endpoint_state_ptr(gms::inet_address(req->param["addr"]));

				            auto state = g.get_endpoint_state_ptr(gms::inet_address(req->get_path_param("addr")));

				            if (!state) {

				                return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->param["addr"]));

				                return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->get_path_param("addr")));

				            }

				            std::stringstream ss;

				            g.append_endpoint_state(ss, *state);

				@@ -99,5 +99,16 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				}

				void unset_failure_detector(http_context& ctx, routes& r) {

				    fd::get_all_endpoint_states.unset(r);

				    fd::get_up_endpoint_count.unset(r);

				    fd::get_down_endpoint_count.unset(r);

				    fd::get_phi_convict_threshold.unset(r);

				    fd::get_simple_states.unset(r);

				    fd::set_phi_convict_threshold.unset(r);

				    fd::get_endpoint_state.unset(r);

				    fd::get_endpoint_phi_values.unset(r);

				}

				}

									
										1

api/failure_detector.hh
									
												View File
												
				@@ -19,5 +19,6 @@ class gossiper;

				namespace api {

				void set_failure_detector(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				void unset_failure_detector(http_context& ctx, httpd::routes& r);

				}

									
										22

api/gossiper.cc
									
												View File
												
				@@ -32,21 +32,21 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				    httpd::gossiper_json::get_endpoint_downtime.set(r, [&g] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				        gms::inet_address ep(req->param["addr"]);

				        gms::inet_address ep(req->get_path_param("addr"));

				        // synchronize unreachable_members on all shards

				        co_await g.get_unreachable_members_synchronized();

				        co_return g.get_endpoint_downtime(ep);

				    });

				    httpd::gossiper_json::get_current_generation_number.set(r, [&g] (std::unique_ptr<http::request> req) {

				        gms::inet_address ep(req->param["addr"]);

				        gms::inet_address ep(req->get_path_param("addr"));

				        return g.get_current_generation_number(ep).then([] (gms::generation_type res) {

				            return make_ready_future<json::json_return_type>(res.value());

				        });

				    });

				    httpd::gossiper_json::get_current_heart_beat_version.set(r, [&g] (std::unique_ptr<http::request> req) {

				        gms::inet_address ep(req->param["addr"]);

				        gms::inet_address ep(req->get_path_param("addr"));

				        return g.get_current_heart_beat_version(ep).then([] (gms::version_type res) {

				            return make_ready_future<json::json_return_type>(res.value());

				        });

				@@ -54,21 +54,31 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				    httpd::gossiper_json::assassinate_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {

				        if (req->get_query_param("unsafe") != "True") {

				            return g.assassinate_endpoint(req->param["addr"]).then([] {

				            return g.assassinate_endpoint(req->get_path_param("addr")).then([] {

				                return make_ready_future<json::json_return_type>(json_void());

				            });

				        }

				        return g.unsafe_assassinate_endpoint(req->param["addr"]).then([] {

				        return g.unsafe_assassinate_endpoint(req->get_path_param("addr")).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    httpd::gossiper_json::force_remove_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {

				        gms::inet_address ep(req->param["addr"]);

				        gms::inet_address ep(req->get_path_param("addr"));

				        return g.force_remove_endpoint(ep, gms::null_permit_id).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				}

				void unset_gossiper(http_context& ctx, routes& r) {

				    httpd::gossiper_json::get_down_endpoint.unset(r);

				    httpd::gossiper_json::get_live_endpoint.unset(r);

				    httpd::gossiper_json::get_endpoint_downtime.unset(r);

				    httpd::gossiper_json::get_current_generation_number.unset(r);

				    httpd::gossiper_json::get_current_heart_beat_version.unset(r);

				    httpd::gossiper_json::assassinate_endpoint.unset(r);

				    httpd::gossiper_json::force_remove_endpoint.unset(r);

				}

				}

									
										1

api/gossiper.hh
									
												View File
												
				@@ -19,5 +19,6 @@ class gossiper;

				namespace api {

				void set_gossiper(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				void unset_gossiper(http_context& ctx, httpd::routes& r);

				}

									
										2

api/messaging_service.cc
									
												View File
												
				@@ -146,7 +146,7 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging

				    });

				    hf::inject_disconnect.set(r, [&ms] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				        auto ip = msg_addr(req->param["ip"]);

				        auto ip = msg_addr(req->get_path_param("ip"));

				        co_await ms.invoke_on_all([ip] (netw::messaging_service& ms) {

				            ms.remove_rpc_client(ip);

				        });

									
										108

api/raft.cc
									
												View File
												
				@@ -8,10 +8,10 @@

				#include <seastar/core/coroutine.hh>

				#include "api/api.hh"

				#include "api/api-doc/raft.json.hh"

				#include "service/raft/raft_group_registry.hh"

				#include "log.hh"

				using namespace seastar::httpd;

				@@ -19,34 +19,44 @@ extern logging::logger apilog;

				namespace api {

				struct http_context;

				namespace r = httpd::raft_json;

				using namespace json;

				namespace {

				::service::raft_timeout get_request_timeout(const http::request& req) {

				    return std::invoke([timeout_str = req.get_query_param("timeout")] {

				        if (timeout_str.empty()) {

				            return ::service::raft_timeout{};

				        }

				        auto dur = std::stoll(timeout_str);

				        if (dur <= 0) {

				            throw bad_param_exception{"Timeout must be a positive number."};

				        }

				        return ::service::raft_timeout{.value = lowres_clock::now() + std::chrono::seconds{dur}};

				    });

				}

				}  // namespace

				void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr) {

				    r::trigger_snapshot.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        raft::group_id gid{utils::UUID{req->param["group_id"]}};

				        auto timeout_dur = std::invoke([timeout_str = req->get_query_param("timeout")] {

				            if (timeout_str.empty()) {

				                return std::chrono::seconds{60};

				            }

				            auto dur = std::stoll(timeout_str);

				            if (dur <= 0) {

				                throw std::runtime_error{"Timeout must be a positive number."};

				            }

				            return std::chrono::seconds{dur};

				        });

				        raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};

				        auto timeout = get_request_timeout(*req);

				        std::atomic<bool> found_srv{false};

				        co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {

				            auto* srv = raft_gr.find_server(gid);

				            if (!srv) {

				        co_await raft_gr.invoke_on_all([gid, timeout, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {

				            if (!raft_gr.find_server(gid)) {

				                co_return;

				            }

				            found_srv = true;

				            abort_on_expiry aoe(lowres_clock::now() + timeout_dur);

				            apilog.info("Triggering Raft group {} snapshot", gid);

				            auto result = co_await srv->trigger_snapshot(&aoe.abort_source());

				            auto srv = raft_gr.get_server_with_timeouts(gid);

				            auto result = co_await srv.trigger_snapshot(nullptr, timeout);

				            if (result) {

				                apilog.info("New snapshot for Raft group {} created", gid);

				            } else {

				@@ -55,30 +65,72 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis

				        });

				        if (!found_srv) {

				            throw std::runtime_error{fmt::format("Server for group ID {} not found", gid)};

				            throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};

				        }

				        co_return json_void{};

				    });

				    r::get_leader_host.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        return smp::submit_to(0, [&] {

				            auto& srv = std::invoke([&] () -> raft::server& {

				                if (req->query_parameters.contains("group_id")) {

				                    raft::group_id id{utils::UUID{req->get_query_param("group_id")}};

				                    return raft_gr.local().get_server(id);

				                } else {

				                    return raft_gr.local().group0();

				                }

				        if (!req->query_parameters.contains("group_id")) {

				            const auto leader_id = co_await raft_gr.invoke_on(0, [] (service::raft_group_registry& raft_gr) {

				                auto& srv = raft_gr.group0();

				                return srv.current_leader();

				            });

				            return json_return_type(srv.current_leader().to_sstring());

				            co_return json_return_type{leader_id.to_sstring()};

				        }

				        const raft::group_id gid{utils::UUID{req->get_query_param("group_id")}};

				        std::atomic<bool> found_srv{false};

				        std::atomic<raft::server_id> leader_id = raft::server_id::create_null_id();

				        co_await raft_gr.invoke_on_all([gid, &found_srv, &leader_id] (service::raft_group_registry& raft_gr) {

				            if (raft_gr.find_server(gid)) {

				                found_srv = true;

				                leader_id = raft_gr.get_server(gid).current_leader();

				            }

				            return make_ready_future<>();

				        });

				        if (!found_srv) {

				            throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};

				        }

				        co_return json_return_type(leader_id.load().to_sstring());

				    });

				    r::read_barrier.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        auto timeout = get_request_timeout(*req);

				        if (!req->query_parameters.contains("group_id")) {

				            // Read barrier on group 0 by default

				            co_await raft_gr.invoke_on(0, [timeout] (service::raft_group_registry& raft_gr) {

				                return raft_gr.group0_with_timeouts().read_barrier(nullptr, timeout);

				            });

				            co_return json_void{};

				        }

				        raft::group_id gid{utils::UUID{req->get_query_param("group_id")}};

				        std::atomic<bool> found_srv{false};

				        co_await raft_gr.invoke_on_all([gid, timeout, &found_srv] (service::raft_group_registry& raft_gr) {

				            if (!raft_gr.find_server(gid)) {

				                return make_ready_future<>();

				            }

				            found_srv = true;

				            return raft_gr.get_server_with_timeouts(gid).read_barrier(nullptr, timeout);

				        });

				        if (!found_srv) {

				            throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};

				        }

				        co_return json_void{};

				    });

				}

				void unset_raft(http_context&, httpd::routes& r) {

				    r::trigger_snapshot.unset(r);

				    r::get_leader_host.unset(r);

				    r::read_barrier.unset(r);

				}

				}

									
										2

api/scrub_status.hh
									
												View File
												
				@@ -6,6 +6,8 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#pragma once

				namespace api {

				enum class scrub_status {

									
										92

api/storage_proxy.cc
									
												View File
												
				@@ -13,7 +13,6 @@

				#include "api/api-doc/utils.json.hh"

				#include "db/config.hh"

				#include "utils/histogram.hh"

				#include "replica/database.hh"

				#include <seastar/core/scheduling_specific.hh>

				namespace api {

				@@ -259,83 +258,6 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_pr

				        return make_ready_future<json::json_return_type>(0);

				    });

				    sp::get_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().request_timeout_in_ms()/1000.0;

				    });

				    sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_read_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().read_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_write_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_counter_write_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().counter_write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_cas_contention_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().cas_contention_timeout_in_ms()/1000.0;

				    });

				    sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_range_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().range_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_truncate_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().truncate_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::reload_trigger_classes.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				@@ -516,20 +438,6 @@ void unset_storage_proxy(http_context& ctx, routes& r) {

				    sp::get_max_hints_in_progress.unset(r);

				    sp::set_max_hints_in_progress.unset(r);

				    sp::get_hints_in_progress.unset(r);

				    sp::get_rpc_timeout.unset(r);

				    sp::set_rpc_timeout.unset(r);

				    sp::get_read_rpc_timeout.unset(r);

				    sp::set_read_rpc_timeout.unset(r);

				    sp::get_write_rpc_timeout.unset(r);

				    sp::set_write_rpc_timeout.unset(r);

				    sp::get_counter_write_rpc_timeout.unset(r);

				    sp::set_counter_write_rpc_timeout.unset(r);

				    sp::get_cas_contention_timeout.unset(r);

				    sp::set_cas_contention_timeout.unset(r);

				    sp::get_range_rpc_timeout.unset(r);

				    sp::set_range_rpc_timeout.unset(r);

				    sp::get_truncate_rpc_timeout.unset(r);

				    sp::set_truncate_rpc_timeout.unset(r);

				    sp::reload_trigger_classes.unset(r);

				    sp::get_read_repair_attempted.unset(r);

				    sp::get_read_repair_repaired_blocking.unset(r);

									
										327

api/storage_service.cc
									
												View File
												
				@@ -26,6 +26,7 @@

				#include <boost/algorithm/string/trim_all.hpp>

				#include <boost/algorithm/string/case_conv.hpp>

				#include <boost/functional/hash.hpp>

				#include <fmt/ranges.h>

				#include "service/raft/raft_group0_client.hh"

				#include "service/storage_service.hh"

				#include "service/load_meter.hh"

				@@ -35,6 +36,7 @@

				#include <seastar/http/exception.hh>

				#include <seastar/core/coroutine.hh>

				#include <seastar/coroutine/parallel_for_each.hh>

				#include <seastar/coroutine/exception.hh>

				#include "repair/row_level.hh"

				#include "locator/snitch_base.hh"

				#include "column_family.hh"

				@@ -47,12 +49,12 @@

				#include "db/extensions.hh"

				#include "db/snapshot-ctl.hh"

				#include "transport/controller.hh"

				#include "thrift/controller.hh"

				#include "locator/token_metadata.hh"

				#include "cdc/generation_service.hh"

				#include "locator/abstract_replication_strategy.hh"

				#include "sstables_loader.hh"

				#include "db/view/view_builder.hh"

				#include "utils/user_provided_param.hh"

				using namespace seastar::httpd;

				using namespace std::chrono_literals;

				@@ -63,6 +65,7 @@ namespace api {

				namespace ss = httpd::storage_service_json;

				namespace sp = httpd::storage_proxy_json;

				namespace cf = httpd::column_family_json;

				using namespace json;

				sstring validate_keyspace(const http_context& ctx, sstring ks_name) {

				@@ -72,11 +75,15 @@ sstring validate_keyspace(const http_context& ctx, sstring ks_name) {

				    throw bad_param_exception(replica::no_such_keyspace(ks_name).what());

				}

				sstring validate_keyspace(const http_context& ctx, const parameters& param) {

				    return validate_keyspace(ctx, param["keyspace"]);

				sstring validate_keyspace(const http_context& ctx, const std::unique_ptr<http::request>& req) {

				    return validate_keyspace(ctx, req->get_path_param("keyspace"));

				}

				static void validate_table(const http_context& ctx, sstring ks_name, sstring table_name) {

				sstring validate_keyspace(const http_context& ctx, const http::request& req) {

				    return validate_keyspace(ctx, req.get_path_param("keyspace"));

				}

				void validate_table(const http_context& ctx, sstring ks_name, sstring table_name) {

				    auto& db = ctx.db.local();

				    try {

				        db.find_column_family(ks_name, table_name);

				@@ -199,14 +206,13 @@ using ks_cf_func = std::function<future<json::json_return_type>(http_context&, s

				static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {

				    return [&ctx, f = std::move(f)](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        return f(ctx, std::move(req), std::move(keyspace), std::move(table_infos));

				    };

				}

				seastar::future<json::json_return_type> run_toppartitions_query(db::toppartitions_query& q, http_context &ctx, bool legacy_request) {

				    namespace cf = httpd::column_family_json;

				    return q.scatter().then([&q, legacy_request] {

				        return sleep(q.duration()).then([&q, legacy_request] {

				            return q.gather(q.capacity()).then([&q, legacy_request] (auto topk_results) {

				@@ -236,47 +242,6 @@ seastar::future<json::json_return_type> run_toppartitions_query(db::toppartition

				    });

				}

				static future<json::json_return_type> set_tables(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {

				    if (tables.empty()) {

				        tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				    }

				    return do_with(keyspace, std::move(tables), [&ctx, set] (const sstring& keyspace, const std::vector<sstring>& tables) {

				        return ctx.db.invoke_on_all([&keyspace, &tables, set] (replica::database& db) {

				            return parallel_for_each(tables, [&db, &keyspace, set] (const sstring& table) {

				                replica::table& t = db.find_column_family(keyspace, table);

				                return set(t);

				            });

				        });

				    }).then([] {

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				}

				future<json::json_return_type> set_tables_autocompaction(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    apilog.info("set_tables_autocompaction: enabled={} keyspace={} tables={}", enabled, keyspace, tables);

				    return ctx.db.invoke_on(0, [&ctx, keyspace, tables = std::move(tables), enabled] (replica::database& db) {

				        auto g = replica::database::autocompaction_toggle_guard(db);

				        return set_tables(ctx, keyspace, tables, [enabled] (replica::table& cf) {

				            if (enabled) {

				                cf.enable_auto_compaction();

				            } else {

				                return cf.disable_auto_compaction();

				            }

				            return make_ready_future<>();

				        }).finally([g = std::move(g)] {});

				    });

				}

				future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    apilog.info("set_tables_tombstone_gc: enabled={} keyspace={} tables={}", enabled, keyspace, tables);

				    return set_tables(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {

				        t.set_tombstone_gc_enabled(enabled);

				        return make_ready_future<>();

				    });

				}

				future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl, std::unique_ptr<http::request> req) {

				    scrub_info info;

				    auto rp = req_params({

				@@ -342,21 +307,17 @@ future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snap

				}

				void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {

				    ss::start_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				    ss::start_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.start_server();

				            });

				            return ctl.start_server();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::stop_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				    ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.request_stop_server();

				            });

				            return ctl.request_stop_server();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				@@ -377,44 +338,21 @@ void unset_transport_controller(http_context& ctx, routes& r) {

				    ss::is_native_transport_running.unset(r);

				}

				void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {

				    ss::stop_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.request_stop_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::start_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.start_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::is_rpc_server_running.set(r, [&ctl] (std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return !ctl.listen_addresses().empty();

				        }).then([] (bool running) {

				            return make_ready_future<json::json_return_type>(running);

				// NOTE: preserved only for backward compatibility

				void set_thrift_controller(http_context& ctx, routes& r) {

				    ss::is_thrift_server_running.set(r, [] (std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [] {

				            return make_ready_future<json::json_return_type>(false);

				        });

				    });

				}

				void unset_rpc_controller(http_context& ctx, routes& r) {

				    ss::stop_rpc_server.unset(r);

				    ss::start_rpc_server.unset(r);

				    ss::is_rpc_server_running.unset(r);

				void unset_thrift_controller(http_context& ctx, routes& r) {

				    ss::is_thrift_server_running.unset(r);

				}

				void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {

				    ss::repair_async.set(r, [&ctx, &repair](std::unique_ptr<http::request> req) {

				    ss::repair_async.set(r, [&ctx, &repair](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        static std::unordered_set<sstring> options = {"primaryRange", "parallelism", "incremental",

				                "jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "ignore_nodes", "trace",

				                "startToken", "endToken", "ranges_parallelism", "small_table_optimization"};

				@@ -427,8 +365,7 @@ void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {

				                continue;

				            }

				            if (!options.contains(x.first)) {

				                return make_exception_future<json::json_return_type>(

				                        httpd::bad_param_exception(format("option {} is not supported", x.first)));

				                throw httpd::bad_param_exception(format("option {} is not supported", x.first));

				            }

				        }

				        std::unordered_map<sstring, sstring> options_map;

				@@ -443,10 +380,14 @@ void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {

				        // returns immediately, not waiting for the repair to finish. The user

				        // then has other mechanisms to track the ongoing repair's progress,

				        // or stop it.

				        return repair_start(repair, validate_keyspace(ctx, req->param),

				                options_map).then([] (int i) {

				                    return make_ready_future<json::json_return_type>(i);

				                });

				        try {

				            int res = co_await repair_start(repair, validate_keyspace(ctx, req), options_map);

				            co_return json::json_return_type(res);

				        } catch (const std::invalid_argument& e) {

				            // if the option is not sane, repair_start() throws immediately, so

				            // convert the exception to an HTTP error

				            throw httpd::bad_param_exception(e.what());

				        }

				    });

				    ss::get_active_repair_async.set(r, [&repair] (std::unique_ptr<http::request> req) {

				@@ -526,7 +467,7 @@ void unset_repair(http_context& ctx, routes& r) {

				void set_sstables_loader(http_context& ctx, routes& r, sharded<sstables_loader>& sst_loader) {

				    ss::load_new_ss_tables.set(r, [&ctx, &sst_loader](std::unique_ptr<http::request> req) {

				        auto ks = validate_keyspace(ctx, req->param);

				        auto ks = validate_keyspace(ctx, req);

				        auto cf = req->get_query_param("cf");

				        auto stream = req->get_query_param("load_and_stream");

				        auto primary_replica = req->get_query_param("primary_replica_only");

				@@ -557,8 +498,8 @@ void unset_sstables_loader(http_context& ctx, routes& r) {

				void set_view_builder(http_context& ctx, routes& r, sharded<db::view::view_builder>& vb) {

				    ss::view_build_statuses.set(r, [&ctx, &vb] (std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto view = req->param["view"];

				        auto keyspace = validate_keyspace(ctx, req);

				        auto view = req->get_path_param("view");

				        return vb.local().view_build_statuses(std::move(keyspace), std::move(view)).then([] (std::unordered_map<sstring, sstring> status) {

				            std::vector<storage_service_json::mapper> res;

				            return make_ready_future<json::json_return_type>(map_to_key_value(std::move(status), res));

				@@ -584,8 +525,24 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        return ctx.db.local().commitlog()->active_config().commit_log_location;

				    });

				    ss::get_token_endpoint.set(r, [&ss] (std::unique_ptr<http::request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(ss.local().get_token_to_endpoint_map(), [](const auto& i) {

				    ss::get_token_endpoint.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        const auto keyspace_name = req->get_query_param("keyspace");

				        const auto table_name = req->get_query_param("cf");

				        std::map<dht::token, gms::inet_address> token_endpoints;

				        if (keyspace_name.empty() && table_name.empty()) {

				            token_endpoints = ss.local().get_token_to_endpoint_map();

				        } else if (!keyspace_name.empty() && !table_name.empty()) {

				            auto& db = ctx.db.local();

				            if (!db.has_schema(keyspace_name, table_name)) {

				                throw bad_param_exception(fmt::format("Failed to find table {}.{}", keyspace_name, table_name));

				            }

				            token_endpoints = co_await ss.local().get_tablet_to_endpoint_map(db.find_schema(keyspace_name, table_name)->id());

				        } else {

				            throw bad_param_exception("Either provide both keyspace and table (for tablet table) or neither (for vnodes)");

				        }

				        co_return json::json_return_type(stream_range_as_array(token_endpoints, [](const auto& i) {

				            storage_service_json::mapper val;

				            val.key = fmt::to_string(i.first);

				            val.value = fmt::to_string(i.second);

				@@ -663,7 +620,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    });

				    ss::get_range_to_endpoint_map.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        auto table = req->get_query_param("cf");

				        auto erm = std::invoke([&]() -> locator::effective_replication_map_ptr {

				@@ -694,7 +651,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				                m.key.push("");

				            }

				            for (const gms::inet_address& address : entry.second) {

				                m.value.push(address.to_sstring());

				                m.value.push(fmt::to_string(address));

				            }

				            return m;

				        });

				@@ -703,7 +660,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::get_pending_range_to_endpoint_map.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        std::vector<ss::maplist_mapper> res;

				        return make_ready_future<json::json_return_type>(res);

				    });

				@@ -712,32 +669,19 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        if (!req->param.exists("keyspace")) {

				            throw bad_param_exception("The keyspace param is not provided");

				        }

				        auto keyspace = req->param["keyspace"];

				        auto keyspace = req->get_path_param("keyspace");

				        auto table = req->get_query_param("table");

				        if (!table.empty()) {

				            validate_table(ctx, keyspace, table);

				            return describe_ring_as_json_for_table(ss, keyspace, table);

				        }

				        return describe_ring_as_json(ss, validate_keyspace(ctx, req->param));

				        return describe_ring_as_json(ss, validate_keyspace(ctx, req));

				    });

				    ss::get_load.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx, &replica::column_family_stats::live_disk_space_used);

				    });

				    ss::get_load_map.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.lmeter.get_load_map().then([] (auto&& load_map) {

				            std::vector<ss::map_string_double> res;

				            for (auto i : load_map) {

				                ss::map_string_double val;

				                val.key = i.first;

				                val.value = i.second;

				                res.push_back(val);

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    ss::get_current_generation_number.set(r, [&ss](std::unique_ptr<http::request> req) {

				        auto ep = ss.local().get_token_metadata().get_topology().my_address();

				        return ss.local().gossiper().get_current_generation_number(ep).then([](gms::generation_type res) {

				@@ -746,7 +690,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    });

				    ss::get_natural_endpoints.set(r, [&ctx, &ss](const_req req) {

				        auto keyspace = validate_keyspace(ctx, req.param);

				        auto keyspace = validate_keyspace(ctx, req);

				        return container_to_vec(ss.local().get_natural_endpoints(keyspace, req.get_query_param("cf"),

				                req.get_query_param("key")));

				    });

				@@ -815,7 +759,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::force_keyspace_cleanup.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        const auto& rs = db.local().find_keyspace(keyspace).get_replication_strategy();

				        if (rs.get_type() == locator::replication_strategy_type::local || !rs.is_vnode_based()) {

				@@ -910,7 +854,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    });

				    ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        auto column_families = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("perform_keyspace_flush: keyspace={} tables={}", keyspace, column_families);

				        auto& db = ctx.db;

				@@ -1019,7 +963,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::truncate.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        auto column_family = req->get_query_param("cf");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				@@ -1153,7 +1097,16 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    });

				    ss::rebuild.set(r, [&ss](std::unique_ptr<http::request> req) {

				        auto source_dc = req->get_query_param("source_dc");

				        utils::optional_param source_dc;

				        if (auto source_dc_str = req->get_query_param("source_dc"); !source_dc_str.empty()) {

				            source_dc.emplace(std::move(source_dc_str)).set_user_provided();

				        }

				        if (auto force_str = req->get_query_param("force"); !force_str.empty() && service::loosen_constraints(validate_bool(force_str))) {

				            if (!source_dc) {

				                throw bad_param_exception("The `source_dc` option must be provided for using the `force` option");

				            }

				            source_dc.set_force();

				        }

				        apilog.info("rebuild: source_dc={}", source_dc);

				        return ss.local().rebuild(std::move(source_dc)).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				@@ -1163,14 +1116,14 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::bulk_load.set(r, [](std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        auto path = req->param["path"];

				        auto path = req->get_path_param("path");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    ss::bulk_load_async.set(r, [](std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        auto path = req->param["path"];

				        auto path = req->get_path_param("path");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				@@ -1257,38 +1210,6 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        }

				    });

				    ss::enable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("enable_auto_compaction: keyspace={} tables={}", keyspace, tables);

				        return set_tables_autocompaction(ctx, keyspace, tables, true);

				    });

				    ss::disable_auto_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("disable_auto_compaction: keyspace={} tables={}", keyspace, tables);

				        return set_tables_autocompaction(ctx, keyspace, tables, false);

				    });

				    ss::enable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("enable_tombstone_gc: keyspace={} tables={}", keyspace, tables);

				        return set_tables_tombstone_gc(ctx, keyspace, tables, true);

				    });

				    ss::disable_tombstone_gc.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto tables = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("disable_tombstone_gc: keyspace={} tables={}", keyspace, tables);

				        return set_tables_tombstone_gc(ctx, keyspace, tables, false);

				    });

				    ss::deliver_hints.set(r, [](std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				@@ -1382,7 +1303,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    });

				    ss::get_effective_ownership.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) {

				        auto keyspace_name = req->param["keyspace"] == "null" ? "" : validate_keyspace(ctx, req->param);

				        auto keyspace_name = req->get_path_param("keyspace") == "null" ? "" : validate_keyspace(ctx, req);

				        auto table_name = req->get_query_param("cf");

				        if (!keyspace_name.empty()) {

				@@ -1622,6 +1543,11 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        co_return json_void();

				    });

				    ss::quiesce_topology.set(r, [&ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        co_await ss.local().await_topology_quiesced();

				        co_return json_void();

				    });

				    sp::get_schema_versions.set(r, [&ss](std::unique_ptr<http::request> req)  {

				        return ss.local().describe_schema_versions().then([] (auto result) {

				            std::vector<sp::mapper_list> res;

				@@ -1649,7 +1575,6 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::get_pending_range_to_endpoint_map.unset(r);

				    ss::describe_ring.unset(r);

				    ss::get_load.unset(r);

				    ss::get_load_map.unset(r);

				    ss::get_current_generation_number.unset(r);

				    ss::get_natural_endpoints.unset(r);

				    ss::cdc_streams_check_and_repair.unset(r);

				@@ -1697,10 +1622,6 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::get_trace_probability.unset(r);

				    ss::get_slow_query_info.unset(r);

				    ss::set_slow_query.unset(r);

				    ss::enable_auto_compaction.unset(r);

				    ss::disable_auto_compaction.unset(r);

				    ss::enable_tombstone_gc.unset(r);

				    ss::disable_tombstone_gc.unset(r);

				    ss::deliver_hints.unset(r);

				    ss::get_cluster_name.unset(r);

				    ss::get_partitioner_name.unset(r);

				@@ -1725,39 +1646,67 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::add_tablet_replica.unset(r);

				    ss::del_tablet_replica.unset(r);

				    ss::tablet_balancing_enable.unset(r);

				    ss::quiesce_topology.unset(r);

				    sp::get_schema_versions.unset(r);

				}

				void set_load_meter(http_context& ctx, routes& r, service::load_meter& lm) {

				    ss::get_load_map.set(r, [&lm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto load_map = co_await lm.get_load_map();

				        std::vector<ss::map_string_double> res;

				        for (auto i : load_map) {

				            ss::map_string_double val;

				            val.key = i.first;

				            val.value = i.second;

				            res.push_back(val);

				        }

				        co_return res;

				    });

				}

				void unset_load_meter(http_context& ctx, routes& r) {

				    ss::get_load_map.unset(r);

				}

				void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl) {

				    ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto result = co_await snap_ctl.local().get_snapshot_details();

				        co_return std::function([res = std::move(result)] (output_stream<char>&& o) -> future<> {

				            auto result = std::move(res);

				            std::exception_ptr ex;

				            output_stream<char> out = std::move(o);

				            bool first = true;

				            try {

				                auto result = std::move(res);

				                bool first = true;

				            co_await out.write("[");

				            for (auto&& map : result) {

				                if (!first) {

				                    co_await out.write(", ");

				                co_await out.write("[");

				                for (auto& [name, details] : result) {

				                    if (!first) {

				                        co_await out.write(", ");

				                    }

				                    std::vector<ss::snapshot> snapshot;

				                    for (auto& cf : details) {

				                        ss::snapshot snp;

				                        snp.ks = cf.ks;

				                        snp.cf = cf.cf;

				                        snp.live = cf.details.live;

				                        snp.total = cf.details.total;

				                        snapshot.push_back(std::move(snp));

				                    }

				                    ss::snapshots all_snapshots;

				                    all_snapshots.key = name;

				                    all_snapshots.value = std::move(snapshot);

				                    co_await all_snapshots.write(out);

				                    first = false;

				                }

				                std::vector<ss::snapshot> snapshot;

				                for (auto& cf : std::get<1>(map)) {

				                    ss::snapshot snp;

				                    snp.ks = cf.ks;

				                    snp.cf = cf.cf;

				                    snp.live = cf.live;

				                    snp.total = cf.total;

				                    snapshot.push_back(std::move(snp));

				                }

				                ss::snapshots all_snapshots;

				                all_snapshots.key = std::get<0>(map);

				                all_snapshots.value = std::move(snapshot);

				                co_await all_snapshots.write(out);

				                first = false;

				                co_await out.write("]");

				                co_await out.flush();

				            } catch (...) {

				              ex = std::current_exception();

				            }

				            co_await out.write("]");

				            co_await out.close();

				            if (ex) {

				                co_await coroutine::return_exception_ptr(std::move(ex));

				            }

				        });

				    });

				@@ -1836,6 +1785,20 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_

				        co_return json::json_return_type(static_cast<int>(scrub_status::successful));

				    });

				    cf::get_true_snapshots_size.set(r, [&snap_ctl] (std::unique_ptr<http::request> req) {

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        return snap_ctl.local().true_snapshots_size(std::move(ks), std::move(cf)).then([] (int64_t res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    cf::get_all_true_snapshots_size.set(r, [] (std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        return make_ready_future<json::json_return_type>(0);

				    });

				}

				void unset_snapshot(http_context& ctx, routes& r) {

				@@ -1844,6 +1807,8 @@ void unset_snapshot(http_context& ctx, routes& r) {

				    ss::del_snapshot.unset(r);

				    ss::true_snapshots_size.unset(r);

				    ss::scrub.unset(r);

				    cf::get_true_snapshots_size.unset(r);

				    cf::get_all_true_snapshots_size.unset(r);

				}

				}

									
										13

api/storage_service.hh
									
												View File
												
				@@ -14,7 +14,6 @@

				#include "db/data_listeners.hh"

				namespace cql_transport { class controller; }

				class thrift_controller;

				namespace db {

				class snapshot_ctl;

				namespace view {

				@@ -40,7 +39,11 @@ sstring validate_keyspace(const http_context& ctx, sstring ks_name);

				// verify that the keyspace parameter is found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective keyspace error.

				sstring validate_keyspace(const http_context& ctx, const httpd::parameters& param);

				sstring validate_keyspace(const http_context& ctx, const std::unique_ptr<http::request>& req);

				// verify that the table parameter is found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective table error.

				void validate_table(const http_context& ctx, sstring ks_name, sstring table_name);

				// splits a request parameter assumed to hold a comma-separated list of table names

				// verify that the tables are found, otherwise a bad_param_exception exception is thrown

				@@ -76,10 +79,12 @@ void set_repair(http_context& ctx, httpd::routes& r, sharded<repair_service>& re

				void unset_repair(http_context& ctx, httpd::routes& r);

				void set_transport_controller(http_context& ctx, httpd::routes& r, cql_transport::controller& ctl);

				void unset_transport_controller(http_context& ctx, httpd::routes& r);

				void set_rpc_controller(http_context& ctx, httpd::routes& r, thrift_controller& ctl);

				void unset_rpc_controller(http_context& ctx, httpd::routes& r);

				void set_thrift_controller(http_context& ctx, httpd::routes& r);

				void unset_thrift_controller(http_context& ctx, httpd::routes& r);

				void set_snapshot(http_context& ctx, httpd::routes& r, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_snapshot(http_context& ctx, httpd::routes& r);

				void set_load_meter(http_context& ctx, httpd::routes& r, service::load_meter& lm);

				void unset_load_meter(http_context& ctx, httpd::routes& r);

				seastar::future<json::json_return_type> run_toppartitions_query(db::toppartitions_query& q, http_context &ctx, bool legacy_request = false);

				} // namespace api

									
										4

api/stream_manager.cc
									
												View File
												
				@@ -108,7 +108,7 @@ void set_stream_manager(http_context& ctx, routes& r, sharded<streaming::stream_

				    });

				    hs::get_total_incoming_bytes.set(r, [&sm](std::unique_ptr<request> req) {

				        gms::inet_address peer(req->param["peer"]);

				        gms::inet_address peer(req->get_path_param("peer"));

				        return sm.map_reduce0([peer](streaming::stream_manager& sm) {

				            return sm.get_progress_on_all_shards(peer).then([] (auto sbytes) {

				                return sbytes.bytes_received;

				@@ -129,7 +129,7 @@ void set_stream_manager(http_context& ctx, routes& r, sharded<streaming::stream_

				    });

				    hs::get_total_outgoing_bytes.set(r, [&sm](std::unique_ptr<request> req) {

				        gms::inet_address peer(req->param["peer"]);

				        gms::inet_address peer(req->get_path_param("peer"));

				        return sm.map_reduce0([peer] (streaming::stream_manager& sm) {

				            return sm.get_progress_on_all_shards(peer).then([] (auto sbytes) {

				                return sbytes.bytes_sent;

									
										14

api/system.cc
									
												View File
												
				@@ -10,6 +10,7 @@

				#include "api/api-doc/system.json.hh"

				#include "api/api-doc/metrics.json.hh"

				#include "replica/database.hh"

				#include "sstables/sstables_manager.hh"

				#include <rapidjson/document.h>

				#include <seastar/core/reactor.hh>

				@@ -122,9 +123,9 @@ void set_system(http_context& ctx, routes& r) {

				    hs::get_logger_level.set(r, [](const_req req) {

				        try {

				            return logging::level_name(logging::logger_registry().get_logger_level(req.param["name"]));

				            return logging::level_name(logging::logger_registry().get_logger_level(req.get_path_param("name")));

				        } catch (std::out_of_range& e) {

				            throw bad_param_exception("Unknown logger name " + req.param["name"]);

				            throw bad_param_exception("Unknown logger name " + req.get_path_param("name"));

				        }

				        // just to keep the compiler happy

				        return sstring();

				@@ -133,9 +134,9 @@ void set_system(http_context& ctx, routes& r) {

				    hs::set_logger_level.set(r, [](const_req req) {

				        try {

				            logging::log_level level = boost::lexical_cast<logging::log_level>(std::string(req.get_query_param("level")));

				            logging::logger_registry().set_logger_level(req.param["name"], level);

				            logging::logger_registry().set_logger_level(req.get_path_param("name"), level);

				        } catch (std::out_of_range& e) {

				            throw bad_param_exception("Unknown logger name " + req.param["name"]);

				            throw bad_param_exception("Unknown logger name " + req.get_path_param("name"));

				        } catch (boost::bad_lexical_cast& e) {

				            throw bad_param_exception("Unknown logging level " + req.get_query_param("level"));

				        }

				@@ -182,6 +183,11 @@ void set_system(http_context& ctx, routes& r) {

				        apilog.info("Profile dumped to {}", profile_dest);

				        return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));

				    }) ;

				    hs::get_highest_supported_sstable_version.set(r, [&ctx] (const_req req) {

				        auto& table = ctx.db.local().find_column_family("system", "local");

				        return seastar::to_sstring(table.get_sstables_manager().get_highest_supported_format());

				    });

				}

				}

									
										7

api/system.hh
									
												View File
												
				@@ -8,10 +8,13 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_system(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_system(http_context& ctx, seastar::httpd::routes& r);

				}

									
										87

api/task_manager.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include <seastar/core/coroutine.hh>

				#include <seastar/coroutine/exception.hh>

				#include <seastar/http/exception.hh>

				#include "task_manager.hh"

				@@ -23,6 +24,8 @@ namespace tm = httpd::task_manager_json;

				using namespace json;

				using namespace seastar::httpd;

				using task_variant = std::variant<tasks::task_manager::foreign_task_ptr, tasks::task_manager::task::task_essentials>;

				inline bool filter_tasks(tasks::task_manager::task_ptr task, std::unordered_map<sstring, sstring>& query_params) {

				    return (!query_params.contains("keyspace") || query_params["keyspace"] == task->get_status().keyspace) &&

				        (!query_params.contains("table") || query_params["table"] == task->get_status().table);

				@@ -32,7 +35,6 @@ struct full_task_status {

				    tasks::task_manager::task::status task_status;

				    std::string type;

				    tasks::task_manager::task::progress progress;

				    std::string module;

				    tasks::task_id parent_id;

				    tasks::is_abortable abortable;

				    std::vector<std::string> children_ids;

				@@ -99,16 +101,16 @@ future<full_task_status> retrieve_status(const tasks::task_manager::foreign_task

				    s.type = task->type();

				    s.parent_id = task->get_parent_id();

				    s.abortable = task->is_abortable();

				    s.module = task->get_module_name();

				    s.progress.completed = progress.completed;

				    s.progress.total = progress.total;

				    std::vector<std::string> ct{task->get_children().size()};

				    boost::transform(task->get_children(), ct.begin(), [] (const auto& child) {

				    std::vector<std::string> ct = co_await task->get_children().map_each_task<std::string>([] (const tasks::task_manager::foreign_task_ptr& child) {

				        return child->id().to_sstring();

				    }, [] (const tasks::task_manager::task::task_essentials& child) {

				        return child.task_status.id.to_sstring();

				    });

				    s.children_ids = std::move(ct);

				    co_return s;

				}

				};

				void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg) {

				    tm::get_modules.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				@@ -123,7 +125,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				            chunked_stats local_res;

				            tasks::task_manager::module_ptr module;

				            try {

				                module = tm.find_module(req->param["module"]);

				                module = tm.find_module(req->get_path_param("module"));

				            } catch (...) {

				                throw bad_param_exception(fmt::format("{}", std::current_exception()));

				            }

				@@ -138,25 +140,34 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				        std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {

				            auto s = std::move(os);

				            auto res = std::move(r);

				            co_await s.write("[");

				            std::string delim = "";

				            for (auto& v: res) {

				                for (auto& stats: v) {

				                    co_await s.write(std::exchange(delim, ", "));

				                    tm::task_stats ts;

				                    ts = stats;

				                    co_await formatter::write(s, ts);

				            std::exception_ptr ex;

				            try {

				                auto res = std::move(r);

				                co_await s.write("[");

				                std::string delim = "";

				                for (auto& v: res) {

				                    for (auto& stats: v) {

				                        co_await s.write(std::exchange(delim, ", "));

				                        tm::task_stats ts;

				                        ts = stats;

				                        co_await formatter::write(s, ts);

				                    }

				                }

				                co_await s.write("]");

				                co_await s.flush();

				            } catch (...) {

				                ex = std::current_exception();

				            }

				            co_await s.write("]");

				            co_await s.close();

				            if (ex) {

				                co_await coroutine::return_exception_ptr(std::move(ex));

				            }

				        };

				        co_return std::move(f);

				    });

				    tm::get_task_status.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        tasks::task_manager::foreign_task_ptr task;

				        try {

				            task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {

				@@ -173,13 +184,13 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				    });

				    tm::abort_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        try {

				            co_await tasks::task_manager::invoke_on_task(tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {

				                if (!task->is_abortable()) {

				                    co_await coroutine::return_exception(std::runtime_error("Requested task cannot be aborted"));

				                }

				                co_await task->abort();

				                task->abort();

				            });

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

				@@ -188,12 +199,11 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				    });

				    tm::wait_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        tasks::task_manager::foreign_task_ptr task;

				        try {

				            task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) {

				                return task->done().then_wrapped([task] (auto f) {

				                    task->unregister_task();

				                    // done() is called only because we want the task to be complete before getting its status.

				                    // The future should be ignored here as the result does not matter.

				                    f.ignore_ready_future();

				@@ -209,8 +219,8 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				    tm::get_task_status_recursively.set(r, [&_tm = tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& tm = _tm;

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        std::queue<tasks::task_manager::foreign_task_ptr> q;

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        std::queue<task_variant> q;

				        utils::chunked_vector<full_task_status> res;

				        tasks::task_manager::foreign_task_ptr task;

				@@ -230,10 +240,33 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				        q.push(co_await task.copy());   // Task cannot be moved since we need it to be alive during whole loop execution.

				        while (!q.empty()) {

				            auto& current = q.front();

				            res.push_back(co_await retrieve_status(current));

				            for (auto& child: current->get_children()) {

				                q.push(co_await child.copy());

				            }

				            co_await std::visit(overloaded_functor {

				                [&] (const tasks::task_manager::foreign_task_ptr& task) -> future<> {

				                    res.push_back(co_await retrieve_status(task));

				                    co_await task->get_children().for_each_task([&q] (const tasks::task_manager::foreign_task_ptr& child) -> future<> {

				                        q.push(co_await child.copy());

				                    }, [&] (const tasks::task_manager::task::task_essentials& child) {

				                        q.push(child);

				                        return make_ready_future();

				                    });

				                },

				                [&] (const tasks::task_manager::task::task_essentials& task) -> future<> {

				                    res.push_back(full_task_status{

				                        .task_status = task.task_status,

				                        .type = task.type,

				                        .progress = task.task_progress,

				                        .parent_id = task.parent_id,

				                        .abortable = task.abortable,

				                        .children_ids = boost::copy_range<std::vector<std::string>>(task.failed_children | boost::adaptors::transformed([] (auto& child) {

				                            return child.task_status.id.to_sstring();

				                        }))

				                    });

				                    for (auto& child: task.failed_children) {

				                        q.push(child);

				                    }

				                    return make_ready_future();

				                }

				            }, current);

				            q.pop();

				        }

									
										9

api/task_manager_test.cc
									
												View File
												
				@@ -83,20 +83,19 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man

				    });

				    tmt::finish_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->param["task_id"]}};

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        auto it = req->query_parameters.find("error");

				        bool fail = it != req->query_parameters.end();

				        std::string error = fail ? it->second : "";

				        try {

				            co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) {

				            co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) -> future<> {

				                tasks::test_task test_task{task};

				                if (fail) {

				                    test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));

				                    co_await test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));

				                } else {

				                    test_task.finish();

				                    co_await test_task.finish();

				                }

				                return make_ready_future<>();

				            });

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

									
										11

api/task_manager_test.hh
									
												View File
												
				@@ -11,16 +11,19 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				namespace tasks {

				class task_manager;

				}

				namespace api {

				namespace seastar::httpd {

				class routes;

				}

				void set_task_manager_test(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm);

				void unset_task_manager_test(http_context& ctx, httpd::routes& r);

				namespace api {

				struct http_context;

				void set_task_manager_test(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<tasks::task_manager>& tm);

				void unset_task_manager_test(http_context& ctx, seastar::httpd::routes& r);

				}

									
										5

api/tasks.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include <seastar/core/coroutine.hh>

				#include <fmt/ranges.h>

				#include "api/api.hh"

				#include "api/storage_service.hh"

				@@ -29,7 +30,7 @@ using ks_cf_func = std::function<future<json::json_return_type>(http_context&, s

				static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {

				    return [&ctx, f = std::move(f)](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        return f(ctx, std::move(req), std::move(keyspace), std::move(table_infos));

				    };

				@@ -61,7 +62,7 @@ void set_tasks_compaction_module(http_context& ctx, routes& r, sharded<service::

				    t::force_keyspace_cleanup_async.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto keyspace = validate_keyspace(ctx, req);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("force_keyspace_cleanup_async: keyspace={} tables={}", keyspace, table_infos);

				        if (!co_await ss.local().is_cleanup_allowed(keyspace)) {

									
										13

api/tasks.hh
									
												View File
												
				@@ -8,11 +8,20 @@

				#pragma once

				#include "api.hh"

				#include "db/config.hh"

				#include <seastar/core/sharded.hh>

				#include "db/snapshot-ctl.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace service {

				class storage_service;

				}

				namespace api {

				struct http_context;

				void set_tasks_compaction_module(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r);

									
										2

api/token_metadata.cc
									
												View File
												
				@@ -31,7 +31,7 @@ void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_to

				    });

				    ss::get_node_tokens.set(r, [&tm] (std::unique_ptr<http::request> req) {

				        gms::inet_address addr(req->param["endpoint"]);

				        gms::inet_address addr(req->get_path_param("endpoint"));

				        auto& local_tm = *tm.local().get();

				        const auto host_id = local_tm.get_host_id_if_known(addr);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(host_id ? local_tm.get_tokens(*host_id): std::vector<dht::token>{}, [](const dht::token& i) {

									
										11

api/token_metadata.hh
									
												View File
												
				@@ -9,13 +9,16 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api/api_init.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace locator { class shared_token_metadata; }

				namespace api {

				void set_token_metadata(http_context& ctx, httpd::routes& r, sharded<locator::shared_token_metadata>& tm);

				void unset_token_metadata(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_token_metadata(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<locator::shared_token_metadata>& tm);

				void unset_token_metadata(http_context& ctx, seastar::httpd::routes& r);

				}

									
										1

auth/CMakeLists.txt
									
												View File
												
				@@ -30,6 +30,7 @@ target_link_libraries(scylla_auth

				    Seastar::seastar

				    xxHash::xxhash

				  PRIVATE

				    absl::headers

				    cql3

				    idl

				    wasmtime_bindings

									
										6

auth/allow_all_authenticator.hh
									
												View File
												
				@@ -59,15 +59,15 @@ public:

				        return make_ready_future<authenticated_user>(anonymous_user());

				    }

				    virtual future<> create(std::string_view, const authentication_options& options) override {

				    virtual future<> create(std::string_view, const authentication_options& options, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual future<> alter(std::string_view, const authentication_options& options) override {

				    virtual future<> alter(std::string_view, const authentication_options& options, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual future<> drop(std::string_view) override {

				    virtual future<> drop(std::string_view, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

									
										15

auth/allow_all_authorizer.hh
									
												View File
												
				@@ -9,6 +9,7 @@

				#pragma once

				#include "auth/authorizer.hh"

				#include <seastar/core/future.hh>

				namespace cql3 {

				class query_processor;

				@@ -44,12 +45,12 @@ public:

				        return make_ready_future<permission_set>(permissions::ALL);

				    }

				    virtual future<> grant(std::string_view, permission_set, const resource&) override {

				    virtual future<> grant(std::string_view, permission_set, const resource&, ::service::group0_batch&) override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke(std::string_view, permission_set, const resource&) override {

				    virtual future<> revoke(std::string_view, permission_set, const resource&, ::service::group0_batch&) override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

				@@ -60,14 +61,12 @@ public:

				                        "LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke_all(std::string_view) override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    virtual future<> revoke_all(std::string_view, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual future<> revoke_all(const resource&) override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    virtual future<> revoke_all(const resource&, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual const resource_set& protected_resources() const override {

									
										2

auth/authenticated_user.hh
									
												View File
												
				@@ -50,7 +50,7 @@ inline bool is_anonymous(const authenticated_user& u) noexcept {

				/// The user name, or "anonymous".

				///

				template <>

				struct fmt::formatter<auth::authenticated_user> : fmt::formatter<std::string_view> {

				struct fmt::formatter<auth::authenticated_user> : fmt::formatter<string_view> {

				    template <typename FormatContext>

				    auto format(const auth::authenticated_user& u, FormatContext& ctx) const {

				        if (u.name) {

									
										6

auth/authentication_options.hh
									
												View File
												
				@@ -48,15 +48,15 @@ public:

				}

				template <>

				struct fmt::formatter<auth::authentication_option> : fmt::formatter<std::string_view> {

				struct fmt::formatter<auth::authentication_option> : fmt::formatter<string_view> {

				    template <typename FormatContext>

				    auto format(const auth::authentication_option a, FormatContext& ctx) const {

				        using enum auth::authentication_option;

				        switch (a) {

				        case password:

				            return formatter<std::string_view>::format("PASSWORD", ctx);

				            return formatter<string_view>::format("PASSWORD", ctx);

				        case options:

				            return formatter<std::string_view>::format("OPTIONS", ctx);

				            return formatter<string_view>::format("OPTIONS", ctx);

				        }

				        std::abort();

				    }

									
										10

auth/authenticator.hh
									
												View File
												
				@@ -16,15 +16,15 @@

				#include <optional>

				#include <functional>

				#include <seastar/core/enum.hh>

				#include <seastar/core/future.hh>

				#include <seastar/core/sstring.hh>

				#include <seastar/core/shared_ptr.hh>

				#include "auth/authentication_options.hh"

				#include "auth/resource.hh"

				#include "auth/sasl_challenge.hh"

				#include "service/raft/raft_group0_client.hh"

				namespace db {

				    class config;

				}

				@@ -106,7 +106,7 @@ public:

				    ///

				    /// The options provided must be a subset of `supported_options()`.

				    ///

				    virtual future<> create(std::string_view role_name, const authentication_options& options) = 0;

				    virtual future<> create(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) = 0;

				    ///

				    /// Alter the authentication record of an existing user.

				@@ -115,12 +115,12 @@ public:

				    ///

				    /// Callers must ensure that the specification of `alterable_options()` is adhered to.

				    ///

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) = 0;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) = 0;

				    ///

				    /// Delete the authentication record for a user. This will disallow the user from logging in.

				    ///

				    virtual future<> drop(std::string_view role_name) = 0;

				    virtual future<> drop(std::string_view role_name, ::service::group0_batch&) = 0;

				    ///

				    /// Query for custom options (those corresponding to \ref authentication_options::options).

									
										10

auth/authorizer.hh
									
												View File
												
				@@ -16,10 +16,10 @@

				#include <vector>

				#include <seastar/core/future.hh>

				#include <seastar/core/shared_ptr.hh>

				#include "auth/permission.hh"

				#include "auth/resource.hh"

				#include "service/raft/raft_group0_client.hh"

				#include "seastarx.hh"

				namespace auth {

				@@ -81,14 +81,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if granting permissions is not supported.

				    ///

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&) = 0;

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&, ::service::group0_batch&) = 0;

				    ///

				    /// Revoke a set of permissions from a role for a particular \ref resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&) = 0;

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&, ::service::group0_batch&) = 0;

				    ///

				    /// Query for all directly granted permissions.

				@@ -102,14 +102,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(std::string_view role_name) = 0;

				    virtual future<> revoke_all(std::string_view role_name, ::service::group0_batch&) = 0;

				    ///

				    /// Revoke all permissions granted to any role for a particular resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(const resource&) = 0;

				    virtual future<> revoke_all(const resource&, ::service::group0_batch&) = 0;

				    ///

				    /// System resources used internally as part of the implementation. These are made inaccessible to users.

									
										10

auth/certificate_authenticator.cc
									
												View File
												
				@@ -10,8 +10,10 @@

				#include "auth/certificate_authenticator.hh"

				#include <regex>

				#include <fmt/ranges.h>

				#include "utils/class_registrator.hh"

				#include "utils/to_string.hh"

				#include "data_dictionary/data_dictionary.hh"

				#include "cql3/query_processor.hh"

				#include "db/config.hh"

				@@ -74,7 +76,7 @@ auth::certificate_authenticator::certificate_authenticator(cql3::query_processor

				                    continue;

				                } catch (std::out_of_range&) {

				                    // just fallthrough

				                } catch (std::regex_error&) {

				                } catch (boost::regex_error&) {

				                    std::throw_with_nested(std::invalid_argument(fmt::format("Invalid query expression: {}", map.at(cfg_query_attr))));

				                }

				            }

				@@ -155,16 +157,16 @@ future<auth::authenticated_user> auth::certificate_authenticator::authenticate(c

				    throw exceptions::authentication_exception("Cannot authenticate using attribute map");

				}

				future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options) {

				future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) {

				    // TODO: should we keep track of roles/enforce existence? Role manager should deal with this...

				    co_return;

				}

				future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options) {

				future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) {

				    co_return;

				}

				future<> auth::certificate_authenticator::drop(std::string_view role_name) {

				future<> auth::certificate_authenticator::drop(std::string_view role_name, ::service::group0_batch&) {

				    co_return;

				}

									
										7

auth/certificate_authenticator.hh
									
												View File
												
				@@ -9,7 +9,6 @@

				#pragma once

				#include <boost/regex.hpp>

				#include "auth/authenticator.hh"

				namespace cql3 {

				@@ -47,9 +46,9 @@ public:

				    future<authenticated_user> authenticate(const credentials_map& credentials) const override;

				    future<std::optional<authenticated_user>> authenticate(session_dn_func) const override;

				    future<> create(std::string_view role_name, const authentication_options& options) override;

				    future<> alter(std::string_view role_name, const authentication_options& options) override;

				    future<> drop(std::string_view role_name) override;

				    future<> create(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) override;

				    future<> alter(std::string_view role_name, const authentication_options& options, ::service::group0_batch&) override;

				    future<> drop(std::string_view role_name, ::service::group0_batch&) override;

				    future<custom_options> query_custom_options(std::string_view role_name) const override;

									
										35

auth/common.cc
									
												View File
												
				@@ -23,8 +23,6 @@

				#include "service/migration_manager.hh"

				#include "service/raft/group0_state_machine.hh"

				#include "timeout_config.hh"

				#include "db/config.hh"

				#include "db/system_auth_keyspace.hh"

				#include "utils/error_injection.hh"

				namespace auth {

				@@ -41,14 +39,14 @@ constinit const std::string_view AUTH_PACKAGE_NAME("org.apache.cassandra.auth.")

				static logging::logger auth_log("auth");

				bool legacy_mode(cql3::query_processor& qp) {

				    return qp.auth_version < db::system_auth_keyspace::version_t::v2;

				    return qp.auth_version < db::system_keyspace::auth_version_t::v2;

				}

				std::string_view get_auth_ks_name(cql3::query_processor& qp) {

				    if (legacy_mode(qp)) {

				        return meta::legacy::AUTH_KS;

				    }

				    return db::system_auth_keyspace::NAME;

				    return db::system_keyspace::NAME;

				}

				// Func must support being invoked more than once.

				@@ -65,7 +63,7 @@ future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_f

				    }).discard_result();

				}

				static future<> create_metadata_table_if_missing_impl(

				static future<> create_legacy_metadata_table_if_missing_impl(

				        std::string_view table_name,

				        cql3::query_processor& qp,

				        std::string_view cql,

				@@ -73,7 +71,7 @@ static future<> create_metadata_table_if_missing_impl(

				    assert(this_shard_id() == 0); // once_among_shards makes sure a function is executed on shard 0 only

				    auto db = qp.db();

				    auto parsed_statement = cql3::query_processor::parse_statement(cql);

				    auto parsed_statement = cql3::query_processor::parse_statement(cql, cql3::dialect{});

				    auto& parsed_cf_statement = static_cast<cql3::statements::raw::cf_statement&>(*parsed_statement);

				    parsed_cf_statement.prepare_keyspace(meta::legacy::AUTH_KS);

				@@ -98,12 +96,12 @@ static future<> create_metadata_table_if_missing_impl(

				    }

				}

				future<> create_metadata_table_if_missing(

				future<> create_legacy_metadata_table_if_missing(

				        std::string_view table_name,

				        cql3::query_processor& qp,

				        std::string_view cql,

				        ::service::migration_manager& mm) noexcept {

				    return futurize_invoke(create_metadata_table_if_missing_impl, table_name, qp, cql, mm);

				    return futurize_invoke(create_legacy_metadata_table_if_missing_impl, table_name, qp, cql, mm);

				}

				::service::query_state& internal_distributed_query_state() noexcept {

				@@ -123,7 +121,7 @@ static future<> announce_mutations_with_guard(

				        ::service::raft_group0_client& group0_client,

				        std::vector<canonical_mutation> muts,

				        ::service::group0_guard group0_guard,

				        seastar::abort_source* as,

				        seastar::abort_source& as,

				        std::optional<::service::raft_timeout> timeout) {

				    auto group0_cmd = group0_client.prepare_command(

				        ::service::write_mutations{

				@@ -138,8 +136,8 @@ static future<> announce_mutations_with_guard(

				future<> announce_mutations_with_batching(

				        ::service::raft_group0_client& group0_client,

				        start_operation_func_t start_operation_func,

				        std::function<mutations_generator(api::timestamp_type& t)> gen,

				        seastar::abort_source* as,

				        std::function<::service::mutations_generator(api::timestamp_type t)> gen,

				        seastar::abort_source& as,

				        std::optional<::service::raft_timeout> timeout) {

				    // account for command's overhead, it's better to use smaller threshold than constantly bounce off the limit

				    size_t memory_threshold = group0_client.max_command_size() * 0.75;

				@@ -190,7 +188,7 @@ future<> announce_mutations(

				        ::service::raft_group0_client& group0_client,

				        const sstring query_string,

				        std::vector<data_value_or_unset> values,

				        seastar::abort_source* as,

				        seastar::abort_source& as,

				        std::optional<::service::raft_timeout> timeout) {

				    auto group0_guard = co_await group0_client.start_operation(as, timeout);

				    auto timestamp = group0_guard.write_timestamp();

				@@ -203,4 +201,17 @@ future<> announce_mutations(

				    co_await announce_mutations_with_guard(group0_client, std::move(cmuts), std::move(group0_guard), as, timeout);

				}

				future<> collect_mutations(

				        cql3::query_processor& qp,

				        ::service::group0_batch& collector,

				        const sstring query_string,

				        std::vector<data_value_or_unset> values) {

				    auto muts = co_await qp.get_mutations_internal(

				            query_string,

				            internal_distributed_query_state(),

				            collector.write_timestamp(),

				            std::move(values));

				    collector.add_mutations(std::move(muts), format("auth internal statement: {}", query_string));

				}

				}

									
										21

auth/common.hh
									
												View File
												
				@@ -13,12 +13,8 @@

				#include <seastar/core/future.hh>

				#include <seastar/core/abort_source.hh>

				#include <seastar/util/noncopyable_function.hh>

				#include <seastar/core/seastar.hh>

				#include <seastar/core/resource.hh>

				#include <seastar/core/sstring.hh>

				#include <seastar/core/smp.hh>

				#include "schema/schema_registry.hh"

				#include "types/types.hh"

				#include "service/raft/raft_group0_client.hh"

				@@ -70,7 +66,7 @@ future<> once_among_shards(Task&& f) {

				// Func must support being invoked more than once.

				future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func);

				future<> create_metadata_table_if_missing(

				future<> create_legacy_metadata_table_if_missing(

				        std::string_view table_name,

				        cql3::query_processor&,

				        std::string_view cql,

				@@ -84,16 +80,15 @@ future<> create_metadata_table_if_missing(

				// Execute update query via group0 mechanism, mutations will be applied on all nodes.

				// Use this function when need to perform read before write on a single guard or if

				// you have more than one mutation and potentially exceed single command size limit.

				using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source*)>;

				using mutations_generator = coroutine::experimental::generator<mutation>;

				using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source&)>;

				future<> announce_mutations_with_batching(

				        ::service::raft_group0_client& group0_client,

				        // since we can operate also in topology coordinator context where we need stronger

				        // guarantees than start_operation from group0_client gives we allow to inject custom

				        // function here

				        start_operation_func_t start_operation_func,

				        std::function<mutations_generator(api::timestamp_type& t)> gen,

				        seastar::abort_source* as,

				        std::function<::service::mutations_generator(api::timestamp_type t)> gen,

				        seastar::abort_source& as,

				        std::optional<::service::raft_timeout> timeout);

				// Execute update query via group0 mechanism, mutations will be applied on all nodes.

				@@ -102,7 +97,13 @@ future<> announce_mutations(

				        ::service::raft_group0_client& group0_client,

				        const sstring query_string,

				        std::vector<data_value_or_unset> values,

				        seastar::abort_source* as,

				        seastar::abort_source& as,

				        std::optional<::service::raft_timeout> timeout);

				// Appends mutations to a collector, they will be applied later on all nodes via group0 mechanism.

				future<> collect_mutations(

				        cql3::query_processor& qp,

				        ::service::group0_batch& collector,

				        const sstring query_string,

				        std::vector<data_value_or_unset> values);

				}

									
										146

auth/default_authorizer.cc
									
												View File
												
				@@ -9,7 +9,7 @@

				 */

				#include "auth/default_authorizer.hh"

				#include "db/system_auth_keyspace.hh"

				#include "db/system_keyspace.hh"

				extern "C" {

				#include <crypt.h>

				@@ -90,9 +90,10 @@ future<> default_authorizer::migrate_legacy_metadata() {

				            return do_with(

				                    row.get_as<sstring>("username"),

				                    parse_resource(row.get_as<sstring>(RESOURCE_NAME)),

				                    [this, &row](const auto& username, const auto& r) {

				                    ::service::group0_batch::unused(),

				                    [this, &row](const auto& username, const auto& r, auto& mc) {

				                const permission_set perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));

				                return grant(username, perms, r);

				                return grant(username, perms, r, mc);

				            });

				        }).finally([results] {});

				    }).then([] {

				@@ -103,7 +104,7 @@ future<> default_authorizer::migrate_legacy_metadata() {

				    });

				}

				future<> default_authorizer::start() {

				future<> default_authorizer::start_legacy() {

				    static const sstring create_table = fmt::format(

				            "CREATE TABLE {}.{} ("

				            "{} text,"

				@@ -121,7 +122,7 @@ future<> default_authorizer::start() {

				            90 * 24 * 60 * 60); // 3 months.

				    return once_among_shards([this] {

				        return create_metadata_table_if_missing(

				        return create_legacy_metadata_table_if_missing(

				                PERMISSIONS_CF,

				                _qp,

				                create_table,

				@@ -144,6 +145,13 @@ future<> default_authorizer::start() {

				    });

				}

				future<> default_authorizer::start() {

				    if (legacy_mode(_qp)) {

				        return start_legacy();

				    }

				    return make_ready_future<>();

				}

				future<> default_authorizer::stop() {

				    _as.request_abort();

				    return _finished.handle_exception_type([](const sleep_aborted&) {}).handle_exception_type([](const abort_requested_exception&) {});

				@@ -178,7 +186,8 @@ default_authorizer::modify(

				        std::string_view role_name,

				        permission_set set,

				        const resource& resource,

				        std::string_view op) {

				        std::string_view op,

				        ::service::group0_batch& mc) {

				    const sstring query = format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",

				            get_auth_ks_name(_qp),

				            PERMISSIONS_CF,

				@@ -195,17 +204,17 @@ default_authorizer::modify(

				                {permissions::to_strings(set), sstring(role_name), resource.name()},

				                cql3::query_processor::cache_internal::no).discard_result();

				    }

				    co_return co_await announce_mutations(_qp, _group0_client, query,

				        {permissions::to_strings(set), sstring(role_name), resource.name()}, &_as, ::service::raft_timeout{});

				    co_await collect_mutations(_qp, mc, query,

				            {permissions::to_strings(set), sstring(role_name), resource.name()});

				}

				future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) {

				    return modify(role_name, std::move(set), resource, "+");

				future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource, ::service::group0_batch& mc) {

				    return modify(role_name, std::move(set), resource, "+", mc);

				}

				future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) {

				    return modify(role_name, std::move(set), resource, "-");

				future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource, ::service::group0_batch& mc) {

				    return modify(role_name, std::move(set), resource, "-", mc);

				}

				future<std::vector<permission_details>> default_authorizer::list_all() const {

				@@ -235,7 +244,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {

				    co_return all_details;

				}

				future<> default_authorizer::revoke_all(std::string_view role_name) {

				future<> default_authorizer::revoke_all(std::string_view role_name, ::service::group0_batch& mc) {

				    try {

				        const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",

				                get_auth_ks_name(_qp),

				@@ -249,7 +258,7 @@ future<> default_authorizer::revoke_all(std::string_view role_name) {

				                    {sstring(role_name)},

				                    cql3::query_processor::cache_internal::no).discard_result();

				        } else {

				            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, &_as, ::service::raft_timeout{});

				            co_await collect_mutations(_qp, mc, query, {sstring(role_name)});

				        }

				    } catch (exceptions::request_execution_exception& e) {

				        alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);

				@@ -301,51 +310,88 @@ future<> default_authorizer::revoke_all_legacy(const resource& resource) {

				    });

				}

				future<> default_authorizer::revoke_all(const resource& resource) {

				future<> default_authorizer::revoke_all(const resource& resource, ::service::group0_batch& mc) {

				    if (legacy_mode(_qp)) {

				        co_return co_await revoke_all_legacy(resource);

				    }

				    if (resource.kind() == resource_kind::data &&

				            data_resource_view(resource).is_keyspace()) {

				        revoke_all_keyspace_resources(resource, mc);

				        co_return;

				    }

				    auto name = resource.name();

				    try {

				        auto gen = [this, name] (api::timestamp_type& t) -> mutations_generator {

				            const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",

				                    ROLE_NAME,

				    auto gen = [this, name] (api::timestamp_type t) -> ::service::mutations_generator {

				        const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",

				                ROLE_NAME,

				                get_auth_ks_name(_qp),

				                PERMISSIONS_CF,

				                RESOURCE_NAME);

				        auto res = co_await _qp.execute_internal(

				                query,

				                db::consistency_level::LOCAL_ONE,

				                {name},

				                cql3::query_processor::cache_internal::no);

				        for (const auto& r : *res) {

				            const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",

				                    get_auth_ks_name(_qp),

				                    PERMISSIONS_CF,

				                    ROLE_NAME,

				                    RESOURCE_NAME);

				            auto res = co_await _qp.execute_internal(

				            auto muts = co_await _qp.get_mutations_internal(

				                    query,

				                    db::consistency_level::LOCAL_ONE,

				                    {name},

				                    cql3::query_processor::cache_internal::no);

				            for (const auto& r : *res) {

				                const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",

				                        get_auth_ks_name(_qp),

				                        PERMISSIONS_CF,

				                        ROLE_NAME,

				                        RESOURCE_NAME);

				                auto muts = co_await _qp.get_mutations_internal(

				                        query,

				                        internal_distributed_query_state(),

				                        t,

				                        {r.get_as<sstring>(ROLE_NAME), name});

				                if (muts.size() != 1) {

				                    on_internal_error(alogger,

				                        format("expecting single delete mutation, got {}", muts.size()));

				                }

				                co_yield std::move(muts[0]);

				                    internal_distributed_query_state(),

				                    t,

				                    {r.get_as<sstring>(ROLE_NAME), name});

				            if (muts.size() != 1) {

				                on_internal_error(alogger,

				                    format("expecting single delete mutation, got {}", muts.size()));

				            }

				        };

				        const auto timeout = ::service::raft_timeout{};

				        co_await announce_mutations_with_batching(

				                _group0_client,

				                [this, timeout](abort_source* as) { return _group0_client.start_operation(as, timeout); },

				                std::move(gen),

				                &_as,

				            timeout);

				    } catch (exceptions::request_execution_exception& e) {

				        alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", name, e);

				    }

				            co_yield std::move(muts[0]);

				        }

				    };

				    mc.add_generator(std::move(gen), "default_authorizer::revoke_all");

				}

				void default_authorizer::revoke_all_keyspace_resources(const resource& ks_resource, ::service::group0_batch& mc) {

				    auto ks_name = ks_resource.name();

				    auto gen = [this, ks_name] (api::timestamp_type t) -> ::service::mutations_generator {

				        const sstring query = format("SELECT {}, {} FROM {}.{}",

				                ROLE_NAME,

				                RESOURCE_NAME,

				                get_auth_ks_name(_qp),

				                PERMISSIONS_CF);

				        auto res = co_await _qp.execute_internal(

				                query,

				                db::consistency_level::LOCAL_ONE,

				                {},

				                cql3::query_processor::cache_internal::no);

				        auto ks_prefix = ks_name + "/";

				        for (const auto& r : *res) {

				            auto name = r.get_as<sstring>(RESOURCE_NAME);

				            if (name != ks_name && !name.starts_with(ks_prefix)) {

				                // r doesn't represent resource related to ks_resource

				                continue;

				            }

				            const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",

				                    get_auth_ks_name(_qp),

				                    PERMISSIONS_CF,

				                    ROLE_NAME,

				                    RESOURCE_NAME);

				            auto muts = co_await _qp.get_mutations_internal(

				                    query,

				                    internal_distributed_query_state(),

				                    t,

				                    {r.get_as<sstring>(ROLE_NAME), name});

				            if (muts.size() != 1) {

				                on_internal_error(alogger,

				                    format("expecting single delete mutation, got {}", muts.size()));

				            }

				            co_yield std::move(muts[0]);

				        }

				    };

				    mc.add_generator(std::move(gen), "default_authorizer::revoke_all_keyspace_resources");

				}

				const resource_set& default_authorizer::protected_resources() const {

									
										14

auth/default_authorizer.hh
									
												View File
												
				@@ -47,19 +47,21 @@ public:

				    virtual future<permission_set> authorize(const role_or_anonymous&, const resource&) const override;

				    virtual future<> grant(std::string_view, permission_set, const resource&) override;

				    virtual future<> grant(std::string_view, permission_set, const resource&, ::service::group0_batch&) override;

				    virtual future<> revoke( std::string_view, permission_set, const resource&) override;

				    virtual future<> revoke( std::string_view, permission_set, const resource&, ::service::group0_batch&) override;

				    virtual future<std::vector<permission_details>> list_all() const override;

				    virtual future<> revoke_all(std::string_view) override;

				    virtual future<> revoke_all(std::string_view, ::service::group0_batch&) override;

				    virtual future<> revoke_all(const resource&) override;

				    virtual future<> revoke_all(const resource&, ::service::group0_batch&) override;

				    virtual const resource_set& protected_resources() const override;

				private:

				    future<> start_legacy();

				    bool legacy_metadata_exists() const;

				    future<> revoke_all_legacy(const resource&);

				@@ -68,7 +70,9 @@ private:

				    future<> migrate_legacy_metadata();

				    future<> modify(std::string_view, permission_set, const resource&, std::string_view);

				    future<> modify(std::string_view, permission_set, const resource&, std::string_view, ::service::group0_batch&);

				    void revoke_all_keyspace_resources(const resource& ks_resource, ::service::group0_batch& mc);

				};

				} /* namespace auth */

									
										14

auth/maintenance_socket_role_manager.cc
									
												View File
												
				@@ -49,23 +49,23 @@ future<T> operation_not_supported_exception(std::string_view operation) {

				        std::runtime_error(format("role manager: {} operation not supported through maintenance socket", operation)));

				}

				future<> maintenance_socket_role_manager::create(std::string_view role_name, const role_config&) {

				future<> maintenance_socket_role_manager::create(std::string_view role_name, const role_config&, ::service::group0_batch&) {

				    return operation_not_supported_exception("CREATE");

				}

				future<> maintenance_socket_role_manager::drop(std::string_view role_name) {

				future<> maintenance_socket_role_manager::drop(std::string_view role_name, ::service::group0_batch& mc) {

				    return operation_not_supported_exception("DROP");

				}

				future<> maintenance_socket_role_manager::alter(std::string_view role_name, const role_config_update&) {

				future<> maintenance_socket_role_manager::alter(std::string_view role_name, const role_config_update&, ::service::group0_batch&) {

				    return operation_not_supported_exception("ALTER");

				}

				future<> maintenance_socket_role_manager::grant(std::string_view grantee_name, std::string_view role_name) {

				future<> maintenance_socket_role_manager::grant(std::string_view grantee_name, std::string_view role_name, ::service::group0_batch& mc) {

				    return operation_not_supported_exception("GRANT");

				}

				future<> maintenance_socket_role_manager::revoke(std::string_view revokee_name, std::string_view role_name) {

				future<> maintenance_socket_role_manager::revoke(std::string_view revokee_name, std::string_view role_name, ::service::group0_batch& mc) {

				    return operation_not_supported_exception("REVOKE");

				}

				@@ -97,11 +97,11 @@ future<role_manager::attribute_vals> maintenance_socket_role_manager::query_attr

				    return operation_not_supported_exception<role_manager::attribute_vals>("QUERY ATTRIBUTE");

				}

				future<> maintenance_socket_role_manager::set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value) {

				future<> maintenance_socket_role_manager::set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value, ::service::group0_batch& mc) {

				    return operation_not_supported_exception("SET ATTRIBUTE");

				}

				future<> maintenance_socket_role_manager::remove_attribute(std::string_view role_name, std::string_view attribute_name) {

				future<> maintenance_socket_role_manager::remove_attribute(std::string_view role_name, std::string_view attribute_name, ::service::group0_batch& mc) {

				    return operation_not_supported_exception("REMOVE ATTRIBUTE");

				}

									
										16

auth/maintenance_socket_role_manager.hh
									
												View File
												
				@@ -10,7 +10,7 @@

				#include "auth/resource.hh"

				#include "auth/role_manager.hh"

				#include "seastar/core/future.hh"

				#include <seastar/core/future.hh>

				namespace cql3 {

				class query_processor;

				@@ -39,15 +39,15 @@ public:

				    virtual future<> stop() override;

				    virtual future<> create(std::string_view role_name, const role_config&) override;

				    virtual future<> create(std::string_view role_name, const role_config&, ::service::group0_batch&) override;

				    virtual future<> drop(std::string_view role_name) override;

				    virtual future<> drop(std::string_view role_name, ::service::group0_batch& mc) override;

				    virtual future<> alter(std::string_view role_name, const role_config_update&) override;

				    virtual future<> alter(std::string_view role_name, const role_config_update&, ::service::group0_batch&) override;

				    virtual future<> grant(std::string_view grantee_name, std::string_view role_name) override;

				    virtual future<> grant(std::string_view grantee_name, std::string_view role_name, ::service::group0_batch& mc) override;

				    virtual future<> revoke(std::string_view revokee_name, std::string_view role_name) override;

				    virtual future<> revoke(std::string_view revokee_name, std::string_view role_name, ::service::group0_batch& mc) override;

				    virtual future<role_set> query_granted(std::string_view grantee_name, recursive_role_query) override;

				@@ -63,9 +63,9 @@ public:

				    virtual future<role_manager::attribute_vals> query_attribute_for_all(std::string_view attribute_name) override;

				    virtual future<> set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value) override;

				    virtual future<> set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value, ::service::group0_batch& mc) override;

				    virtual future<> remove_attribute(std::string_view role_name, std::string_view attribute_name) override;

				    virtual future<> remove_attribute(std::string_view role_name, std::string_view attribute_name, ::service::group0_batch& mc) override;

				};

				}

									
										82

auth/password_authenticator.cc
									
												View File
												
				@@ -132,48 +132,48 @@ future<> password_authenticator::create_default_if_missing() {

				            db::consistency_level::QUORUM,

				            internal_distributed_query_state(),

				            {salted_pwd, _superuser},

				            cql3::query_processor::cache_internal::no).then([](auto&&) {

				            plogger.info("Created default superuser authentication record.");

				        });

				            cql3::query_processor::cache_internal::no);

				        plogger.info("Created default superuser authentication record.");

				    } else {

				        co_await announce_mutations(_qp, _group0_client, query,

				            {salted_pwd, _superuser}, &_as, ::service::raft_timeout{}).then([]() {

				            plogger.info("Created default superuser authentication record.");

				        });

				            {salted_pwd, _superuser}, _as, ::service::raft_timeout{});

				        plogger.info("Created default superuser authentication record.");

				    }

				}

				future<> password_authenticator::start() {

				     return once_among_shards([this] {

				         auto f = create_metadata_table_if_missing(

				                 meta::roles_table::name,

				                 _qp,

				                 meta::roles_table::creation_query(),

				                 _migration_manager);

				    return once_among_shards([this] {

				        _stopped = do_after_system_ready(_as, [this] {

				            return async([this] {

				                if (legacy_mode(_qp)) {

				                    _migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get();

				         _stopped = do_after_system_ready(_as, [this] {

				             return async([this] {

				                 _migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get();

				                    if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash, _superuser).get()) {

				                        if (legacy_metadata_exists()) {

				                            plogger.warn("Ignoring legacy authentication metadata since nondefault data already exist.");

				                        }

				                 if (any_nondefault_role_row_satisfies(_qp, &has_salted_hash, _superuser).get()) {

				                     if (legacy_metadata_exists()) {

				                         plogger.warn("Ignoring legacy authentication metadata since nondefault data already exist.");

				                     }

				                        return;

				                    }

				                     return;

				                 }

				                    if (legacy_metadata_exists()) {

				                        migrate_legacy_metadata().get();

				                        return;

				                    }

				                }

				                create_default_if_missing().get();

				            });

				        });

				                 if (legacy_metadata_exists()) {

				                     migrate_legacy_metadata().get();

				                     return;

				                 }

				                 create_default_if_missing().get();

				             });

				         });

				         return f;

				     });

				        if (legacy_mode(_qp)) {

				            return create_legacy_metadata_table_if_missing(

				                    meta::roles_table::name,

				                    _qp,

				                    meta::roles_table::creation_query(),

				                    _migration_manager);

				        }

				        return make_ready_future<>();

				    });

				 }

				future<> password_authenticator::stop() {

				@@ -257,7 +257,7 @@ future<authenticated_user> password_authenticator::authenticate(

				    }

				}

				future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) {

				future<> password_authenticator::create(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) {

				    if (!options.password) {

				        co_return;

				    }

				@@ -270,12 +270,12 @@ future<> password_authenticator::create(std::string_view role_name, const authen

				                {passwords::hash(*options.password, rng_for_salt), sstring(role_name)},

				                cql3::query_processor::cache_internal::no).discard_result();

				    } else {

				        co_await announce_mutations(_qp, _group0_client, query,

				                {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}, &_as, ::service::raft_timeout{});

				        co_await collect_mutations(_qp, mc, query,

				                {passwords::hash(*options.password, rng_for_salt), sstring(role_name)});

				    }

				}

				future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) {

				future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) {

				    if (!options.password) {

				        co_return;

				    }

				@@ -293,12 +293,12 @@ future<> password_authenticator::alter(std::string_view role_name, const authent

				                {passwords::hash(*options.password, rng_for_salt), sstring(role_name)},

				                cql3::query_processor::cache_internal::no).discard_result();

				    } else {

				        co_await announce_mutations(_qp, _group0_client, query,

				            {passwords::hash(*options.password, rng_for_salt), sstring(role_name)}, &_as, ::service::raft_timeout{});

				        co_await collect_mutations(_qp, mc, query,

				                {passwords::hash(*options.password, rng_for_salt), sstring(role_name)});

				    }

				}

				future<> password_authenticator::drop(std::string_view name) {

				future<> password_authenticator::drop(std::string_view name, ::service::group0_batch& mc) {

				    const sstring query = format("DELETE {} FROM {}.{} WHERE {} = ?",

				            SALTED_HASH,

				            get_auth_ks_name(_qp),

				@@ -311,7 +311,7 @@ future<> password_authenticator::drop(std::string_view name) {

				                {sstring(name)},

				                cql3::query_processor::cache_internal::no).discard_result();

				    } else {

				        co_await announce_mutations(_qp, _group0_client, query, {sstring(name)}, &_as, ::service::raft_timeout{});

				        co_await collect_mutations(_qp, mc, query, {sstring(name)});

				    }

				}

				@@ -329,7 +329,7 @@ const resource_set& password_authenticator::protected_resources() const {

				        credentials_map credentials{};

				        credentials[USERNAME_KEY] = sstring(username);

				        credentials[PASSWORD_KEY] = sstring(password);

				        return this->authenticate(credentials);

				        return authenticate(credentials);

				    });

				}

									
										6

auth/password_authenticator.hh
									
												View File
												
				@@ -64,11 +64,11 @@ public:

				    virtual future<authenticated_user> authenticate(const credentials_map& credentials) const override;

				    virtual future<> create(std::string_view role_name, const authentication_options& options) override;

				    virtual future<> create(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) override;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) override;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options, ::service::group0_batch&) override;

				    virtual future<> drop(std::string_view role_name) override;

				    virtual future<> drop(std::string_view role_name, ::service::group0_batch&) override;

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const override;

									
										1

auth/permissions_cache.cc
									
												View File
												
				@@ -8,6 +8,7 @@

				#include "auth/permissions_cache.hh"

				#include <fmt/ranges.h>

				#include "auth/authorizer.hh"

				#include "auth/service.hh"

									
										2

auth/permissions_cache.hh
									
												View File
												
				@@ -13,8 +13,6 @@

				#include <fmt/core.h>

				#include <seastar/core/future.hh>

				#include <seastar/core/shared_ptr.hh>

				#include <seastar/core/sstring.hh>

				#include "auth/permission.hh"

				#include "auth/resource.hh"

									
										18

auth/resource.cc
									
												View File
												
				@@ -19,11 +19,16 @@

				#include <boost/algorithm/string/split.hpp>

				#include <boost/algorithm/string/classification.hpp>

				#include "cql3/functions/aggregate_function.hh"

				#include "cql3/functions/user_function.hh"

				#include "cql3/util.hh"

				#include "db/marshal/type_parser.hh"

				#include "log.hh"

				namespace auth {

				static logging::logger logger("auth_resource");

				static const std::unordered_map<resource_kind, std::string_view> roots{

				        {resource_kind::data, "data"},

				        {resource_kind::role, "roles"},

				@@ -223,6 +228,15 @@ static sstring decoded_signature_string(std::string_view encoded_signature) {

				            }), ", "));

				}

				resource make_functions_resource(const cql3::functions::function& f) {

				    if (!dynamic_cast<const cql3::functions::user_function*>(&f) &&

				            !dynamic_cast<const cql3::functions::aggregate_function*>(&f)) {

				        on_internal_error(logger, "unsuppported function type");

				    }

				    auto&& sig = auth::encode_signature(f.name().name, f.arg_types());

				    return make_functions_resource(f.name().keyspace, sig);

				}

				functions_resource_view::functions_resource_view(const resource& r) : _resource(r) {

				    if (r._kind != resource_kind::functions) {

				        throw resource_kind_mismatch(resource_kind::functions, r._kind);

				@@ -282,6 +296,10 @@ std::optional<std::string_view> data_resource_view::keyspace() const {

				    return _resource._parts[1];

				}

				bool data_resource_view::is_keyspace() const {

				    return _resource._parts.size() == 2;

				}

				std::optional<std::string_view> data_resource_view::table() const {

				    if (_resource._parts.size() <= 2) {

				        return {};

									
										46

auth/resource.hh
									
												View File
												
				@@ -18,11 +18,11 @@

				#include <unordered_set>

				#include <fmt/core.h>

				#include <boost/range/adaptor/transformed.hpp>

				#include <seastar/core/print.hh>

				#include <seastar/core/sstring.hh>

				#include "auth/permission.hh"

				#include "cql3/functions/function.hh"

				#include "seastarx.hh"

				#include "utils/hash.hh"

				#include "utils/small_vector.hh"

				@@ -41,6 +41,28 @@ enum class resource_kind {

				    data, role, service_level, functions

				};

				}

				template <>

				struct fmt::formatter<auth::resource_kind> : fmt::formatter<string_view> {

				    template <typename FormatContext>

				    auto format(const auth::resource_kind kind, FormatContext& ctx) const {

				        using enum auth::resource_kind;

				        switch (kind) {

				        case data:

				            return formatter<string_view>::format("data", ctx);

				        case role:

				            return formatter<string_view>::format("role", ctx);

				        case service_level:

				            return formatter<string_view>::format("service_level", ctx);

				        case functions:

				            return formatter<string_view>::format("functions", ctx);

				        }

				        std::abort();

				    }

				};

				namespace auth {

				///

				/// Type tag for constructing data resources.

				///

				@@ -144,6 +166,7 @@ public:

				    explicit data_resource_view(const resource& r);

				    std::optional<std::string_view> keyspace() const;

				    bool is_keyspace() const;

				    std::optional<std::string_view> table() const;

				};

				@@ -240,31 +263,14 @@ inline resource make_functions_resource(std::string_view keyspace, std::string_v

				    return resource(functions_resource_t{}, keyspace, function_name, function_signature);

				}

				resource make_functions_resource(const cql3::functions::function& f);

				sstring encode_signature(std::string_view name, std::vector<data_type> args);

				std::pair<sstring, std::vector<data_type>> decode_signature(std::string_view encoded_signature);

				}

				template <>

				struct fmt::formatter<auth::resource_kind> : fmt::formatter<std::string_view> {

				    template <typename FormatContext>

				    auto format(const auth::resource_kind kind, FormatContext& ctx) const {

				        using enum auth::resource_kind;

				        switch (kind) {

				        case data:

				            return formatter<std::string_view>::format("data", ctx);

				        case role:

				            return formatter<std::string_view>::format("role", ctx);

				        case service_level:

				            return formatter<std::string_view>::format("service_level", ctx);

				        case functions:

				            return formatter<std::string_view>::format("functions", ctx);

				        }

				        std::abort();

				    }

				};

				namespace std {

				template <>

									
										15

auth/role_manager.hh
									
												View File
												
				@@ -20,6 +20,7 @@

				#include "auth/resource.hh"

				#include "seastarx.hh"

				#include "exceptions/exceptions.hh"

				#include "service/raft/raft_group0_client.hh"

				namespace auth {

				@@ -107,17 +108,17 @@ public:

				    ///

				    /// \returns an exceptional future with \ref role_already_exists for a role that has previously been created.

				    ///

				    virtual future<> create(std::string_view role_name, const role_config&) = 0;

				    virtual future<> create(std::string_view role_name, const role_config&, ::service::group0_batch&) = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    virtual future<> drop(std::string_view role_name) = 0;

				    virtual future<> drop(std::string_view role_name, ::service::group0_batch&) = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				    ///

				    virtual future<> alter(std::string_view role_name, const role_config_update&) = 0;

				    virtual future<> alter(std::string_view role_name, const role_config_update&, ::service::group0_batch&) = 0;

				    ///

				    /// Grant `role_name` to `grantee_name`.

				@@ -127,7 +128,7 @@ public:

				    /// \returns an exceptional future with \ref role_already_included if granting the role would be redundant, or

				    /// create a cycle.

				    ///

				    virtual future<> grant(std::string_view grantee_name, std::string_view role_name) = 0;

				    virtual future<> grant(std::string_view grantee_name, std::string_view role_name, ::service::group0_batch& mc) = 0;

				    ///

				    /// Revoke `role_name` from `revokee_name`.

				@@ -136,7 +137,7 @@ public:

				    ///

				    /// \returns an exceptional future with \ref revoke_ungranted_role if the role was not granted.

				    ///

				    virtual future<> revoke(std::string_view revokee_name, std::string_view role_name) = 0;

				    virtual future<> revoke(std::string_view revokee_name, std::string_view role_name, ::service::group0_batch& mc) = 0;

				    ///

				    /// \returns an exceptional future with \ref nonexistant_role if the role does not exist.

				@@ -170,12 +171,12 @@ public:

				    /// Sets `attribute_name` with `attribute_value` for `role_name`.

				    /// \returns an exceptional future with nonexistant_role if the role does not exist.

				    ///

				    virtual future<> set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value) = 0;

				    virtual future<> set_attribute(std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value, ::service::group0_batch& mc) = 0;

				    /// Removes `attribute_name` for `role_name`.

				    /// \returns an exceptional future with nonexistant_role if the role does not exist.

				    /// \note: This is a no-op if the role does not have the named attribute set.

				    ///

				    virtual future<> remove_attribute(std::string_view role_name, std::string_view attribute_name) = 0;

				    virtual future<> remove_attribute(std::string_view role_name, std::string_view attribute_name, ::service::group0_batch& mc) = 0;

				};

				}

									
										5

auth/role_or_anonymous.cc
									
												View File
												
				@@ -10,11 +10,6 @@

				namespace auth {

				std::ostream& operator<<(std::ostream& os, const role_or_anonymous& mr) {

				    os << mr.name.value_or("<anonymous>");

				    return os;

				}

				bool is_anonymous(const role_or_anonymous& mr) noexcept {

				    return !mr.name.has_value();

				}

									
										1

auth/roles-metadata.hh
									
												View File
												
				@@ -8,6 +8,7 @@

				#pragma once

				#include <optional>

				#include <string_view>

				#include <functional>

									
										374

auth/service.cc
									
												View File
												
				@@ -6,6 +6,7 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <exception>

				#include <seastar/core/coroutine.hh>

				#include "auth/resource.hh"

				#include "auth/service.hh"

				@@ -28,11 +29,12 @@

				#include "db/config.hh"

				#include "db/consistency_level_type.hh"

				#include "db/functions/function_name.hh"

				#include "db/system_auth_keyspace.hh"

				#include "log.hh"

				#include "schema/schema_fwd.hh"

				#include "seastar/core/future.hh"

				#include <seastar/core/future.hh>

				#include <seastar/coroutine/parallel_for_each.hh>

				#include "service/migration_manager.hh"

				#include "service/raft/raft_group0_client.hh"

				#include "timestamp.hh"

				#include "utils/class_registrator.hh"

				#include "locator/abstract_replication_strategy.hh"

				@@ -55,9 +57,10 @@ static logging::logger log("auth_service");

				class auth_migration_listener final : public ::service::migration_listener {

				    authorizer& _authorizer;

				    cql3::query_processor& _qp;

				public:

				    explicit auth_migration_listener(authorizer& a) : _authorizer(a) {

				    explicit auth_migration_listener(authorizer& a, cql3::query_processor& qp) : _authorizer(a),  _qp(qp) {

				    }

				private:

				@@ -77,27 +80,33 @@ private:

				    void on_update_tablet_metadata() override {}

				    void on_drop_keyspace(const sstring& ks_name) override {

				        if (!legacy_mode(_qp)) {

				            // in non legacy path revoke is part of schema change statement execution

				            return;

				        }

				        // Do it in the background.

				        (void)_authorizer.revoke_all(

				                auth::make_data_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        (void)do_with(::service::group0_batch::unused(), [this, &ks_name] (auto& mc) mutable {

				            return _authorizer.revoke_all(auth::make_data_resource(ks_name), mc);

				        }).handle_exception([] (std::exception_ptr e) {

				            log.error("Unexpected exception while revoking all permissions on dropped keyspace: {}", e);

				        });

				        (void)_authorizer.revoke_all(

				            auth::make_functions_resource(ks_name)).handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        (void)do_with(::service::group0_batch::unused(), [this, &ks_name] (auto& mc) mutable {

				            return _authorizer.revoke_all(auth::make_functions_resource(ks_name), mc);

				        }).handle_exception([] (std::exception_ptr e) {

				            log.error("Unexpected exception while revoking all permissions on functions in dropped keyspace: {}", e);

				        });

				    }

				    void on_drop_column_family(const sstring& ks_name, const sstring& cf_name) override {

				        if (!legacy_mode(_qp)) {

				            // in non legacy path revoke is part of schema change statement execution

				            return;

				        }

				        // Do it in the background.

				        (void)_authorizer.revoke_all(

				                auth::make_data_resource(

				                        ks_name, cf_name)).handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        (void)do_with(::service::group0_batch::unused(), [this, &ks_name, &cf_name] (auto& mc) mutable {

				            return _authorizer.revoke_all(

				                    auth::make_data_resource(ks_name, cf_name), mc);

				        }).handle_exception([] (std::exception_ptr e) {

				            log.error("Unexpected exception while revoking all permissions on dropped table: {}", e);

				        });

				@@ -105,17 +114,26 @@ private:

				    void on_drop_user_type(const sstring& ks_name, const sstring& type_name) override {}

				    void on_drop_function(const sstring& ks_name, const sstring& function_name) override {

				        (void)_authorizer.revoke_all(

				            auth::make_functions_resource(ks_name, function_name)).handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        if (!legacy_mode(_qp)) {

				            // in non legacy path revoke is part of schema change statement execution

				            return;

				        }

				        // Do it in the background.

				        (void)do_with(::service::group0_batch::unused(), [this, &ks_name, &function_name] (auto& mc) mutable {

				            return _authorizer.revoke_all(

				                    auth::make_functions_resource(ks_name, function_name), mc);

				        }).handle_exception([] (std::exception_ptr e) {

				            log.error("Unexpected exception while revoking all permissions on dropped function: {}", e);

				        });

				    }

				    void on_drop_aggregate(const sstring& ks_name, const sstring& aggregate_name) override {

				        (void)_authorizer.revoke_all(

				            auth::make_functions_resource(ks_name, aggregate_name)).handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        if (!legacy_mode(_qp)) {

				            // in non legacy path revoke is part of schema change statement execution

				            return;

				        }

				        (void)do_with(::service::group0_batch::unused(), [this, &ks_name, &aggregate_name] (auto& mc) mutable {

				            return _authorizer.revoke_all(

				                    auth::make_functions_resource(ks_name, aggregate_name), mc);

				        }).handle_exception([] (std::exception_ptr e) {

				            log.error("Unexpected exception while revoking all permissions on dropped aggregate: {}", e);

				        });

				@@ -134,6 +152,7 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n

				service::service(

				        utils::loading_cache_config c,

				        cql3::query_processor& qp,

				        ::service::raft_group0_client& g0,

				        ::service::migration_notifier& mn,

				        std::unique_ptr<authorizer> z,

				        std::unique_ptr<authenticator> a,

				@@ -142,11 +161,12 @@ service::service(

				            : _loading_cache_config(std::move(c))

				            , _permissions_cache(nullptr)

				            , _qp(qp)

				            , _group0_client(g0)

				            , _mnotifier(mn)

				            , _authorizer(std::move(z))

				            , _authenticator(std::move(a))

				            , _role_manager(std::move(r))

				            , _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer))

				            , _migration_listener(std::make_unique<auth_migration_listener>(*_authorizer, qp))

				            , _permissions_cache_cfg_cb([this] (uint32_t) { (void) _permissions_cache_config_action.trigger_later(); })

				            , _permissions_cache_config_action([this] { update_cache_config(); return make_ready_future<>(); })

				            , _permissions_cache_max_entries_observer(_qp.db().get_config().permissions_cache_max_entries.observe(_permissions_cache_cfg_cb))

				@@ -165,6 +185,7 @@ service::service(

				            : service(

				                      std::move(c),

				                      qp,

				                      g0,

				                      mn,

				                      create_object<authorizer>(sc.authorizer_java_name, qp, g0, mm),

				                      create_object<authenticator>(sc.authenticator_java_name, qp, g0, mm),

				@@ -172,7 +193,7 @@ service::service(

				                      used_by_maintenance_socket) {

				}

				future<> service::create_keyspace_if_missing(::service::migration_manager& mm) const {

				future<> service::create_legacy_keyspace_if_missing(::service::migration_manager& mm) const {

				    assert(this_shard_id() == 0); // once_among_shards makes sure a function is executed on shard 0 only

				    auto db = _qp.db();

				@@ -204,8 +225,12 @@ future<> service::start(::service::migration_manager& mm, db::system_keyspace& s

				    // version is set in query processor to be easily available in various places we call auth::legacy_mode check.

				    _qp.auth_version = auth_version;

				    if (!_used_by_maintenance_socket) {

				        // this legacy keyspace is only used by cqlsh

				        // it's needed when executing `list roles` or `list users`

				        // it doesn't affect anything except that cqlsh fails if keyspace

				        // is not found

				        co_await once_among_shards([this, &mm] {

				            return create_keyspace_if_missing(mm);

				            return create_legacy_keyspace_if_missing(mm);

				        });

				    }

				    co_await _role_manager->start();

				@@ -218,6 +243,7 @@ future<> service::start(::service::migration_manager& mm, db::system_keyspace& s

				}

				future<> service::stop() {

				    _as.request_abort();

				    // Only one of the shards has the listener registered, but let's try to

				    // unregister on each one just to make sure.

				    return _mnotifier.unregister_listener(_migration_listener.get()).then([this] {

				@@ -248,104 +274,86 @@ void service::reset_authorization_cache() {

				    _qp.reset_cache();

				}

				future<bool> service::has_existing_legacy_users() const {

				    if (!_qp.db().has_schema(meta::legacy::AUTH_KS, meta::legacy::USERS_CF)) {

				        return make_ready_future<bool>(false);

				    }

				    static const sstring default_user_query = format("SELECT * FROM {}.{} WHERE {} = ?",

				            meta::legacy::AUTH_KS,

				            meta::legacy::USERS_CF,

				            meta::user_name_col_name);

				    static const sstring all_users_query = format("SELECT * FROM {}.{} LIMIT 1",

				            meta::legacy::AUTH_KS,

				            meta::legacy::USERS_CF);

				    // This logic is borrowed directly from Apache Cassandra. By first checking for the presence of the default user, we

				    // can potentially avoid doing a range query with a high consistency level.

				    return _qp.execute_internal(

				            default_user_query,

				            db::consistency_level::ONE,

				            {meta::DEFAULT_SUPERUSER_NAME},

				            cql3::query_processor::cache_internal::yes).then([this](auto results) {

				        if (!results->empty()) {

				            return make_ready_future<bool>(true);

				        }

				        return _qp.execute_internal(

				                default_user_query,

				                db::consistency_level::QUORUM,

				                {meta::DEFAULT_SUPERUSER_NAME},

				                cql3::query_processor::cache_internal::yes).then([this](auto results) {

				            if (!results->empty()) {

				                return make_ready_future<bool>(true);

				            }

				            return _qp.execute_internal(

				                    all_users_query,

				                    db::consistency_level::QUORUM,

				                    cql3::query_processor::cache_internal::no).then([](auto results) {

				                return make_ready_future<bool>(!results->empty());

				            });

				        });

				    });

				}

				future<permission_set>

				service::get_uncached_permissions(const role_or_anonymous& maybe_role, const resource& r) const {

				    if (is_anonymous(maybe_role)) {

				        return _authorizer->authorize(maybe_role, r);

				        co_return co_await _authorizer->authorize(maybe_role, r);

				    }

				    const std::string_view role_name = *maybe_role.name;

				    return has_superuser(role_name).then([this, role_name, &r](bool superuser) {

				        if (superuser) {

				            return make_ready_future<permission_set>(r.applicable_permissions());

				        }

				        //

				        // Aggregate the permissions from all granted roles.

				        //

				        return do_with(permission_set(), [this, role_name, &r](auto& all_perms) {

				            return get_roles(role_name).then([this, &r, &all_perms](role_set all_roles) {

				                return do_with(std::move(all_roles), [this, &r, &all_perms](const auto& all_roles) {

				                    return parallel_for_each(all_roles, [this, &r, &all_perms](std::string_view role_name) {

				                        return _authorizer->authorize(role_name, r).then([&all_perms](permission_set perms) {

				                            all_perms = permission_set::from_mask(all_perms.mask() | perms.mask());

				                        });

				                    });

				                });

				            }).then([&all_perms] {

				                return all_perms;

				            });

				        });

				    auto all_roles = co_await get_roles(role_name);

				    auto superuser = co_await has_superuser(role_name, all_roles);

				    if (superuser) {

				        co_return r.applicable_permissions();

				    }

				    // Aggregate the permissions from all granted roles.

				    permission_set all_perms;

				    co_await coroutine::parallel_for_each(all_roles, [this, &r, &all_perms](std::string_view role_name) -> future<> {

				        auto perms = co_await _authorizer->authorize(role_name, r);

				        all_perms = permission_set::from_mask(all_perms.mask() | perms.mask());

				    });

				    co_return std::move(all_perms);

				}

				future<permission_set> service::get_permissions(const role_or_anonymous& maybe_role, const resource& r) const {

				    return _permissions_cache->get(maybe_role, r);

				}

				future<bool> service::has_superuser(std::string_view role_name, const role_set& roles) const {

				    for (const auto& role : roles) {

				        if (co_await _role_manager->is_superuser(role)) {

				            co_return true;

				        }

				    }

				    co_return false;

				}

				future<bool> service::has_superuser(std::string_view role_name) const {

				    return this->get_roles(std::move(role_name)).then([this](role_set roles) {

				        return do_with(std::move(roles), [this](const role_set& roles) {

				            return do_with(false, roles.begin(), [this, &roles](bool& any_super, auto& iter) {

				                return do_until(

				                        [&roles, &any_super, &iter] { return any_super || (iter == roles.end()); },

				                        [this, &any_super, &iter] {

				                    return _role_manager->is_superuser(*iter++).then([&any_super](bool super) {

				                        any_super = super;

				                    });

				                }).then([&any_super] {

				                    return any_super;

				                });

				            });

				        });

				    });

				    auto roles = co_await get_roles(role_name);

				    co_return co_await has_superuser(role_name, roles);

				}

				static void validate_authentication_options_are_supported(

				        const authentication_options& options,

				        const authentication_option_set& supported) {

				    const auto check = [&supported](authentication_option k) {

				        if (!supported.contains(k)) {

				            throw unsupported_authentication_option(k);

				        }

				    };

				    if (options.password) {

				        check(authentication_option::password);

				    }

				    if (options.options) {

				        check(authentication_option::options);

				    }

				}

				future<> service::create_role(std::string_view name,

				        const role_config& config,

				        const authentication_options& options,

				        ::service::group0_batch& mc) const {

				    co_await underlying_role_manager().create(name, config, mc);

				    if (!auth::any_authentication_options(options)) {

				        co_return;

				    }

				    std::exception_ptr ep;

				    try {

				        validate_authentication_options_are_supported(options,

				                underlying_authenticator().supported_options());

				        co_await underlying_authenticator().create(name, options, mc);

				    } catch (...) {

				        ep = std::current_exception();

				    }

				    if (ep) {

				        // Rollback only in legacy mode as normally mutations won't be

				        // applied in case exception is raised

				        if (legacy_mode(_qp)) {

				            co_await underlying_role_manager().drop(name, mc);

				        }

				        std::rethrow_exception(std::move(ep));

				    }

				}

				future<role_set> service::get_roles(std::string_view role_name) const {

				@@ -402,7 +410,7 @@ future<bool> service::exists(const resource& r) const {

				                return make_ready_future<bool>(db.has_keyspace(sstring(*keyspace)));

				            }

				            auto [name, function_args] = auth::decode_signature(*function_signature);

				            return make_ready_future<bool>(cql3::functions::functions::find(db::functions::function_name{sstring(*keyspace), name}, function_args));

				            return make_ready_future<bool>(cql3::functions::instance().find(db::functions::function_name{sstring(*keyspace), name}, function_args));

				        }

				    }

				@@ -436,15 +444,6 @@ future<permission_set> get_permissions(const service& ser, const authenticated_u

				    });

				}

				bool is_enforcing(const service& ser)  {

				    const bool enforcing_authorizer = ser.underlying_authorizer().qualified_java_name() != allow_all_authorizer_name;

				    const bool enforcing_authenticator = ser.underlying_authenticator().qualified_java_name()

				            != allow_all_authenticator_name;

				    return enforcing_authorizer || enforcing_authenticator;

				}

				bool is_protected(const service& ser, command_desc cmd) noexcept {

				    if (cmd.type_ == command_desc::type::ALTER_WITH_OPTS) {

				        return false; // Table attributes are OK to modify; see #7057.

				@@ -454,84 +453,45 @@ bool is_protected(const service& ser, command_desc cmd) noexcept {

				            || ser.underlying_authorizer().protected_resources().contains(cmd.resource);

				}

				static void validate_authentication_options_are_supported(

				        const authentication_options& options,

				        const authentication_option_set& supported) {

				    const auto check = [&supported](authentication_option k) {

				        if (!supported.contains(k)) {

				            throw unsupported_authentication_option(k);

				        }

				    };

				    if (options.password) {

				        check(authentication_option::password);

				    }

				    if (options.options) {

				        check(authentication_option::options);

				    }

				}

				future<> create_role(

				        const service& ser,

				        std::string_view name,

				        const role_config& config,

				        const authentication_options& options) {

				    return ser.underlying_role_manager().create(name, config).then([&ser, name, &options] {

				        if (!auth::any_authentication_options(options)) {

				            return make_ready_future<>();

				        }

				        return futurize_invoke(

				                &validate_authentication_options_are_supported,

				                options,

				                ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {

				            return ser.underlying_authenticator().create(name, options);

				        }).handle_exception([&ser, name](std::exception_ptr ep) {

				            // Roll-back.

				            return ser.underlying_role_manager().drop(name).then([ep = std::move(ep)] {

				                std::rethrow_exception(ep);

				            });

				        });

				    });

				        const authentication_options& options,

				        ::service::group0_batch& mc) {

				    return ser.create_role(name, config, options, mc);

				}

				future<> alter_role(

				        const service& ser,

				        std::string_view name,

				        const role_config_update& config_update,

				        const authentication_options& options) {

				    return ser.underlying_role_manager().alter(name, config_update).then([&ser, name, &options] {

				        if (!any_authentication_options(options)) {

				            return make_ready_future<>();

				        }

				        return futurize_invoke(

				                &validate_authentication_options_are_supported,

				                options,

				                ser.underlying_authenticator().supported_options()).then([&ser, name, &options] {

				            return ser.underlying_authenticator().alter(name, options);

				        });

				    });

				        const authentication_options& options,

				        ::service::group0_batch& mc) {

				    co_await ser.underlying_role_manager().alter(name, config_update, mc);

				    if (!any_authentication_options(options)) {

				        co_return;

				    }

				    validate_authentication_options_are_supported(options,

				            ser.underlying_authenticator().supported_options());

				    co_await ser.underlying_authenticator().alter(name, options, mc);

				}

				future<> drop_role(const service& ser, std::string_view name) {

				    return do_with(make_role_resource(name), [&ser, name](const resource& r) {

				        auto& a = ser.underlying_authorizer();

				future<> drop_role(const service& ser, std::string_view name, ::service::group0_batch& mc) {

				    auto& a = ser.underlying_authorizer();

				    auto r = make_role_resource(name);

				    co_await a.revoke_all(name, mc);

				    co_await a.revoke_all(r, mc);

				    co_await ser.underlying_authenticator().drop(name, mc);

				    co_await ser.underlying_role_manager().drop(name, mc);

				}

				        return when_all_succeed(

				                a.revoke_all(name),

				                a.revoke_all(r))

				                    .discard_result()

				                    .handle_exception_type([](const unsupported_authorization_operation&) {

				            // Nothing.

				        });

				    }).then([&ser, name] {

				        return ser.underlying_authenticator().drop(name);

				    }).then([&ser, name] {

				        return ser.underlying_role_manager().drop(name);

				    });

				future<> grant_role(const service& ser, std::string_view grantee_name, std::string_view role_name, ::service::group0_batch& mc) {

				    return ser.underlying_role_manager().grant(grantee_name, role_name, mc);

				}

				future<> revoke_role(const service& ser, std::string_view revokee_name, std::string_view role_name, ::service::group0_batch& mc) {

				    return ser.underlying_role_manager().revoke(revokee_name, role_name, mc);

				}

				future<bool> has_role(const service& ser, std::string_view grantee, std::string_view name) {

				@@ -549,35 +509,47 @@ future<bool> has_role(const service& ser, const authenticated_user& u, std::stri

				    return has_role(ser, *u.name, name);

				}

				future<> set_attribute(const service& ser, std::string_view role_name, std::string_view attribute_name, std::string_view attribute_value, ::service::group0_batch& mc) {

				    return ser.underlying_role_manager().set_attribute(role_name, attribute_name, attribute_value, mc);

				}

				future<> remove_attribute(const service& ser, std::string_view role_name, std::string_view attribute_name, ::service::group0_batch& mc) {

				    return ser.underlying_role_manager().remove_attribute(role_name, attribute_name, mc);

				}

				future<> grant_permissions(

				        const service& ser,

				        std::string_view role_name,

				        permission_set perms,

				        const resource& r) {

				    return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {

				        return ser.underlying_authorizer().grant(role_name, perms, r);

				    });

				        const resource& r,

				        ::service::group0_batch& mc) {

				    co_await validate_role_exists(ser, role_name);

				    co_await ser.underlying_authorizer().grant(role_name, perms, r, mc);

				}

				future<> grant_applicable_permissions(const service& ser, std::string_view role_name, const resource& r) {

				    return grant_permissions(ser, role_name, r.applicable_permissions(), r);

				future<> grant_applicable_permissions(const service& ser, std::string_view role_name, const resource& r, ::service::group0_batch& mc) {

				    return grant_permissions(ser, role_name, r.applicable_permissions(), r, mc);

				}

				future<> grant_applicable_permissions(const service& ser, const authenticated_user& u, const resource& r) {

				future<> grant_applicable_permissions(const service& ser, const authenticated_user& u, const resource& r, ::service::group0_batch& mc) {

				    if (is_anonymous(u)) {

				        return make_ready_future<>();

				    }

				    return grant_applicable_permissions(ser, *u.name, r);

				    return grant_applicable_permissions(ser, *u.name, r, mc);

				}

				future<> revoke_permissions(

				        const service& ser,

				        std::string_view role_name,

				        permission_set perms,

				        const resource& r) {

				    return validate_role_exists(ser, role_name).then([&ser, role_name, perms, &r] {

				        return ser.underlying_authorizer().revoke(role_name, perms, r);

				    });

				        const resource& r,

				        ::service::group0_batch& mc) {

				    co_await validate_role_exists(ser, role_name);

				    co_await ser.underlying_authorizer().revoke(role_name, perms, r, mc);

				}

				future<> revoke_all(const service& ser, const resource& r, ::service::group0_batch& mc) {

				    return ser.underlying_authorizer().revoke_all(r, mc);

				}

				future<std::vector<permission_details>> list_filtered_permissions(

				@@ -634,10 +606,14 @@ future<std::vector<permission_details>> list_filtered_permissions(

				    });

				}

				future<> commit_mutations(service& ser, ::service::group0_batch&& mc) {

				    return ser.commit_mutations(std::move(mc));

				}

				future<> migrate_to_auth_v2(db::system_keyspace& sys_ks, ::service::raft_group0_client& g0, start_operation_func_t start_operation_func, abort_source& as) {

				    // FIXME: if this function fails it may leave partial data in the new tables

				    // that should be cleared

				    auto gen = [&sys_ks] (api::timestamp_type& ts) -> mutations_generator {

				    auto gen = [&sys_ks] (api::timestamp_type ts) -> ::service::mutations_generator {

				        auto& qp = sys_ks.query_processor();

				        for (const auto& cf_name : std::vector<sstring>{

				                "roles", "role_members", "role_attributes", "role_permissions"}) {

				@@ -685,7 +661,7 @@ future<> migrate_to_auth_v2(db::system_keyspace& sys_ks, ::service::raft_group0_

				                }

				                auto muts = co_await qp.get_mutations_internal(

				                        format("INSERT INTO {}.{} ({}) VALUES ({})",

				                                db::system_auth_keyspace::NAME,

				                                db::system_keyspace::NAME,

				                                cf_name,

				                                col_names_str,

				                                val_binders_str),

				@@ -700,12 +676,12 @@ future<> migrate_to_auth_v2(db::system_keyspace& sys_ks, ::service::raft_group0_

				            }

				        }

				        co_yield co_await sys_ks.make_auth_version_mutation(ts,

				                db::system_auth_keyspace::version_t::v2);

				                db::system_keyspace::auth_version_t::v2);

				    };

				    co_await announce_mutations_with_batching(g0,

				            start_operation_func,

				            std::move(gen),

				            &as,

				            as,

				            std::nullopt);

				}

Compare commits

1602 Commits annastuchl ... next-6.1

1 .gitattributes vendored Unescape Escape View File

20 .github/clang-include-cleaner.json vendored Normal file Unescape Escape View File

18 .github/clang-matcher.json vendored Normal file Unescape Escape View File

25 .github/mergify.yml vendored Unescape Escape View File

1 .github/pull_request_template.md vendored Normal file Unescape Escape View File

186 .github/scripts/auto-backport.py vendored Executable file Unescape Escape View File

82 .github/scripts/label_promoted_commits.py vendored Unescape Escape View File

55 .github/workflows/add-label-when-promoted.yaml vendored Unescape Escape View File

35 .github/workflows/build-scylla.yaml vendored Normal file Unescape Escape View File

65 .github/workflows/clang-nightly.yaml vendored Normal file Unescape Escape View File

67 .github/workflows/clang-tidy.yaml vendored Normal file Unescape Escape View File

2 .github/workflows/codespell.yaml vendored Unescape Escape View File

80 .github/workflows/iwyu.yaml vendored Normal file Unescape Escape View File

23 .github/workflows/read-toolchain.yaml vendored Normal file Unescape Escape View File

34 .github/workflows/reproducible-build.yaml vendored Normal file Unescape Escape View File

50 .github/workflows/seastar.yaml vendored Normal file Unescape Escape View File

6 .github/workflows/sync-labels.yaml vendored Unescape Escape View File

4 .gitignore vendored Unescape Escape View File

5 .gitmodules vendored Unescape Escape View File

55 CMakeLists.txt Unescape Escape View File

2 HACKING.md Unescape Escape View File

6 README.md Unescape Escape View File

9 SCYLLA-VERSION-GEN Unescape Escape View File

1 abseil Submodule

3 alternator/CMakeLists.txt Unescape Escape View File

17 alternator/auth.cc Unescape Escape View File

14 alternator/controller.cc Unescape Escape View File

3 alternator/controller.hh Unescape Escape View File

12 alternator/executor.cc Unescape Escape View File

1 alternator/executor.hh Unescape Escape View File

4 alternator/expressions.cc Unescape Escape View File

3 alternator/expressions_types.hh Unescape Escape View File

24 alternator/server.cc Unescape Escape View File

1 alternator/stats.hh Unescape Escape View File

18 alternator/streams.cc Unescape Escape View File

106 alternator/ttl.cc Unescape Escape View File

3 api/CMakeLists.txt Unescape Escape View File

4 api/api-doc/collectd.json Unescape Escape View File

56 api/api-doc/error_injection.json Unescape Escape View File

32 api/api-doc/raft.json Unescape Escape View File

68 api/api-doc/storage_service.json Unescape Escape View File

15 api/api-doc/system.json Unescape Escape View File

2 api/api-doc/utils.json Unescape Escape View File

52 api/api.cc Unescape Escape View File

16 api/api_init.hh Unescape Escape View File

15 api/authorization_cache.hh Unescape Escape View File

7 api/cache_service.cc Unescape Escape View File

7 api/cache_service.hh Unescape Escape View File

4 api/collectd.cc Unescape Escape View File

327 api/column_family.cc Unescape Escape View File

1 api/column_family.hh Unescape Escape View File

37 api/compaction_manager.cc Unescape Escape View File

102 api/config.cc Unescape Escape View File

1 api/config.hh Unescape Escape View File

36 api/error_injection.cc Unescape Escape View File

17 api/failure_detector.cc Unescape Escape View File

1 api/failure_detector.hh Unescape Escape View File

22 api/gossiper.cc Unescape Escape View File

1 api/gossiper.hh Unescape Escape View File

2 api/messaging_service.cc Unescape Escape View File

108 api/raft.cc Unescape Escape View File

2 api/scrub_status.hh Unescape Escape View File

92 api/storage_proxy.cc Unescape Escape View File

327 api/storage_service.cc Unescape Escape View File

13 api/storage_service.hh Unescape Escape View File

4 api/stream_manager.cc Unescape Escape View File

14 api/system.cc Unescape Escape View File

7 api/system.hh Unescape Escape View File

87 api/task_manager.cc Unescape Escape View File

9 api/task_manager_test.cc Unescape Escape View File

11 api/task_manager_test.hh Unescape Escape View File

5 api/tasks.cc Unescape Escape View File

13 api/tasks.hh Unescape Escape View File

2 api/token_metadata.cc Unescape Escape View File

11 api/token_metadata.hh Unescape Escape View File

1 auth/CMakeLists.txt Unescape Escape View File

6 auth/allow_all_authenticator.hh Unescape Escape View File

15 auth/allow_all_authorizer.hh Unescape Escape View File

1602 Commits

annastuchl ... next-6.1

1

.gitattributes vendored

View File

20

.github/clang-include-cleaner.json vendored Normal file

View File

18

.github/clang-matcher.json vendored Normal file

View File

25

.github/mergify.yml vendored

View File

1

.github/pull_request_template.md vendored Normal file

View File

186

.github/scripts/auto-backport.py vendored Executable file

View File

82

.github/scripts/label_promoted_commits.py vendored

View File

55

.github/workflows/add-label-when-promoted.yaml vendored

View File

35

.github/workflows/build-scylla.yaml vendored Normal file

View File

65

.github/workflows/clang-nightly.yaml vendored Normal file

View File

67

.github/workflows/clang-tidy.yaml vendored Normal file

View File

2

.github/workflows/codespell.yaml vendored

View File

80

.github/workflows/iwyu.yaml vendored Normal file

View File

23

.github/workflows/read-toolchain.yaml vendored Normal file

View File

34

.github/workflows/reproducible-build.yaml vendored Normal file

View File

50

.github/workflows/seastar.yaml vendored Normal file

View File

6

.github/workflows/sync-labels.yaml vendored

View File

4

.gitignore vendored

View File

5

.gitmodules vendored

View File

55

CMakeLists.txt

View File

2

HACKING.md

View File

6

README.md

View File

9

SCYLLA-VERSION-GEN

View File

1

abseil Submodule

3

alternator/CMakeLists.txt

View File

17

alternator/auth.cc

View File

14

alternator/controller.cc

View File

3

alternator/controller.hh

View File

12

alternator/executor.cc

View File

1

alternator/executor.hh

View File

4

alternator/expressions.cc

View File

3

alternator/expressions_types.hh

View File

24

alternator/server.cc

View File

1

alternator/stats.hh

View File

18

alternator/streams.cc

View File

106

alternator/ttl.cc

View File

3

api/CMakeLists.txt

View File

4

api/api-doc/collectd.json

View File

56

api/api-doc/error_injection.json

View File

32

api/api-doc/raft.json

View File

68

api/api-doc/storage_service.json

View File

15

api/api-doc/system.json

View File

2

api/api-doc/utils.json

View File

52

api/api.cc

View File

16

api/api_init.hh

View File

15

api/authorization_cache.hh

View File

7

api/cache_service.cc

View File

7

api/cache_service.hh

View File

4

api/collectd.cc

View File

327

api/column_family.cc

View File

1

api/column_family.hh

View File

37

api/compaction_manager.cc

View File

102

api/config.cc

View File

1

api/config.hh

View File

36

api/error_injection.cc

View File

17

api/failure_detector.cc

View File

1

api/failure_detector.hh

View File

22

api/gossiper.cc

View File

1

api/gossiper.hh

View File

2

api/messaging_service.cc

View File

108

api/raft.cc

View File

2

api/scrub_status.hh

View File

92

api/storage_proxy.cc

View File

327

api/storage_service.cc

View File

13

api/storage_service.hh

View File

4

api/stream_manager.cc

View File

14

api/system.cc

View File

7

api/system.hh

View File

87

api/task_manager.cc

View File

9

api/task_manager_test.cc

View File

11

api/task_manager_test.hh

View File

5

api/tasks.cc

View File

13

api/tasks.hh

View File

2

api/token_metadata.cc

View File

11

api/token_metadata.hh

View File

1

auth/CMakeLists.txt

View File

6

auth/allow_all_authenticator.hh

View File

15

auth/allow_all_authorizer.hh

View File

2

auth/authenticated_user.hh

View File