scylladb

Author	SHA1	Message	Date
Anna Stuchlik	933192a0bb	doc: replace Scylla with ScyllaDB in Glossary This commit replaces "Scylla" with "ScyllaDB" on the Glossary page. The product has been rebranded as "ScyllaDB".	2024-04-18 13:10:15 +02:00
Aleksandr Bykov	e8833c6f2a	test: Kill coordinator during topology operation If coordinator node was killed, restarted, become not operatable during topology operation, new coordinator should be elected, operation should be aborted and cluster should be rolled back Error injection will be used to kill the coordinator before streaming starts Closes scylladb/scylladb#16197	2024-04-17 17:24:20 +02:00
Tomasz Grabiec	c6c8347493	migration_manager: Pull all of group0 state on repair Current code uses non-raft path to pull the schema, which violates group0 linearizability because the node will have latest schema but miss group0 updates of other system tables. In particular, system.tablets. This manifests as repair errors due to missing tablet_map for a given table when trying to access it. Tablet map is always created together with the table in the same group0 command. When a node is bootstrapping, repair calls sync_schema() to make sure local schema is up to date. This races with group0 catch up, and if sync_schema() wins, repair may fail on misssing tablet map. Fix by making sync_schema() do a group0 read barrier when in raft mode. Fixes #18002 Closes scylladb/scylladb#18175	2024-04-17 16:21:05 +02:00
Nadav Har'El	e78fc75323	Merge 'tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands' from Botond Dénes Just like all the other commands already have it. These commands didn't have documentation at the point where they were implemented, hence the missing doc link. The links don't work yet, but they will work once we release 6.0 and the current master documentation is promoted to stable. Closes scylladb/scylladb#18147 * github.com:scylladb/scylladb: tools/scylla-nodetool: fix typo: Fore -> For tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands	2024-04-17 15:15:56 +03:00
Asias He	642f9a1966	repair: Improve estimated_partitions to reduce memory usage Currently, we use the sum of the estimated_partitions from each participant node as the estimated_partitions for sstable produced by repair. This way, the estimated_partitions is the biggest possible number of partitions repair would write. Since repair will write only the difference between repair participant nodes, using the biggest possible estimation will overestimate the partitions written by repair, most of the time. The problem is that overestimated partitions makes the bloom filter consume more memory. It is observed that it causes OOM in the field. This patch changes the estimation to use a fraction of the average partitions per node instead of sum. It is still not a perfect estimation but it already improves memory usage significantly. Fixes #18140 Closes scylladb/scylladb#18141	2024-04-17 14:31:38 +03:00
Kefu Chai	e431e7dc16	test: paritioner_test: print using fmt::print() instead of using `operator<<`, use `fmt::print()` to format and print, so we can ditch the `operator<<`-based formatters. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18259	2024-04-17 07:13:20 +03:00
Kefu Chai	0ff28b2a2a	test: extract boost_test_print_type() into test_utils.hh since Boost.Test relies on operator<< or `boost_test_print_type()` to print the value of variables being compared, instead of defining the fallback formatter of `boost_test_print_type()` for each individual test, let's define it in `test/lib/test_utils.hh`, so that it can be shared across tests. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18260	2024-04-17 07:12:39 +03:00
Kefu Chai	2bb8e7c3c3	utils: include "seastarx.hh" in composite_abort_source.hh there is chance that `utils/small_vector.hh` does not include `using namespace seastar`, and even if it does, we should not rely on it. but if it does not, checkhh would fail. so let's include "seastarx.hh" in this header, so it is self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18265	2024-04-17 07:11:01 +03:00
David Garcia	6707bc673c	docs: update theme 1.7 Closes scylladb/scylladb#18252	2024-04-16 13:48:11 +02:00
Kamil Braun	eb9ba914a3	Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy The problem this series solves is correctly ignoring DOWN nodes state when replacing a node. When a node is replaced and there are other nodes that are down, the replacing node is told to ignore those DOWN nodes using the `ignore_dead_nodes_for_replace` option. Since the replacing node is bootstrapping it starts with an empty system.peers table so it has no notion about any node state and it learns about all other nodes via gossip shadow round done in `storage_service::prepare_replacement_info`. Normally, since the DOWN nodes to ignore already joined the ring, the remaining node will have their endpoint state already in gossip, but if the whole cluster was restarted while those DOWN nodes did not start, the remaining nodes will only have a partial endpoint state from them, which is loaded from system.peers. Currently, the partial endpoint state contains only `HOST_ID` and `TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`. The first part of this series loads also `DC` and `RACK` from system.peers to make them available to the replacing node as they are crucial for building a correct replication map with network topology replication strategy. But still, without a `STATUS` those nodes are not considered as normal token owners yet, and they do not go through handle_state_normal which adds them to the topology and token_metadata. The second part of this series uses the endpoint state retrieved in the gossip shadow round to explicitly add the ignored nodes' state to topology (including dc and rack) and token_metadata (tokens) in `prepare_replacement_info`. If there are more DOWN nodes that are not explicitly ignored replace will fail (as it should). Fixes scylladb/scylladb#15787 Closes scylladb/scylladb#15788 * github.com:scylladb/scylladb: storage_service: join_token_ring: load ignored nodes state if replacing storage_service: replacement_info: return ignore_nodes state locator: host_id_or_endpoint: keep value as variant gms: endpoint_state: add getters for host_id, dc_rack, and tokens storage_service: topology_state_load: set local STATUS state using add_saved_endpoint gossiper: add_saved_endpoint: set dc and rack gossiper: add_saved_endpoint: fixup indentation gossiper: add_saved_endpoint: make host_id mandatory gossiper: add load_endpoint_state gossiper: start_gossiping: log local state	2024-04-16 10:27:36 +02:00
Pavel Emelyanov	2c3d6fe72f	storage_proxy: Simplify create_hint_sync_point() code It tries to call container().invoke_on_all() the hard way. Calling it directly is not possible, because there's no sharded::invoke_on_all() const overload Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18202	2024-04-16 07:26:06 +03:00
Nadav Har'El	a175e34375	cql-pytest: add instructions on how to get Cassandra The cql-pytest framework allows running tests also against Cassandra, but developers need to install Cassandra on their own because modern distributions such as Fedora no longer carry a Cassandra package. This patch adds clear and easy to follow (I think) instructions on how to download a pre-compiled Cassadra, or alternatively how to download and build Cassandra from source - and how either can be used with the test/cql-pytest/run-cassandra script. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18138	2024-04-16 07:23:36 +03:00
Botond Dénes	298a7fcbf2	Merge 'Drain view_builder in generic drain' from ScyllaDB For view builder draining there's dedicated deferred action in main while all other services that need to be drained do it via storage_service. The latter is to unify shutdown for services and to make `nodetool drain` drain everything, not just some part of those. This PR makes view builder drain look the same. As a side effect it also moves `mark_existing_views_as_built` from storage service to view builder and generalizes this marking code inside view builder itself. refs: #2737 refs: #2795 Closes scylladb/scylladb#16558 * github.com:scylladb/scylladb: storage_service: Drain view builder on drain too view_builder: Generalize mark_as_built(view_ptr) method view_builder: Move mark_existing_views_as_built from storage service storage_service: Add view_builder& reference main,cql_test_env: Move view_builder start up (and make unconditional)	2024-04-16 07:21:42 +03:00
Pavel Emelyanov	5cf53e670d	replica: Remove unused ex variable from table::take_snapshot Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18215	2024-04-16 07:16:38 +03:00
Pavel Emelyanov	f17c594d21	large_data_handler: If-less statistics increment The partitions_bigger_than_threshold is incremented only if the previous check detects that the partition exceeds a threshold by its size. It's done with an extra if, but it can be done without (explicit) condition as bool type is guaranteed by the standard to convert into integers as true = 1 and false = 0 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18217	2024-04-16 07:16:05 +03:00
Pavel Emelyanov	0f70d276d2	tools/scylla-sstable: Use shorter check is unordered_set contains a key Currentl code counts the number of keys in it just to see if this number is non-zero. Using .contains() method is better fit here Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18219	2024-04-16 07:14:48 +03:00
Pavel Emelyanov	1df7c2a0e9	topology_coordinator: Mark retake_node() const Runaway from `4d83a8c12c` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18218	2024-04-16 07:13:07 +03:00
Pavel Emelyanov	05c4042511	api/lsa: Don't use database to perform invoke-on-all The sharded<database> is used as a invoke_in_all() method provider, there's no real need in database itself. Simple smp::invoke_on_all() would work just as good. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18221	2024-04-16 07:12:40 +03:00
Pavel Emelyanov	4a6291dce5	test/sstable: Use .handle_exception_type() shortcut Some tests want to ignore out_of_range exception in continuation and go the longer route for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18216	2024-04-16 07:11:35 +03:00
Pavel Emelyanov	1612aa01ca	cql3: Reserve vector with pk columns When constructing a vector with partition key data, the size of that vector is known beforehand Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18239	2024-04-16 07:06:07 +03:00
Pavel Emelyanov	f3edde7d2e	api: Qualify callback commitlog* argument with const There's a helper map-reducer that accepts a function to call on commitlog. All callers accumulate statistics with it, so the commitlog argument is const pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18238	2024-04-16 07:02:31 +03:00
Botond Dénes	162c9ad6f6	Merge 'gossiper: lock local endpoint when updating heart_beat' from Kamil Braun In testing, we've observed multiple cases where nodes would fail to observe updated application states of other nodes in gossiper. For example: - in scylladb/scylladb#16902, a node would finish bootstrapping and enter NORMAL state, propagating this information through gossiper. However, other nodes would never observe that the node entered NORMAL state, still thinking that it is in joining state. This would lead to further bad consequences down the line. - in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for schema versions to converge. Convergence would never be achieved and the test eventually timed out. The node was observing outdated schema state of some existing node in gossip. I created a test that would bootstrap 3 nodes, then wait until they all observe each other as NORMAL, with timeout. Unfortunately, thousands of runs of this test on different machines failed to reproduce the problem. After banging my head against the wall failing to reproduce, I decided to sprinkle randomized sleeps across multiple places in gossiper code and finally: the test started catching the problem in about 1 in 1000 runs. With additional logging and additional head-banging, I determined the root cause. The following scenario can happen, 2 nodes are sufficient, let's call them A and B: - Node B calls `add_local_application_state` to update its gossiper state, for example, to propagate its new NORMAL status. - `add_local_application_state` takes a copy of the endpoint_state, and updates the copy: ``` auto local_state = ep_state_before; for (auto& p : states) { auto& state = p.first; auto& value = p.second; value = versioned_value::clone_with_higher_version(value); local_state.add_application_state(state, value); } ``` `clone_with_higher_version` bumps `version` inside gms/version_generator.cc. - `add_local_application_state` calls `gossiper.replicate(...)` - `replicate` works in 2 phases to achieve exception safety: in first phase it copies the updated `local_state` to all shards into a separate map. In second phase the values from separate map are used to overwrite the endpoint_state map used for gossiping. Due to the cross-shard calls of the 1 phase, there is a yield before the second phase. During this yield* the following happens: - `gossiper::run()` loop on B executes and bumps node B's `heart_beat`. This uses the monotonic version_generator, so it uses a higher version then the ones we used for states added above. Let's call this new version X. Note that X is larger than the versions used by application_states added above. - now node B handles a SYN or ACK message from node A, creating an ACK or ACK2 message in response. This message contains: - old application states (NOT including the update described above, because `replicate` is still sleeping before phase 2), - but bumped heart_beat == X from `gossiper::run()` loop, and sends the message. - node A receives the message and remembers that the max version across all states (including heart_beat) of node B is X. This means that it will no longer request or apply states from node B with versions smaller than X. - `gossiper.replicate(...)` on B wakes up, and overwrites endpoint_state with the ones it saved in phase 1. In particular it reverts heart_beat back to smaller value, but the larger problem is that it saves updated application_states that use versions smaller than X. - now when node B sends the updated application_states in ACK or ACK2 message to node A, node A will ignore them, because their versions are smaller than X. Or node B will never send them, because whenever node A requests states from node B, it only requests states with versions > X. Either way, node A will fail to observe new states of node B. If I understand correctly, this is a regression introduced in `38c2347a3c`, which introduced a yield in `replicate`. Before that, the updated state would be saved atomically on shard 0, there could be no `heart_beat` bump in-between making a copy of the local state, updating it, and then saving it. With the description above, it's easy to make a consistent reproducer for the problem -- introduce a longer sleep in `add_local_application_state` before second phase of replicate, to increase the chance that gossiper loop will execute and bump heart_beat version during the yield. Further commit adds a test based on that. The fix is to bump the heart_beat under local endpoint lock, which is also taken by `replicate`. The PR also adds a regression test. Fixes: scylladb/scylladb#15393 Fixes: scylladb/scylladb#15602 Fixes: scylladb/scylladb#16668 Fixes: scylladb/scylladb#16902 Fixes: scylladb/scylladb#17493 Fixes: scylladb/scylladb#18118 Ref: scylladb/scylla-enterprise#3720 Closes scylladb/scylladb#18184 * github.com:scylladb/scylladb: test: reproducer for missing gossiper updates gossiper: lock local endpoint when updating heart_beat	2024-04-16 06:46:24 +03:00
Tzach Livyatan	289793d964	Update Driver root page The right term is Amazon DynamoDB not AWS DynamoDB See https://aws.amazon.com/dynamodb/ Closes scylladb/scylladb#18214	2024-04-16 06:41:28 +03:00
Beni Peled	223275b4d1	test.py: add the pytest junit_suite_name parameter By default the suitename in the junit files generated by pytest is named `pytest` for all suites instead of the suite, ex. `topology_experimental_raft` With this change, the junit files will use the real suitename This change doesn't affect the Test Report in Jenkins, but it raised part of the other task of publishing the test results to elasticsearch https://github.com/scylladb/scylla-pkg/pull/3950 where we parse the XMLs and we need the correct suitename Closes scylladb/scylladb#18172	2024-04-15 21:07:00 +03:00
Tomasz Grabiec	95d93c1668	Merge 'Extend tablet_transition_kind::rebuild to remove replicas' from Pavel Emelyanov When altering rf for a keyspace, all tablets in this ks may have less replicas. Part of this process is removing replicas from some node(s). This PR extends the tablets rebuild transition to handle this case by making pending_replica optional. fixes: #18176 Closes scylladb/scylladb#18203 * github.com:scylladb/scylladb: test: Tune up tablet-transition test to check del_replica api: Add method to delete replica from tablet tablet: Make pending replica optional	2024-04-15 21:01:03 +03:00
Pavel Emelyanov	c60639d582	sstables: Coroutinize drop_caches() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18220	2024-04-15 17:22:59 +03:00
Pavel Emelyanov	b06b85c270	test: Tune up tablet-transition test to check del_replica For that the test case is modified to have 3 nodes and 2 replicas on start. Existing test cases are changed slightly in the way "from" host is detected. Also, the final check for data presense is modified to check that hosts in "replicas" have data and other hosts don't have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Pavel Emelyanov	8bad828208	api: Add method to delete replica from tablet Copied from the add_replica counterpart TODO: Generalize common parts of move_tablet and add_\|del_tablet_replica Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Pavel Emelyanov	725b2863d2	tablet: Make pending replica optional Just like leaving replica could be optional when adding replica to tablet, the pending replica can be optional too if we're removing a replica from tablet Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Amnon Heiman	06dc56df01	Update seastar submodule Fixes scylladb/scylladb#18083 * seastar cd8a9133...f3058414 (18): > src/core/metrics.cc: rewrite set_metric_family_configs > include/seastar/core/metrics_api.hh: Revert d2929c2ade5bd0125a73d53280c82ae5da86218e > sstring: include <fmt/format.h> instead of <fmt/ostream.h> > seastar.cc: include used header > tls: include used header of <unordered_set> > docs: remove unused parameter from handle_connection function of echo-HTTP-server tutorial example > stall-analyser: use 0 for the default value of --width > http: Move parsed params and urls > scripts: use raw string to avoid invalid escape sequences > timed_out_error: add fmt::formatter for timed_out_error > scripts/stall-analyser: change default branch-threshold to 3% > scripts/stall-analyser: resolve string escape sequence warning > io_queue: Use static vector for fair groups too > io_queue: Use static vector to store fair queues > stall-analyser: add space around '=' in param list > stall-analyser: add a space between 'var: Type' in type annotation > stall-analyser: move variables closer to where they are used > memory: drop support for compilers that don't support aligned new Closes scylladb/scylladb#18235	2024-04-15 15:19:59 +02:00
Tomasz Grabiec	2ceef1d600	scripts: tablet-mon.py: Support for annotating tablets by table id Closes scylladb/scylladb#18225	2024-04-15 15:19:59 +02:00
Benny Halevy	655d624e01	storage_service: join_token_ring: load ignored nodes state if replacing When a node bootstraps or replaces a node after full cluster shutdown and restart, some nodes may be down. Existing nodes in the cluster load the down nodes TOKENS (and recently, in this series, also DC and RACK) from system.peers and then populate locator::topology and token_metadata accordingly with the down nodes' tokens in storage_service::join_cluster. However, a bootstrapping/replacing node has no persistent knowledge of the down nodes, and it learns about their existance only from gossip. But since the down nodes have unknown status, they never go through `handle_state_normal` (in gossiper mode) and therefore they are not accounted as normal token owners. This is handled by `topology_state_load`, but not with gossip-based node operations. This patch updates the ignored nodes (for replace) state in topology and token_metadata as if they were loaded from system tables, after calling `prepare_replacement_info` when raft topology changes are disabled, based on the endpoint_state retrieved in the shadow round initiated in prepare_replacement_info. Fixes scylladb/scylladb#15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:45:55 +03:00
Benny Halevy	e4c3c07510	storage_service: replacement_info: return ignore_nodes state Instead of `parse_node_list` resolving host ids to inet_address let `prepare_replacement_info` get host_id_or_endpoint from parse_node_list and prepare `loaded_endpoint_state` for the ignored nodes so it can be used later by the callers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:43:19 +03:00
Benny Halevy	7c2bd8dc34	locator: host_id_or_endpoint: keep value as variant Rather than allowing to keep both host_id and endpoint, keep only one of them and provide resolve functions that use the token_metadata to resolve the host_id into an inet_address or vice verse. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:25:50 +03:00
Benny Halevy	86f1fcdcdd	gms: endpoint_state: add getters for host_id, dc_rack, and tokens Allow getting metadata from the endpoint_state based on the respective application states instead of going through the gossiper. To be used by the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:16:58 +03:00
Benny Halevy	239069eae5	storage_service: topology_state_load: set local STATUS state using add_saved_endpoint When loading this node endpoint state and it has tokens in token_metadata, its status can already be set to normal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	6aaa1b0f48	gossiper: add_saved_endpoint: set dc and rack When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following https://github.com/scylladb/scylladb/pull/16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs #15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	468462aa73	gossiper: add_saved_endpoint: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	b9e2aa4065	gossiper: add_saved_endpoint: make host_id mandatory Require all callers to provide a valid host_id parameter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	1061455442	gossiper: add load_endpoint_state Pack the topology-related data loaded from system.peers in `gms::load_endpoint_state`, to be used in a following patch for `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:56 +03:00
Benny Halevy	6b2d94045a	gossiper: start_gossiping: log local state The trace level message hides important information about the initial node state in gossip. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:30 +03:00
Kefu Chai	0be61e51d3	treewide: include <fmt/ostream.h> this header was previously brought in by seastar's sstring.hh. but since sstring.hh does not include <fmt/ostream.h> anymore, `gms/application_state.cc` does not have access to this header. also, `gms/application_state.cc` should `#include` the used header by itself. so, in this change, let's include <fmt/ostream.h> in `gms/application_state.cc`. this change addresses the FTBFS with the latest seastar. the same applies to other places changed in this commit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18193	2024-04-11 11:59:41 +03:00
Pavel Emelyanov	1e0d96cfed	storage_service: Drain view builder on drain too This gets rid of dangling deferred drin on stop and makes nodetool drain more "consistent" by stopping one more unneeded background activity Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:12 +03:00
Pavel Emelyanov	90593f4e82	view_builder: Generalize mark_as_built(view_ptr) method Marking is performed in two places and they can be generalized Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:12 +03:00
Pavel Emelyanov	3c3f2cd337	view_builder: Move mark_existing_views_as_built from storage service Now it's in the correct component Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:56:11 +03:00
Pavel Emelyanov	895391fb4b	storage_service: Add view_builder& reference Storage service will need to drain v.b. on its drain. Also on cluster join it marks existing views as built while it's v.b.'s job to do it. Both will be fixed by next patching and this is prerequisite. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:55:07 +03:00
Pavel Emelyanov	f00f1f117b	main,cql_test_env: Move view_builder start up (and make unconditional) Just starting sharded<view_builder> is lightweight, its constructor does nothing but initializes on-board variables. Real work takes off on view_builder::start() which is not moved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-05 19:53:33 +03:00
Botond Dénes	c01b19fcb3	Merge 'test/boost: add test for writing large partition notifications' from Ferenc Szili The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions. This change adds this check for partitions. Closes scylladb/scylladb#18189 * github.com:scylladb/scylladb: test/boost: added test for large row count warning test/boost: add test for writing large partition notifications	2024-04-05 15:35:54 +03:00
Botond Dénes	f6efa17713	Merge 'repair: fix memory counting in repair' from Aleksandra Martyniuk Repair memory limit includes only the size of frozen mutation fragments in repair row. The size of other members of repair row may grow uncontrollably and cause out of memory. Modify what's counted to repair memory limit. Fixes: #16710. Closes scylladb/scylladb#17785 * github.com:scylladb/scylladb: test: add test for repair_row::size() repair: fix memory accounting in repair_row	2024-04-05 14:53:55 +03:00
Tomasz Grabiec	0c74c2c12f	Merge 'Extend tablet_transition_kind::rebuild to rebuild tablet to new replica' from Pavel Emelyanov When altering rf for a keyspace, all tablets in this ks will get more replicas. Part of this process is rebuilding tablets' onto new node(s). This PR extends the tablets transition code to support rebuilding of tablet on new replica. fixes: #18030 Closes scylladb/scylladb#18082 * github.com:scylladb/scylladb: test: Check data presense as well test: Test how tablets are copied between nodes test: Add sanity test for tablet migration api: Add method to add replica to a tablet tablet: Make leaving replica optional	2024-04-05 12:51:10 +02:00
Ferenc Szili	443192e36d	test/boost: added test for large row count warning	2024-04-05 11:50:09 +02:00
Pavel Emelyanov	639cc1f576	compaction: Replace formatted_sstables_list with fmt:: facilities The formatted_sstables_list is auxiliary class that collects a bunch of sstables::to_string(shared_sstable)-generated strings. One of bad side effects of this helper is that it allocates memory for the vector of strings. This patch achieves the same goal with the help of fmt::join() equipped with transformed boost adaptor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18160	2024-04-05 09:17:15 +03:00
Kefu Chai	ff43628b44	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18194	2024-04-05 08:48:17 +03:00
Pavel Emelyanov	2a98e95cd0	api: Coroutinize API get_snapshot_details handler Now it's possible to understand what it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18190	2024-04-04 22:20:28 +03:00
Kamil Braun	72955093eb	test: reproducer for missing gossiper updates Regression test for scylladb/scylladb#17493.	2024-04-04 18:47:01 +02:00
Kamil Braun	a0b331b310	gossiper: lock local endpoint when updating heart_beat In testing, we've observed multiple cases where nodes would fail to observe updated application states of other nodes in gossiper. For example: - in scylladb/scylladb#16902, a node would finish bootstrapping and enter NORMAL state, propagating this information through gossiper. However, other nodes would never observe that the node entered NORMAL state, still thinking that it is in joining state. This would lead to further bad consequences down the line. - in scylladb/scylladb#15393, a node got stuck in bootstrap, waiting for schema versions to converge. Convergence would never be achieved and the test eventually timed out. The node was observing outdated schema state of some existing node in gossip. I created a test that would bootstrap 3 nodes, then wait until they all observe each other as NORMAL, with timeout. Unfortunately, thousands of runs of this test on different machines failed to reproduce the problem. After banging my head against the wall failing to reproduce, I decided to sprinkle randomized sleeps across multiple places in gossiper code and finally: the test started catching the problem in about 1 in 1000 runs. With additional logging and additional head-banging, I determined the root cause. The following scenario can happen, 2 nodes are sufficient, let's call them A and B: - Node B calls `add_local_application_state` to update its gossiper state, for example, to propagate its new NORMAL status. - `add_local_application_state` takes a copy of the endpoint_state, and updates the copy: ``` auto local_state = ep_state_before; for (auto& p : states) { auto& state = p.first; auto& value = p.second; value = versioned_value::clone_with_higher_version(value); local_state.add_application_state(state, value); } ``` `clone_with_higher_version` bumps `version` inside gms/version_generator.cc. - `add_local_application_state` calls `gossiper.replicate(...)` - `replicate` works in 2 phases to achieve exception safety: in first phase it copies the updated `local_state` to all shards into a separate map. In second phase the values from separate map are used to overwrite the endpoint_state map used for gossiping. Due to the cross-shard calls of the 1 phase, there is a yield before the second phase. During this yield* the following happens: - `gossiper::run()` loop on B executes and bumps node B's `heart_beat`. This uses the monotonic version_generator, so it uses a higher version then the ones we used for states added above. Let's call this new version X. Note that X is larger than the versions used by application_states added above. - now node B handles a SYN or ACK message from node A, creating an ACK or ACK2 message in response. This message contains: - old application states (NOT including the update described above, because `replicate` is still sleeping before phase 2), - but bumped heart_beat == X from `gossiper::run()` loop, and sends the message. - node A receives the message and remembers that the max version across all states (including heart_beat) of node B is X. This means that it will no longer request or apply states from node B with versions smaller than X. - `gossiper.replicate(...)` on B wakes up, and overwrites endpoint_state with the ones it saved in phase 1. In particular it reverts heart_beat back to smaller value, but the larger problem is that it saves updated application_states that use versions smaller than X. - now when node B sends the updated application_states in ACK or ACK2 message to node A, node A will ignore them, because their versions are smaller than X. Or node B will never send them, because whenever node A requests states from node B, it only requests states with versions > X. Either way, node A will fail to observe new states of node B. If I understand correctly, this is a regression introduced in `38c2347a3c`, which introduced a yield in `replicate`. Before that, the updated state would be saved atomically on shard 0, there could be no `heart_beat` bump in-between making a copy of the local state, updating it, and then saving it. With the description above, it's easy to make a consistent reproducer for the problem -- introduce a longer sleep in `add_local_application_state` before second phase of replicate, to increase the chance that gossiper loop will execute and bump heart_beat version during the yield. Further commit adds a test based on that. The fix is to bump the heart_beat under local endpoint lock, which is also taken by `replicate`. Fixes: scylladb/scylladb#15393 Fixes: scylladb/scylladb#15602 Fixes: scylladb/scylladb#16668 Fixes: scylladb/scylladb#16902 Fixes: scylladb/scylladb#17493 Fixes: scylladb/scylladb#18118 Ref: scylladb/scylla-enterprise#3720	2024-04-04 18:46:56 +02:00
Ferenc Szili	5624abfbeb	test/boost: add test for writing large partition notifications The current test in boost/cql_query_large_test::test_large_data only checks whether notifications for large rows and cells are written into the system keyspace. It doesn't check this for partitions. This change adds this check for partitions.	2024-04-04 17:33:23 +02:00
Pavel Emelyanov	c7908c319f	test: Check data presense as well Other than making sure that system.tablets is updated with correct replica set, it's also good to check that the data is present on the repsective nodes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 18:01:24 +03:00
Aleksandra Martyniuk	51c09a84cc	test: add test for repair_row::size() Add test which checs whether repair_row::size() considers external memory.	2024-04-04 16:03:05 +02:00
Aleksandra Martyniuk	a4dc6553ab	repair: fix memory accounting in repair_row In repair, only the size of frozen mutation fragments of repair row is counted to the memory limit. So, huge keys of repair rows may lead to OOM. Include other repair_row's members' memory size in repair memory limit.	2024-04-04 15:50:53 +02:00
Raphael S. Carvalho	9f93dd9fa3	replica: Use flat_hash_map for tablet storage The reason that we want to switch to flat_hash_map is that only a small subset of tablets will be allocated on any given shard, therefore it's wasteful to use a sparse array, and iterations are slow. Also, the map gives greater development flexibility as one doesn't have to worry about empty entries. perf result: -- reads scylla_with_chunked_vector-read-no-tablets.txt median 73223.28 tps ( 62.3 allocs/op, 13.3 tasks/op, 41932 insns/op, 0 errors) median 74952.87 tps ( 62.3 allocs/op, 13.3 tasks/op, 41969 insns/op, 0 errors) median 73016.37 tps ( 62.3 allocs/op, 13.3 tasks/op, 41934 insns/op, 0 errors) median 74078.14 tps ( 62.3 allocs/op, 13.3 tasks/op, 41938 insns/op, 0 errors) median 75323.07 tps ( 62.3 allocs/op, 13.3 tasks/op, 41944 insns/op, 0 errors) scylla_with_hash_map-read-no-tablets.txt median 74963.30 tps ( 62.3 allocs/op, 13.3 tasks/op, 41926 insns/op, 0 errors) median 74032.09 tps ( 62.3 allocs/op, 13.3 tasks/op, 41918 insns/op, 0 errors) median 74850.09 tps ( 62.3 allocs/op, 13.3 tasks/op, 41937 insns/op, 0 errors) median 74239.37 tps ( 62.3 allocs/op, 13.3 tasks/op, 41921 insns/op, 0 errors) median 74798.14 tps ( 62.3 allocs/op, 13.3 tasks/op, 41925 insns/op, 0 errors) scylla_with_chunked_vector-read-tablets-1.txt median 74234.27 tps ( 62.1 allocs/op, 13.3 tasks/op, 41903 insns/op, 0 errors) median 75775.98 tps ( 62.1 allocs/op, 13.3 tasks/op, 41910 insns/op, 0 errors) median 76481.56 tps ( 62.1 allocs/op, 13.2 tasks/op, 41874 insns/op, 0 errors) median 74056.67 tps ( 62.1 allocs/op, 13.3 tasks/op, 41894 insns/op, 0 errors) median 75287.68 tps ( 62.1 allocs/op, 13.3 tasks/op, 41894 insns/op, 0 errors) scylla_with_hash_map-read-tablets-1.txt median 75613.63 tps ( 62.1 allocs/op, 13.2 tasks/op, 41990 insns/op, 0 errors) median 74819.51 tps ( 62.1 allocs/op, 13.2 tasks/op, 41973 insns/op, 0 errors) median 75648.41 tps ( 62.1 allocs/op, 13.3 tasks/op, 42025 insns/op, 0 errors) median 74170.89 tps ( 62.1 allocs/op, 13.2 tasks/op, 42002 insns/op, 0 errors) median 75447.72 tps ( 62.1 allocs/op, 13.3 tasks/op, 41952 insns/op, 0 errors) scylla_with_chunked_vector-read-tablets-128.txt median 73788.57 tps ( 62.1 allocs/op, 13.2 tasks/op, 41956 insns/op, 0 errors) median 76563.63 tps ( 62.1 allocs/op, 13.3 tasks/op, 42006 insns/op, 0 errors) median 75536.12 tps ( 62.1 allocs/op, 13.2 tasks/op, 42005 insns/op, 0 errors) median 74679.17 tps ( 62.1 allocs/op, 13.3 tasks/op, 41958 insns/op, 0 errors) median 75380.95 tps ( 62.1 allocs/op, 13.2 tasks/op, 41946 insns/op, 0 errors) scylla_with_hash_map-read-tablets-128.txt median 75459.99 tps ( 62.1 allocs/op, 13.3 tasks/op, 42055 insns/op, 0 errors) median 74280.11 tps ( 62.1 allocs/op, 13.3 tasks/op, 42085 insns/op, 0 errors) median 74502.61 tps ( 62.1 allocs/op, 13.3 tasks/op, 42063 insns/op, 0 errors) median 74692.27 tps ( 62.1 allocs/op, 13.3 tasks/op, 41994 insns/op, 0 errors) median 75402.64 tps ( 62.1 allocs/op, 13.3 tasks/op, 42015 insns/op, 0 errors) -- writes scylla_with_chunked_vector-write-no-tablets.txt median 68635.17 tps ( 58.4 allocs/op, 13.3 tasks/op, 52709 insns/op, 0 errors) median 68716.36 tps ( 58.4 allocs/op, 13.3 tasks/op, 52691 insns/op, 0 errors) median 68512.76 tps ( 58.4 allocs/op, 13.3 tasks/op, 52721 insns/op, 0 errors) median 68606.14 tps ( 58.4 allocs/op, 13.3 tasks/op, 52696 insns/op, 0 errors) median 68619.25 tps ( 58.4 allocs/op, 13.3 tasks/op, 52697 insns/op, 0 errors) scylla_with_hash_map-write-no-tablets.txt median 67678.10 tps ( 58.4 allocs/op, 13.3 tasks/op, 52723 insns/op, 0 errors) median 67966.06 tps ( 58.4 allocs/op, 13.3 tasks/op, 52736 insns/op, 0 errors) median 67881.47 tps ( 58.4 allocs/op, 13.3 tasks/op, 52743 insns/op, 0 errors) median 67856.81 tps ( 58.4 allocs/op, 13.3 tasks/op, 52730 insns/op, 0 errors) median 67812.58 tps ( 58.4 allocs/op, 13.3 tasks/op, 52740 insns/op, 0 errors) scylla_with_chunked_vector-write-tablets-1.txt median 67741.83 tps ( 58.4 allocs/op, 13.3 tasks/op, 53425 insns/op, 0 errors) median 68014.20 tps ( 58.4 allocs/op, 13.3 tasks/op, 53455 insns/op, 0 errors) median 68228.48 tps ( 58.4 allocs/op, 13.3 tasks/op, 53447 insns/op, 0 errors) median 67950.96 tps ( 58.4 allocs/op, 13.3 tasks/op, 53443 insns/op, 0 errors) median 67832.69 tps ( 58.4 allocs/op, 13.3 tasks/op, 53462 insns/op, 0 errors) scylla_with_hash_map-write-tablets-1.txt median 66873.70 tps ( 58.4 allocs/op, 13.3 tasks/op, 53548 insns/op, 0 errors) median 67568.23 tps ( 58.4 allocs/op, 13.3 tasks/op, 53547 insns/op, 0 errors) median 67653.70 tps ( 58.4 allocs/op, 13.3 tasks/op, 53525 insns/op, 0 errors) median 67389.21 tps ( 58.4 allocs/op, 13.3 tasks/op, 53536 insns/op, 0 errors) median 67437.91 tps ( 58.4 allocs/op, 13.3 tasks/op, 53537 insns/op, 0 errors) scylla_with_chunked_vector-write-tablets-128.txt median 67115.41 tps ( 58.3 allocs/op, 13.3 tasks/op, 53341 insns/op, 0 errors) median 66836.07 tps ( 58.3 allocs/op, 13.3 tasks/op, 53342 insns/op, 0 errors) median 67214.07 tps ( 58.3 allocs/op, 13.3 tasks/op, 53303 insns/op, 0 errors) median 67198.25 tps ( 58.3 allocs/op, 13.3 tasks/op, 53347 insns/op, 0 errors) median 67368.78 tps ( 58.3 allocs/op, 13.3 tasks/op, 53374 insns/op, 0 errors) scylla_with_hash_map-write-tablets-128.txt median 66273.50 tps ( 58.3 allocs/op, 13.3 tasks/op, 53400 insns/op, 0 errors) median 66564.89 tps ( 58.3 allocs/op, 13.3 tasks/op, 53432 insns/op, 0 errors) median 66568.52 tps ( 58.3 allocs/op, 13.3 tasks/op, 53408 insns/op, 0 errors) median 66368.00 tps ( 58.3 allocs/op, 13.3 tasks/op, 53441 insns/op, 0 errors) median 66293.55 tps ( 58.3 allocs/op, 13.3 tasks/op, 53408 insns/op, 0 errors) Fixes #18010. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18093	2024-04-04 16:25:48 +03:00
Yaniv Kaul	2ce2649ec1	Typo: you -> your Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#17806	2024-04-04 14:55:46 +03:00
Nadav Har'El	c24bc3b57a	alternator: do not use tablets on new Alternator tables A few months ago, in merge `d3c1be9107`, we decided that if Scylla has the experimental "tablets" feature enabled, new Alternator tables should use this feature by default - exactly like this is the default for new CQL tables. Sadly, it was now decided to reverse this decision: We do not yet trust enough LWT on tablets, and since Alternator often (if not always) relies on LWT, we want Alternator tables to continue to use vnodes - not tablets. The fix is trivial - just changing the default. No test needed to change because anyway, all Alternator tests work correctly on Scylla with the tablets experimental feature disabled. I added a new test to enshrine the fact that Alternator does not use tablets. An unfortunate result of this patch will be that Alternator tables created on versions with this patch (e.g., Scylla 6.0) will not use tablets and will continue to not use tablets even if Scylla is upgraded (currently, the use of tablets is decided at table creation time, and there is no way to "upgrade" a vnode-based table to be tablet based). This patch should be reverted as soon as LWT support matures on tablets. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18157	2024-04-04 12:11:29 +03:00
Pavel Emelyanov	1c1004d1bd	sstables_loader: Format list of sstables' filenames in place Loader wants to print set of sstables' names. For that it collects names into a dedicated vector, then prints it using fmt/ranges facility. There's a way to achieve the same goal without allocating extra vector with names -- use fmt::format() and pass it a range converting sstables into their names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18159	2024-04-04 12:09:52 +03:00
Ferenc Szili	f1cc6252fd	logging: Don't log PK/CK in large partition/row/cell warning Currently, Scylla logs a warning when it writes a cell, row or partition which are larger than certain configured sizes. These warnings contain the partition key and in case of rows and cells also the cluster key which allow the large row or partition to be identified. However, these keys can contain user-private, sensitive information. The information which identifies the partition/row/cell is also inserted into tables system.large_partitions, system.large_rows and system.large_cells respectivelly. This change removes the partition and cluster keys from the log messages, but still inserts them into the system tables. The logged data will look like this: Large cells: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large cell ks_name/tbl_name: cell_name (SIZE bytes) to sstable.db Large rows: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large row ks_name/tbl_name: (SIZE bytes) to sstable.db Large partitions: WARN 2024-04-02 16:49:48,602 [shard 3: mt] large_data - Writing large partition ks_name/tbl_name: (SIZE bytes) to sstable.db Fixes #18041 Closes scylladb/scylladb#18166	2024-04-04 12:06:31 +03:00
Kefu Chai	3b50c39a83	scylla-gdb: access io_queue::_streams and io_queue::_fgs with static_vector in seastar's b28342fa5a301de3facf5e83dc691524a6b20604, we switched * `io_queue::_streams` from `boost::container::small_vector<fair_queue, 2>` to `boost::container::static_vector<fair_queue, 2>` * `io_queue::_fgs` from `std::vector<std::unique_ptr<fair_group>>` to `boost::container::static_vector<fair_group, 2>` so we need to update the gdb script accordingly to reflect this change, and to avoid the nested try-except blocks, we switch to a `while` statement to simplify the code structure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18165	2024-04-04 11:39:10 +03:00
Anna Stuchlik	994f807bf6	docs: add the latest image info to GCP and Azure pages This commit adds image information for the latest patch release to the GCP and Azure deployment page. The information now replaces the reference to the Download Center so that the user doesn't have to jump to another website. Fixes https://github.com/scylladb/scylladb/issues/18144 Closes scylladb/scylladb#18168	2024-04-04 11:24:39 +03:00
Kefu Chai	64b8bb239f	api/storage_service: throw if table is not found when move tablets `database::find_column_family()` throws no_such_column_family if an unknown ks.cf is fed to it. and we call into this function without checking for the existence of ks.cf first. since "/storage_service/tablets/move" is a public interface, we should translate this error to a better http error. in this change, we check for the existence of the given ks.cf, and throw an exception so that it can be caught by seastar::httpd::routers, and converted to an HTTP error. Fixes #17198 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17217	2024-04-04 11:23:52 +03:00
Pavel Emelyanov	590f0329ae	test: Test how tablets are copied between nodes This patches the previously introduced test by introducing the 'action' test paramter and tweaking the final checking assertions around tablet replicas read from system.tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:22:57 +03:00
Pavel Emelyanov	28964ba5fe	test: Add sanity test for tablet migration It just checks that after api call to move_tablet the resulting replica is in expected state. This test will be later expanded to check for rebuild transition. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:22:31 +03:00
Pavel Emelyanov	79ad760e95	api: Add method to add replica to a tablet The new API submits rebuild transition with new replicas set to be old (current) replicas plus the provided one. It looks and acts like the move_tablet API call with several changes: - lacks the "source" replica argument - submits "rebuild" transition kind - cross racks checks are not performed The 'force' argument is inherited from move_tablet, but is unused now and is left for future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:22:16 +03:00
Tomasz Grabiec	1a839bcb36	main: Skip tablet metadata loading in maintenance mode If system.tablets is corrupted, the node would not boot in maintenance mode, which is needed to fix system.tablets. Closes scylladb/scylladb#17990	2024-04-04 09:20:09 +03:00
Pavel Emelyanov	b0cba57e29	tablet: Make leaving replica optional When getting leaving replica from from tablet info and transition info, the getter code assumes that this replica always exists. It's not going to be the case soon, so make the return value be optional. There are four places that mess with leaving replica: - stream tablet handler: this place checks that the leaving replica is _not_ current host. If leaving replica is missing, the check should pass - cleanup tablet handler: this place checks that the leaving replica _is_ current host. If leaving replica is missing, the check should fail as well - topology coordinator: it gets leaving replica to call cleanup on. If leaving replica is missing, the cleanup call is short-circuited to succeed immediately - load-stats calculator: it checks if the leaving replica is self. This check is not patched as it's automatically satisfied by std::optional comparison operator overload for wrapped type Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:03:36 +03:00
Michał Chojnowski	8147ab69ac	row_cache_test: avoid a throw in external_updater In test_exception_safety_of_update_from_memtable, we have a potential throw from external_updater. external_updater is supposed to be infallible. Scylla currently aborts when an external_updater throws, so a throw from there just fails the test. This isn't intended. We aren't testing external_updater in this test. Fixes #18163 Closes scylladb/scylladb#18171	2024-04-03 23:22:08 +02:00
Piotr Dulikowski	baae811142	Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail. Fixes https://github.com/scylladb/scylladb/issues/17736 Closes scylladb/scylladb#18039 * github.com:scylladb/scylladb: auth: keep auth version in scylla_local auth: coroutinize service::start	2024-04-03 12:25:56 +02:00
Kefu Chai	e2f3fed373	service: qos: fix a typo s/accesor/accessor/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18124	2024-04-03 10:33:54 +02:00
Raphael S. Carvalho	12714a4123	locator: Avoid tablet map lookup on every write for getting replicas We can cache tablet map in erm, to avoid looking it up on every write for getting write replicas. We do that in tablet_sharder, but not in tablet erm. Tablet map is immutable in the context of a given erm, so the address of the map is stable during erm lifetime. This caught my attention when looking at perf diff output (comparing tablet and vnode modes). It also helps when erm is called again on write completion for checking locality, used for forwarding info to the driver if needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18158	2024-04-03 10:28:04 +02:00
Botond Dénes	d43670046b	test/lib: random_schema: disallow boolean_type in keys They result in poor distribution and poor cardinality, interfering with tests which want to generate N partitions or rows. Fixes: #17821 Closes scylladb/scylladb#17856	2024-04-03 09:52:36 +03:00
Botond Dénes	2cb5dcabf7	docs/dev/maintainer.md: document another exceptions to rule no.0 Maintainers are also allowed to commit their own backport PR. They are allowed to backport their own code, opening a PR to get a CI run for a backport doesn't change this. Closes scylladb/scylladb#17727	2024-04-03 09:51:19 +03:00
Botond Dénes	6771c646c4	tools/scylla-nodetool: fix typo: Fore -> For	2024-04-03 02:16:59 -04:00
Botond Dénes	b6db56286a	tools/scylla-nodetool: add doc link for getsstables and sstableinfo commands Just like all the other commands already have it. These commands didn't have documentation at the point where they were implemented, hence the missing doc link. The links don't work yet, but they will work once we release 6.0 and the current master documentation is promoted to stable.	2024-04-03 02:16:03 -04:00
Piotr Dulikowski	3ba7a4ead2	Merge 'api: upgrade_to_raft topology: add logging' from Benny Halevy Upgrading raft topology is an important api call that should be logged. When failed, it is also important to log the exception to get better visibility into why the call failed. Closes scylladb/scylladb#18143 * github.com:scylladb/scylladb: api: storage_service: upgrade_to_raft_topology: fixup indentation api: storage_service: upgrade_to_raft_topology: add logging	2024-04-03 07:00:10 +02:00
Pavel Emelyanov	8550a38a8b	cql: Reserve vector of column definitions in advance The vector in question is populted from the content of another map, so its size is known in advance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18155	2024-04-02 22:35:10 +03:00
Marcin Maliszkiewicz	562caaf6c6	auth: keep auth version in scylla_local Before the patch selection of auth version depended on consistent topology feature but during raft recovery procedure this feature is disabled so we need to persist the version somewhere to not switch back to v1 as this is not supported. During recovery auth works in read-only mode, writes will fail.	2024-04-02 19:04:21 +02:00
Benny Halevy	1272d736c0	api: storage_service: upgrade_to_raft_topology: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-02 20:02:51 +03:00
Benny Halevy	31026ae27f	api: storage_service: upgrade_to_raft_topology: add logging Upgrading raft topology is an important api call that should be logged. When failed, it is also important to log the exception to get better visibility into why the call failed. Indentation will be fixed in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-02 20:02:49 +03:00
Kefu Chai	15d59db98b	cql3: select_statement: include <ranges> we should include used header, to avoid compilation failures like: ``` cql3/statements/select_statement.cc:229:79: error: no member named 'filter' in namespace 'std::ranges::views' for (const auto& used_function : used_functions \| std::ranges::views::filter(not_native)) { ~~~~~~~~~~~~~~~~~~~~^ 1 error generated.` ``` if some of the included header drops its own `#include <optional>`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18145	2024-04-02 18:47:54 +03:00
Botond Dénes	2179bfc40d	Merge 'Relax initialization of virtual tables' from Pavel Emelyanov It now happens in initialize_virtual_tables(), but this function is split into sub-calls and iterates over virtual tables map several times to do its work. This PR squashes it into a straightforward code which is shorter and, hopefully, easier to read. Closes scylladb/scylladb#18133 * github.com:scylladb/scylladb: virtual_tables: Open-code install_virtual_readers_and_writers() virtual_tables: Move readers setup loop into add_table() virtual_tables: Move tables creation loop into add_table() virtual_tables: Make add_tablet() a coroutine virtual_tables: Open-code register_virtual_tables()	2024-04-02 13:39:26 +03:00
Botond Dénes	469ff4f290	Merge 'repair: Load repair history in background' from Asias He Currently, we load the repair history during boot up. If the number of repair history entries is high, it might take a while to load them. In my test, to load 10M entries, it took around 60 seconds. It is not a must to load the entries during boot up. It is better to load them in the background to speed up the boot time. Fixes #17993 Closes scylladb/scylladb#17994 * github.com:scylladb/scylladb: repair: Load repair history in background repair: Abort load_history process in shutdown	2024-04-02 10:53:10 +03:00
Botond Dénes	fd12052c89	Update tools/java/ submodule * tools/java/ d61296dc...b810e8b0 (1): > do not include {dclocal_,}read_repair_chance if not enabled	2024-04-02 10:47:57 +03:00
Yaron Kaikov	fcdb80773e	github: sync-labels: run only in scylladb oss repo We currently support the sync-label only in OSS. Since Scylla-enterprise get all the commits from OSS repo, the sync-label is running and failing during checkout (since it's a private repo and should have different configuration) For now, let's limit the workflows for oss repo Closes scylladb/scylladb#18142	2024-04-02 10:45:17 +03:00
Botond Dénes	ffdd47c2b1	Merge 'Track and limit memory used by bloom filters' from Lakshmi Narayanan Sreethar Added support to track and limit the memory usage by sstable components. A reclaimable component of an SSTable is one from which memory can be reclaimed. SSTables and their managers now track such reclaimable memory and limit the component memory usage accordingly. A new configuration variable defines the memory reclaim threshold. If the total memory of the reclaimable components exceeds this limit, memory will be reclaimed to keep the usage under the limit. This PR considers only the bloom filters as reclaimable and adds support to track and limit them as required. The feature can be manually verified by doing the following : 1. run a single-node single-shard 1GB cluster 2. create a table with bloom-filter-false-positive-chance of 0.001 (to intentionally cause large bloom filter) 3. populate with tiny partitions 4. watch the bloom filter metrics get capped at 100MB The default value of the `components_memory_reclaim_threshold` config variable which controls the reclamation process is `.1`. This can also be reduced further during manual tests to easily hit the threshold and verify the feature. Fixes #17747 Closes scylladb/scylladb#17771 * github.com:scylladb/scylladb: test_bloom_filter.py: disable reclaiming memory from components sstable_datafile_test: add tests to verify auto reclamation of components test/lib: allow overriding available memory via test_env_config sstables_manager: support reclaiming memory from components sstables_manager: store available memory size sstables_manager: add variable to track component memory usage db/config: add a new variable to limit memory used by table components sstable_datafile_test: add testcase to verify reclamation from sstables sstables: support reclaiming memory from components	2024-04-02 10:40:52 +03:00
Amnon Heiman	803d414896	get_description.py: Make the Script a library This patch makes the get_description.py script easier to use by the documentation automation: 1. The script is now a library. 2. You can choose the output of the script, currently supported pipee and yml. You can still call the from the command line, like before, but you can also calls it from another python script. For example the folowing python script would generate the documentation for the metrics description of the ./alternator/ttl.cc file. ``` import get_description metrics = get_description.get_metrics_from_file("./alternator/ttl.cc", "scylla", get_description.get_metrics_information("metrics-config.yml")) get_description.write_metrics_to_file("out.yaml", metrics, "yml") ``` Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#18136	2024-04-02 10:07:11 +03:00
Botond Dénes	ea8478a3e7	scripts/open-coredump.sh: introduce --ci Coredumps coming from CI are produced by a commit, which is not available in the scylla.git repository, as CI runs on a merge commit between the main branch (master or enterprise) and the tested PR branch. Currently the script will attempt to checkout this commit and will fail as the commit hash is unrecognized. To work around this, add a --ci flag, which when used, will force the main branch to be checked out, instead of the commit hash. Closes scylladb/scylladb#18023	2024-04-02 09:27:52 +03:00
Kefu Chai	55d0ea48bd	test: randomized_nemesis_test: remove fmt::formatter for seastar::timed_out_error This reverts commit `97b203b1af`. since Seastar provides the formatter, it's not necessary to vendor it in scylladb anymore. Refs #13245 Closes scylladb/scylladb#18114	2024-04-02 09:25:51 +03:00
Benny Halevy	d5ac0c06b3	test_sstable_reversing_reader_random_schema: drop workaround for #9352 Issue #9352 was fixed about a year and a half ago so this workaround should not be needed anymore. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18121	2024-04-02 09:25:06 +03:00
Raphael S. Carvalho	29f9f7594f	replica: Kill table::storage_group_id_for_token() storage_group_id_for_token() was only needed from within tablet_storage_group_manager, so we can kill table::storage_group_id_for_token(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18134	2024-04-02 09:23:23 +03:00
Asias He	99b7ccfa8b	repair: Load repair history in background Currently, we load the repair history during boot up. If the number of repair history entries is high, it might take a while to load them. In my test, to load 10M entries, it took around 60 seconds. It is not a must to load the entries during boot up. It is better to load them in the background to speed up the boot time. Fixes #17993	2024-04-02 09:24:35 +08:00
Asias He	523895145d	repair: Abort load_history process in shutdown If the node is shutting down, there is no point to continue to load the repair history. Refs #17993	2024-04-02 09:24:35 +08:00
Lakshmi Narayanan Sreethar	d86505e399	test_bloom_filter.py: disable reclaiming memory from components Disabled reclaiming memory from sstable components in the testcase as it interferes with the false positive calculation. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	d261f0fbea	sstable_datafile_test: add tests to verify auto reclamation of components Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	169629dd40	test/lib: allow overriding available memory via test_env_config Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	a36965c474	sstables_manager: support reclaiming memory from components Reclaim memory from the SSTable that has the most reclaimable memory if the total reclaimable memory has crossed the threshold. Only the bloom filter memory is considered reclaimable for now. Fixes #17747 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	2ca4b0a7a2	sstables_manager: store available memory size The available memory size is required to calculate the reclaim memory threshold, so store that within the sstables manager. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	f05bb4ba36	sstables_manager: add variable to track component memory usage sstables_manager::_total_reclaimable_memory variable tracks the total memory that is reclaimable from all the SSTables managed by it. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	e8026197d2	db/config: add a new variable to limit memory used by table components A new configuration variable, components_memory_reclaim_threshold, has been added to configure the maximum allowed percentage of available memory for all SSTable components in a shard. If the total memory usage exceeds this threshold, it will be reclaimed from the components to bring it back under the limit. Currently, only the memory used by the bloom filters will be restricted. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	e0b6186d16	sstable_datafile_test: add testcase to verify reclamation from sstables Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Lakshmi Narayanan Sreethar	4f0aee62d1	sstables: support reclaiming memory from components Added support to track total memory from components that are reclaimable and to reclaim memory from them if and when required. Right now only the bloom filters are considered as reclaimable components but this can be extended to any component in the future. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-04-02 01:37:47 +05:30
Pavel Emelyanov	627c5fdf04	virtual_tables: Open-code install_virtual_readers_and_writers() It's pretty short already and is naturally a "part" of initialize_virtual_tables(). Neither it installs writers any longer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:02:40 +03:00
Pavel Emelyanov	1d79cfc6cf	virtual_tables: Move readers setup loop into add_table() Similarly to previous patch, after virtual tables are registered the registry is iterated over to install virtual readers onto each entry. Again, this can happen at the time of registering, no need in dedicated loop for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:01:50 +03:00
Pavel Emelyanov	891e792717	virtual_tables: Move tables creation loop into add_table() Once virtual_tables map is populated, it's iterated over to create replica::table entries for each virtual table. This can be done in the same place where the virtual table is created, no need in dedicated loop for it nowadays. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:00:38 +03:00
Pavel Emelyanov	420ce3634f	virtual_tables: Make add_tablet() a coroutine Next patches will populate it with sleeping calls, this patch prepares for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 19:00:15 +03:00
Pavel Emelyanov	ddc6f9279f	virtual_tables: Open-code register_virtual_tables() It's naturally a "part" of initialize_virtual_tables(). Further patching gets possible with it being open-coded. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-01 18:59:18 +03:00
Kefu Chai	c5601a749e	github: sync_labels: do not error out if PR's cover letter is empty if a pull request's cover letter is empty, `pr.body` is None. in that case we should not try to pass it to `re.findall()` as the "string" parameter. otherwise, we'd get ``` TypeError: expected string or bytes-like object, got 'NoneType' ``` so, in this change, we just return an empty list if the PR in question has an empty cover letter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18125	2024-04-01 18:13:22 +03:00
Avi Kivity	88fb686d67	test: generate core dumps on crashes in debug clusters The cluster manager library doesn't set the asan/ubsan options to abort on error and create core dumps; this makes debugging much harder. Fix by preparing the environment correctly. Fixes scylladb/scylladb#17510 Closes scylladb/scylladb#17511	2024-04-01 18:11:41 +03:00
Kefu Chai	07c40f5600	github: sync_labels: use ${{}} expression syntax in "if" condition to ensure that the expression is evaluated properly. see https://docs.github.com/en/actions/creating-actions/metadata-syntax-for-github-actions#runsstepsif Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18127	2024-04-01 17:17:43 +03:00
Kefu Chai	1494499f90	github: sync_labels: checkout a single file not the whole repo what we need is but a script, so instead of checkout the whole repo, with all history for all tags and branches, let's just checkout a single file. faster this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18126	2024-04-01 17:15:50 +03:00
Yaron Kaikov	b8c705bc54	.github: sync-labels: fix pull request permissions when adding a label to a PR request we keep getting the following error message: ``` Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 93, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main sync_labels(repo, args.number, args.label, args.action, args.is_issue) File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 74, in sync_labels target.add_to_labels(label) File "/usr/lib/python3/dist-packages/github/Issue.py", line 321, in add_to_labels headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.GithubException: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/issues/labels#add-labels-to-an-issue"} ``` Based on https://docs.github.com/en/actions/security-guides/automatic-token-authentication#permissions-for-the-github_token. The maximum access for pull requests from public forked repositories is set to `read` Switching to `pull_request_target` to solve it Fixes: https://github.com/scylladb/scylladb/issues/18102 Closes scylladb/scylladb#18052	2024-04-01 17:11:35 +03:00
Pavel Emelyanov	46bbfc0c53	expression: Shorten making raw_value from FragmetedView The read_field is std::optional<View>. The raw_value::make_value() accepts managed_bytes_opt which is std::optional<manager_bytes>. Finally, there's std::optional<T>::optional(std::optional<U>&&) move constructor (and its copy-constructor peer). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18128	2024-04-01 16:52:18 +03:00
Benny Halevy	01fc1a9f66	schema_tables: std::move mutation into the mutation vector To save a copy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18120	2024-04-01 14:16:30 +03:00
Pavel Emelyanov	5427967f45	schema: Introduce build() && overload The schema_builder::build() method creates a copy of raw schema internaly in a hope that builder will be updated and be asked to build the resulting schema again (e.g. alternator uses this). However, there are places that build schema using temporary object once in a `return schema_builder().with_...().build()` manner. For those invocations copying raw schema is just waste of cycles. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18094	2024-04-01 14:00:42 +03:00
Nadav Har'El	b6854cbb21	Merge 'test/cql-pytest: match error message formated using {fmt} ' from Kefu Chai currently, our homebrew formatter formats `std::map` like ``` {{k1, v1}, {k2, v2}} ``` while {fmt} formats a map like: ``` {k1: v1, k2: v2} ``` and if the type of key/value is string, {fmt} quotes it, so a compaction strategy option is formatted like ``` {"max_threshold": "1"} ``` before switching the formatter to the ones supported by {fmt}, let's update the test to match with the new format. this should reduce the overhead of reviewing the change of switching the formatter. we can revert this change, and use a simpler approach after the change of formatter lands. Closes scylladb/scylladb#18058 * github.com:scylladb/scylladb: test/cql-pytest: match error message formated using {fmt} test/cql-pytest: extract scylla_error() for not allowed options test	2024-04-01 11:23:24 +03:00
Kefu Chai	fcf7ca5675	utils/logalloc: do not allocate memory in reclaim_timer::report() before this change, `reclaim_timer::report()` calls ```c++ fmt::format(", at {}", current_backtrace()) ``` which allocates a `std::string` on heap, so it can fail and throw. in that case, `std::terminate()` is called. but at that moment, the reason why `reclaim_timer::report()` gets called is that we fail to reclaim memory for the caller. so we are more likely to run into this issue. anyway, we should not allocate memory in this path. in this change, a dedicated printer is created so that we don't format to a temporary `std::string`, and instead write directly to the buffer of logger. this avoids the memory allocation. Fixes #18099 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18100	2024-04-01 11:01:52 +03:00
Botond Dénes	885cb2af07	utils/rjson: include tasklocal backtrace in rapidjson assert error message Currently, the error message on a failed RAPIDJSON_ASSERT() is this: rjson::error (JSON error: condition not met: false) This is printed e.g. when the code processing a json expects an object but the JSON has a different type. Or if a JSON object is missing an expected member. This message however is completely inadequate for determinig what went wrong. Change this to include a task-local backtrace, like a real assert failure would. The new error looks like this: rjson::error (JSON assertion failed on condition '{}' at: libseastar.so+0x56dede 0x2bde95e 0x2cc18f3 0x2cf092d 0x2d2316b libseastar.so+0x46b623) Closes scylladb/scylladb#18101	2024-03-29 18:41:54 +01:00
Pavel Emelyanov	41a1b1c0d0	move_tablets: Emplace mutations into vector, not push It's more applicable in this case. Also, built tablets mutations are casted to canonical_mutations, but when emplaced compiler can pick-up canonical_mutation(const mutation&) constructor and the cast is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18090	2024-03-29 15:21:49 +02:00
Kamil Braun	f5603ad9ca	Merge 'test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero' from Mikołaj Grzebieluch Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it. In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations. Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use. This can be seen in the following logs from a node: ``` ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams. ``` Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed `svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use. Write can fail with the following response from a node: ``` cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node. ``` Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation. Fixes scylladb/scylladb#17977 Reapply `b4144d14c6`. Closes scylladb/scylladb#17998 * github.com:scylladb/scylladb: test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables"	2024-03-29 12:52:31 +01:00
Tzach Livyatan	4930095d39	Docs: Fix link fro scylla-sstable.rst to /architecture/sstable/ Fix https://github.com/scylladb/scylladb/issues/18096 Closes scylladb/scylladb#18097	2024-03-29 10:48:24 +02:00
Piotr Dulikowski	57719ece4f	Merge 'main: reload service levels data accessor after join_cluster' from Marcin Maliszkiewicz Setting data accessor implicitly depends on node joining the cluster with raft leader elected as only then service level mutation is put into scylla_local table. Calling it after join_cluster avoids starting new cluster with older version only to immediately migrate it to the latest one in the background. Closes scylladb/scylladb#18040 * github.com:scylladb/scylladb: main: reload service levels data accessor after join_cluster service: qos: create separate function for reloading data accessor	2024-03-29 09:39:11 +01:00
Kefu Chai	1632fbbef9	test/cql-pytest: match error message formated using {fmt} currently, our homebrew formatter formats `std::map` like {{k1, v1}, {k2, v2}} while {fmt} formats a map like: {k1: v1, k2: v2} and if the type of key/value is string, {fmt} quotes it, so a compaction strategy option is formatted like {"max_threshold": "1"} before switching the formatter to the ones supported by {fmt}, let's update the test to match with the new format. this should reduce the overhead of reviewing the change of switching the formatter. we can revert this change, and use a simpler approach after the change of formatter lands. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-29 08:07:59 +08:00
Kefu Chai	8f47fcedf6	test/cql-pytest: extract scylla_error() for not allowed options test currently, our homebrew formatter formats `std::map` like {{k1, v1}, {k2, v2}} while {fmt} formats a map like: {k1: v1, k2: v2} and if the type of key/value is string, {fmt} quotes it, so a compaction strategy option is formatted like {"max_threshold": "1"} as we are switching to the formatters provided by {fmt}, would be better to support its convention directly. so, in this change, to prepare the change, before migrating to {fmt}, let's refactor the test to support both formats by extracting a helper to format the error message, so that we can change it to emit both formats. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-29 08:03:02 +08:00
Mikołaj Grzebieluch	1e2607563f	test.py: test_topology_upgrade_basic: make ring_delay_ms nonzero Test.py uses `ring_delay_ms = 0` by default. CDC creates generation's timestamp by adding `ring_delay_ms` to it. In this test, nodes are learning about new generations (introduced by upgrade procedure and then by node bootstrap) concurrently with doing writes that should go to these generations. Because of `ring_delay_ms = 0', the generation could have been committed when it should have already been in use. This can be seen in the following logs from a node: ``` ERROR 2024-03-22 12:29:55,431 [shard 0:strm] cdc - just learned about a CDC generation newer than the one used the last time streams were retrieved. This generation, or some newer one, should have been used instead (new generation's timestamp: 2024/03/22 12:29:54, last time streams were retrieved: 2024/03/22 12:29:55). The new generation probably arrived too late due to a network partition and we've made a write using the wrong set streams. ``` Creating writes during such a generation can result in assigning them a wrong generation or a failure. Failure may occur if it hits short time window when `generation_service::handle_cdc_generation(cdc::generation_id_v2)` has executed `svc._cdc_metadata.prepare(...)` but`_cdc_metadata.insert(...)` has not yet been executed. With a nonzero ring_delay_ms it's not a problem, because during this time window, the generation should not be in use. Write can fail with the following response from a node: ``` cdc: attempted to get a stream from a generation that we know about, but weren't able to retrieve (generation timestamp: 2024/03/22 12:29:54, write timestamp: 2024/03/22 12:29:55). Make sure that the replicas which contain this generation's data are alive and reachable from this node. ``` Set ring_delay_ms to 15000 for the debug mode and 5000 in other modes. Wait for the last generation to be in use and sleep one second to make sure there are writes to the CDC table in this generation. Fixes #17977	2024-03-28 17:13:43 +01:00
Botond Dénes	4c0dadee7c	Merge 'test: changes to prepare for dropping FMT_DEPRECATED_OSTREAM' from Kefu Chai this series includes test related changes to enable us to drop `FMT_DEPRECATED_OSTREAM` deprecated in {fmt} v10. Refs #13245 Closes scylladb/scylladb#18054 * github.com:scylladb/scylladb: test: unit: add fmt::formatter for test_data in tests test/lib: do not print with fmt::to_string() test/boost: print runtime_error using e.what()	2024-03-28 15:33:56 +02:00
Kamil Braun	33751f8f4e	Merge 'raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC' from Gleb * 'gleb/raft_snapshot_rpc-v3' of github.com:scylladb/scylla-dev: raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC Use correct limit for raft commands throughout the code.	2024-03-28 14:25:58 +01:00
Nadav Har'El	566223c34a	Merge ' tools/scylla-nodetool: repair: abort on first failed repair' from Botond Dénes When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well. To be able to test this, a bit of refactoring was needed, to be able to assert that `scylla-nodetool` doesn't make repair requests, beyond the expected ones. Refs: https://github.com/scylladb/scylla-cluster-tests/issues/7226 Closes scylladb/scylladb#17678 * github.com:scylladb/scylladb: tools/scylla-nodetool: repair: abort on first failed repair test/nodetool: nodetool(): add check_return_code param test/nodetool: nodetool(): return res object instead of just stdout test/nodetool: count unexpected requests	2024-03-28 14:02:29 +02:00
Botond Dénes	81bbfae77a	tools/scylla-nodetool: implement the checkAndRepairCdcStreams command Closes scylladb/scylladb#18076	2024-03-28 13:54:37 +02:00
Pavel Emelyanov	1adf16ce73	Merge 'network_topology_strategy: reallocate_tablets: support for rf changes' from Benny Halevy This series provides a reallocate_tablets function, that's initially called by allocate_tablets_for_new_table. The new allocation implementation is independent of vnodes/token ownership. Rather than using the natural_endpoints_tracker, it implements its own tracking based on dc/rack load (== number of replicas in rack), with the additional benefit that tablet allocation will balance the allocation across racks, using a heap structure, similar to the one we use to balance tablet allocation across shards in each node. reallocate_tablets may also be called with an optional parameter pointing the the current tablet_map. In this case the function either allocates more tablet replicas in datacenters for which the replication factor was increased, or it will deallocate tablet replicas from datacenters for which replication factor was decreased. The NetworkTopologyStrategy_tablets_test unit test was extended to cover replication factor changes. Closes scylladb/scylladb#17846 * github.com:scylladb/scylladb: network_topology_strategy: reallocate_tablets: consider new_racks before existing racks network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test network_topology_strategy: reallocate_tablets: support deallocation via rf change network_topology_startegy_test: tablets_test: randomize cases network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes network_topology_strategy_test: endpoints_check: strictly check rf for tablets network_topology_strategy_test: full_ring_check for tablets: drop unused options param	2024-03-28 11:19:11 +03:00
Kefu Chai	2bfc7324d4	mutation: friend fmt::formatter<atomic_cell> in atomic_cell_view GCC-14 rightly points out that the constructor of `atomic_cell_view` is marked private, and cannot be called from its formatter: ``` /usr/bin/g++-14 -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/var/ssd/scylladb -I/var/ssd/scylladb/build/gen -I/var/ssd/scylladb/seastar/include -I/var/ssd/scylladb/build/seastar/gen/include -I/var/ssd/scylladb/build/seastar/gen/src -g -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unused-parameter -ffile-prefix-map=/var/ssd/scylladb=. -march=westmere -Wstack-usage=40960 -U_FORTIFY_SOURCE -Wno-maybe-uninitialized -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -MF mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o.d -o mutation/CMakeFiles/mutation.dir/Debug/atomic_cell.cc.o -c /var/ssd/scylladb/mutation/atomic_cell.cc In file included from /var/ssd/scylladb/mutation/atomic_cell.cc:9: /var/ssd/scylladb/mutation/atomic_cell.hh: In member function ‘auto fmt::v10::formatter<atomic_cell>::format(const atomic_cell&, fmt::v10::format_context&) const’: /var/ssd/scylladb/mutation/atomic_cell.hh:413:67: error: ‘atomic_cell_view::atomic_cell_view(basic_atomic_cell_view<is_mutable>) [with mutable_view is_mutable = mutable_view::yes]’ is private within this context 413 \| return fmt::format_to(ctx.out(), "{}", atomic_cell_view(ac)); \| ^ /var/ssd/scylladb/mutation/atomic_cell.hh:275:5: note: declared private here 275 \| atomic_cell_view(basic_atomic_cell_view<is_mutable> view) \| ^~~~~~~~~~~~~~~~ ``` so, in this change, we make the formatter a friend of `atomic_cell_view`. since the operator<< was dropped, there is no need to keep its friend declaration around, so it is dropped in this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18081	2024-03-28 09:44:00 +02:00
Kefu Chai	99e743de9d	test: nodetool: match with vector printed by {fmt} our homebrew formatter for std::vector<string> formats like ``` {hello, world} ``` while {fmt}'s formatter for sequence-like container formats like ``` ["hello", "world"] ``` since we are moving to {fmt} formatters. and in this context, quoting the verbatim text makes more sense to user. let's support the format used by {fmt} as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18057	2024-03-28 09:35:37 +02:00
Kefu Chai	c2ffa0d813	bytes.hh: stop at '}' in fmt::formatter<fmt_hex> according to {fmt}'s document at https://fmt.dev/latest/api.html#formatting-user-defined-types, ``` // the range will contain "f} continued". The formatter should parse // specifiers until '}' or the end of the range. In this example the // formatter should parse the 'f' specifier and return an iterator // pointing to '}'. ``` so we should check for _both_ '}' and end of the range. when building scylla with {fmt} 10.2.1, it fails to build code like ```c++ fmt::format_to(out, "{}", fmt_hex(frag)) ``` as {fmt}'s compile-time checker fails to parse this format string along with given argument, as at compile time, ```c++ throw format_error("invalid group_size") ``` is executed. so, in this change, we check both '}' and the end of range. the change which introduced this formatter was `2f9dfba800` Refs `2f9dfba800` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18080	2024-03-28 08:58:36 +02:00
Marcin Maliszkiewicz	50e0032bca	test: auth: remove if not exists from auth cql statement They were added due to https://github.com/scylladb/python-driver/issues/296 but looks like it no longer reproduces. Change was tested with ./test.py -vv --repeat=100 test_auth to minimize chance of introducing flakiness. Closes scylladb/scylladb#18043	2024-03-28 06:06:45 +01:00
Raphael S. Carvalho	902c71bac8	storage_service: Fix undefined behavior in stream_tablet() correctness when constructing range_streamer depends on compiler evaluation order of params. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18079	2024-03-27 23:50:37 +01:00
Gleb Natapov	6e6aefc9ab	raft topology: drop RAFT_PULL_TOPOLOGY_SNAPSHOT RPC We have new, more generic, RPC to pull group0 mutations now: RAFT_PULL_SNAPSHOT. Use it instead of more specific RAFT_PULL_TOPOLOGY_SNAPSHOT one.	2024-03-27 19:18:45 +02:00
Gleb Natapov	c1dcf0fae7	Use correct limit for raft commands throughout the code. Raft uses schema commitlog, so all its limits should be derived from this commitlog segment size, but many places used regular commitlog size to calculate the limits and did not do what they really suppose to be doing.	2024-03-27 19:16:09 +02:00
Kamil Braun	c3989d8e03	Merge 'storage_service: keep subscription to raft topology feature alive' from Piotr Dulikowski The storage_service::track_upgrade_progress_to_topology_coordinator function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES cluster feature (among other things) before starting the raft_state_monitor_fiber. The wait is realized by passing a callback to feature::when_enabled which sets a shared_promise that is waited on by the tracking fiber. If the feature is already enabled, when_enabled will call the callback immediately. However, if it's not, then it will return a non-null listener_registration object - as long as it is alive, the callback is registered. The listener_registration object was not assigned to a variable which caused it to be destroyed shortly after the when_enabled function returns. Due to that, if upgrade was requested but the current group0 leader didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled right after boot, the upgrade would not start until the leader is changed to a node which has that cluster feature already enabled on boot. Moreover, the topology coordinator would not start on such a node until the node were rebooted. Fix the issue by assigning the subscription to a variable. Fixes: scylladb/scylladb#18049 Closes scylladb/scylladb#18051 * github.com:scylladb/scylladb: gms: feature: mark when_enabled(func) with nodiscard storage_service: keep subscription to raft topology feature alive	2024-03-27 14:46:43 +01:00
Avi Kivity	96a3544739	Merge 'alternator: reduce stall for Query and Scan with large pages' from Nadav Har'El Before this series, Alternator's Query and Scan operations convert an entire result page to JSON without yielding. For a page of maximum size (1MB) and tiny rows, this can cause a significant stall - the test included in this PR reported stalls of 14-26ms on my laptop. The problem is the describe_items() function, which does this conversion immediately, without yielding. This patch changes this function to return a future, and use a new result_set::visit_gently() method that does what visit() does, but with yields when needed. This PR improves #17995, but does not completely fix is as the stalls in the are not completely eliminated. But on my laptop it usually reduces the stalls to around 5ms. It appears that the remaining stalls some from other places not fixed in this PR, such as perhaps query_page::handle_result(), and will need to be fixed by additional patches. Closes scylladb/scylladb#18036 * github.com:scylladb/scylladb: alternator: reduce stall for Query and Scan with large pages result_set: introduce visit_gently() alternator: coroutinize do_query() function	2024-03-27 15:06:32 +02:00
Kamil Braun	404406e6a1	Merge ' test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables' from Botond Dénes Memtables are fickle, they can be flushed when there is memory pressure, if there is too much commitlog or if there is too much data in them. The tests in test_select_from_mutation_fragments.py currently assume data written is in the memtable. This is tru most of the time but we have seen some odd test failures that couldn't be understood. To make the tests more robust, flush the data to the disk and read it from the sstables. This means that some range scans need to filter to read from just a single mutation source, but this does not influence the tests. Also fix a use-after-return found when modifying the tests. This PR tentatively fixes the below issues, based on our best guesses on why they failed (each was seen just once): Fixes: scylladb/scylladb#16795 Fixes: scylladb/scylladb#17031 Closes scylladb/scylladb#17562 * github.com:scylladb/scylladb: test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables cql3: select_statement: mutation_fragments_select_statement: fix use-after-return	2024-03-27 13:21:19 +01:00
Botond Dénes	fdd5367974	Merge 'compaction: implement unchecked_tombstone_compaction' from Ferenc Szili This change adds the missing Cassandra compaction option unchecked_tombstone_compaction. Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only based on the value of tombstone_compaction_interval Fixes #1487 Closes scylladb/scylladb#17976 * github.com:scylladb/scylladb: removed forward declaration of resharding_descriptor compaction options and troubleshooting docs cql-pytest/test_compaction_strategy_validation.py test/boost/sstable_compaction_test.cc compaction: implement unchecked_tombstone_compaction	2024-03-27 13:56:02 +02:00
Kefu Chai	6bd0be71ab	mutation: add fmt::formatter for invalid_mutation_fragment_stream before this change, we rely on the default-generated fmt::formatter created from operator<<. but this depends on the `FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10. this change addresses the formatting with fmtlib < 10, and without `FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up, it defines formatter for classes derived from `std::exception`, so our formatter is only added when compiled with {fmt} < 10. in this change, `fmt::formatter<invalid_mutation_fragment_stream>` is added for backward compatibility with {fmt} < 10. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18053	2024-03-27 13:37:48 +02:00
Kefu Chai	d1e8d89ae2	doc: topology-over-raft: add transition_state to node state diagram in order to help the developers to understand the transitions of `node_state` and the `transition_state` on each of the `node_state`, in this change, the nested state machine diagram is added to the node state diagram. please note, instead of trying to merge similar states like bootstrapping and replacing into a single state, we keep them as separate ones, and replicate the nested state machine diagram in them as well, to be more clear. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18025	2024-03-27 12:16:35 +01:00
Andrei Chekun	0752ef1481	test: remove skip annotation for multi-DC test with 5 DCs with one node in each As a follow-up of the https://github.com/scylladb/scylladb/pull/17503 remove skip annotation for the multi-DC test with a reduced amount of the DC used in it: from 30 DCs to 5 DCs Closes scylladb/scylladb#17898	2024-03-27 13:13:13 +02:00
Michał Chojnowski	295b27a07b	cache_flat_mutation_reader: only call get_iterator_in_latest() when pointing at a row Calling `_next_row.get_iterator_in_latest()` is illegal when `_next_row` is not pointing at a row. In particular, the iterator returned by such call might be dangling. We have observed this to cause a use-after-free in the field, when a reverse read called `maybe_add_to_cache` after `_latest_it` was left dangling after a dead row removal in `copy_from_cache_to_buffer`. To fix this, we should ensure that we only call `_next_row.get_iterator_in_latest` is pointing at a row. Only the occurrences of this problem in `maybe_add_to_cache` are truly dangerous. As far as I can see, other occurrences can't break anything as of now. But we apply fixes to them anyway. Closes scylladb/scylladb#18046	2024-03-27 11:48:42 +01:00
Kamil Braun	d274f63d89	Merge 'Add support for "initial-token" parameter in raft mode' from Gleb Fixes scylladb/scylladb#17893 * 'gleb/initial-token-v1' of github.com:scylladb/scylla-dev: dht: drop unused parameter from get_random_bootstrap_tokens() function test: add test for initial_token parameter topology coordinator: use provided initial_token parameter to choose bootstrap tokens topology cooordinator: propagate initial_token option to the coordinator	2024-03-27 11:41:06 +01:00
Kefu Chai	71a519dee8	test: unit: add fmt::formatter for test_data in tests this change is created in same spirit of `d1c35f943d`. before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for test_data in radix_tree_stress_test.cc, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 18:18:32 +08:00
Kefu Chai	4f8c1a4729	test/lib: do not print with fmt::to_string() we should not format a variable unless we want to print it. in this case, we format `first_row` using `fmt::to_string()` to a string, and then insert the string to another string, despite that this is in a cold path, this is still a anti pattern -- both convoluted, and not performant. so let's just pass `first_row` to `format()`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 18:18:32 +08:00
Kefu Chai	d0ceb35e7e	test/boost: print runtime_error using e.what() before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. but fortunately, fmt v10 brings the builtin formatter for classes derived from `std::exception`. but before switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM` macro, we need to print out `std::runtime_error`. so far, we don't have a shared place for formatter for `std::runtime_error`. so we are addressing the needs on a case-by-case basis. in this change, we just print it using `e.what()`. it's behavior is identical to what we have now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 18:18:32 +08:00
Benny Halevy	8a77319cb7	network_topology_strategy: reallocate_tablets: consider new_racks before existing racks Allocate first from new (unpopulated) racks before allocating from racks that are already populated with replicas. Still, rotate both new and existing racks by tablet id to ensure fairness. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	c5ff060dee	network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test Test that tablet allocation is balanced across racks, nodes, and shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	4a7d57525e	network_topology_strategy: reallocate_tablets: support deallocation via rf change Add support for deallocating tablet replicas when the datacenter replication factor is decreased. We deallocate replicas back-to-front order to maintain replica pairing between the base table and its materialized views. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	1e8f8db5b8	network_topology_startegy_test: tablets_test: randomize cases Instead of deterministically testing a very small set of cases, randomize the the shard_count per node, the cluster topology and the NetworkTopologyStrategy options. The next patch will extend the test to also test `reallocate_tablets` with randomized options. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:24 +02:00
Benny Halevy	898cd1d404	network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership Base initial tablets allocation for new table on the dc/rack topology, rather then on the token ring, to remove the dependency on token ownership. We keep the rack ordinal order in each dc to facilitate in-rack pairing of base/view replica pairing, and we apply load-balancing principles by sorting the nodes in each rack by their load (number of tablets allocated to the node), and attempting to fill lease-loaded nodes first. This method is more efficient than circling the token ring and attemting to insert the endpoints to the natural_endpoint_tracker until the replication factor per dc is fulfilled, and it allows an easier way to incrementally allocate more replicas after rf is increased. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 12:06:21 +02:00
Botond Dénes	f70f04c240	tools/scylla-nodetool: repair: abort on first failed repair When repairing multiple keyspaces, bail out on the first failed keyspace repair, instead of continuing and reporting all failures at the end. This is what Origin does as well.	2024-03-27 05:46:18 -04:00
Mikołaj Grzebieluch	fa4193e09f	Reapply "test.py: adjust the test for topology upgrade to write to and read from CDC tables" This reverts commit `230f23004b`.	2024-03-27 10:39:01 +01:00
Benny Halevy	40a4b349bd	network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test Test that we attempting to allocate tablets throws an error when there are not enough nodes for the configured replication factor. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	f19dbb4ae5	network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions Using e.g. `BOOST_CHECK_EQUAL(endpoints.size(), total_rf)` rather than `BOOST_CHECK(endpoints.size() == total_rf)` prints a more detailed error message that includes the runtime valies, if it fails. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	93b6573a90	network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	c11ffd14cc	network_topology_strategy_test: endpoints_check: strictly check rf for tablets With tablet we want to verify that the number of replicas allocated per tablet per dc exactly matches the replication strategy per-dc replication factor options. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Benny Halevy	ffa5870758	network_topology_strategy_test: full_ring_check for tablets: drop unused options param Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-27 10:35:04 +02:00
Botond Dénes	764e9a344d	test/nodetool: nodetool(): add check_return_code param When set to false, the returncode is not checked, this is left to the caller. This in turn allows for checking the expected and unexpected requests which is not checked when the nodetool process fails. This is used by utils._do_check_nodetool_fails_with(), so that expected and unexpected requests are checked even for failed invocations. Some test need adjustment to the stricter checks.	2024-03-27 04:18:19 -04:00
Botond Dénes	8f3b1db37f	test/nodetool: nodetool(): return res object instead of just stdout So callers have access to stderr, return code and more. This causes some churn in the test, but the changes are mechanical.	2024-03-27 04:18:19 -04:00
Kefu Chai	2e2c3a5fea	locator: fix a typo in comment s/Substracts/Subtracts/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18048	2024-03-27 10:15:18 +02:00
Piotr Dulikowski	e76817502f	gms: feature: mark when_enabled(func) with nodiscard The feature::when_enabled function takes a callback and returns a listener_registration object. Unless the feature were enabled right from the start, the listener_registration will be non-null and will keep the callback registered until the registration is destroyed. If the registration is destroyed before the feature is enabled, the callback will not be called. It's easy to make a mistake and forget to keep the returned registration alive - especially when, in tests, the feature is enabled early in boot, because in that case when_enabled calls the callback immediately and returns a null object instead. In order to prevent issues with prematurely dropped listener_registration in the future, mark feature::when_enabled with the [[nodiscard]] attribute.	2024-03-27 08:55:45 +01:00
Piotr Dulikowski	7ea6e1ec0a	storage_service: keep subscription to raft topology feature alive The storage_service::track_upgrade_progress_to_topology_coordinator function is supposed to wait on the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES cluster feature (among other things) before starting the raft_state_monitor_fiber. The wait is realized by passing a callback to feature::when_enabled which sets a shared_promise that is waited on by the tracking fiber. If the feature is already enabled, when_enabled will call the callback immediately. However, if it's not, then it will return a non-null listener_registration object - as long as it is alive, the callback is registered. The listener_registration object was not assigned to a variable which caused it to be destroyed shortly after the when_enabled function returns. Due to that, if upgrade was requested but the current group0 leader didn't have the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature enabled right after boot, the upgrade would not start until the leader is changed to a node which has that cluster feature already enabled on boot. Moreover, the topology coordinator would not start on such a node until the node were rebooted. Fix the issue by assigning the subscription to a variable.	2024-03-27 08:55:45 +01:00
Botond Dénes	2d12db81cf	Merge 'docs: document nodetool {getsstables, sstableinfo}' from Kefu Chai these two subcommands are provided by cassandra, and are also implemented natively in scylla. so let's document them. Closes scylladb/scylladb#17982 * github.com:scylladb/scylladb: docs/operating-scylla: document nodetool sstableinfo docs/operating-scylla: document nodetool getsstables	2024-03-27 09:04:55 +02:00
Botond Dénes	4d98b7d532	test/nodetool: count unexpected requests We currently check at the end of each test, that all expected requests set by the test were consumed. This patch adds a mechanism to count unexpected requests -- requests which didn't match any of the expected ones set by the test. This can be used to asser that nodetool didn't make any request to the server, beyond what the test expected it to do. Before this patch, requests like this would only be noticed by the test, if the response of 404/500 caused nodetool to fail, which is not always the case.	2024-03-27 02:39:28 -04:00
Kefu Chai	8af9c735f2	docs/operating-scylla: document nodetool sstableinfo Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 07:29:24 +08:00
Kefu Chai	da90e368dc	docs/operating-scylla: document nodetool getsstables Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-27 07:29:24 +08:00
Pavel Emelyanov	04370dc8a4	tablets: Introduce substract_sets() There are several places in code that calculate replica sets associated with specific tablet transision. Having a helper to substract two sets improves code readability. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18033	2024-03-26 23:33:06 +02:00
Tomasz Grabiec	042a4b7627	Merge 'tablets: add warning on CREATE KEYSPACE' from Nadav Har'El The CDC feature is not supported on a table that uses tablets (Refs https://github.com/scylladb/scylladb/issues/16317), so if a user creates a keyspace with tablets enabled they may be surprised later (perhaps much later) when they try to enable CDC on the table and can't. The LWT feature always had issue Refs https://github.com/scylladb/scylladb/issues/5251, but it has become potentially more common with tablets. So it was proposed that as long as we have missing features (like CDC or LWT), every time a keyspace is created with tablets it should output a warning (a bona-fide CQL warning, not a log message) that some features are missing, and if you need them you should consider re-creating the keyspace without tablets. This PR does this. The warning text which will be produced is the following (obviously, it can be improved later, as we perhaps find more missing features): > "Tables in this keyspace will be replicated using tablets, and will > not support the CDC feature (issue https://github.com/scylladb/scylladb/issues/16317) and LWT may suffer from > issue https://github.com/scylladb/scylladb/issues/5251 more often. If you want to use CDC or LWT, please drop > this keyspace and re-create it without tablets, by adding AND TABLETS > = {'enabled': false} to the CREATE KEYSPACE statement." This PR also includes a test - that checks that this warning is is indeed generated when a keyspace is created with tablets (either by default or explicitly), and not generated if the keyspace is created without tablets. It also fixes existing tests which didn't like the new warning. Fixes https://github.com/scylladb/scylladb/issues/16807 Closes scylladb/scylladb#17318 * github.com:scylladb/scylladb: tablets: add warning on CREATE KEYSPACE test/cql-pytest: fix guadrail tests to not be sensitive to more warnings	2024-03-26 20:04:07 +01:00
Gleb Natapov	9b00847f31	dht: drop unused parameter from get_random_bootstrap_tokens() function	2024-03-26 18:43:31 +02:00
Gleb Natapov	ed534fde8f	test: add test for initial_token parameter Test that configured tokens are used and tokens collision is detected.	2024-03-26 18:43:31 +02:00
Gleb Natapov	06952ec6dd	topology coordinator: use provided initial_token parameter to choose bootstrap tokens Use the same logic as with gossiper to choose bootstrap tokens in case initial_token parameters is not empty.	2024-03-26 18:43:25 +02:00
Gleb Natapov	6ab78e13c6	topology cooordinator: propagate initial_token option to the coordinator The patch propagates initial_token option to the topology coordinator where it is added to join request parameter.	2024-03-26 18:43:16 +02:00
Marcin Maliszkiewicz	e1fea3af6b	main: reload service levels data accessor after join_cluster Setting data accessor implicitly depends on node joining the cluster with raft leader elected as only then service level mutation is put into scylla_local table. Calling it after join_cluster avoids starting new cluster with older version only to immediately migrate it to the latest one in the background.	2024-03-26 17:36:03 +01:00
Nadav Har'El	ba97fd98a3	alternator: reduce stall for Query and Scan with large pages Before this patch, Alternator's Query and Scan operations convert an entire result page to JSON without yielding. For a page of maximum size (1MB) and tiny rows, this can cause a significant stall - the test included in this patch reported stalls of 14-26ms on my laptop. The problem is the describe_items() function, which does this conversion immediately, without yielding. This patch changes this function to return a future, and use the result_set::visit_gently() method instead of visit() that yields when needed. This patch does not completely eliminate stalls in the test, but on my laptop usually reduces them to around 5ms. It appears that the remaining stalls some from other places not fixed in this PR, such as perhaps query_page::handle_result(), and will need to be fixed by additional patches. The test included in this patch is useful for manually reproducing the stall, but not useful as a regression test: It is slow (requiring a couple of seconds to set up the large partition) and doesn't check anything, and can't even report the stall without modifying the test runner. So the test is skipped by default (using the "veryslow" marker) and can be enabled and run manually by developers who want to continue working on #17995. Refs #17995. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-26 18:32:45 +02:00
Nadav Har'El	e24629a635	result_set: introduce visit_gently() Whereas result_set::visit() passes all the rows the the visitor and returns void, this patch introduces a method visit_gently() that returns a future, and may yield before visiting each row. This method will be used in the next patch to allow Alternator, which used visit() to convert a result_set into JSON format, to potentially yield between rows and avoid large stalls when converting a large result set. Note that I decided to add the yield points in the new visit_gently() between rows - not between each cell. Many places in our code (including the memtable) already work on a per-row basis and do not yield in the middle of a row, so it won't really be helpful to do this either. But if we'll want, we will still be able to modify visit_gently() later to be even more gentle, and yield between individual cells. The callers shouldn't know or care. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-26 18:32:11 +02:00
Marcin Maliszkiewicz	ff17a29b54	service: qos: create separate function for reloading data accessor Scylla's main is already too long, it's better to contain this logic inside qos service.	2024-03-26 17:26:19 +01:00
Avi Kivity	4ddf82e58b	treewide: don't #include "gms/feature_service.hh" from other headers feature_service.hh is a high-level header that integrates much of the system functionality, so including it in lower-level headers causes unnecessary rebuilds. Specifically, when retiring features. Fix by removing feature_service.hh from headers, and supply forward declarations and includes in .cc where needed. Closes scylladb/scylladb#18005	2024-03-26 15:31:18 +02:00
Nadav Har'El	c146b1224c	alternator: coroutinize do_query() function This patch changes the do_query() function, used to implement Alternator's Query and Scan operations, from using continuations to be a coroutine. There are no functional changes in this patch, it's just the necessary changes to convert the function to a coroutine. The new code is easier to read and less indented, but more importantly, will be easier to extend in the next patch to add additional awaits in the middle of the function. In additional to the obvious changes, I also had to rename one local variable (as the same name was used in two scopes), and to convert pass-by-rvalue-reference to pass-by-value (these parameters are moved by the caller, and moreover the old code had to move them again to a continuation, so there is no performance penalty in this change). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-26 15:08:08 +02:00
Pavel Emelyanov	8bf9098663	system_keyspace: Consolidate node-state vs tokens checks When loading topology state, nodes are checked to have or not to have "tokens" field set. The check is done based on node state and it's spread across the loading method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17957	2024-03-26 14:55:46 +02:00
Avi Kivity	22b8065a89	Merge 'tools/scylla-nodetool: implement the getsstables and sstableinfo commands' from Botond Dénes These commands manage to avoid detection because they are not documented on https://opensource.docs.scylladb.com/stable/operating-scylla/nodetool.html. They were discovered when running dtests, with ccm tuned to use the native nodetool directly. See https://github.com/scylladb/scylla-ccm/pull/565. The commands come with tests, which pass with both the native and Java nodetools. I also checked that the relevant dtests pass with the native implementation. Closes scylladb/scylladb#17979 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the sstableinfo command tools/scylla-nodetool: implement the getsstables command tools/scylla-nodetool: move get_ks_cfs() to the top of the file test/nodetool: rest_api_mock.py: add expected_requests context manager	2024-03-26 14:38:00 +02:00
Kefu Chai	101fdfc33a	test: randomized_nemesis_test: add fmt::formatter for stop_crash::result_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. also, it's impossible to partial specialize a nested type of a template class, we cannot specialize the `fmt::formatter` for `stop_crash<M>::result_type`, as a workaround, a new type is added. in this change, * define a new type named `stop_crash_result` * add fmt::formatter for `stop_crash_result` * define stop_crash::result_type as an alias of `stop_crash_result` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18018	2024-03-26 12:18:55 +02:00
Pavel Emelyanov	67c2a06493	api: Rename (un)set_server_load_sstable -> (un)set_server_column_family The method sets up column family API, not load-sstables one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18022	2024-03-26 12:16:08 +02:00
Botond Dénes	7edbf189e6	Merge 'treewide: use fmt::to_string() to transform a UUID to std::string and drop UUID::to_sstring()' from Kefu Chai `UUID::to_sstring()` relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10. in this series, all callers of `UUID::to_sstring()` are switched to `fmt::to_string()`, and this function is dropped. Closes scylladb/scylladb#18020 * github.com:scylladb/scylladb: utils: UUID: drop UUID::to_sstring() treewide: use fmt::to_string() to transform a UUID to std::string	2024-03-26 12:14:56 +02:00
Kefu Chai	f3532cbaa0	db: commitlog: use fmt::streamed() to print segment before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change: * add `format_as()` for `segment` so we can use it as a fallback after upgrading to {fmt} v10 * use fmt::streamed() when formatting `segment`, this will be used the intermediate solution before {fmt} v10 after dropping `FMT_DEPRECATED_OSTREAM` macro Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18019	2024-03-26 12:13:29 +02:00
Botond Dénes	cd9589ec78	Merge 'test.py: Sanitize test list creation' from Pavel Emelyanov To create the list of tests to run there's a loop that fist collects all tests from suits, then filters the list in two ways -- excludes opt-out-ed lists (disabled and matching the skip pattern) or leaves there only opt-in-ed (those, specified as positional arguments). This patch keeps both list-checking code close to each other so that the intent is explicitly clear. Closes scylladb/scylladb#17981 * github.com:scylladb/scylladb: test.py: Give local variable meaningful name test.py: Sanitize test list creation	2024-03-26 12:09:49 +02:00
Marcin Maliszkiewicz	5844d66676	auth: coroutinize service::start	2024-03-26 09:45:15 +01:00
Patryk Jędrzejczak	13fecd4e36	raft topology: decommission: allow only in NORMAL mode We move the mode check so that the raft-based decommission also uses it. Without this check, it hanged after the drain operation instead of instantly failing. `test_decommission_after_drain_is_invalid` was failing because of it with the raft-based topology enabled. Fixes scylladb/scylladb#16761 Closes scylladb/scylladb#18000	2024-03-26 08:52:26 +01:00
Botond Dénes	f0ff23492f	Merge 'Sanitize topology suites' skiplists' from Pavel Emelyanov There are skip_in_<mode> lists in suite yaml that tells test.py not to run the test from it. This PR sanitizes these lists in two ways. First, to skip pytests the skip-decorators are much more convenient, e.g. because they show the reason why the test is skipped. Also, if a test wants to be opt-in-ed for some mode only, it's opt-out-ed in all other lists instead. There's run_in_<mode> list in suite for that. Closes scylladb/scylladb#17964 * github.com:scylladb/scylladb: test: Do not duplicate test name in several skip-lists test: Mark tests with skip_mode instead of suite skip-list	2024-03-26 08:24:57 +02:00
Kefu Chai	a047178fe7	utils: UUID: drop UUID::to_sstring() this function is not used anymore, and it relies on `FMT_DEPRECATED_OSTREAM` to generated `fmt::formatter` for `UUID`, and this feature is deprecated in {fmt} v9, and dropped in {fmt} v10. in this change, let's drop this member function. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Kefu Chai	1b859e484f	treewide: use fmt::to_string() to transform a UUID to std::string without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is implemented using its `fmt::formatter`, which is not available at the end of this header file where `UUID` is defined. at this moment, we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can still use `UUID::to_sstring()`, but in {fmt} v10, we cannot. so, in this change, we change all callers of `UUID::to_sstring()` to `fmt::to_string()`, so that we don't depend on `FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Wojciech Mitros	9789a3dc7c	mv: keep semaphore units alive until the end of a remote view update When a view update has both a local and remote target endpoint, it extends the lifetime of its memory tracking semaphore units only until the end of the local update, while the resources are actually used until the remote update finishes. This patch changes the semaphore transferring so that in case of both local and remote endpoints, both view updates share the units, causing them to be released only after the update that takes longer finishes. Fixes #17890 Closes scylladb/scylladb#17891	2024-03-25 19:43:58 +02:00
Tzach Livyatan	6702ba3664	Docs: Add link from migration tools page to nodetool refresh load and stream Closes scylladb/scylladb#18006	2024-03-25 17:47:05 +02:00
Botond Dénes	1ea7b408db	tools/scylla-nodetool: implement the sstableinfo command	2024-03-25 11:29:30 -04:00
Botond Dénes	50da93b9c8	tools/scylla-nodetool: implement the getsstables command	2024-03-25 11:29:30 -04:00
Botond Dénes	f51061b198	tools/scylla-nodetool: move get_ks_cfs() to the top of the file So it can be used by all commands.	2024-03-25 11:29:30 -04:00
Botond Dénes	4ff88b848c	test/nodetool: rest_api_mock.py: add expected_requests context manager So tests and fixtures can use `with expected_requests():` and have cleanup be taken care for them. I just discovered that some tests do not clean up after themselves and when running all tests in a certain order, this causes unrelated tests to fail. Fix by using the context everywhere, getting guaranteed cleanup after each test.	2024-03-25 11:29:30 -04:00
Petr Gusev	7c84fc527b	test_invalid_user_type_statements: increase raft timeout The test creates ut4 with a lot of fields, this may take a while in debug builds, to avoid raft operation timeout set the threshold to some big value. The error injector is disabled in release builds, so this settings won't be applied to them. This shouldn't be a problem since release builds are fast enough, even on arm. Fixes scylladb/scylladb#17987 Closes scylladb/scylladb#17997	2024-03-25 14:52:16 +01:00
Ferenc Szili	8bb7a18de2	test/cql-pytest: add --omit-scylla-output to Cassandra test runs Currently, the tests in test/cql-pytest can be run against both ScyllaDB and Cassandra. Running the test for either will first output the test results, and subsequently print the stdout output of the process under test. Using the command line option --omit-scylla-output it is possible to disable this print for Scylla, but it is not possible for tests run against Cassandra. This change adds the option to suppress output for Cassandra tests, too. By default, the stdout of the Cassandra run will still be printed after the test results, but this can now be disabled with --omit-scylla-output Closes scylladb/scylladb#17996	2024-03-25 15:14:45 +02:00
Pavel Emelyanov	16343b3edc	test: Do not duplicate test name in several skip-lists Some tests are only run in dev mode for some reason. For such tests there's run_in_dev list, no need in putting it in all the non-dev skip_in_... ones. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:56:37 +03:00
Pavel Emelyanov	90dfcec86b	test: Mark tests with skip_mode instead of suite skip-list There are many tests that are skipped in release mode becuase they rely on error-injection machinery which doesn't work in release mode. Most of those tests are listed in suite's skip_in_release, but it's not very handy, mainly because it's not clear why the test is there. The skip_mode decoration is much more convenient. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:56:37 +03:00
Pavel Emelyanov	2c90aeb5ee	test.py: Give local variable meaningful name Rename t to testname as it's more informative Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:53:48 +03:00
Pavel Emelyanov	b2f5b63aaa	test.py: Sanitize test list creation To create the list of tests to run there's a loop that fist collects all tests from suits, then filters the list in two ways -- excludes opt-out-ed lists (disabled and matching the skip pattern) or leaves there only opt-in-ed (those, specified as positional arguments). This patch keeps both list-checking code close to each other so that the intent is explicitly clear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-25 14:53:20 +03:00
Kamil Braun	69bf962522	Merge 'allow changing snitch with topology over raft' from Gleb Fixes scylladb/scylladb#17513 * 'gleb/raft-snitch-change-v3' of github.com:scylladb/scylla-dev: doc: amend snitch changing procedure to work with raft test: add test to check that snitch change takes effect. raft topology: update rack/dc info in topology state on reboot if changed	2024-03-25 10:41:39 +01:00
Gleb Natapov	3b272c5650	doc: amend snitch changing procedure to work with raft To change snitch with raft all nodes need to be started simultaneously since each node will try to update its state in the raft and for that quorum is required.	2024-03-25 11:31:30 +02:00
Beni Peled	eecfd164ff	Remove docs-amplify-enhanced github-workflow Since we implemented the CI-Docs on pkg, there is no need for this workflow Closes scylladb/scylladb#17908	2024-03-25 11:30:06 +02:00
Kefu Chai	e97ae6b0de	raft: server: print pointee of `server_impl::_fsm` before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, instead of printing the `unique_ptr` instance, we print the pointee of it. since `server_impl` uses pimpl paradigm, `_fsm` is always valid after `server_impl::start()`, we can always deference it without checking for null. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17953	2024-03-25 11:20:34 +02:00
Botond Dénes	ff421168d0	Update tools/jmx submodule * tools/jmx 3257897a...53696b13 (1): > dist/debian: do not use substvar of ${shlib:Depends}	2024-03-25 11:16:25 +02:00
Gleb Natapov	d7adf26a56	test: add test to check that snitch change takes effect. The test creates two node cluster with default snitch (SimpleSnitch) and checks that dc and rack names are as expected. Then it changes the config to use GossipingPropertyFileSnitch with different names, restart nodes and check that now peers table has new names.	2024-03-25 10:41:49 +02:00
Kefu Chai	4eabf8b617	topology_coordinator: add fmt::formatter for wait_for_ip_timeout before this change, we rely on the default-generated fmt::formatter created from operator<<. but this depends on the `FMT_DEPRECATED_OSTREAM` macro which is not respected in {fmt} v10. this change addresses the formatting with fmtlib < 10, and without `FMT_DEPRECATED_OSTREAM` defined. please note, in {fmt} v10 and up, it defines formatter for classes derived from `std::exception`, so our formatter is only added when compiled with {fmt} < 10. in this change, `fmt::formatter<service::wait_for_ip_timeout>` is added for backward compatibility with {fmt} < 10. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17955	2024-03-25 10:39:38 +02:00
Kefu Chai	5d59dd585f	configure.py: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated at the first run of `configure.py`, once these files are around, they are not updated despite that `SCYLLA_VERSION_GEN` does not generate them as long as the release string retrieved from git sha1 is identical the one stored in `SCYLLA-RELEASE-FILE`, because we don't rerun `SCYLLA_VERSION_GEN` at all. but the pain is, when performing incremental build, like other built artifacts, these generated files stay with the build directory, so even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE keeps the same -- it still contains the original git sha1 when it was created. this could leads to confusion if developer or even our CI perform incremental build using the same workspace and build directory, as the built scylla executables always report the same version number. in this change, we always rebuilt the said SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct ninja to re-stat the output files, see https://ninja-build.org/manual.html#ref_rule, in order to avoid unnecessary rebuild. so the downside is that `SCYLLA_VERSION_GEN` is executed every time we run `ninja` even if all targets are updated. but the upside is that the release number reported by scylla is accurate even if we perform incremental build. also, since we encode the product, version and release stored in the above files in the generated `build.ninja` file, in this change, these three files are added as dependencies of `build.ninja`, so that this file is regenerated if any of them is newer than `build.ninja`. Fixes #8255 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17974	2024-03-25 10:29:42 +02:00
Kefu Chai	5bc6d83f3b	build: cmake: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE before this change, SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE is generated when CMake generate `build.ninja` for the first time, once these files are around, they are not updated anymore. despite that `SCYLLA_VERSION_GEN` does not generate them as long as the release string retrieved from git sha1 is identical the one stored in `SCYLLA-RELEASE-FILE`, because we don't rerun `SCYLLA_VERSION_GEN` at all. but the pain is, when performing incremental build, like other built artifacts, these generated files stay with the build directory, so even if the sha1 of the workspace changes, the SCYLLA-RELEASE-FILE keeps the same -- it still contains the original git sha1 when it was created. this could leads to confusion if developer or even our CI perform incremental build using the same workspace and build directory, as the built scylla executables always report the same version number. in this change, we always rebuilt the said SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE files, and instruct CMake to regenerate `build.ninja` if any of these files is updated. Fixes #17975 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17983	2024-03-25 10:28:28 +02:00
Kefu Chai	0eb990fbf6	.github: skip "raison" when running codespell workflow codespell workflow checks for misspellings to identify common mispellings. it considers "raison" in "raison d'etre" (the accent mark over "e" is removed , so the commit message can be encoded in ASCII), to the misspelling of "reason" or "raisin". apparently, the dictionary it uses does not include les mots francais les plus utilises. so, in this change, let's ignore "raison" for this very use case, before we start the l10n support of the document. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17985	2024-03-25 09:51:12 +02:00
Kefu Chai	0713c324d4	cql3: provide fmt::formatter for cql3_type::raw only for {fmt} < 10 since we already have `format_as()` for `cql3_type::raw`, there is no need to provide `cql3_type::raw` if the tree is compiled with {fmt} >= 10, otherwise compiler is not able to figure out which one to match, see the errror at the end of this commit message. so, in this change, we only provide the specialized `fmt::formatter` for `cql3_type::raw` when {fmt} < 10. this should address the FTBFS with {fmt} >= 10. ``` /usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1040:25: error: ambiguous partial specializations of 'formatter<cql3::cql3_type::raw>' 1040 \| = __bool_constant<__is_constructible(_Tp, _Args...)>; \| ^ /usr/lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/type_traits:1046:16: note: in instantiation of template type alias '__is_constructible_impl' requested here 1046 \| : public __is_constructible_impl<_Tp, _Args...> \| ^ /usr/include/fmt/core.h:1420:13: note: in instantiation of template class 'std::is_constructible<fmt::formatter<cql3::cql3_type::raw>>' requested here 1420 \| !has_formatter<T, Context>::value))> \| ^ /usr/include/fmt/core.h:1421:22: note: while substituting prior template arguments into non-type template parameter [with T = cql3::cql3_type::raw] 1421 \| FMT_CONSTEXPR auto map(const T&) -> unformattable_pointer { \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1422 \| return {}; \| ~~~~~~~~~~ 1423 \| } \| ~ ``` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17986	2024-03-25 09:49:40 +02:00
Yaron Kaikov	cb2c69a3f7	github: mergify: Add Ref to original PR When openning a backport PR, adding a reference to the original PR. This will be used later for updating the original PR/issue once the backport is done (with different label) Closes scylladb/scylladb#17973	2024-03-25 08:12:47 +02:00
Raphael S. Carvalho	6bdb456fad	sstables_loader: Fix loader when write selector is previous during tablet migration The loader is writing to pending replica even when write selector is set to previous. If migration is reverted, then the writes won't be rolled back as it assumes pending replicas weren't written to yet. That can cause data resurrection if tablet is later migrated back into the same replica. NOTE: write selector is handled correctly when set to next, because get_natural_endpoints() will return the next replica set, and none of the replicas will be considered leaving. And of course, selector set to both is also handled correctly. Fixes #17892. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17902	2024-03-24 01:20:50 +01:00
Kamil Braun	230f23004b	Revert "test.py: adjust the test for topology upgrade to write to and read from CDC tables" This reverts commit `b4144d14c6`. The test is flaky and blocks next promotions.	2024-03-22 17:25:04 +01:00
Petr Gusev	2a5f5d1948	test_fencing: fix flakiness To cause the stale topology exception the test reads the version from the last bootstrapped host and assigns its decremented value to version and fence_version fields of system.topology. The test assumes that version == fence_version here, if version is greater than fence_version we won't get state topology exception in this setup. Tablet balancer can break this -- it may increment the version after the last node is bootstrapped. Fix this by disabling the tablet balancer earlier. fixes scylladb/scylladb#17807 Closes scylladb/scylladb#17940	2024-03-22 12:49:13 +01:00
Piotr Dulikowski	f23f8f81bf	Merge 'Raft-based service levels' from Michał Jadwiszczak This patch introduces raft-based service levels. The difference to the current method of working is: - service levels are stored in `system.service_levels_v2` - reads are executed with `LOCAL_ONE` - writes are done via raft group0 operation Service levels are migrated to v2 in topology upgrade. After the service levels are migrated, `key: service_level_v2_status; value: data_migrated` is written to `system.scylla_local` table. If this row is present, raft data accessor is created from the beginning and it handles recovery mode procedure (service levels will be read from v2 table even if consistent topology is disabled then) Fixes #17926 Closes scylladb/scylladb#16585 * github.com:scylladb/scylladb: test: test service levels v2 works in recovery mode test: add test for service levels migration test: add test for service levels snapshot test:topology: extract `trigger_snapshot` to utils main: create raft dda if sl data was migrated service:qos: store information about sl data migration service:qos: service levels migration main: assign standard service level DDA before starting group0 service:qos: fix `is_v2()` method service:qos: add a method to upgrade data accessor test: add unit_test_raft_service_levels_accessor service:storage_service: add support for service levels raft snapshot service:qos: add abort_source for group0 operations service:qos: raft service level distributed data accessor service:qos: use group0_guard in data accessor cql3:statements: run service level statements on shard0 with raft guard test: fix overrides in unit_test_service_levels_accessor service:qos: fix indentation service:qos: coroutinize some of the methods db:system_keyspace: add `SERVICE_LEVELS_V2` table service:qos: extract common service levels' table functions	2024-03-22 11:51:53 +01:00
Ferenc Szili	b50a9f9bab	removed forward declaration of resharding_descriptor resharding_descriptor has been removed in `e40aa042` in 2020	2024-03-22 11:35:10 +01:00
Ferenc Szili	93395e2ebe	compaction options and troubleshooting docs Added unchecked_tombstone_compaction descrition to compaction docs. Added section to troubleshooting pointless compaction.	2024-03-22 11:26:17 +01:00
Ferenc Szili	455959b80e	cql-pytest/test_compaction_strategy_validation.py Adds the check for the wording of the validation error on invalid values of unchecked_tombstone_compaction	2024-03-22 11:22:56 +01:00
Ferenc Szili	5c0de3b097	test/boost/sstable_compaction_test.cc Checks if the tombstone_threshold value will be ignored if unchecked_tombstone_compaction is set to true	2024-03-22 11:21:21 +01:00
Kamil Braun	9979adb670	Merge 'topology_coordinator: do not clear unpublished CDC generation's data' from Patryk Jędrzejczak In this PR, we ensure unpublished CDC generation's data is never removed, which was theoretically possible. If it happened, it could cause problems. CDC generation publisher would then try to publish the generation with its data removed. In particular, the precondition of calling `_sys_ks.read_cdc_generation` wouldn't be satisfied. We also add a test that passes only after the fix. However, this test needs to block execution of the CDC generation publisher's loop twice. Currently, error injections with handlers do not allow it because handlers always share received messages. Apart from the first created handler, all handlers would be instantly unblocked by a message from the past that has already unblocked the first handler. This seems like a general limitation that could cause problems in the future, so in this PR, we extend injections with handlers to solve it once and for all. We add the `share_messages` parameter to the `inject` (with handler) function. Depending on its value, handlers will share messages (as before) or not. Fixes scylladb/scylladb#17497 Closes scylladb/scylladb#17934 * github.com:scylladb/scylladb: topology_coordinator: clean_obsolete_cdc_generations: fix log topology_coordinator: do not clear unpublished CDC generation's data topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages error_injection: allow injection handlers to not share messages	2024-03-22 11:20:26 +01:00
Ferenc Szili	5a65169f46	compaction: implement unchecked_tombstone_compaction This change adds the missing Cassandra compaction option unchecked_tombstone_compaction. Setting this option to true causes the compaction to ignore tombstone_threshold, and decide whether to do a compaction only on the value of tombstone_compaction_interval	2024-03-22 11:19:43 +01:00
Kamil Braun	4359a1b460	Merge 'raft timeouts: better handling of lost quorum' from Petr Gusev In this PR we add timeouts support to raft groups registry. We introduce the `raft_server_with_timeouts` class, which wraps the `raft::server` add exposes its interface with additional `raft_timeout` parameter. If it's set, the wrapper cancels the `abort_source` after certain amount of time. The value of the timeout can be specified either in the `raft_timeout` parameter, or the default value can be set in `the raft_server_with_timeouts` class constructor. The `raft_group_registry` interface is extended with `group0_with_timeouts()` method. It returns an instance of `raft_server_with_timeouts` for group0 raft server. The timeout value for it is configured in `create_server_for_group0`. It's one minute by default and can be overridden for tests with `group0-raft-op-timeout-in-ms` parameter. The new api allows the client to decide whether to use timeouts or not. In this PR we are reviewing all the group0 call sites and add `raft_timeout` if that makes sense. The general principle is that if the code is handling a client request and the client expects a potential error, we use timeouts. We don't use timeouts for background fibers (such as topology coordinator), since they wouldn't add much value. The only thing the background fiber can do with a timeout is to retry, and this will have the same end effect as not having a timeout at all. Fixes scylladb/scylladb#16604 Closes scylladb/scylladb#17590 * github.com:scylladb/scylladb: migration_manager: use raft_timeout{} storage_service::join_node_response_handler: use raft_timeout{} storage_service::start_upgrade_to_raft_topology: use raft_timeout{} storage_service::set_tablet_balancing_enabled: use raft_timeout{} storage_service::move_tablet: use raft_timeout{} raft_check_and_repair_cdc_streams: use raft_timeout{} raft_timeout: test that node operations fail properly raft_rebuild: use raft_timeout{} do_cluster_cleanup: use raft_timeout{} raft_initialize_discovery_leader: use raft_timeout{} update_topology_with_local_metadata: use with_timeout{} raft_decommission: use raft_timeout{} raft_removenode: use raft_timeout{} join_node_request_handler: add raft_timeout to make_nonvoters and add_entry raft_group0: make_raft_config_nonvoter: add raft_timeout parameter raft_group0: make_raft_config_nonvoter: add abort_source parameter manager_client: server_add with start=false shouldn't call driver_connect scylla_cluster: add seeds parameter to the add_server and servers_add raft_server_with_timeouts: report the lost quorum join_node_request_handler: add raft_timeout{} for start_operation skip_mode: add platform_key auth: use raft_timeout{} raft_group0_client: add raft_timeout parameter raft_group_registry: add group0_with_timeouts utils: add composite_abort_source.hh error_injection: move api registration to set_server_init error_injection: add inject_parameter method error_injection: move injection_name string into injection_shared_data error_injection: pass injection parameters at startup	2024-03-22 10:45:33 +01:00
Botond Dénes	f02baef871	Merge 'test/lib: sstable::test_env consolidate and reduce header footprint' from Avi Kivity Reduce the sprawl of sstables::test_env in .cc and .hh files, to ease maintenance and reduce recompilations. Closes scylladb/scylladb#17965 * github.com:scylladb/scylladb: test: sstables::test_env: complete pimplification test/lib: test_env: move test_env::reusable_sst() to test_services.cc	2024-03-22 11:26:12 +02:00
Botond Dénes	8b2856339a	Merge 'github: sync-labels: use more descriptive name for workflow' from Kefu Chai * rename `sync_labels.yaml` to `sync-labels.yaml` * use more descrptive name for workflow Closes scylladb/scylladb#17971 * github.com:scylladb/scylladb: github: sync-labels: use more descriptive name for workflow github: sync_labels: rename sync_labels to sync-labels	2024-03-22 10:01:56 +02:00
David Garcia	0375faa6aa	docs: add experimental tag Closes scylladb/scylladb#17633	2024-03-22 09:53:30 +02:00
Patryk Wrobel	28ed20d65e	scylla-nodetool: adjust effective ownership handling When a keyspace uses tablets, then effective ownership can be obtained per table. If the user passes only a keyspace, then /storage_service/ownership/{keyspace} returns an error. This change: - adds an additional positional parameter to 'status' command that allows a user to query status for table in a keyspace - makes usage of /storage_service/ownership/{keyspace} optional to avoid errors when user tries to obtain effective ownership of a keyspace that uses tablets - implements new frontend tests in 'test_status.py' that verify the new logic Refs: scylladb#17405 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17827	2024-03-22 09:51:57 +02:00
Yaron Kaikov	407d25e47b	[mergify] delete backport branch after merge Since those branches clutter the branch search UI and we don't need them after merging Closes scylladb/scylladb#17961	2024-03-22 09:51:22 +02:00
Calle Wilund	7e09517433	Update seastar submodule Submodule seastar 6b7b16a8a3..cd8a9133d2: > abort_source: add fmt::formatter for abort_requested_exception > memory: Ensure thread locals etc are minimally initialized even with non-seastar reactor options for alloc > rpc: add fmt::formatter for rpc::error classes and rpc::optional > Merge 'Adding Metrics family config' from Amnon Heiman > util: add fmt::formatter for bool_class<Tag> > util/bool_class: use the default-generated comparison operators > membarrier: cooperatively serialize calls to sys_membarrier > Merge 'build: relax the version constraint for Protobuf' from Kefu Chai > tls: add fmt::formatter for tls::subject_alt_name > memory.cc: Fix static init fiasco in system malloc override diff --git a/seastar b/seastar index 6b7b16a8a3..cd8a9133d2 160000 --- a/seastar +++ b/seastar @@ -1 +1 @@ -Subproject commit 6b7b16a8a329d831b94fdd4b41f6f55b260e9afd +Subproject commit cd8a9133d2c02f63dbd578d882cf7333a427e194 Closes scylladb/scylladb#17865	2024-03-22 09:49:23 +02:00
Kefu Chai	7ebdfdb705	github: sync-labels: use more descriptive name for workflow "label-sync" is not very helpful for developers to understand what this workflow is for. the "name" field of a job shows in the webpage on github of the pull request against which the job is performed, so if the author or reviewer checks the status of the pull request, he/she would notice these names aside of the workflow's name. for this very job, what we have now is: ``` Sync labels / label-sync ``` after this change it will be: ``` Sync labels / Synchronize labels between PR and the issue(s) fixed by it ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-22 10:41:20 +08:00
Kefu Chai	af879759b9	github: sync_labels: rename sync_labels to sync-labels to be more consistent with other github workflows Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-22 10:31:31 +08:00
Michał Jadwiszczak	c0853b461c	test: test service levels v2 works in recovery mode	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	c551a85cda	test: add test for service levels migration	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	5811f696be	test: add test for service levels snapshot	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	bf3aed1ecb	test:topology: extract `trigger_snapshot` to utils The function was defined separately in a few tests.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	a08918a320	main: create raft dda if sl data was migrated Create `raft_service_levels_distributed_data_accessor` if service levels were migrated to v2 table. This supports raft recovery mode, as service levels will be read from v2 table in the mode.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	dab909b1d1	service:qos: store information about sl data migration Save information whether service levels data was migrated to v2 table. The information is stored in `system.scylla_local` table. It's written with raft command and included in raft snapshot.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	2917ec5d51	service:qos: service levels migration Migrate data from `system_distributes.service_levels` to `system.service_levels_v2` during raft topology upgrade. Migration process reads data from old table with CL ALL and inserts the data to the new table via raft.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	36c9afda99	main: assign standard service level DDA before starting group0 `topology_state_load()` is responsible for upgrading service level DDA, so the standard DDA has to be assigned before to be upgraded	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	159a6a2169	service:qos: fix `is_v2()` method	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	fd32f5162a	service:qos: add a method to upgrade data accessor	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d403bdfdd5	test: add unit_test_raft_service_levels_accessor Raft service level data accessor with logic simillar to `unit_test_service_levels_accessor` to avoid sleeps in boost tests.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	8bbeea0169	service:storage_service: add support for service levels raft snapshot Include mutations from `system.service_levels_v2` in `raft_snapshot`.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	d5fa0747d7	service:qos: add abort_source for group0 operations Add mechanism to abort ongoing group0 operations while draining service_level_controller or leaving the cluster.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	7e61bbb0d5	service:qos: raft service level distributed data accessor `raft_service_level_distributed_data_accessor` works this way: - on read path it reads service levels from `SYSTEM.SERVICE_LEVELS_V2` table with CL = LOCAL_ONE - on write path it starts group0 operation and it makes the change using raft command	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	71c07addb5	service:qos: use group0_guard in data accessor Adjust service_level_controller and service_level_controller::service_level_distributed_data_accessor interfaces to take `group0_guard` while adding/altering/dropping a service level.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	da82c5f0b0	cql3:statements: run service level statements on shard0 with raft guard To migrate service levels to be raft managed, obtain `group0_guard` to be able to pass it to service_level_controller's methods. Using this mechanism also automatically provides retries in case of concurrent group0 operation.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	674286b868	test: fix overrides in unit_test_service_levels_accessor	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	c0e22fcb9c	service:qos: fix indentation	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	1f3c6b2813	service:qos: coroutinize some of the methods Functions: - `service_level_controller::set_distributed_service_level()` - `service_level_controller::drop_distributed_service_level()` - `service_level_controller::drain()` Coroutines increase readability of those functions.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	8e242f5acd	db:system_keyspace: add `SERVICE_LEVELS_V2` table The table has the same schema as `system_distributed.service_levels`. However it's created entirely at once (unlike old table which creates base table first and then it adds other columns) because `system` tables are local to the node.	2024-03-21 23:14:57 +01:00
Michał Jadwiszczak	990c5e7dd0	service:qos: extract common service levels' table functions Getting a service level(s) will be done the same way in raft-based service levels as it's done in standard service levels, so those funtions are extracted to reused it.	2024-03-21 23:14:57 +01:00
Avi Kivity	b530dc1e3b	test: sstables::test_env: complete pimplification sstables::test_env uses the pimpl idiom, but incompletely. This prevents reaping some of the benefits. Complete the pimplification: - the `impl` nested struct is moved out-of-line - all non-template member functions are moved out-of-line - a destructor is declared and defined out-of-line - the move constructor is also defined (necessary after the destructor is defined) After this, we can forward-declare more components.	2024-03-21 22:29:01 +02:00
Avi Kivity	d745929b44	test/lib: test_env: move test_env::reusable_sst() to test_services.cc test_env implementation is scattered around two .cc, concentrate it in test_services.cc, which happens to be the file that doesn't cause link errors. Move toc_filename with it, as it is its only caller and it is static.	2024-03-21 22:21:02 +02:00
Kefu Chai	900b56b117	raft_group0: print runtime_error by printing e.what() before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. but fortunately, fmt v10 brings the builtin formatter for classes derived from `std::exception`. but before switching to {fmt} v10, and after dropping `FMT_DEPRECATED_OSTREAM` macro, we need to print out `std::runtime_error`. so far, we don't have a shared place for formatter for `std::runtime_error`. so we are addressing the needs on a case-by-case basis. in this change, we just print it using `e.what()`. it's behavior is identical to what we have now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17954	2024-03-21 19:43:52 +02:00
Avi Kivity	f0ca5e5a08	Merge 'treewide: add fmt::formatter for exception types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter` is added for following types for backward compatibility with {fmt} < 10: * `utils::bad_exception_container_access` * `cdc::no_generation_data_exception` * classes derived from `sstables::malformed_sstable_exception` * classes derived from `cassandra_exception` Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17944 * github.com:scylladb/scylladb: cdc: add fmt::formatter for exception types in data_dictionary.hh utils: add fmt::formatter for utils::bad_exception_container_access sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception exceptions: add fmt::formatter for classes derived from cassandra_exception cdc: add fmt::formatter for cdc::no_generation_data_exception	2024-03-21 18:44:37 +02:00
Botond Dénes	f9104fbfa9	tools/toolchain/image: update python driver (implicit) Fixes: #17662 Closes scylladb/scylladb#17956	2024-03-21 18:27:40 +02:00
Andrei Chekun	7de28729e7	test: change maintenance socket location to /tmp Fixes #16912 By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility. Closes scylladb/scylladb#17941	2024-03-21 18:22:21 +02:00
Patryk Jędrzejczak	33a0864aaa	topology_coordinator: clean_obsolete_cdc_generations: fix log We use a non-inclusive bound here, so the log was incorrect.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	27465a00e0	topology_coordinator: do not clear unpublished CDC generation's data In this commit, we ensure unpublished CDC generation's data is never removed, which was theoretically possible. If it happened, it could cause problems. CDC generation publisher would then try to publish the generation with its data removed. In particular, the precondition of calling `_sys_ks.read_cdc_generation` wouldn't be satisfied. We also add a test that passes only after the fix.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	f45aebeee2	topology_coordinator: cdc_generation_publisher_fiber injection: make handlers share messages In the following commit, we add a test that needs to block the CDC generation publisher's loop twice. We allow it in this commit by making handlers of the `cdc_generation_publisher_fiber` injection share messages. From now on, unblocking every step of the loop will require sending a new message from the test. This change breaks the test already using the `cdc_generation_publisher_fiber` injection, so we adjust the test.	2024-03-21 14:35:38 +01:00
Patryk Jędrzejczak	c5c4cc7d00	error_injection: allow injection handlers to not share messages For a single injection, all created injection handlers share all received messages. In particular, it means that one received message unblocks all handlers waiting for the first message. This behavior is often desired, for example, if multiple fibers execute the injected code and we want to unblock them all with a single message. However, there is a problem if we want to block every execution of the injected code. Apart from the first created handler, all handlers will be instantly unblocked by messages from the past that have already unblocked the first handler. In one of the following commits, we add a test that needs to block the CDC generation publisher's loop twice. Since it looks like there are no good workarounds for this arguably general problem, we extend injections with handlers in a way that solves it. We introduce the new `share_messages` parameter. Depending on its value, handlers will share messages or not. The details are described in the new comments in `error_injection.hh`. We also add some basic unit tests for the new funcionality.	2024-03-21 14:35:38 +01:00
Petr Gusev	ae0ec19537	migration_manager: use raft_timeout{} Checking all the call sites of the migration manager shows that all of them are initiated by user requests, not background activities. Therefore, we add a global raft_timeout{} here.	2024-03-21 16:35:48 +04:00
Petr Gusev	294e1ff464	storage_service::join_node_response_handler: use raft_timeout{} This function is called as part of a node join procedure initiated by the user, so having timeouts here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	e53189dcdc	storage_service::start_upgrade_to_raft_topology: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	6e350fb580	storage_service::set_tablet_balancing_enabled: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	22d7c62c3c	storage_service::move_tablet: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	dafd5d0160	raft_check_and_repair_cdc_streams: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	ca21362ade	raft_timeout: test that node operations fail properly	2024-03-21 16:35:48 +04:00
Petr Gusev	dcc275cb0f	raft_rebuild: use raft_timeout{} This is a user-requested operation, so having a timeout here makes sense. The test will be provided in a subsequent commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	8deb06647a	do_cluster_cleanup: use raft_timeout{} This function is called from the REST api, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	d5d2f04cd6	raft_initialize_discovery_leader: use raft_timeout{} This function is called as part of a node startup procedure, so a timeout may be useful. As outlined in the comment, there is no valid way we can lose quorum here, but some subsystems may just become unreasonably slow for various reasons, so we nonetheless use raft_timeout{} here.	2024-03-21 16:35:48 +04:00
Petr Gusev	f498cfae79	update_topology_with_local_metadata: use with_timeout{} This function is called as part of a node startup procedure, so having a timeout here makes sense.	2024-03-21 16:35:48 +04:00
Petr Gusev	f1f77b4882	raft_decommission: use raft_timeout{} This is a user requested operation, so having a timeout here makes sense. The test will be provided in a subsequent commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	aabcc0852a	raft_removenode: use raft_timeout{} This is a user requested operation, so having a timeout here makes sense. The test will be provided in a subsequent commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	099c756ba1	join_node_request_handler: add raft_timeout to make_nonvoters and add_entry We also add a specific test_quorum_lost_during_node_join. It exercises the case when the quorum is lost after start_operation but before these methods are called.	2024-03-21 16:35:48 +04:00
Petr Gusev	0ad852e323	raft_group0: make_raft_config_nonvoter: add raft_timeout parameter We'll use this parameter in subsequent commits.	2024-03-21 16:35:48 +04:00
Petr Gusev	ce7fb39750	raft_group0: make_raft_config_nonvoter: add abort_source parameter	2024-03-21 16:35:48 +04:00
Petr Gusev	99ddffac32	manager_client: server_add with start=false shouldn't call driver_connect If the server is not started there is not point in starting the driver, it would fail because there are no nodes to connect to. On the other hand, we should connect the driver in server_start() if it's not connected yet.	2024-03-21 16:35:48 +04:00
Petr Gusev	3f6cf38dd5	scylla_cluster: add seeds parameter to the add_server and servers_add If this parameter is set, we use its value for the scylla.yaml of the new node, otherwise we use IPs of all running nodes as before. We'll need this parameter in subsequent commits to restrict the communication between nodes. We remove default values for _create_server_add_data parameters since they are redundant - in the two call sites we pass all of them.	2024-03-21 16:35:48 +04:00
Petr Gusev	99419d5964	raft_server_with_timeouts: report the lost quorum In this commit we extend the timeout error message with additional context - if we see that there is no quorum of available nodes, we report this as the most likely cause of the error. We adjust the test by adding this new information to the expected_error. We need raft-group-registry-fd-threshold-in-ms to make _direct_fd threshold less than group0-raft-op-timeout-in-ms.	2024-03-21 16:35:48 +04:00
Petr Gusev	1a3fc58438	join_node_request_handler: add raft_timeout{} for start_operation In the test, we use the group0-raft-op-timeout-in-ms parameter to reduce the timeout to one second so as not to waste time. The join_node_request_handler method contains other group0 calls which should have timeouts (make_nonvoters and add_entry). They will be handled in a separate commit.	2024-03-21 16:35:48 +04:00
Petr Gusev	854531ae8e	skip_mode: add platform_key In subsequent commits we are going to add test.py tests for raft_timeout{} feature. The problem is that aarch/debug configuration is infamously slow. Timeout settings used in tests work for all platforms but aarch/debug. In this commit we extend the skip_mode attribute with the platform_key property. We'll use @skip_mode('debug', platform_key='aarch64') to skip the tests for this specific configuration. The tests will still be run for aarch64/release.	2024-03-21 16:35:43 +04:00
Yaron Kaikov	5bd6b4f4c2	github: sync_labels: match issue number with better pattern Seen in https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535 ``` python .github/scripts/sync_labels.py --repo scylladb/scylladb --number 17309 --action labeled --label backport/none shell: /usr/bin/bash -e {0} env: GITHUB_TOKEN: *** Found issue number: ('', '', '15465') Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 9[3](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:3), in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line 89, in main sync_labels(repo, args.number, args.label, args.action, args.is_issue) File "/home/runner/work/scylladb/scylladb/.github/scripts/sync_labels.py", line [7](https://github.com/scylladb/scylladb/actions/runs/8357352616/job/22876314535#step:5:8)1, in sync_labels target = repo.get_issue(int(pr_or_issue_number)) TypeError: int() argument must be a string, a bytes-like object or a real number, not 'tuple' Error: Process completed with exit code 1. ``` Fixing the pattern to catch all GitHub supported close keywords as describe in https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword Fixed: https://github.com/scylladb/scylladb/issues/17917 Fixed: https://github.com/scylladb/scylladb/issues/17921 Closes scylladb/scylladb#17920	2024-03-21 14:25:24 +02:00
Petr Gusev	e335b17190	auth: use raft_timeout{} The only place where we don't need raft_timeout{} is migrate_to_auth_v2 since it's called from topology_coordinator fiber. All other places are called from user context, so raft_timeout{} is used.	2024-03-21 16:12:51 +04:00
Petr Gusev	cebf87bf59	raft_group0_client: add raft_timeout parameter In this commit we add raft_timeout parameter to start_operation and add_entry method. We fix compilation in default_authorizer.cc, bind_front doesn't account for default parameter values. We should use raft_timeout{} here, but this is for another commit.	2024-03-21 16:12:51 +04:00
Petr Gusev	3d1b94475f	raft_group_registry: add group0_with_timeouts In this commit we add timeouts support to raft groups registry. We introduce the raft_server_with_timeouts class, which wraps the raft::server add exposes its interface with additional raft_timeout parameter. If it's set, the wrapper cancels the abort_source after certain amount of time. The value of the timeout can be specified in the raft_timeout parameter, or the default value can be set in the raft_server_with_timeouts class constructor. The raft_group_registry interface is extended with get_server_with_timeouts(group_id) and group0_with_timeouts() methods. They return an instance of raft_server_with_timeouts for a specified group id or for group0. The timeout value for it is configured in create_server_for_group0. It's one minute by default, can be overridden for tests with group0-raft-op-timeout-in-ms parameter. The new api allows the client to decide whether to use timeouts or not. In subsequent commits we are going to review all group0 call sites and add raft_timeout if that makes sense. The general principle is that if the code is handling a client request and the client expects a potential error, we use timeouts. We don't use timeouts for background fibers (such as topology coordinator), since they won't add much value. The only thing the background fiber can do with a timeout is to retry, and this will have the same effect as not having a timeout at all.	2024-03-21 16:12:51 +04:00
Petr Gusev	532a720c3d	utils: add composite_abort_source.hh	2024-03-21 16:12:51 +04:00
Kefu Chai	8dacec589d	cql3: add fmt::formatter for cql3_type and cql3_type::raw before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<>` is added for following classes: * `cql3::cql3_type` * `cql3::cql3_type::raw` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17945	2024-03-21 14:08:50 +02:00
Nadav Har'El	fdeb14b468	Merge 'scylla-nodetool: make command-line parsing fully compatible with the legacy nodetool' from Botond Dénes There was two more things missing: * Allow global options to be positioned before the operation/command option (https://github.com/scylladb/scylladb/issues/16695) * Ignore JVM args (https://github.com/scylladb/scylladb/issues/16696) This PR fixes both. With this, hopefully we are fully compatible with nodetool as far as command line parsing is concerned. After this PR goes in, we will need another fix to tools/java/bin/nodetool-wrapper, to allow user to benefit from this fix. Namely, after this PR, we can just try to invoke scylla-nodetool first with all the command-line args as-is. If it returns with exit-code 100, we fall back to nodetool. We will not need the current trick with `--help $1`. In fact, this trick doesn't work currently, because `$1` is not guaranteed to be the command in the first place. In addition to the above, this PR also introduces a new option, to help us in the switching process. This is `--rest-api-port`, which can also be provided as `-Dcom.scylladb.apiPort`. When provided, this option takes precedence over `--port\|-p`. This is intended as a bridge for `scylla-ccm`, which currently provides the JMX port as `--port`. With this change, it can also provided the REST API port as `-Dcom.scylladb.apiPort`. The legacy nodetool will ignore this, while the native nodetool will use it to connect to the correct REST API address. After the switch we can ditch these options. Fixes: https://github.com/scylladb/scylladb/issues/16695 Fixes: https://github.com/scylladb/scylladb/issues/16696 Refs: https://github.com/scylladb/scylladb/issues/16679 Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#17168 * github.com:scylladb/scylladb: tools/scylla-nodetool: add --rest-api-port option tools/scylla-nodetool: ignore JVM args tools/utils: make finding the operation command line option more flexible tools/utils: get_selected_operation(): remove alias param tools: add constant with current help command-line arguments	2024-03-21 14:06:45 +02:00
Pavel Emelyanov	c8fc43d169	test: Update topology_custom/suite::run_first list The recently added test_tablets_migration dominates with it run-time (10 minutes). Also update other tests, e.g. test_read_repair is not in top-7 for any mode, test_replace and test_raft_recovery_majority_loss are both not notably slower than most of other tests (~40 sec both). On the other hand, the test_raft_recovery_basic and test_group0_schema_versioning are both 1+ minute Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17927	2024-03-21 12:48:50 +01:00
Gleb Natapov	e26b0f34a0	raft topology: update rack/dc info in topology state on reboot if changed It is allowed to change a snitch after cluster is already running. Changing a snitch may cause dc and/or rack names to be changed and gossiper handles it by gossiping new names on restart. The patch changes raft mode to update the names on restart as well.	2024-03-21 12:44:12 +02:00
Andrei Chekun	a5455460d8	test: fix flakiness of the multi_dc tests The initial version used a redundant method, and it did not cover all cases. So that leads to the flakiness of the test that used this method. Switching to the cluster_con() method removes flakiness since it's written more robustly. Fixes scylladb/scylladb#17914 Closes scylladb/scylladb#17932	2024-03-21 11:17:22 +01:00
Asias He	9587352f13	repair: Invoke group0 read barrier in repair_tablets This allows the repair master to see all previous metadata changes. Refs #17658 Closes scylladb/scylladb#17942	2024-03-21 10:54:40 +01:00
Kamil Braun	4dfb7e3051	Merge 'storage_service::merge_topology_snapshot: handle big mutations' from Petr Gusev The group0 state machine calls `merge_topology_snapshot` from `transfer_snapshot`. It feeds it with `raft_topology_snapshot` returned from `raft_pull_topology_snapshot`. This snapshot includes the entire `system.cdc_generations_v3` table. It can be huge and break the commitlog `max_record_size` limit. The `system.cdc_generations_v3` is a single-partition table, so all the data is contained in one mutation object. To fit the commitlog limit we split this mutation into many smaller ones and apply them in separate `database::apply` calls. That means we give up the atomicity guarantee, but we actually don't need it for `system.cdc_generations_v3` and `system.topology_requests`. This PR fixes the dtest `update_cluster_layout_tests.py::TestLargeScaleCluster::test_add_many_nodes_under_load` Fixes scylladb/scylladb#17545 Closes scylladb/scylladb#17632 * github.com:scylladb/scylladb: test_cdc_generation_data: test snapshot transfer storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations mutation: add split_mutation function storage_service::merge_topology_snapshot: fix indentation	2024-03-21 10:50:03 +01:00
Avi Kivity	628017c810	test: sstables::test_env: mock sstables_registry sstables::test_env is intended for sstable unit tests, but to satisfy its dependency of an sstables_registry we instantiate an entire database. Remove the dependency by having a mock implementation of sstables_registry and using that instead. Closes scylladb/scylladb#17895	2024-03-21 10:19:46 +01:00
Tomasz Grabiec	baf12b0b2f	test: tablets: Avoid infinite loop in rebalance_tablets() If there is a bug in the tablet scheduler which makes it never converge for a given state of topology, rebalance_tablets() will never complete and will generate a huge amounts of logs. This patch adds a sanity limit so that we fail earlier. This was observed in one of the test_load_balancing_with_random_load runs in CI. Fixes scylladb/scylladb#17894. Closes scylladb/scylladb#17916	2024-03-21 10:19:46 +01:00
Kamil Braun	bc42a5a092	Merge 'make sure that address map entry is not dropped between join request placement and the request handling' from Gleb The series marks nodes to be non expiring in the address map earlier, when they are placed in the topology. Fixes: scylladb/scylladb#16849 * 'gleb/16849-fix-v2' of github.com:scylladb/scylla-dev: test: add test to check that address cannot expire between join request placemen and its processing topology_coordinator: set address map entry to nonexpiring when a node is added to the topology raft_group0: add modifiable_address_map() function	2024-03-21 10:19:46 +01:00
Kamil Braun	676af581d8	Merge 'cdc: should_propose_first_generation: get my_host_id from caller' from Benny Halevy There is no need to map this node's inet_address to host_id. The storage_service can easily just pass the local host_id. While at it, get the other node's host_id directly from their endpoint_state instead of looking it up yet again in the gossiper, using the nodes' address. Refs #12283 Closes scylladb/scylladb#17919 * github.com:scylladb/scylladb: cdc: should_propose_first_generation: get my_host_id from caller storage_service: add my_host_id	2024-03-21 10:19:46 +01:00
Avi Kivity	43bcaeb87f	Merge 'test: randomized_nemesis_test: add fmt::formatter for some types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raft_call * raft_read * network_majority_grudge * reconfiguration * stop_crash * operation::thread_id * append_seq * AppendReg::append * AppendReg::ret * operation::either_of<Ops...> * operation::exceptional_result<Op> * operation::completion<Op> * operation::invocable<Op> and drop their operator<<:s. in which, * `operator<<` for append_entry is never used. so it is removed. * `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}. * stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change. Refs #13245 Closes scylladb/scylladb#17884 * github.com:scylladb/scylladb: test: raft: generator: add fmt::formatter:s test: randomized_nemesis_test: add fmt::formatter for some types test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error raft: add fmt::formatter for error classes	2024-03-21 10:19:46 +01:00
Kefu Chai	6d77283941	cdc: add fmt::formatter for exception types in data_dictionary.hh before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<>` is added for following classes for backward compatibility with {fmt} < 10: * `data_dictionary::no_such_keyspace` * `data_dictionary::no_such_column_family` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 13:26:01 +08:00
Kefu Chai	a58be49abf	utils: add fmt::formatter for utils::bad_exception_container_access before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<utils::bad_exception_container_access>` is added for backward compatibility with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Kefu Chai	0d6bff0f56	sstables: add fmt::formatter for classes derived from sstables::malformed_sstable_exception before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<T>` is added for classes derived from `malformed_sstable_exception`, where `T` is the class type derived from `malformed_sstable_exception`. this change is implemented to be backward compatible with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Kefu Chai	0609cd676f	exceptions: add fmt::formatter for classes derived from cassandra_exception before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<T>` is added for classes derived from `cassandra_exception`, where `T` is the class type derived from `cassandra_exception`. this change is implemented to be backward compatible with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Kefu Chai	f5e1f0ccc7	cdc: add fmt::formatter for cdc::no_generation_data_exception before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, `fmt::formatter<cdc::no_generation_data_exception>` is added for backward compatibility with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-21 12:48:19 +08:00
Petr Gusev	740b240e9d	test_cdc_generation_data: test snapshot transfer The test only looked at the initial cdc_generation generation. It made the changes bigger to go past the raft max_command_size limit. It then made sure this large mutation set is saved in several raft commands. In this commit we enhance the test to check that the mutations are properly handled during snapshot transfer. The problem is that the entire system.cdc_generations_v3 table is read into the topology_snapshot and it's total size can exceed the commitlog max_record_size limit. We need a separate injection since the compaction could nullify the effects of the previous injection. The test fails without the fix from the previous commit.	2024-03-20 22:40:03 +04:00
Petr Gusev	276d58114d	storage_service::merge_topology_snapshot: handle big cdc_generations_v3 mutations The group0 state machine calls merge_topology_snapshot from transfer_snapshot. It feeds it with raft_topology_snapshot returned from raft_pull_topology_snapshot. This snapshot includes the entire system.cdc_generations_v3 table. It can be huge and break the commitlog max_record_size limit. The system.cdc_generations_v3 is a single-partition table, so all the data is contained in one mutation object. To fit the commitlog limit we split this mutation into several smaller ones and apply them in separate database::apply calls. That means we give up the atomicity guarantee, but we actually don't need it for system.cdc_generations_v3. The cdc_generations_v3 data is not used in any way until it's referenced from the topology table. By applying the cdc_generations_v3 mutations before topology mutations we ensure that the lack of atomicity isn't a problem here. The database::apply method takes frozen_mutation parameter by const reference, so we need to keep them alive until all the futures are complete. fixes #17545	2024-03-20 22:40:03 +04:00
Petr Gusev	db1afa0aba	mutation: add split_mutation function The function splits the source mutation into multiple mutations so that their size does not exceed the max_size limit. The size of a mutation is calculated as the sum of the memory_usage() of its constituent mutation_fragments. The implementation is taken from view_updating_consumer. We use mutation_rebuilder_v2 to reconstruct mutations from a stream of mutation fragments and recreate the output mutation whenever we reach the limit. We'll need this function in the next commit.	2024-03-20 22:39:51 +04:00
Petr Gusev	d07e0efdd8	storage_service::merge_topology_snapshot: fix indentation It was three spaces, should be four.	2024-03-20 22:30:48 +04:00
Kefu Chai	61424b615c	test: raft: generator: add fmt::formatter:s before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * operation::either_of<Ops...> * operation::exceptional_result<Op> * operation::completion<Op> * operation::invocable<Op> and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Kefu Chai	72899f573e	test: randomized_nemesis_test: add fmt::formatter for some types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raft_call * raft_read * network_majority_grudge * reconfiguration * stop_crash * operation::thread_id * append_seq * append_entry * AppendReg::append * AppendReg::ret and drop their operator<<:s. in which, * `operator<<` for `std::monostate` and `std::variant` are dropped. as we are now using their counterparts in {fmt}. * stop_crash::result_type 's `fmt::formatter` is not added, as we cannot define a partial specialization of `fmt::formatter` for a nested class for a template class. we will tackle this struct in another change. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Kefu Chai	97b203b1af	test: randomized_nemesis_test: add fmt::formatter for seastar::timed_out_error before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatter for `seastar::timed_out_error`, which will be used by the `fmt::formatter` for `std::variant<...>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Kefu Chai	50637964ed	raft: add fmt::formatter for error classes before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatter for classes derived from `raft::error`. since {fmt} v10 defines the formatter for all classes derived from `std::exception`, the definition is provided only when the tree is compiled with {fmt} < 10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-20 21:01:29 +08:00
Pavel Emelyanov	21a5911e60	Merge 'db/virtual_tables: make token_ring_table tablet aware' from Botond Dénes The token ring table is a virtual table (`system.token_ring`), which contains the ring information for all keyspaces in the system. This is essentially an alternative to `nodetool describering`, but since it is a virtual table, it allows for all the usual filtering/aggregation/etc. that CQL supports. Up until now, this table only supported keyspaces which use vnodes. This PR adds support for tablet keyspaces. To accommodate these keyspaces a new `table_name` column is added, which is set to `ALL` for vnodes keyspaces. For tablet keyspaces, this contains the name of the table. Simple sanity tests are added for this virtual table (it had none). Fixes: #16850 Closes scylladb/scylladb#17351 * github.com:scylladb/scylladb: test/cql-pytest: test_virtual_tables: add test for token_ring table db/virtual_tables: token_ring_table: add tablet support db/virtual_tables: token_ring_table: add table_name column db/virtual_tables: token_ring_table: extract ring emit service/storage_service: describe_ring_for_table(): use topology to map hostid to ip	2024-03-20 14:05:49 +03:00
Benny Halevy	fceb1183d3	cdc: should_propose_first_generation: get my_host_id from caller There is no need to map this node's inet_address to host_id. The storage_service can easily just pass the local host_id. While at it, get the other node's host_id directly from their endpoint_state instead of looking it up yet again in the gossiper, using the nodes' address. Refs #12283 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-20 12:53:49 +02:00
Benny Halevy	37adcd3ecf	storage_service: add my_host_id Shorthand for getting this node's host_id from token_metadata.topology, similar to the `get_broadcast_address` helper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-20 12:53:49 +02:00
Mikołaj Grzebieluch	b4144d14c6	test.py: adjust the test for topology upgrade to write to and read from CDC tables In topology on raft, management of CDC generations is moved to the topology coordinator. We need to verify that the CDC keeps working correctly during the upgrade for topology on the raft. A similar change will be made in the topology recovery test. It will reuse the `start_writes_to_cdc_table` function. Ref #17409 Closes scylladb/scylladb#17828	2024-03-20 11:15:02 +01:00
Yaron Kaikov	d859067486	[action sync labels] improve pr search when labeling an issue This PR contains few fixes and improvment seen during https://github.com/scylladb/scylladb/issues/15902 label addtion When we add a label to an issue, we go through all PR. 1) Setting PR base to `master` (release PR are not relevant) 2) Since for each Issue we have only one PR, ending the search after a match was found 3) Make sure to skip PR with empty body (mainly debug one) 4) Set backport label prefix to `backport/` Closes scylladb/scylladb#17912	2024-03-20 12:14:42 +02:00
David Garcia	559dc9bb27	docs: Implement relative link support for configuration properties Introduces relative link support for individual properties listed on the configuration properties page. For instance, to link to a property from a different document, use the syntax :ref:`memtable_flush_static_shares <confprop_memtable_flush_static_shares>`. Additionally, it also adds support for linking groups. For example, :ref:`Ungrouped properties <confgroup_ungrouped_properties>`. Closes scylladb/scylladb#17753	2024-03-20 11:39:30 +02:00
Gleb Natapov	2b11842cb4	test: add test to check that address cannot expire between join request placemen and its processing	2024-03-20 11:05:31 +02:00
Kefu Chai	2479328e3b	Update seastar submodule > Revert "build: do not provide zlib as an ingredient" > Fix reference to sstring type in tutorial about concurrency in coroutines > Merge 'Adding a Metrics tester app' from Amnon Heiman > cooking.sh: do not quote backtick in here document Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17887	2024-03-20 09:18:35 +02:00
Kefu Chai	432c000dfa	./: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17888	2024-03-20 09:16:46 +02:00
Raphael S. Carvalho	6115c113fe	sstables_loader: Don't discard sstable that is not fully exhausted Affects load-and-stream for tablets only. The intention is that only this loop is responsible for detecting exhausted sstables and then discarding them for next iterations: while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) { sstable_it++; } But the loop which consumes non exhausted sstables, on behalf of each tablet, was incorrectly advancing the iterator, despite the sstable wasn't considered exhausted. Fixes #17733. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17899	2024-03-20 09:11:59 +02:00
Yaron Kaikov	0cbe5f1aa8	[action] add Fixes validation in backport PR When we open a backport PR we should make sure the patch contains a ref to the issue it suppose to fix in order to make sure we have more accurate backport information This action will only be triggered when base branch is `branch-*` If `Fixes` are missing, this action will fail and notify the author. Ref: https://github.com/scylladb/scylla-pkg/issues/3539 Closes scylladb/scylladb#17897	2024-03-20 08:55:36 +02:00
Nadav Har'El	8df2ea3f95	cql: don't crash when creating a view during a truncate The test dtest materialized_views_test.py::TestMaterializedViews:: test_mv_populating_from_existing_data_during_truncate reproduces an assertion failure, and crash, while doing a CREATE MATERIALIZED VIEW during a TRUNCATE operation. This patch fixes the crash by removing the assert() call for a view (replacing it by a warning message) - we'll explain below why this is fine. Also for base tables change we change the assertion to an on_internal_error (Refs #7871). This makes the test stop crashing Scylla, but it still fails due to issue #17635. Let's explain the crash, and the fix: The test starts TRUNCATE on table that doesn't yet have a view. truncate_table_on_all_shards() begins by disabling compaction on the table and all its views (of which there are none, at this point). At this point, the test creates a new view is on this table. The new view has, by default, compaction enabled. Later, TRUNCATE calls discard_sstables() on this new view, asserts that it has compaction disabled - and this assertion fails. The fix in this patch is to not do the assert() for views. In other words, we acknowledge that in this use case, the view will have compactions enabled while being truncated. I claim that this is "good enough", if we remember why we disable compaction in the first place: It's important to disable compaction while truncating because truncating during compaction can lead us to data resurection when the old sstable is deleted during truncation but the result of the compaction is written back. True, this can now happen in a new view (a view created DURING the truncation). But I claim that worse things can happen for this new view: Notably, we may truncate a view and then the ongoing view building (which happens in a new view) might copy data from the base to the view and only then truncate the base - ending up with an empty base and non-empty view. This problem - issue #17635 - is more likely, and more serious, than the compaction problem, so will need to be solved in a separate patch. Fixes #17543. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17634	2024-03-20 08:54:39 +02:00
Raphael S. Carvalho	d5a5005afa	sstables: Fix clone semantics for runs in partitioned_sstable_set When a sstable set is cloned, we don't want a change in cloned set propagating to the former one. It happens today with partitioned_sstable_set::_all_runs, because sets are sharing ownership of runs, which is wrong. Let's not violate clone semantics by copying all_runs when cloning. Doesn't affect data correctness as readers work directly with sstables, which are properly cloned. Can result in a crash in ICS when it is estimating pending tasks, but should be very rare in practice. Fixes #17878. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17879	2024-03-20 08:41:32 +02:00
Botond Dénes	c2425ca135	tools/scylla-nodetool: add --rest-api-port option This option is an alternative to --port\|-p and takes precedence over it. This is meant to aid the switch from the legacy nodetool to the native one. Users of the legacy nodetool pass the port of JMX to --port. We need a way to provide both the JMX port (via --port) and also the REST API port, which only the native nodetool will interpret. So we add this new --rest-api-port, which when provided, overwrites the --port\|-p option. To ensure the legacy nodeotol doesn't try to interpret this, this option can also be provided as -Dcom.scylladb.apiPort (which is substituted to --rest-api-port behind the scenes).	2024-03-20 02:11:47 -04:00
Botond Dénes	a85ec6fc60	tools/scylla-nodetool: ignore JVM args Legacy scripts and tests for nodetool, might pass JVM args like -Dcom.sun.jndi.rmiURLParsing=legacy. Ignore these, by dropping anything that starts with -D from the command line args.	2024-03-20 02:11:47 -04:00
Botond Dénes	12516b0861	tools/utils: make finding the operation command line option more flexible Currently all scylla-tools assume that the operation/command is in argv[1]. This is not very flexible, because most programs allow global options (that are not dependent on the current operation/command) to be passed before the operation name on the command line. Notably C*'s nodetool is one such program and indeed scripts and tests using nodetool do utilize this. This patch makes this more flexible. Instead of looking at argv[1], do an initial option parsing with boost::program_options to locate the operation parameter. This initial parser knows about the global options, and the operation positional argument. It allows for unrecognized positional and non-positional arguments, but only after the command. With this, any combination of global options + operation is allowed, in any order.	2024-03-20 02:11:47 -04:00
Botond Dénes	7ae98c586a	tools/utils: get_selected_operation(): remove alias param This method has a single caller, who always passes "operation". Just hard-code this into the method, no need to keep a param for it.	2024-03-20 02:11:47 -04:00
Botond Dénes	28e7eecf0b	tools: add constant with current help command-line arguments Unfortunately, we have code in scylla-nodetool.cc which needs to know what are the current help options available. Soon, there will be more code like this in tools/utils.cc, so centralize this list in a const static tool_app_template member.	2024-03-20 02:11:47 -04:00
Petr Gusev	5db6b8b3c2	error_injection: move api registration to set_server_init The set_server_done function is called only when a node is fully initialized. To allow error injection to be used during initialization we move the handler registration to set_server_init, which is called as soon as the api http server is started.	2024-03-19 20:18:29 +04:00
Petr Gusev	e4318e139d	error_injection: add inject_parameter method In this commit we extend the error_injector with a new method inject_parameter. It allows to pass parameters from tests to scylla, e.g. to lower timeouts or limits. A typical use cases is described in scylladb/scylladb#15571. It's logically the same as inject_with_handler, whose lambda reads the parameter named 'value'. The only difference is that the inject_parameter doesn't return future, it just read the parameter from the injection shared_data.	2024-03-19 20:18:23 +04:00
Petr Gusev	460567c4fd	error_injection: move injection_name string into injection_shared_data In subsequent commit we'll need the injection_name from inside injection_shared_data, so in this commit we move it there. Additionally, we fix the todo about switching the injections dictionary from map to unordered_set, now unordered_map contains string_views, pointing to injection_name inside injection_shared_data.	2024-03-19 20:17:02 +04:00
Petr Gusev	49a4220fea	error_injection: pass injection parameters at startup Injection parameters can be used in the lambda passed to inject_with_handler method to take some values from the test. However, there was no way to set values to these parameters on node startup, only through the error injection REST api. Therefore, we couldn't rely on this when inject_with_handler is used during node startup, it could trigger before we call the api from the test. In this commit with solve this problem by allowing these parameters to be assigned through scylla.yaml config. The defer.hh header was added to error_injection.hh to fix compilation after adding error_injection.hh to config.hh, defer function is used in error_injection.hh.	2024-03-19 20:17:02 +04:00
Andrei Chekun	b52f79b1ce	Fix leaking file descriptors in test.py Fixes #17569 Tests are not closing file descriptor after it finishes. This leads to inability to continue tests since the default value for opened files in Linux is 1024. Issue easy to reproduce with the next command: ``` $ ./test.py --mode debug test_native_transport --repeat 1500 ``` After fix applied all tests are passed with a next command: ``` $ ./test.py --mode debug test_native_transport --repeat 10000 ``` Closes scylladb/scylladb#17798	2024-03-19 14:59:14 +01:00
Piotr Dulikowski	70cb1dc8fe	doc: describe upgrade and recovery for raft topology Document the manual upgrade procedure that is required to enable consistent cluster management in clusters that were upgraded from an older version to ScyllaDB Open Source 6.0. This instruction is placed in previously placeholder "Enable Raft-based Topology" page which is a part of the upgrade instructions to ScyllaDB Open Source 6.0. Add references to the new description in the "Raft Consensus Algorithm in ScyllaDB" document in relevant places. Extend the "Handling Node Failures" document so that it mentions steps required during recovery of a ScyllaDB cluster running version 6.0. Fixes: scylladb/scylladb#17341 Closes scylladb/scylladb#17624	2024-03-19 14:59:14 +01:00
Gleb Natapov	fde3068530	topology_coordinator: set address map entry to nonexpiring when a node is added to the topology Currently a node's address is set to nonexpiring in the address map when the node is added to group0, but the node is added to the topology earlier (during the join request) and the cluster must be able to communicate with it (potentially) much later when the request will be processed. The patch marks nodes that are in the topology, but no yet in group0 as non expiring, so they will not be dropped from address map until their join request is processed. Fixes: scylladb/scylladb#16849	2024-03-19 13:35:19 +02:00
Gleb Natapov	9651ae875f	raft_group0: add modifiable_address_map() function Provide access to non const address_map. We will need it later.	2024-03-19 13:34:41 +02:00
Yaron Kaikov	ad76f0325e	[action] Sync labels from an Issue to linked PR After merging https://github.com/scylladb/scylladb/pull/17365, all backport labels should be added to PR (before we used to add backport labels to the issues). Adding a GitHub action which will be triggered in the following conditions only: 1) The base branch is `master` or `next` 2) Pull request events: - opened: For every new PR that someone opens, we will sync all labels from the linked issue (if available) - labeled: This role only applies to labels with the `backport/` prefix. When we add a new label for the backport we will update the relevant issue or PR to get them both to sync - unlabeled: Same as `labeled` only applies to the `backport/` prefix. When we remove a label for backport we will update the relevant issue or pr Closes scylladb/scylladb#17715	2024-03-19 09:17:07 +02:00
Avi Kivity	e48eb76f61	sstables_manager: decouple from system_keyspace sstables_manager now depends on system_keyspace for access to the system.sstables table, needed by object storage. This violates modularity, since sstables_manager is a relatively low-level leaf module while system_keyspace integrates large parts of the system (including, indirectly, sstables_manager). One area where this is grating is sstables::test_env, which has to include the much higher level cql_test_env to accommodate it. Fix this by having sstables_manager expose its dependency on system_keyspace as an interface, sstables_registry, and have system_keyspace implement the glue logic in system_keyspace_sstables_manager. Closes scylladb/scylladb#17868	2024-03-18 20:38:07 +03:00
Anna Stuchlik	a13694daea	doc: fix the image upgrade page This commit updates the Upgrade ScyllaDB Image page. - It removes the incorrect information that updating underlying OS packages is mandatory. - It adds information about the extended procedure for non-official images. Closes scylladb/scylladb#17867	2024-03-18 18:27:46 +02:00
Gleb Natapov	af218d0063	raft_group0_client: assert that hold_read_apply_mutex is called on shard 0 group0 operations a valid on shard 0 only. Assert that. We already do that in the version of the function that gets abort source. Message-ID: <ZeCti70vrd7UFNim@scylladb.com>	2024-03-18 16:20:41 +01:00
Pavel Emelyanov	a8f48e0f6b	test/boost/tablets: Use verbose BOOST_REQUIRE checkers Lot's of BOOST_REQUIRES in this test require some integers to be in some eq/gt/le relations to each other. And one place that compares rack names as strings. Using more verbose boost checkers is preferred in such cases Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17866	2024-03-18 17:09:02 +02:00
Botond Dénes	270d01f16a	Merge 'build: cmake: put server deb packages under build/dist/$<CONFIG>/debian' from Kefu Chai this change is a follow up of `ca7f7bf8e2`, which changed the output path to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh expects is `build/dist/$config/debian/.deb`, where `$config` is the normalized mode, when the debian packages are built using CMake generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`. so, `ca7f7bf8e2` made a mistake, as it does not match the expectation of `build_docker.sh`. in this change, this issue is addressed. so we use the same path in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`. Closes scylladb/scylladb#17848 github.com:scylladb/scylladb: build: cmake: add dist-* targets to the default build target build: cmake: put server deb packages under build/dist/$<CONFIG>/debian	2024-03-18 16:18:35 +02:00
Avi Kivity	72bbe75d5b	Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec This PR fixes a problem with replacing a node with tablets when RF=N. Currently, this will fail because tablet replica allocation for rebuild will not be able to find a viable destination, as the replacing node is not considered to be a candidate. It cannot be a candidate because replace rolls back on failure and we cannot roll back after tablets were migrated. The solution taken here is to not drain tablet replicas from replaced node during topology request but leave it to happen later after the replaced node is in left state and replacing node is in normal state. The replacing node waits for this draining to be complete on boot before the node is considered booted. Fixes https://github.com/scylladb/scylladb/issues/17025 Nodes in the left state will be kept in tablet replica sets for a while after node replace is done, until the new replica is rebuilt. So we need to know about those node's location (dc, rack) for two reasons: 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first. 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement. It's ok to not know the IP, and we don't keep it. Those nodes will not be present in the IP-based replica sets, e.g. those returned by get_natural_endpoints(), only in host_id-based replica sets. storage_proxy request coordination is not affected. Nodes in the left state are still not present in token ring, and not considered to be members of the ring (datacanter endpoints excludes them). In the future we could make the change even more transparent by only loading locator::node* for those nodes and keeping node* in tablet replica sets. Currently left nodes are never removed from topology, so will accumulate in memory. We could garbage-collect them from topology coordinator if a left node is absent in any replica set. That means we need a new state - left_for_real. Closes scylladb/scylladb#17388 * github.com:scylladb/scylladb: test: py: Add test for view replica pairing after replace raft, api: Add RESTful API to query current leader of a raft group test: test_tablets_removenode: Verify replacing when there is no spare node doc: topology-on-raft: Document replace behavior with tablets tablets, raft topology: Rebuild tablets after replacing node is normal tablets: load_balancer: Access node attributes via node struct tablets: load_balancer: Extract ensure_node() mv: Switch to using host_id-based replica set effective_replication_map: Introduce host_id-based get_replicas() raft topology: Keep nodes in the left state to topology tablets: Introduce read_required_hosts()	2024-03-18 16:16:08 +02:00
Kefu Chai	d1c35f943d	test: unit: add fmt::formatter for test_data in tests before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * test_data in two different tests * row_cache_stress_test::reader_id and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17861	2024-03-18 15:35:28 +02:00
Kefu Chai	de6803de92	build: cmake: use --ld-path for specifying linker for clang Clang > 12 starts to complain like ``` warning: '-fuse-ld=' taking a path is deprecated; use '--ld-path=' instead [-Wfuse-ld-path]' ``` this option is not supported by GCC yet. also instead of using the generic driver's name, use the specific name. otherwise ld fails like ``` lld is a generic driver. Invoke ld.lld (Unix), ld64.lld (macOS), lld-link (Windows), wasm-ld (WebAssembly) instead ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17825	2024-03-18 14:49:11 +02:00
Pavel Emelyanov	933b346166	test/tablets: Add test to check how ALTER changes RF (in one DC) For now test is incomplete in several ways 1. It xfails, until #17116 2. It doesn't rebuild/repair tablets 3. It doesn't check that tablet data actually exists on replicas refs: #17575 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17808	2024-03-18 14:47:57 +02:00
Yaron Kaikov	6406d3083c	[mergify] set draft PR when conflicts When Mergify open a backport PR and identify conflicts, it adding the `conflicts` label. Since GitHub can't identify conflicts in PR, setting a role to move PR to draft, this way we will not trigger CI Once we resolve the conflicts developer should make the PR `ready for review` (which is not draft) and then CI will be triggered `conflict` label can also be removed Closes scylladb/scylladb#17834	2024-03-18 14:45:08 +02:00
Beni Peled	bddac3279e	Skip the backport-label workflow for draft pull requests It's not necessary (and annoying) when this workflow runs and fails against PRs in draft mode Closes scylladb/scylladb#17864	2024-03-18 14:42:55 +02:00
Wojciech Mitros	efcb718e0a	mv: adjust memory tracking of single view updates within a batch Currently, when dividing memory tracked for a batch of updates we do not take into account the overhead that we have for processing every update. This patch adds the overhead for single updates and joins the memory calculation path for batches and their parts so that both use the same overhead. Fixes #17854 Closes scylladb/scylladb#17855	2024-03-18 14:31:54 +02:00
Kefu Chai	d57a82c156	build: cmake: add dist-* targets to the default build target also, add a target of `dist-server`, which mirrors the structure of the targets created by `configure.py`, and it is consistent with the ones defined by `build_submodule()`. so that they are built when our CI runs `ninja -C $build`. CI expects that all these rpm and deb packages to built when `ninja -C $build` finishes. so that it can continue with building the container image. let's make it happen. so that the CMake-based rules can work better with CI. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-18 20:02:43 +08:00
Raphael S. Carvalho	2c9b13d2d1	compaction: Check for key presence in memtable when calculating max purgeable timestamp It was observed that some use cases might append old data constantly to memtable, blocking GC of expired tombstones. That's because timestamp of memtable is unconditionally used for calculating max purgeable, even when the memtable doesn't contain the key of the tombstone we're trying to GC. The idea is to treat memtable as we treat L0 sstables, i.e. it will only prevent GC if it contains data that is possibly shadowed by the expired tombstone (after checking for key presence and timestamp). Memtable will usually have a small subset of keys in largest tier, so after this change, a large fraction of keys containing expired tombstones can be GCed when memtable contains old data. Fixes #17599. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17835	2024-03-18 13:37:44 +02:00
Benny Halevy	2c0b1d1fa7	compaction: get_max_purgeable_timestamp: optimize sstable filtering by min_timestamp There is no point in checking `sst->filter_has_key(*hk)` if the sstable contains no data older than the running minimum timestamp, since even if it matches, it won't change the minimum. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17839	2024-03-18 13:26:49 +02:00
Avi Kivity	ed211cd0bf	sstables: partition_index_cache: reindent Fix up after `e120ba3514`. Closes scylladb/scylladb#17847	2024-03-18 13:23:21 +02:00
Andrei Chekun	b6edf056ea	Add sanity tests for multi dc Fix writing cassandra-rackdc.properties with correct format data instead of yaml Add a parameter to overwrite RF for specific DC Add the possibility to connect cql to the specific node In this PR 4 tests were added to test multi-DC functionality. One is added from initial commit were multi-DC possibility were introduced, however, this test was not commited. Three of them are migrations from dtest, that later will be deleted. To be able to execute migrated tests additional functionality is added: the ability to connect cql to the specific node in the cluster instead of pooled connection and the possibility to overwrite the replication factor for the specific DC. To be able to use the multi DC in test.py issue with the incorrect format of the properties file fixed in this PR. Closes scylladb/scylladb#17503	2024-03-18 13:00:36 +02:00
Nadav Har'El	680e37c4af	Merge 'schema_tables: unfreeze frozen_mutation:s gently' from Avi Kivity With large schemas, unfreezing can stall, especially as it requires a lot of memory. Switch to a gentle version that will not stall. As a preparation step, we add unfreeze_gently() for a span of mutations. Fixes #17841 Closes scylladb/scylladb#17842 * github.com:scylladb/scylladb: schema_tables: unfreeze frozen_mutation:s gently frozen_mutation: add unfreeze_gently(span<frozen_mutation>)	2024-03-18 12:56:44 +02:00
Kefu Chai	fe28aac440	test/perf: add fmt::formatter for perf_result_with_aio_writes before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `perf_result_with_aio_writes`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17849	2024-03-18 12:53:39 +02:00
Botond Dénes	a4e8bea679	tools/scylla-nodetool: status: handle missing host_id Newly joining nodes may not have a host id yet. Handle this and print a "?" for these nodes, instead of the host-id. Extend the existing test for joining node case (also rename it and add comment). Closes scylladb/scylladb#17853	2024-03-18 12:26:59 +02:00
Kefu Chai	384e9e9c7c	build: cmake: put server deb packages under build/dist/$<CONFIG>/debian this change is a follow up of `ca7f7bf8e2`, which changed the output path to build/$<CONFIG>/debian. but what dist/docker/debian/build_docker.sh expects is `build/dist/$config/debian/*.deb`, where `$config` is the normalized mode, when the debian packages are built using CMake generated rules, `$mode` is CMake configuration name, i.e., `$<CONFIG>`. so, `ca7f7bf8e2` made a mistake, as it does not match the expectation of `build_docker.sh`. in this change, this issue is addressed. so we use the same path in both `dist/CMakeLists.txt` and `dist/docker/debian/build_docker.sh`. apply the same change to `dist-server-rpm`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-18 14:21:39 +08:00
Avi Kivity	731b5c5120	schema_tables: unfreeze frozen_mutation:s gently With large schemas, unfreezing can stall, especially as it requires a lot of memory. Switch to a gentle version that will not stall.	2024-03-17 17:46:02 +02:00
Avi Kivity	a34edb0a93	frozen_mutation: add unfreeze_gently(span<frozen_mutation>) While we have unfreeze(vector<frozen_mutation>), a gentle version is preferred.	2024-03-17 17:45:30 +02:00
Kefu Chai	8811900602	build: cmake: do not link randomized_nemesis_test with replication.cc test/raft/replication.cc defines a symbol named `tlogger`, while test/raft/randomized_nemesis_test.cc also defines a symbol with the same name. when linking the test with mold, it identified the ODR violation. in this change, we extract test-raft-helper out, so that randomized_nemesis_test can selectively only link against this library. this also matches with the behavior of the rules generated by `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17836	2024-03-17 17:01:47 +02:00
Kefu Chai	e1ae36ecfd	test/boost: add formatter for BOOST_REQUIRE_EQUAL in gossiping_property_file_snitch_test, we use `BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0])` to check the equality of two instances of `pair<sstring, sstring`, like: ```c++ BOOST_REQUIRE_EQUAL(dc_racks[i], dc_racks[0]) ``` since the standard library does not provide the formatter for printing `std::pair<>`, we rely on the homebrew generic formatter to print `std::pair<>, which in turn uses operator<< to format the elements in the `pair`, but we intend to remove this formatter in future, as the last step of #13245 . so in order to enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL` check fails, we are adding `boost_test_print_type()` for `pair<sstring,sstring>`. the helper function uses {fmt} to print the `pair<>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17831	2024-03-17 16:58:39 +02:00
Kefu Chai	6244a2ae00	service:qos: add fmt::formatter for service_level_options::workload_type this change prepares for the fmt::formatter based formatter used by tests, which will use {fmt} to print the elements in a container, so we need to define the formatter using fmt::formatter for these element. the operator<< for service_level_options::workload_type is preserved, as the tests are still using it. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17837	2024-03-17 16:52:57 +02:00
Kefu Chai	7df3acd39c	repair: add fmt::formatter for row_level_diff_detect_algorithm before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for row_level_diff_detect_algorithm. please note, we already have `format_as()` overload for this type, but we cannot use it as a fallback of the proper `fmt::formatter<>` specialization before {fmt} v10. so before we update our CI to a distro with {fmt} v10, `fmt::formatter<row_level_diff_detect_algorithm>` is still needed. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17824	2024-03-16 19:12:49 +02:00
Botond Dénes	03c47bc30b	tools/scylla-nodetool: status: handle nodes without load Some nodes may not have a load yet. Handle this. Also add a test covering this case. Closes scylladb/scylladb#17823	2024-03-16 17:38:53 +02:00
Pavel Emelyanov	42a2dce4b6	test/lib: Eliminate variadic futures from template The assert_that_failed(future) pair of helpers are templates with variadic futures, but since they are gone in seastar, so should they in test/lib Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17830	2024-03-16 17:37:25 +02:00
Kefu Chai	8bab51733f	db: add fmt::formatter for db::functions::function before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::functions::function`. please note, because we use `std::ostream` as the parameter of the polymorphism implementation of `function::print()`. without an intrusive change, we have to use `fmt::ostream_formatter` or at least use similar technique to format the `function` instance into an instance of `ostream` first. so instead of implementing a "native" `fmt::formatter`, in this change, we just use `fmt::ostream_formatter`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17832	2024-03-16 17:36:49 +02:00
Kefu Chai	23e9958ebb	data_dictionary: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17826	2024-03-15 21:17:11 +03:00
Botond Dénes	ad9bad4700	tools/scylla-nodetool: {proxy,table}histograms: handle empty histograms Empty histograms are missing some of the members that non-empty histograms have. The code handling these histograms assumed all required members are always present and thus error out when receiving an empty histogram. Add tests for empty histograms and fix the code handling them to check for the potentially missing members, instead of making assumptions. Closes scylladb/scylladb#17816	2024-03-15 15:59:31 +03:00
Tomasz Grabiec	a233a699cc	test: py: Add test for view replica pairing after replace	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	6d50e93f10	raft, api: Add RESTful API to query current leader of a raft group Example: $ curl -X GET "http://127.0.0.1:10000/raft/leader_host" "f7f57588-62de-4cac-9e4b-c62bfc458d91" Accepts optional group_id param, defaults to group0.	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	6d24fdee75	test: test_tablets_removenode: Verify replacing when there is no spare node The test is changed to be more strict. Verifies the case of replacing when RF=N in which case tablet replicas have to be rebuilt using the replacing node. This would fail if tablets are drained as part of replace operation, since replacing node is not yet a viable target for tablet migration.	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	1d01b4ca20	doc: topology-on-raft: Document replace behavior with tablets	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	1c71f44e63	tablets, raft topology: Rebuild tablets after replacing node is normal This fixes a problem with replacing a node with tablets when RF=N. Currently, this will fail because new tablet replica allocation will not be able to find a viable destination, as the replacing node is not considered a candidate. It cannot be a candidate because replace rolls back on failure and we cannot roll back after tablets were migrated. The solution taken here is to not drain tablet replicas from replaced node during topology request but leave it to happen later after the replaced node is left and replacing node is normal. The replacing node waits for this draining to be complete on boot before the node is considered booted. Fixes #17025	2024-03-15 13:20:08 +01:00
Tomasz Grabiec	b2418fab39	tablets: load_balancer: Access node attributes via node struct Reduces lookups into topology and decouples the algorithm more from the topology object.	2024-03-15 11:22:34 +01:00
Tomasz Grabiec	9090050244	tablets: load_balancer: Extract ensure_node() Will be called in another loop to populate the "nodes" map with left node.	2024-03-15 11:22:32 +01:00
Artsiom Mishuta	73ed4c0eb5	test.py: fix aiohttp usage issue in python 3.12 Fix aiohttp usage issue in python 3.12: "Timeout context manager should be used inside a task" This occurs due to UnixRESTClient created in one event loop (created inside pytest) but used in another (created in rewriten event_loop fixture), now it is fixed by updating UnixRESTClient object for every new loop. Closes scylladb/scylladb#17760	2024-03-15 11:17:29 +01:00
Tomasz Grabiec	9b656ec2aa	mv: Switch to using host_id-based replica set This is necessary to not break replica pairing between base and view. After replacing a node, tablet replica set contains for a while the replaced node which is in the left state. This node is not returned by the IP-based get_natural_endpoints() so the replica indexes would shift, changing the pairing with the view. The host_id-based replica set always has stable indexes for replicas.	2024-03-15 11:05:29 +01:00
Tomasz Grabiec	888dc41d66	effective_replication_map: Introduce host_id-based get_replicas()	2024-03-15 11:05:29 +01:00
Tomasz Grabiec	61b3453552	raft topology: Keep nodes in the left state to topology Those nodes will be kept in tablet replica sets for a while after node replace is done, until the new replica is rebuilt. So we need to know about those node's location (dc, rack) for two reasons: 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first. 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement. It's ok to not know the IP, and we don't keep it. Those nodes will not be present in the IP-based replica sets, e.g. those returned by get_natural_endpoints(), only in host_id-based replica sets. storage_proxy request coordination is not affected. Nodes in the left state are still not present in token ring, and not considered to be members of the ring (datacanter endpoints excludes them). In the future we could make the change even more transparent by only loading locator::node* for those nodes and keeping node* in tablet replica sets. We load topology infromation only for left nodes which are actually referenced by any tablet. To achieve that, topology loading code queries system.tablet for the set of hosts. This set is then passed to system.topology loading method which decides whether to load replica_state for a left node or not.	2024-03-15 11:05:29 +01:00
Tomasz Grabiec	f7851696fa	tablets: Introduce read_required_hosts() Will be used by topology loading code to determine which hosts are needed in topology, even if they're in the left state. We want to load only left nodes if they are referenced by any tablet, which may happen temporarily until the replacement replica is rebuilt.	2024-03-15 11:05:29 +01:00
Botond Dénes	598e5aebfb	test/cql-pytest: test_virtual_tables: add test for token_ring table Just a simple sanity test for both vnodes and tablets.	2024-03-15 04:23:20 -04:00
Botond Dénes	279e496133	db/virtual_tables: token_ring_table: add tablet support For keyspaces which use tablets, we describe each table separately.	2024-03-15 04:23:20 -04:00
Botond Dénes	61b6ac7ffe	db/virtual_tables: token_ring_table: add table_name column As the first clustering column. For vnode keyspaces, this will always be "ALL", for tablet keyspaces, this will contain the name of the described table.	2024-03-15 04:23:20 -04:00
Botond Dénes	fdef62c232	db/virtual_tables: token_ring_table: extract ring emit Into a separate method. For vnodes there is a single ring per keyspace, but for tablets, there is a separate ring for each table in the keyspace. To accomodate both, we move the code emitting the ring into a separate method, so execute() can just call it once per keyspace or once per table, whichever appropriate.	2024-03-15 04:23:20 -04:00
Botond Dénes	a205752513	service/storage_service: describe_ring_for_table(): use topology to map hostid to ip Do no use the internal host2ip() method. This relies on `_group0`, which is only set on shard 0. Consequently, any call to this method, coming from a shard other than shard 0, would crash ScyllaDB, as it dereferences a nullptr.	2024-03-15 04:23:20 -04:00
Nadav Har'El	6cdb68f094	test/cql-pytest: remove unused function Remove an unused function from test/cql-pytest/test_using_timeout.py. Some linters can complain that this function used re.compile(), but the "re" package was never imported. Since this function isn't used, the right fix is to remove it - and not add the missing import. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17801	2024-03-15 09:56:30 +02:00
Kefu Chai	e1a9340cc1	partition_version: add fmt::formatter for partition_entry::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `parition_entry::printer`, and drop its operator<< . Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17812	2024-03-15 09:52:27 +02:00
Kefu Chai	a0625261ef	build: cmake: reword the comment for dev-headers before this change, the comment was difficult to parse. let's update it for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17814	2024-03-15 09:51:47 +02:00
Kefu Chai	640d573106	schema_mutations: add fmt::formatter for schema_mutations before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `schema_mutations`, and drop its operator<< . Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17815	2024-03-15 09:49:56 +02:00
Kefu Chai	3edd530bd1	test/boost: add formatter for BOOST_REQUIRE_EQUAL before this change, we rely on the homebrew generic formatter to print unordered_set<>, which in turn uses operator<< to format the elements in the `unordered_set`, but we intend to remove this formatter in future, as the last step of #13245 . so enable Boost.test to print out lhs and rhs when `BOOST_REQUIRE_EQUAL` check fails, we are adding `boost_test_print_type()` for `unordered_set<fruit>`. the helper function uses {fmt} to print the `unordered_set<>`, so we are adding a fmt::formatter for `fruit`, the operator<< for this type is dropped, as it is not used anymore. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17813	2024-03-15 09:40:22 +02:00
Benny Halevy	530d270828	api: /storage_service/tablets/balancing: fix incorrect operation summary It was probably copy-pasted from /storage_service/tablets/move Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17811	2024-03-14 22:52:57 +01:00
Tomasz Grabiec	8c5d088928	Merge 'Drop tablets of dropped views and indices' from Benny Halevy This series adds notification before dropping views and indices so that the tablet_allocator can generate mutations to respectively drop all tablets associated with them from system.tablets. Additional unit tests were added for these cases. Note that one case is not yet tested: where a table is allowed to be dropped while having views that depend on it, when it is dropped from the alternator path. This is tested indirectly by testing dropping a table with live secondary index as it follows the same notification path as views in this series. Fixes #17627 Closes scylladb/scylladb#17773 * github.com:scylladb/scylladb: migration_manager: notify before_drop_column_family when dropping indices schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices migration_manager: notify before_drop_column_family before dropping views cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table tablet_allocator: on_before_drop_column_family: remove unused result variable	2024-03-14 22:52:29 +01:00
Raphael S. Carvalho	c46c2d436f	sstables: Reduce cost for loading sstables with tablets Loader was changed to quickly determine ownership after consuming sharding metadata only. If it's not available, it falls back to reading first and last keys from summary. The fallback is only there for backward compatibility and it costs a lot more as we don't skip to the end where keys are located in summary. With tablets, sharding metadata is only first and last keys so we can do it without sharder. So loader will be able to use it instead of looking up keys in summary. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17805	2024-03-14 21:06:35 +01:00
Pavel Emelyanov	8ffb5f27c7	topology_coordinator: Clear tablet transition session after streaming When jumping from streaming stage into cleanup_target, session must also be cleared as pending replica may still process some incoming mutations blocked in the pipeline. Deleting session prior to executing barrier makes sure those mutations will not be applied. fixes: #17682 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17800	2024-03-14 20:35:00 +01:00
Pavel Emelyanov	6a77f36519	doc: Add tablets migration state diagram Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17790	2024-03-14 20:29:21 +01:00
Benny Halevy	5bfca73b30	migration_manager: notify before_drop_column_family when dropping indices Fixes #17627 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:19:12 +02:00
Benny Halevy	9cf6a2e510	schema_tables: make_update_indices_mutations: use find_schema to lookup the view of dropped indices When dropping indices, we don't need to go through `create_view_for_index` in order to drop the index. That actually creates a new schema for this view which is used just for its metadata for generating mutations dropping it. Instead, use `find_schema` to lookup the current schema for the dropped index. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:19:11 +02:00
Benny Halevy	358e92e645	migration_manager: notify before_drop_column_family before dropping views Call the before_drop_column_family notifications before dropping the views to allow the tablet_allocator to delete the view's tablets. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 20:14:56 +02:00
Avi Kivity	5e28bf9b5c	Merge 'Do not try to balance tablets on nodes which are known to be down' from Pavel Emelyanov Tablet transition would get stuck anyway for such nodes, so it's not worth trying refs: #16372 (not fixes, because there's also repair transitions with same problem) Closes scylladb/scylladb#17796 * github.com:scylladb/scylladb: topology_coordinator: Skip dead nodes when balancing tablets test: Add test for load_balancer skiplist tablet_allocator: Add skiplist to load_balancer	2024-03-14 18:47:51 +02:00
Avi Kivity	0f188f2d9f	Merge 'tools/scylla-nodetool: implement the status command' from Botond Dénes The status command has an extensive amount of requests to the server. To be able to handle this more easily, the rest api mock server is refactored extensively to be more flexible, accepting expected requests out-of-order. While at it, the rest api mock server also moves away from a deprecated `aiohttp` feature: providing custom router argument to the `aiohttp` app. This forces us to pre-register all API endpoints that any test currently uses, although due to some templateing support, this is not as bad as it sounds. Still, this is an annoyance, but this point we have implemented almost all commands, so this won't be much a of a problem going forward. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#17547 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the status command test/nodetool: rest_api_mock.py: match requests out-of-order test/nodetool: rest_api_mock.py: remove trailing / from request paths test/nodetool: rest_api_mock.py: use static routes test/nodetool: check only non-exhausted requests tools/scylla-nodetool: repair: set the jobThreads request parameter	2024-03-14 18:42:54 +02:00
Kamil Braun	5ef47c42b3	Merge 'remove_rpc_client_with_ignored_topology: recreate rpc client earlier' from Petr Gusev It's too late to call `remove_rpc_client_with_ignored_topology` on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to `write_both_read_new`. The `remove_rpc_client_with_ignored_topology` function shutdowns sockets and causes such requests to timeout. In this PR we move the `remove_rpc_client_with_ignored_topology` call to the earliest point possible when a node first appears in `token_metadata.topology`. From the topology coordinator perspective this happens when a joining node moves to `node_state::bootstrapping` and the topology moves to `transition_state::join_group0`. In `sync_raft_topology_nodes` the node should be contained in transition_nodes. The successful `wait_for_ip` before entering `transition_state::join_group0` ensures that update_topology should find a node's IP and put it into the topology. The barrier in `commit_cdc_generation` will ensure that all nodes in the cluster are using the proper connection parameters. Only outgoing connections are tracked by `remove_rpc_client_with_ignored_topology`, those created by the current node. This means we need to call `remove_rpc_client_with_ignored_topology` on each node of the cluster. fixes scylladb/scylladb#17445 Closes scylladb/scylladb#17757 * github.com:scylladb/scylladb: test_remove_rpc_client_with_pending_requests: add a regression test remove_rpc_client_with_ignored_topology: call it earlier storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined	2024-03-14 17:20:59 +01:00
Yaniv Kaul	a2ac80340f	Typo: pint -> print Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#17804	2024-03-14 15:50:35 +02:00
Wojciech Mitros	59d5bfa742	mv: fail base writes instead of dropping view updates when overloaded Since `4c767c379c` we can reach a situation where we know that we have admitted too many expensive view update operations and the mechanism of dropping the following view updates can be triggerred in a wider range of scenarios. Ideally, we would want to fail whole requests on the coordinator level, but for now, we change the behavior to failing just the base writes. This allows us to avoid creating inconsistencies between base replicas and views at the cost of introducing inconsistencies between different base replicas. This, however, can be fixed by repair, in contrast to base-view inconsistencies which we don't have a good method of fixing. Fixes #17795 Closes scylladb/scylladb#17777	2024-03-14 15:11:45 +02:00
Aleksandra Martyniuk	43ef6e6ab9	test: fix regular compaction tasks check Since `6b87778` regular compaction tasks are removed from task manager immediately after they are finished. test_regular_compaction_task lists compaction tasks and then requests their statuses. Only one regular compaction task is guaranteed to still be running at that time, the rest of them may finish before their status is requested and so it will no longer be in task manager, causing the test to fail. Fix statuses check to consider the possibility of a regular compaction task being removed from task manager. Fixes: #17776. Closes scylladb/scylladb#17784	2024-03-14 14:40:18 +02:00
Piotr Smaron	ad2d039e3d	db: move all group 0 tables to schema commitlog This is to have durability for the group0 tables. But also because I need it specifially to make `system.topology` & `system_schema.scylla_keyspaces` mutations under a single raft command in https://github.com/scylladb/scylladb/pull/16723 Fixes: #15596 Closes scylladb/scylladb#17783	2024-03-14 13:33:30 +01:00
Piotr Dulikowski	2d9e78b09a	gossiper: failure detector: don't handle directly removed live endpoints Commit `0665d9c346` changed the gossiper failure detector in the following way: when live endpoints change and per-node failure detectors finish their loops, the main failure detector calls gossiper::convict for those nodes which were alive when the current iteration of the main FD started but now are not. This was changed in order to make sure that nodes are marked as down, because some other code in gossiper could concurrently remove nodes from the live node lists without marking them properly. This was committed around 3 years ago and the situation changed: - After `75d1dd3a76` the `endpoint_state::_is_alive` field was removed and liveness of a node is solely determined by its presence in the `gossiper::_live_endpoints` field. - Currently, all gossiper code which modifies `_live_endpoints` takes care to trigger relevant callback. The only function which modifies the field but does not trigger notifications is `gossiper::evict_from_membership`, but it is either called after `gossiper::remove_endpoint` which triggers callbacks by itself, or when a node is already dead and there is no need to trigger callbacks. So, it looks like the reasons it was introduced for are not relevant anymore. What's more important though is that it is involved in a bug described in scylladb/scylladb#17515. In short, the following sequence of events may happen: 1. Failure detector for some remote node X decides that it was dead long enough and `convict`s it, causing live endpoints to be updated. 2. The gossiper main loop sends a successful echo to X and decides to mark it as alive. 3. At the same time, failure detector for all nodes other than X finish and main failure detector continues; it notices that node X is not alive (because it was convicted in point 1.) and decides to convict it. 4. Actions planned in 2 and 3 run one after another, i.e. node is first marked as alive and then immediately as dead. This causes `on_alive` callbacks to run first and then `on_dead`. The second one is problematic as it closes RPC connections to node X - in particular, if X is in the process of replacing another node with the same IP then it may cause the replace operation to fail. In order to simplify the code and fix the bug - remove the piece of logic in question. Fixes: scylladb/scylladb#17515 Closes scylladb/scylladb#17754	2024-03-14 13:29:17 +01:00
Botond Dénes	d6103dc1b6	tools/scylla-nodetool: snapshot: handle ks.tbl positional args correctly Nodetool currently assumes that positional arguments are only keyspaces. ks.tbl pairs are only provided when --kt-list or friends are used. This is not the case however. So check positional args too, and if they look like ks.tbl, handle them accordingly. While at it, also make sure that alternator keyspace and tables names are handled correctly. Closes scylladb/scylladb#17480	2024-03-14 13:42:23 +02:00
Avi Kivity	dd76e1c834	Merge 'Simplify error_injection::inject_with_handler()' from Pavel Emelyanov The method in question can have a shorter name that matches all other injections in this class, and can be non-template Closes scylladb/scylladb#17734 * github.com:scylladb/scylladb: error_injection: De-template inject() with handler error_injection: Overload inject() instead of inject_with_handler()	2024-03-14 13:37:54 +02:00
Petr Gusev	2783985bb2	test_remove_rpc_client_with_pending_requests: add a regression test This test reproduces the problem from scylladb/scylladb#17445. It fails quite reliably without the fix from the previous commit. The test just bootstraps a new node while bombarding the cluster with read requests.	2024-03-14 15:17:34 +04:00
Petr Gusev	398e14d6d0	remove_rpc_client_with_ignored_topology: call it earlier In this commit we move the remove_rpc_client_with_ignored_topology call to the earliest point possible - when a node first appears in token_metadata.topology. From the topology coordinator perspective this happens when a joining node moves to node_state::bootstrapping and the topology moves to transition_state::join_group0. In sync_raft_topology_nodes the node should be contained in transition_nodes. The successful wait_for_ip before entering transition_state::join_group0 ensures that update_topology should find a node's IP and put it into the topology. The barrier in commit_cdc_generation will ensure that all nodes in the cluster are using the proper connection parameters. Only outgoing connections are tracked by remove_rpc_client_with_ignored_topology, those created by the current node. This means we need to call remove_rpc_client_with_ignored_topology on each node of the cluster. fixes scylladb/scylladb#17445	2024-03-14 15:10:09 +04:00
Petr Gusev	1b9f21314f	storage_service: decouple remove_rpc_client_with_ignored_topology from notify_joined It's too late to call remove_rpc_client_with_ignored_topology on messaging service when a node becomes normal. Data plane requests can be routed to the node much earlier, at least when topology switches to write_both_read_new. The remove_rpc_client_with_ignored_topology function shutdowns sockets and causes such requests to timeout. We intend to call remove_rpc_client_with_ignored_topology as soon as a node becomes part of token_metadata topology. In this preparatory commit we refactor storage_service::notify_joined. We remove the remove_rpc_client_with_ignored_topology call from it call it separately from the two call sites of notify_joined.	2024-03-14 15:10:09 +04:00
Kefu Chai	ce17841860	tools/scylla-nodetool: print bpo::options_description with fmt::streamed before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, since boost::program_options::options_description is defined by boost.program_options library, and it only provides the operator<< overload. we're inclined to not specializing `fmt::formatter` for it at this moment, because * this class is not in defined by scylla project. we would have to find a home for this formatter. * we are not likely to reuse the formatter in multiple places so, in this change we just print it using `fmt::streamed`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17791	2024-03-14 10:44:32 +02:00
Pavel Emelyanov	33d258528e	topology_coordinator: Skip dead nodes when balancing tablets The coordinator can find out which nodes are marked as DOWN, thus when calling tablets balancer it can feed it a skiplist Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-14 10:51:11 +03:00
Pavel Emelyanov	ee55e8442a	test: Add test for load_balancer skiplist The test is inspired by the test_load_balancing_with_empty_node one and verifies that when a node is skiplisted, balancer doesn't put load on it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-14 10:50:21 +03:00
Pavel Emelyanov	b4dd732dab	tablet_allocator: Add skiplist to load_balancer Currently load balancer skips nodes only based on its "administrative" state, i.e. whether it's drained/decommissioned/removed/etc. There's no way to exclude any node from balancing decision based on anything else. This patch add this ability by adding skiplist argument to balance_tablets() method. When a node is in it, it will not be considered, as if it was removenode-d. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-14 10:47:31 +03:00
Kefu Chai	926fe29ebd	db: commitlog: add fmt::formatter for commitlog types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * db::commitlog::segment::cf_mark * db::commitlog::segment_manager::named_file * db::commitlog::segment_manager::dispose_mode * db::commitlog::segment_manager::byte_flow<T> please note, the formatter of `db::commitlog::segment` is not included in this commit, as we are formatting it in the inline definition of this class. so we cannot define the specialization of `fmt::formatter` for this class before its callers -- we'd either use `format_as()` provided by {fmt} v10, or use `fmt::streamed`. either way, it's different from the theme of this commit, and we will handle it in a separated commit. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17792	2024-03-14 09:28:12 +02:00
Botond Dénes	20d5c536b5	tools/scylla-nodetool: implement the status command Contrary to Origin, the single-token case is not discriminated in the native implementation, for two reasons: * ScyllaDB doesn't ever run with a single token, it is even moving away from vnodes. * Origin implemented the logic to detect single-token with a mistake: it compares the number of tokens to the number of DCs, not the number of nodes. Another difference is that the native implementation doesn't request ownership information when a keyspace argument was not provided -- it is not printed anyway.	2024-03-14 03:27:04 -04:00
Botond Dénes	2d4f4cfad4	test/nodetool: rest_api_mock.py: match requests out-of-order In the previous patch, we made matching requests to different endpoints be matched out-of-order. In this patch we go one step further and make matching requests to the same endpoint match out-of-order too. With this, tests can register the expected requests in any order, not in the same order as the nodetool-under-test is expected to send them. This makes testing more flexible. Also, how requests are ordered is not interesting from the correctness' POV anyway.	2024-03-14 03:27:04 -04:00
Botond Dénes	09a27f49ea	test/nodetool: rest_api_mock.py: remove trailing / from request paths The legacy nodetool likes to append an "/" to the requests paths every now and then, but not consistently. Unfortunately, request path matching in the mock rest server and in aiohttp is quite sensitive to this currently. Reduce friction by removing trailing "/" from paths in the mock api, allowing paths to match each other even if one has a trailing "/" but the other doesn't. Unfortunately there is nothing we can do about the aiohttp part, so some API endpoints have to be registered with a trailing "/".	2024-03-14 03:27:04 -04:00
Botond Dénes	5659f23b2a	test/nodetool: rest_api_mock.py: use static routes The mock server currently provides its own router to the aiohttp.web app. The ability to provide custom routers however is deprecated and can be removed at any point. So refactor the mock server to use the built-in router. This requires some changes, because the built-in router does not allow adding/removing routes once the server starts. However the mock server only learns of the used routes when the tests run. This unfortunately means that we have to statically register all possible routes the tests will use. Fortunately, aiohttp has variable route support (templated routes) and with this, we can get away with just 9 statically registered routes, which is not too bad. A (desired) side-effect of this refactoring is that now requests to different routes do not have to arrive in order. This constraint of the previous implementation proved to be not useful, and even made writing certain tests awkward.	2024-03-14 03:27:04 -04:00
Botond Dénes	061bd89957	test/nodetool: check only non-exhausted requests Refactor how the tests check for expected requests which were never invoked. At the end of every test, the nodetool fixture requests all unconsumed expected requests from the rest_api_mock.py and checks that there is none. This mechanism has some interaction with requests which have a "multiple" set: rest_api_mock.py allows registering requests with different "multiple" requirements -- how many times a request is expected to be invoked: * ANY: [0, +inf) * ONE: 1 * MULTIPLE: [1, +inf) Requests are stored in a stack. When a request arrives, we pop off requests from the top until we find a perfect match. We pop off requests, iff: multiple == ANY \|\| multiple == MULTIPLE and was hit at least once. This works as long as we don't have an multiple=ANY request at the bottom of the stack which is never invoked. Or a multiple=MULTIPLE one. This will get worse once we refactor requests to be not stored in a stack. So in this patch, we filter requests when collecting unexhausted ones, dropping those which would be qualified to be popped from the stack.	2024-03-14 03:27:04 -04:00
Botond Dénes	be5a18c07d	tools/scylla-nodetool: repair: set the jobThreads request parameter Although ScyllaDB ignores this request parameter, the Java nodetools sets it, so it is better to have the native one do the same for symmetry. It makes testing easier. Discovered with the more strict request matching introduced in the next patches.	2024-03-14 03:26:13 -04:00
Benny Halevy	b4245bf46e	cql-pytest: test_tablets: add test_tablets_are_dropped_when_dropping_table Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 09:01:30 +02:00
Asias He	9d41fb9bcd	repair: Add hosts and ignore_nodes option support for tablet repair It is not supported currently. If a user passes the option, the request will be rejected with: The hosts option is not supported for tablet repair The ignore_nodes option is not supported for tablet repair This option is useful to select nodes to repair. Fixes: #17742 Tests: repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes repair_additional_test.py::TestRepairAdditional::test_repair_ignore_nodes_errors repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_dc_host Closes scylladb/scylladb#17767	2024-03-14 08:40:30 +02:00
Benny Halevy	b73aaee5e4	tablet_allocator: on_before_drop_column_family: remove unused result variable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-14 08:34:02 +02:00
Avi Kivity	c1d8a1dda5	Merge 'Fix false-positive errors in scrub validate-mode' from Botond Dénes The new MX-native validator, which validates the index in tandem with the data file, was discovered to print false-positive errors, related to range-tombstones and promoted-index positions. This series fixes that. But first, it refactors the scrub-related tests. These are currently dominated by boiler-plate code. They are hard to read and hard to write. In the first half of the series, a new `scrub_test` is introduced, which moves all the boiler-plate to a central place, allowing the tests to focus on just the aspect of scrub that is tested. Then, all the found bugs in validate are fixed and finally a new test, checking validate with valid sstable is introduced. Fixes: #16326 Closes scylladb/scylladb#16327 * github.com:scylladb/scylladb: test/boost/sstable_compaction_test: add validation test with valid sstable sstablex/mx/reader: validate(): print trace message when finishing the PI block sstablex/mx/reader: validate(): make index-data PI position check message consistent sstablex/mx/reader: validate(): only load the next PI block if current is exhausted sstablex/mx/reader: validate(): reset the current PI block on partition-start sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked sstablex/mx/reader: validate(): fix validator for range tombstone end bounds test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper test/boost/sstable_compaction_test: fix indentation test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test test/boost/sstable_compaction_test: introduce scrub_test_framework test/lib/random_schema: add uncompatible_timestamp_generator()	2024-03-13 20:51:30 +02:00
Kefu Chai	15bea069a9	docs: use less slangy language this is a follow-up change of `1519904fb9`, to incorporate the comment from Anna Stuchlik. Signed-off-by: Anna Stuchlik <anna.stuchlik@scylladb.com> Closes scylladb/scylladb#17671	2024-03-13 13:33:37 +02:00
Avi Kivity	4db4b2279c	Merge 'tools/scylla-nodetool: implement the last batch of commands' from Botond Dénes This PR implements the following new nodetool commands: * netstats * tablehistograms/cfhistograms * proxyhistograms All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#17651 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the proxyhistograms command tools/scylla-nodetool: implement the tableshistograms command tools/scylla-nodetool: introduce buffer_samples utils/estimated_histogram: estimated_histogram: add constructor taking buckets tools/scylla-nodetool: implement the netstats command tools/scylla-nodetool: add correct units to file_size_printer	2024-03-13 12:46:11 +02:00
Avi Kivity	e120ba3514	sstables: partition_index_cache: evict entries within a page gently When the partition_index_cache is evicted, we yield for preemption between pages, but not within a page. Commit `3b2890e1db` ("sstables: Switch index_list to chunked_vector to avoid large allocations") recognized that index pages can be large enough to overflow a 128k alignment block (this was before the index cache and index entries were not stored in LSA then). However, it did not go as far as to gently free individual entries; either the problem was not recognized or wasn't as bad. As the referenced issue shows, a fairly large stall can happen when freeing the page. The workload had a large number of tombstones, so index selectivity was poor. Fix by evicting individual rows gently. The fix ignores the case where rows are still references: it is unlikely that all index pages will be referenced, and in any case skipping over a referenced page takes an insignificant amount of time, compared to freeing a page. Fixes #17605 Closes scylladb/scylladb#17606	2024-03-13 10:44:37 +01:00
Marcin Maliszkiewicz	7b60752e47	test: fix cql connection problem in test_auth_raft_command_split This is a speculative fix as the problem is observed only on CI. When run_async is called right after driver_connect and get_cql it fails with ConnectionException('Host has been marked down or removed'). If the approach proves to be succesfull we can start to deprecate base get_cql in favor of get_ready_cql. It's better to have robust testing helper libraries than try to take care of it in every test case separately. Fixes #17713 Closes scylladb/scylladb#17772	2024-03-13 10:36:51 +01:00
Pavel Emelyanov	4d83a8c12c	topology_coordinator: Mark constant class methods with const Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17756	2024-03-13 10:23:39 +02:00
Pavel Emelyanov	2e982df898	test/tablets: Generalize repair history loading Two repair test cases verify that repair generated enough rows in the history table. Both use identical code for that, worth generalizing Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17761	2024-03-13 10:22:57 +02:00
Pavel Emelyanov	88a40b0dfa	uuid: UUID_gen::get_UUID src argument is const pointer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17762	2024-03-13 10:21:25 +02:00
Botond Dénes	53e3325845	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * mutation_partition_v2::printer * frozen_mutation::printer * mutation their operator<<:s are dropped. Refs #13245 Closes scylladb/scylladb#17769 * github.com:scylladb/scylladb: mutation: add fmt::formatter for mutation mutation: add fmt::formatter for frozen_mutation::printer mutation: add fmt::formatter for mutation_partition_v2::printer	2024-03-13 10:13:09 +02:00
Pavel Emelyanov	488404e080	gms: Remove unused i_failure_detection_event_listener Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17765	2024-03-13 09:33:56 +02:00
Kefu Chai	fb4f48b4ed	schema: add fmt::formatter for schema before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * column_definition * column_mapping * ordinal_column_id * raw_view_info * schema * view_ptr their operator<<:s are dropped. but operator<< for schema is preserved, as we are still printing `seastar::lw_shared_ptr<const schema>` with our homebrew generic formatter for `seastar::lw_shared_ptr<>`, which uses operator<< to print the pointee. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17768	2024-03-13 09:29:00 +02:00
Kefu Chai	85c4034495	.git: skip redis/lolwut.cc when scanning spelling errors codespell reports "Nees" should be "Needs" but "Nees" is the last name of Georg Nees. so it is not a misspelling. can should not be fixed. since the purpose of lolwut.cc is to display Redis version and print a generative computer art. the one included by our version was created by Georg Nees. since the LOLWUT command does not contain business logic connected with scylladb, we don't lose a lot if skip it when scanning for spelling errors. so, in this change, let's skip it, this should silence one more warning from the github codespell workflow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17770	2024-03-13 09:25:58 +02:00
Michał Chojnowski	75864e18a2	open-coredump.sh: respect http redirects downloads.scylladb.com recently started redirecting from http to https (via `301 Moved Permanently`). This broke package downloading in open-coredump.sh. To fix this, we have to instruct curl to follow redirects. Closes scylladb/scylladb#17759	2024-03-13 08:57:04 +02:00
Pavel Emelyanov	d90db016bf	treewide: Use partition_slice::is_reversed() Continuation of `cc56a971e8`, more noisy places detected Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17763	2024-03-13 08:52:46 +02:00
Botond Dénes	a329cc34b7	tools/scylla-nodetool: implement the proxyhistograms command	2024-03-13 02:06:30 -04:00
Botond Dénes	a52eddc9c1	tools/scylla-nodetool: implement the tableshistograms command	2024-03-13 02:06:30 -04:00
Botond Dénes	151fb5a53b	tools/scylla-nodetool: introduce buffer_samples Based on Origin's org.apache.cassandra.tools.NodeProbe.BufferSamples. To be used to qunatile time latency histogram samples.	2024-03-13 02:06:30 -04:00
Botond Dénes	47ac7d70e4	utils/estimated_histogram: estimated_histogram: add constructor taking buckets And bucket offsets. Allows constructing the histogram back from a json format.	2024-03-13 02:06:30 -04:00
Botond Dénes	006bc84761	tools/scylla-nodetool: implement the netstats command	2024-03-13 02:06:10 -04:00
Botond Dénes	ec7e1a2e92	tools/scylla-nodetool: add correct units to file_size_printer When printing human-readable file-sizes, the Java nodetool always uses base-2 steps (1024) to arrive at the human-readable size, but it uses the base-10 units (MB) and base-2 units (MiB) interchangeably. Adapt file_size_printer to support both. Add a flag to control which is used.	2024-03-13 02:05:22 -04:00
Kefu Chai	2d319fa789	mutation: add fmt::formatter for mutation before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation. but its operator<< is preserved, as we are still using our homebrew generic formatter for printing `std::vector<mutation>`, and this formatter is using operator<< for printing the elements in vector. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 11:07:42 +08:00
Kefu Chai	acd14f12f0	mutation: add fmt::formatter for frozen_mutation::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for frozen_mutation::printer, and drop its operator. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 10:47:22 +08:00
Kefu Chai	94d25e02ad	mutation: add fmt::formatter for mutation_partition_v2::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation_partition_v2::printer, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 10:47:09 +08:00
Asias He	f74053af40	repair: Add dc option support for tablet repair This patch adds the dc option support for table repair. The management tool can use this option to select nodes in specific data centers to run repair. Fixes: #17550 Tests: repair_additional_test.py::TestRepairAdditional::test_repair_option_dc Closes scylladb/scylladb#17571	2024-03-12 22:19:50 +02:00
Ferenc Szili	1da5b3033e	scylla-nodetool: check for missing keyspace argument on describering Calling scylla-nodetool with option describering and ommiting the keyspace name argument results in a boost exception with the following error message: error running operation: boost::wrapexcept<boost::bad_any_cast> (boost::bad_any_cast: failed conversion using boost::any_cast) This change checks for the missing keyspace and outputs a more sensible error message: error processing arguments: keyspace must be specified Closes scylladb/scylladb#17741	2024-03-12 21:19:11 +02:00
Avi Kivity	f410038296	Merge 'Use do_with_cql_env_thread() helper in storage proxy test' from Pavel Emelyanov Just a cleanup -- replace do_with_cql_env + async with do_with_cql_env_thread Closes scylladb/scylladb#17758 * github.com:scylladb/scylladb: test/storage_proxy: Restore indentation after previous patch test/storage_proxy: Use do_with_cql_env_thread()	2024-03-12 20:23:40 +02:00
Pavel Emelyanov	34477ad98e	test/storage_proxy: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-12 19:10:44 +03:00
Pavel Emelyanov	fd112446c2	test/storage_proxy: Use do_with_cql_env_thread() One of the test cases explicitly wraps itself into async, but there's a convenience helper for that already. Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-12 19:10:33 +03:00
Botond Dénes	2335f42b2b	test/boost/sstable_compaction_test: add validation test with valid sstable Add a positive test, as it turns out we had some false-positive validation bugs in the validator and we need a regression test for this.	2024-03-12 11:05:18 -04:00
Botond Dénes	a19a2d76c9	sstablex/mx/reader: validate(): print trace message when finishing the PI block	2024-03-12 11:05:18 -04:00
Botond Dénes	677be168c4	sstablex/mx/reader: validate(): make index-data PI position check message consistent The message says "index-data" but when printing the position, the data position is printed first, causing confusion. Fix this and while at it, also print the position of the partition start.	2024-03-12 11:05:18 -04:00
Botond Dénes	5bff7c40d3	sstablex/mx/reader: validate(): only load the next PI block if current is exhausted The validate() consumes the content of partitions in a consume-loop. Every time the consumer asks for a "break", the next PI block is loaded and set on the validator, so it can validate that further clustering elements are indeed from this block. This loop assumed the consumer would only request interruption when the current clustering block is finished. This is wrong, the consumer can also request interruption when yielding is needed. When this is the case, the next PI block doesn't have to be loaded yet, the current one is not exhausted yet. Check this condition, before loading the next PI block, to prevent false positive errors, due to mismatched PI block and clustering elements from the sstable.	2024-03-12 11:05:18 -04:00
Botond Dénes	e073df1dbb	sstablex/mx/reader: validate(): reset the current PI block on partition-start It is possible that the next partition has no PI and thus there won't be a new PI block to overwrite the old one. This will result in false-positive messages about rows being outside of the finished PI block.	2024-03-12 11:05:18 -04:00
Botond Dénes	2737899c21	sstablex/mx/reader: validate(): consume_range_tombstone(): check for finished clustering blocked Promoted index entries can be written on any clustering elements, icluding range tombstones. So the validating consumer also has the check whether the current expected clustering block is finished, when consuming a range tombstone. If it is, consumption has to be interrupted, so that the outer-loop can load up the next promoted index block, before moving on to the next clustering element.	2024-03-12 11:05:18 -04:00
Botond Dénes	f46b458f0d	sstablex/mx/reader: validate(): fix validator for range tombstone end bounds For range tombstone end-bounds, the validate_fragment_order() should be passed a null tombstone, not a disengaged optional. The latter means no change in the current tombstone. This caused the end bound of range tombstones to not make it to the validator and the latter complained later on partition-end that the partition has unclosed range tombstone.	2024-03-12 11:05:18 -04:00
Botond Dénes	8be97884ec	test/boost/sstable_compaction_test: drop write_corrupt_sstable() helper It is not used anymore.	2024-03-12 11:05:18 -04:00
Botond Dénes	da0f4d3a9f	test/boost/sstable_compaction_test: fix indentation	2024-03-12 11:05:18 -04:00
Botond Dénes	c35092aff6	test/boost/sstable_compaction_test: use test_scrub_framework in test_scrub_quarantine_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	3f76aad609	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_segregate_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	5237e8133b	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_skip_mode_test The test becomes a lot shorter and it now uses random schema and random data. The test is also split in two: one test for abort mode and one for skip mode. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	76785baf43	test/boost/sstable_compaction_test: use scrub_test_framework in sstable_scrub_validate_mode_test The test becomes a lot shorter and it now uses random schema and random data. Indentation is left broken, to be fixed in a future patch.	2024-03-12 11:05:18 -04:00
Botond Dénes	b6f0c4efa0	test/boost/sstable_compaction_test: introduce scrub_test_framework Scrub tests require a lot of boilerplate code to work. This has a lot of disadvantages: * Tests are long * The "meat" of the test is lost between all the boiler-plate, it is hard to glean what a test actually does * Tests are hard to write, so we have only a few of them and they test multiple things. * The boiler-plate differs sligthly from test-to-test. To solve this, this patch introduces a new class, `scrub_test_frawmework`, which is a central place for all the boiler-plate code needed to write scrub-related tests. In the next patches, we will migrate scrub related tests to this class.	2024-03-12 11:05:18 -04:00
Botond Dénes	e412673c44	test/lib/random_schema: add uncompatible_timestamp_generator() Guarantees that produced mutations will not be compactible.	2024-03-12 11:05:18 -04:00
Pavel Emelyanov	3a734facc7	view_builder: Complete build step early if reader produces nothing Builder works in "steps". Each step runs for a given base table, when a new view is created it either initiates a step or appends to currently running step. Running a step means reading mutations from local sstables reader and applying them to all views that has jumped into this step so far. When a view is added to the step it remembers the current token value the step is on. When step receives end-of-stream it rewinds to minimal-token. Rewinding is done by closing current reader and creating a new one. Each time token is advanced, all the views that meet the new token value for the second time (i.e. -- scan full round) are marked as built and are removed from step. When no views are left on step, it finishes. The above machinery can break when rewinding the end-of-stream reader. The trick is that a running step silently assumes that if the reader once produced some token (and there can be a view that remembered this token as its starting one), then after rewinding the reader would generate the same token or greater. With tablets, however, that's not the case. When a node is decommissioned tablets are cleaned and all sstables are removed. Rewinding a reader after it makes empty reader that produces no tokens from now on. Respectively, any build steps that had captured tokens prior to cleanup would get stuck forever. The fix is to check if the mutation consumer stepped at least one step forward after rewind, and if no -- complete all the attached views. fixes: #17293 Similar thing should happen if the base table is truncated with views being built from it. Testing it steps on compaction assertion elsewhere and needs more research. refs: #17543 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17548	2024-03-12 14:58:47 +02:00
Kefu Chai	69f140eea6	test.py: s/summarize_tests/summarize_boost_tests/ summarize_tests() is only used to summarize boost tests, so reflect this fact using its name. we will need to summarize the tests which generate JUnit XML as well, so this change also prepares for a following-up change to implement a new summarize helper. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17746	2024-03-12 14:49:01 +02:00
Pavel Emelyanov	def5fed619	api: Fix stats reported for row cache Here are three endpoints in the api/cache_service that report "metrics" for the row cache and the values they return - entries: number of partitions - size: number of partitions - capacity: used space The size and capacity seem very inaccurate. Comment says, that in C* the size should be weighted, but scylla doesn't support weight of entries in cache. Also, capacity is configurable via row_cache_size_in_mb config option or set_row_cache_capacity_in_mb API call, but Scylla doesn't support both either. This patch suggestes changing return values for size and capacity endpoints. Despite row cache doesn't support weights, it's natural to return used_space in bytes as the value, which is more accurate to what "size" means rather than number of entries. The capacity may return back total memory size, because this is what Scylla really does -- row cache growth is only limited by other memory consumers, not by configured limits. fixes: #9418 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17724	2024-03-12 13:44:59 +02:00
Pavel Emelyanov	a755914265	test/cql_query_test: Use string_view by value The test carries const std::string_view& around, but the type is lightweight class that can be copied around at the same cost as its reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17735	2024-03-12 13:44:04 +02:00
Kefu Chai	17fe4a6439	view_info: add fmt::formatter for view_info before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `view_info`, its operator<< is dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17745	2024-03-12 13:28:27 +02:00
Botond Dénes	f3735dc8e0	Merge 'utils: add fmt::formatter for utils types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * utils::human_readable_value * std::strong_ordering * std::weak_ordering * std::partial_ordering * utils::exception_container Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17710 * github.com:scylladb/scylladb: utils/exception_container: add fmt::formatter for exception_container utils/human_readable: add fmt::formatter for human_readable_value utils: add fmt::formatter for std::strong_ordering and friends	2024-03-12 13:27:37 +02:00
Botond Dénes	8e90b856b5	Merge 'Extend test.py's ability to select test cases' from Pavel Emelyanov This PR fixes comments left from #17481 , namely - adds case selection to boost suite - describes the case selection in documentation Closes scylladb/scylladb#17721 * github.com:scylladb/scylladb: docs: Add info about the ability to run specific test case test.py: Support case selection for boost tests	2024-03-12 13:21:50 +02:00
Kefu Chai	9c1d517bcc	data_dictionary: drop unused friend declaration the corresponding implementation of operator<< was dropped in `a40d3fc25b`, so there is no needs to keep this friend declaration anymore. also, drop `include <ostream>`, as this header does not reference any of the ostream types with the change above. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17743	2024-03-12 09:45:15 +02:00
Kefu Chai	af3b69a4d1	Update seastar submodule * seastar 5d3ee980...a71bd96d (51): > util: add formatter for optimized_optional<> > build: search protobuf using package config > reactor: Move pieces of scollectd to scollectd > reactor: Remove write-only task_queue._current > Add missing include in tests/unit/rpc_test.cc > doc/io_tester.md: include request_type::unlink in the docs > doc/io-tester.md: update obsolete information in io_tester docs > io_tester/conf.yaml: include an example of request_type::unlink job > io_tester: implement request_type::unlink > reactor: Print correct errno on io_submit failure > src/core/reactor.cc: qualify metric function calls with "sm::" > build: add shard_id.hh to seastar library > thread: speed up thread creation in debug mode > include: add missing modules.hh import to shard_id.hh > prometheus: avoid ambiguity when calling MetricFamily.set_name() > util/log: add formatter for log_level > util/log: use string_view for log_level_names > perf: Calculate length of name column in perf tests > rpc_test: add a test for inter-compressor communication > rpc: in multi_algo_compressor_factory, propagate send_empty_frame > rpc: give compressors a way to send something over the connection > rpc: allow (and skip) empty compressed frames > metrics: change value_vector type to std::deque > HACKING.md: remove doc related to test_dist > test/unit: do not check if __cplusplus > 201703L > json_elements: s/foramted/formatted/ > iostream: Refactor input_stream::read_exactly_part > add unit test to verify str.starts_with(str), str.ends_with(str) return true. > str.starts_with(str) and str.ends_with(str) should return true, just like std::string > rpc: Remove FrameType::header_and_buffer_type > rpc: Defuturize FrameType::return_type > rpc: Kill FrameType::get_size() > treewide: put std::invocable<> constraints in template param list > include: do not include unuser headers > rpc: fix a deadlock in connection::send() > iostream: Replace recursion by iteration in input_stream::read_exactly_part > core/bitops.hh: use std::integral when appropriate > treewide: include <concepts> instead of seastar/util/concepts.hh > abortable_fifo: fix the indent > treewide: expand `SEASTAR_CONCEPT` macro > util/concepts: always define SEASTAR_CONCEPT > file: Remove unused thread-pool arg from directory lister > seastar-json2code: collect required_query_params using a list > seastar-json2code: reduce the indent level > seastar-json2code: indent the enum and array elements > seastar-json2code: generate code for enum type using Template > seastar-json2code: extract add_operation() out > reactor: Re-ifdef SIGSEGV sigaction installing > reactor: Re-ifdef reactor::enable_timer() > reactor: Re-ifdef task_histogram_add_task() > reactor: Re-ifdef install_signal_handler_stack() Closes scylladb/scylladb#17714	2024-03-12 09:19:28 +02:00
Botond Dénes	3a7364525f	Merge 'test/alternator: improve metrics tests' from Nadav Har'El This small series improves the Alternator tests for metrics: 1. Improves some comments in the test. 2. Restores a test that was previously hidden by two tests having the same name. 3. Adds tests for latency histogram metrics. Closes scylladb/scylladb#17623 * github.com:scylladb/scylladb: test/alternator: tests for latency metrics test/alternator: improve comments and unhide hidden test	2024-03-12 09:13:17 +02:00
Kefu Chai	35fc065458	utils/exception_container: add fmt::formatter for exception_container before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `exception_container<..>` and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-12 14:53:55 +08:00
Kefu Chai	9300d7b80b	utils/human_readable: add fmt::formatter for human_readable_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `utils::human_readable_value`, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-12 14:53:55 +08:00
Kefu Chai	007d7f1355	utils: add fmt::formatter for std::strong_ordering and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * std::strong_ordering * std::weak_ordering * std::partial_ordering and their operator<<:s are moved to test/lib/test_utils.{hh,cc}, as they are only used by Boost.test. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-12 14:53:55 +08:00
Tomasz Grabiec	47a66d0150	Merge 'Handle tablet migration failure in wrapping-up stages' from Pavel Emelyanov There are four stages left to handle: cleanup, cleanup_target, end_migration and revert_migration. All are handling removed nodes already, so the PR just extends the test. fixes: #16527 Closes scylladb/scylladb#17684 * github.com:scylladb/scylladb: test/tablets_migration: Test revert_migration failure handling test/tablets_migration: Test end_migration failure handling test/tablets_migration: Test cleanup_target failure handling test/tablets_migration: Test cleanup failure handling test/tablets_migration: Prepare for do_... stages test/tablets_migration: Add ability to removenode via any other node test/tablets_migration: Wrap migration stages failing code into a helper class storage_service: Add failure injection to crash cleanup_tablet	2024-03-12 00:20:56 +01:00
Botond Dénes	c6cff53771	reader_concurrency_semaphore: use variable reference for metrics Instead of a functor, for those metrics that just return the value of an existing member variable. This is ever so slightly more efficient than a functor. Closes scylladb/scylladb#17726	2024-03-11 20:47:04 +02:00
Mikołaj Grzebieluch	cb17b4ac59	docs: maintenance socket: add section about accessing maintenance socket Closes scylladb/scylladb#17701	2024-03-11 20:25:00 +02:00
Asias He	ebc0ab94e5	repair: Add ranges option support for tablet repair The management tool, e.g., scylla manager, needs the ranges option to select which ranges to repair on a node to schedule repair jobs. This patch adds ranges option support. E.g., curl -X POST "http://127.0.0.1:10000/storage_service/repair_async/ks1?ranges=-4611686018427387905:-1,4611686018427387903:9223372036854775807" Fixes: #17416 Tests: test_tablet_repair_ranges_selection Closes scylladb/scylladb#17436	2024-03-11 20:03:12 +02:00
Nadav Har'El	d207962e40	test/alternator: tests for latency metrics In test/alternator/test_metrics.py we had tests for the operation-count metrics for different Alternator API operations, but not for the latency histograms for these same operations. So this patch adds the missing tests (and removes a TODO asking to do that). Note that only a subset of the operations - PutItem, GetItem, DeleteItem, UpdateItem, and GetRecords - currently have a latency history, and this test verifies this. We have an issue (Refs #17616) about adding latency histograms for more operations - at which point we will be able to expand this test for the additional operations. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-11 19:26:59 +02:00
Nadav Har'El	970c2dc7a6	test/alternator: improve comments and unhide hidden test The original goal of this patch was to improve comments in test/alternator/test_metrics.py, but while doing that I discovered that one of the test functions was hidden by a second test with the same name! So this patch also renames the second test. The test continues to work after this patch - the hidden test was successful. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-03-11 19:26:59 +02:00
Pavel Emelyanov	0d5c25aef5	error_injection: De-template inject() with handler The recently renamed inject_with_handler() was a template, but it can be symmetrical to its peer that accepts void function as a callback, and use std::function as its argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 19:32:21 +03:00
Pavel Emelyanov	1f44a374b8	error_injection: Overload inject() instead of inject_with_handler() The inject_with_handler() method accepts a coroutine that can be called wiht injection_handler. With such function as an argument, there's no need in distinctive inject_with_handler() name for a method, it can be overload of all the existing inject()-s Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 19:30:19 +03:00
Botond Dénes	7d31093d4b	Merge 'storage_service/ownership: handle requests when tablets are enabled' from Patryk Wróbel Before this change, when user tried to utilize 'storage_service/ownership/{keyspace}' API with keyspace parameter that uses tablets, then internal error was thrown. The code was calling a function, that is intended for vnodes: get_vnode_effective_replication_map(). This commit introduces graceful handling of such scenario and extends the API to allow passing 'cf' parameter that denotes table name. Now, when keyspace uses tablets and cf parameter is not passed a descriptive error message is returned via BAD_REQUEST. Users cannot query ownership for keyspace that uses tablets, but they can query ownership for a table in a given keyspace that uses tablets. Also, new tests have been added to test/rest_api/test_storage_service.py and to test/topology_experimental_raft/test_tablets.py in order to verify the behavior with and without tablets enabled. Fixes: https://github.com/scylladb/scylladb/issues/17342 Closes scylladb/scylladb#17405 * github.com:scylladb/scylladb: storage_service/ownership: discard get_ownership() requests when tablets enabled storage_service/ownership/{keyspace}: handle requests when tablets are enabled locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual locator/tablets: add tablet_map::get_sorted_tokens() pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient rest_api/test_storage_service: add simplistic tests of ownership API for vnodes	2024-03-11 14:55:26 +02:00
Kefu Chai	50c6fc1141	scylla-gdb: use current_scheduling_group_ptr instead of task_queue._current Seastar removed `task_queue::_current` in 258b11220d343d8c7ae1a2ab056fb5e202723cc8 . let's adapt scylla-gdb.py accordingly. despite that `current_scheduling_group_ptr()` is an internal API, it's been around for a while, and relatively stable. so let's use it instead. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17720	2024-03-11 13:13:59 +02:00
Kamil Braun	65b4f754ff	Merge 'gossiper: do_status_check: allow evicting dead nodes from membership with no host_id' from Benny Halevy The short series allows do_status_check to handle down nodes that don't have HOST_ID application state. Fixes #16936 Closes scylladb/scylladb#17024 * github.com:scylladb/scylladb: gossiper: do_status_check: fixup indentation gossiper: do_status_check: allow evicting dead nodes from membership with no host_id gossiper: print the host_id when endpoint state goes UP/DOWN gossiper: get_host_id: differentiate between no endpoint_state and no application_state gms: endpoint_state: add get_host_id gossiper: do_status_check: continue loop after evicting FatClient	2024-03-11 11:21:49 +01:00
Kefu Chai	e1dbfedcdb	service: add fmt::formatter for service/storage_proxy.cc types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for internal types in service/storage_proxy.cc. please note, `service::storage_proxy::remote::read_verb` is extracted out of the outter class, because, the class's implementation formats `read_verb` in this class. so we have to put the formatter at the place where its callers can see. that's why it is moved up and out of `service::storage_proxy::remote`. some of the operator<<:s are preserved, as they are still being used by the existing formatters, for instance, the one for `seastar::shared_ptr<>`, which is used to print `seastar::shared_ptr<service::paxos_response_handler>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17708	2024-03-11 11:52:58 +02:00
Kefu Chai	1ab30fc306	clustering_bounds_comparator: add fmt::formtter for bound_{kind,view} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `bound_kind` and `bound_view`, and drop the latter's operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17706	2024-03-11 11:37:48 +02:00
Botond Dénes	1e7180de57	Update tools/java submodule * tools/java e4878ae7...d61296dc (1): > build.xml: update scylla-driver-core to 3.11.5.2 Closes scylladb/scylladb#17722	2024-03-11 11:36:29 +02:00
Amnon Heiman	8b43609920	alternator: Use summary for shard-level latencies. Shard-level latencies generate a lot of metrics. This patch reduces the the number of latencies reported by Alternator while keeping the same functionality. On the shard level, summaries will be reported instead of histograms. On the instance level, an aggregated histogram will be reported. Summaries, histograms, and counters are marked with skip_when_empty. Fixes #12230 Closes scylladb/scylladb#17581	2024-03-11 11:12:08 +02:00
Patryk Wrobel	9eb91b5526	storage_service/ownership: discard get_ownership() requests when tablets enabled This change introduces a logic, that is responsible for checking if tablets are enabled for any of keyspaces when get_ownership() is invoked. Without it, the result would be calculated based solely on sorted_tokens() which was invalid. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:52:25 +01:00
Patryk Wrobel	51da80da7d	storage_service/ownership/{keyspace}: handle requests when tablets are enabled Before this change, when user tried to utilize 'storage_service/ownership/{keyspace}' API with keyspace parameter that uses tablets, then internal error was thrown. The code was calling a function, that is intended for vnodes: get_vnode_effective_replication_map(). This commit introduces graceful handling of such scenario and extends the API to allow passing 'cf' parameter that denotes table name. Now, when keyspace uses tablets and cf parameter is not passed a descriptive error message is returned via BAD_REQUEST. Users cannot query ownership for keyspace that uses tablets, but they can query ownership for a table in a given keyspace that uses tablets. Also, new tests have been added to test/rest_api/test_storage_service.py and to test/topology_experimental_raft/test_tablets.py in order to verify the behavior with and without tablets enabled. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:52:23 +01:00
Patryk Wrobel	75aadeb32f	locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual Before this patch, the mentioned function was a specific member of vnode_effective_replication_strategy class. To allow its usage also when tablets are enabled it was shifted to the base class - effective_replication_strategy and made pure virtual to force the derived classes to implement it. It is used by 'storage_service::get_ranges_for_endpoint()' that is used in calculation of effective ownership. Such calculation needs to be performed also when tablets are enabled. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Patryk Wrobel	3fff6bd407	locator/tablets: add tablet_map::get_sorted_tokens() This change introudces a new member function that returns a vector of sorted tokens where each pair of adjacent elements depicts a range of tokens that belong to tablet. It will be used to produce the equivalent of sorted_tokens() of vnodes when trying to use dht::describe_ownership() for tablets. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Patryk Wrobel	a39a5b671e	pylib/rest_client.py: add ownership API to ScyllaRESTAPIClient This change adds a member function that can be used to access 'storage_service/ownership' API. It will be used by tests that need to access this API. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Patryk Wrobel	dea76c4763	rest_api/test_storage_service: add simplistic tests of ownership API for vnodes This change is intended to introduce tests for vnodes for the following API paths: - 'storage_service/ownership' - 'storage_service/ownership/{keyspace}' In next patches the logic that is tested will be adjusted to work correctly when tablets are enabled. This is a safety net that ensures that the logic is not broken. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:50:20 +01:00
Kefu Chai	38ae52d5cd	add fmt::formatter for reader_permit::state and reader_resources before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * reader_permit::state * reader_resources Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17707	2024-03-11 09:55:51 +02:00
Kefu Chai	ca7b73f34e	tools/scylla-nodetool: use constexpr for compile-time format check instead of using fmt::runtime format string, use compile-time format string, so that we can have compile-time format check provided by {fmt}. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17709	2024-03-11 09:45:32 +02:00
Pavel Emelyanov	3453a934ba	docs: Add info about the ability to run specific test case The test.py usage is documented, the ability to run a specific test by its name is described in doc. Extend it with the new ability to run specific test case as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 09:10:20 +03:00
Pavel Emelyanov	3afbd21faa	test.py: Support case selection for boost tests Boost tests support case-by-case execution and always turn it on -- when run, boost test is split into parallel-running sub-tests each with the specific case name. This patch tunes this, so that when a test is run like test.py boost/testname::casename No parallel-execution happens, but instead just the needed casename is run. Example of selection: test.py --mode=${mode} boost/bptree_test::test_cookie_find Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 09:09:10 +03:00
Pavel Emelyanov	feae470475	test/tablets_migration: Test revert_migration failure handling This stage is also the error path that starts from write_both_read_old, so check this failure in two steps -- first fail the latter stage in one of the nodes, then fail the former in another. For that one more node in the cluster is needed. Also, to avoid name conflicts, the do_revert_migration pseudo stage name is used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	c3d96b1a86	test/tablets_migration: Test end_migration failure handling This stage is pure barrier. Barriers already take ignored nodes into account, so do the fail-injector, so just wire the stage name into the test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	180446e7b8	test/tablets_migration: Test cleanup_target failure handling This stage is error path, so in order to fail it we need to fail some other stage prior to that. This leads to the testing sequence of 1. fail streaming via source node 2. stop and remove source node to let state machine proceed 3. fail cleanup_target on the destination node 4. stop and remove destination node First thing to note here, is that the test doesn't fail source node for cleanup_target stage, symmetrically to how it does for cleanup stage. Next, since we're removing two nodes, the cluster is equipeed with more nodes nodes to have raft quorum. Finally, since remove of source node doesn't finish until tablet migration finishes, it's impossible to remove destination node via the same node-0, so the 2nd removenode happens via node-3. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	724c79ecf6	test/tablets_migration: Test cleanup failure handling The handling itself is already there -- if the leaving node is excluded the cleanup stage resolves immediately. So just add a code that validates that. Also, skip testing of pending replica failure during cleanup stage, as it doesn't really participate in it any longer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 08:16:13 +03:00
Pavel Emelyanov	ccefb7f21f	test/tablets_migration: Prepare for do_... stages The tablets migration test is parametrized with stage name to inject failure in. Internal class node_failer uses this parameter as is when injecting a failure into scylla barrier handler. Next patch will need to extend the test with revert_migration value and add handling of this name to node_failer class. The node_failer class, in turn, will want to instantiate two other instances of the same class -- one to fail the write_both_read_old stage, and the other one to fail the revert_migration barrier. So internally the class will need to tell revert_migration value as full test parameter from revert_migration as barrier-only parameter. This test adds the ability to add do_ prefix to node_failer parameter to tell full test from barrier-only. When injecting a failure into scylla the do_ prefix needs to be cut off, since scylla still needs to fail the barrier named revert_migration, not do_revert_migration. Also split the long line while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:58 +03:00
Pavel Emelyanov	abbd22cb90	test/tablets_migration: Add ability to removenode via any other node Currently the test calls removenode via node-0 in the cluster, which is always alive. Next test case will need to call removenode on some other node (more details in that patch later). refs: #17681 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:55 +03:00
Pavel Emelyanov	5d3291f322	test/tablets_migration: Wrap migration stages failing code into a helper class One of the next stages will need to use two of them at the same time and it's going to be easier if the failing code is encapsulated. No functional changes here, just large portions of code and local variables are moved into class and its methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:55 +03:00
Pavel Emelyanov	82270e3ec4	storage_service: Add failure injection to crash cleanup_tablet Will be needed by test that verifies how failures in tablets migration stages are handled by state machine Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-11 07:56:55 +03:00
Benny Halevy	9804ce79d8	gossiper: do_status_check: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	1375c4e6a3	gossiper: do_status_check: allow evicting dead nodes from membership with no host_id Be more permissive about the presence of host_id application state for dead and expired nodes in release mode, so do not throw runtime_error in this case, but rather consider them as non-normal token owners. Instead, call on_internal_error_noexcept that will log the internal error and a backtrace, and will abort if abort-on-internal-error is set. This was seen when replacing dead nodes, without https://github.com/scylladb/scylladb/pull/15788 Fixes #16936 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	f32efcb7a6	gossiper: print the host_id when endpoint state goes UP/DOWN The host_id is now used in token_metadata and in raft topology changes so print it when the gossiper marks the node as UP/DOWN. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:17:00 +02:00
Benny Halevy	fbf85ee199	gossiper: get_host_id: differentiate between no endpoint_state and no application_state Currently, we throw the same runtime_error: `Host {} does not have HOST_ID application_state` in both case: where there is no endpoint_state or when the endpoint_state has no HOST_ID application state. The latter case is unexpected, especially after `8ba0decda5` (and also from the add_saved_endpoint path after https://github.com/scylladb/scylladb/pull/15788 is merged), so throw different error in each case so we can tell them apart in the logs. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 20:16:49 +02:00
Benny Halevy	a9fb0cf3dc	gms: endpoint_state: add get_host_id A simpler getter to get the HOST_ID application state from the endpoint_state. Return a null host_id if the application state is not found. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 15:19:51 +02:00
Benny Halevy	234774295e	gossiper: do_status_check: continue loop after evicting FatClient We're seeing cases like #16936: ``` INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - failure_detector_loop: Mark node 127.0.23.4 as DOWN INFO 2024-01-23 02:14:19,915 [shard 0:strm] gossip - InetAddress 127.0.23.4 is now DOWN, status = BOOT INFO 2024-01-23 02:14:27,913 [shard 0: gms] gossip - FatClient 127.0.23.4 has been silent for 30000ms, removing from gossip INFO 2024-01-23 02:14:27,915 [shard 0: gms] gossip - Removed endpoint 127.0.23.4 WARN 2024-01-23 02:14:27,916 [shard 0: gms] gossip - === Gossip round FAIL: std::runtime_error (Host 127.0.23.4 does not have HOST_ID application_state) ``` Since the FatClient timeout handling already evicts the endpoint from memberhsip there is no need to check further if the node is dead and expired, so just co_return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-10 15:19:51 +02:00
Nadav Har'El	af90910687	Merge 'repair: add fmt::formatter for repair types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * repair_hash * read_strategy * streaming::stream_summary and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17711 * github.com:scylladb/scylladb: repair: add fmt::formatter for streaming::stream_summary repair: add fmt::formatter for read_strategy repair: add fmt::formatter for repair_hash	2024-03-10 12:15:15 +02:00
Kefu Chai	5687c289f4	repair: add fmt::formatter for streaming::stream_summary before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for streaming::stream_summary, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-09 23:43:32 +08:00
Kefu Chai	7be93084b3	repair: add fmt::formatter for read_strategy before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for read_strategy, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-09 23:42:19 +08:00
Kefu Chai	39ee8593cb	repair: add fmt::formatter for repair_hash before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for repair_hash. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-09 23:41:58 +08:00
Botond Dénes	9f97d21339	Merge 'Enhance perf-simple-query test' from Pavel Emelyanov While measuring #17149 with this test some changes were applied, here they are - keep initial_tablets number in output json's parameters section - disable auto compaction - add control over the amount of sstables generated for --bypass-cache case Closes scylladb/scylladb#17473 * github.com:scylladb/scylladb: perf_simple_query: Add --memtable-partitions option perf_simple_query: Disable auto compaction perf_simple_query: Keep number of initial tablets in output json	2024-03-08 15:21:04 +02:00
Kefu Chai	079d70145e	raft: add fmt::formatter for raft tracker types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raft::election_tracker * raft::votes * raft::vote_result and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17670	2024-03-08 15:19:37 +02:00
Piotr Smaroń	44bbf2e57b	test.py: improve readability of failures resulting in empty XML Before the change, when a test failed because of some error in the `cql_test_env.cc`, we were getting: ``` error: boost/virtual_table_test: failed to parse XML output '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0 ``` After the change we're getting: ``` error: boost/virtual_table_test: Empty testcase XML output, possibly caused by a crash in the cql_test_env.cc, details: '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0 ``` Closes scylladb/scylladb#17679	2024-03-08 15:17:12 +02:00
Kefu Chai	362a8a777c	partition_snapshot_row_cursor: add fmt::format to this class before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `partition_snapshot_row_cursor`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17669	2024-03-08 15:15:43 +02:00
Botond Dénes	630be97d2f	Merge 'tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring"' from Kefu Chai before this change, "ring" subcommand has two issues: 1. `--resolve-ip` option accepts a boolean argument, but this option should be a switch, which does not accept any argument at all 2. it always prints the endpoint no matter if `--resolve-ip` is specified or not. but it should print the resolved name, instead of an IP address if `--resolve-ip` is specified. in this change, both issues are addressed. and the test is updated accordingly to exercise the case where `--resolve-ip` is used. Closes scylladb/scylladb#17553 * github.com:scylladb/scylladb: tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring" test/nodetool: calc max_width from all_hosts test/nodetool: keep tokens as Host's member test/nodetool: remove unused import	2024-03-08 15:15:19 +02:00
Pavel Emelyanov	fc9fb03b90	cql3: Remove unused cf_name::operator<< Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17686	2024-03-08 15:14:52 +02:00
Nadav Har'El	ba585905e5	Update tools/java submodule * tools/java 5e11ed17...e4878ae7 (2): > nodetool: fix a typo in error message > bin/cassandra-stress: Add extended version info Closes scylladb/scylladb#17680	2024-03-08 15:14:21 +02:00
Kefu Chai	f5f5ff1d51	clustering_interval_set: add fmt::formatter for clustering_interval_set before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `clustering_interval_set` their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17593	2024-03-08 15:13:14 +02:00
Kefu Chai	9b5ec53355	tombstone_gc_options: add fmt::formatter for tombstone_gc_mode before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `tombstone_gc_mode`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17673	2024-03-08 15:12:00 +02:00
Kefu Chai	8ca672a02c	test/pylib: return better error if self.create_server() raises in `ScyllaServer::add_server()`, `self.create_server()` is called to create a server, but if it raises, we would reference a local variable of `server` which is not bound to any value, as `server` is not assigned at that moment. if `ScyllaServer` is used by `ScyllaClusterManager`, we would not be able to see the real exception apart from the error like ``` cannot access local variable 'server' where it is not associated with a value ``` which is but the error from Python runtime. in this change, `server` is always initialized, and we check for None, before dereference it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17693	2024-03-08 15:10:27 +02:00
Kefu Chai	70ef7e63b5	tools: toolchain: prepare: do not bail out when checking for command before this change, if `buildah` is not available in $PATH, this script fails like: ```console $ tools/toolchain/prepare --help tools/toolchain/prepare: line 3: buildah: command not found ``` the error message never gets a chance to show up. as `set -e` in the shebang line just let bash quit. after this change, we check for the existence of buildah, and bail out if it is not available. so, on a machine without buildah around, we now have: ```console $ tools/toolchain/prepare --help install buildah 1.19.3 or later ``` the same applies to "reg". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17697	2024-03-08 15:09:21 +02:00
Botond Dénes	05307d0be9	Merge 'service: add fmt::formatter for service types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * service::fencing_token * service::topology::transition_state * service::node_state * service::topology_request * service::global_topology_request * service::raft_topology_cmd::command * service::paxos::proposal * service::paxos::promise Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17692 * github.com:scylladb/scylladb: service/paxos: add fmt::formatter for paxos::promise service/paxos: add fmt::formatter for paxos::proposal service: add fmt::formatter for topology_state_machine types	2024-03-08 15:06:07 +02:00
Botond Dénes	505f137cc9	Merge 'Make object_store suite use ManagerClient' from Pavel Emelyanov The test cases in this suite need to start scylla with custom config options, restart it and call API on it. By the time the suite was created all this wasn't possible with any library facility, so the suite carries its version of managed_cluster class that piggy-backs cql-pytest scylla starting. Now test.py has pretty flexible manager that provides all the scylla cluster management object_store suite needs. This PR makes the suite use the manager client instead of the home-brew managed_cluster thing refs: #16006 fixes: #16268 Closes scylladb/scylladb#17292 * github.com:scylladb/scylladb: test/object_store: Remove unused managed_cluster (and other stuff) test/object_store: Use tmpdir fixture in flush-retry case test/object_store: Turn flush-retry case to use ManagerClient test/object_store: Turn "misconfigured" case to use ManagerClient test/object_store: Turn garbage-collect case to use ManagerClient test/object_store: Turn basic case to use ManagerClient test/object_store: Prepare to work with ManagerClient	2024-03-08 15:04:46 +02:00
Tomasz Grabiec	85ae10f632	Merge 'Make it possible to run individual pytest cases with test.py' from Pavel Emelyanov Today's test.py allows filtering tests to run with the `test.py --options name` syntax. The "name" argument is then considered to be some prefix, and when iterating tests only those whose name starts with that prefix are collected and executed. This has two troubles. Minor: since it is prefix filtering, running e.g. topology_custom/test_tablets will run test_tablets _and_ test_tablets_migration from it. There's no way to exclude the latter from this selection. It's not common, but careful file names selection is welcome for better ~~user~~ testing experience. Major: most of test files in topology and python suites contain many cases, some are extremely long. When the intent is to run a single, potentially fast, test case one needs to either wait or patch the test .py file by hand to somehow exclude unwanted test cases. This PR adds the ability to run individual test case with test.py. The new syntax is `test.py --options name::case`. If the "::case" part is present two changes apply. First, the test file selection is done by name match, not by prefix match. So running topology_custom/test_tablets will _not_ select test_tablets_migration from it. Second, the "::case" part is appended to the pytest execution so that it collects and runs only the specified test case. Closes scylladb/scylladb#17481 * github.com:scylladb/scylladb: test.py: Add test-case splitting in 'name' selection test.py: Add casename argument to PythonTest	2024-03-08 12:56:39 +01:00
Kamil Braun	ae954fb2ec	test: unflake test_tablets_removenode These tests are inserting data into RF=3 tables, but used the default consistency level which is taken from the default execution profile which is set to LOCAL_QUORUM. The tests would then read with CL=ONE, so we cannot give a guarantee that some of the data won't be missed. Fix this by inserting the data with CL=ALL. (Do it for all RF cases for simplicity.) Fixes scylladb/scylladb#17695 Closes scylladb/scylladb#17700	2024-03-08 12:47:47 +01:00
Benny Halevy	8456967012	tablets: read_tablet_mutations: unfreeze_gently Use co_await unfreeze_gently in the loop body unfreezing each partition mutation to prevent reactor stalls when building group0 snapshot with lots of tablets. Fixes #15303 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17688	2024-03-08 10:52:39 +01:00
Yaron Kaikov	ad842e5ad7	[mergify] Fix worng label and base branch for backport pr This PR contains 2 fixes for mergify config file: 1) When openning a backport PR base branch should be `branch-x.y` 2) Once a commit is promoted, we should add the label `promoted-to-master`, in 5.4 configuraion we were using the wrong label. fixing it Closes scylladb/scylladb#17698	2024-03-08 10:08:09 +01:00
Kamil Braun	76fb902858	test: unflake test_topology_remove_garbage_group0 The test is booting nodes, and then immediately starts shutting down nodes and removing them from the cluster. The shutting down and removing may happen before driver manages to connect to all nodes in the cluster. In particular, the driver didn't yet connect to the last bootstrapped node. Or it can even happen that the driver has connected, but the control connection is established to the first node, and the driver fetched topology from the first node when the first node didn't yet consider the last node to be normal. So the driver decides to close connection to the last node like this: ``` 22:34:03.159 DEBUG> [control connection] Removing host not found in peers metadata: <Host: 127.42.90.14:9042 datacenter1> ``` Eventually, at the end of the test, only the last node remains, all other nodes have been removed or stopped. But the driver does not have a connection to that last node. Fix this problem by ensuring that: - all nodes see each other as NORMAL, - the driver has connected to all nodes at the beginning of the test, before we start shutting down and removing nodes. Fixes scylladb/scylladb#16373 Closes scylladb/scylladb#17676	2024-03-08 10:08:09 +01:00
Mikołaj Grzebieluch	a0915115c3	maintenance_socket: change log message to differentiate from regular CQL ports Scylla-ccm uses function `wait_for_binary_interface` that waits for scylla logs to print "Starting listening for CQL clients". If this log is printed far before the regular cql_controller is initialized, scylla-ccm assumes too early that node is initialized. It can result in timeouts that throw errors, for example in the function `watch_rest_for_alive`. Closes scylladb/scylladb#17496	2024-03-08 10:08:09 +01:00
Nadav Har'El	ea53db379f	Merge 'tools/scylla-nodetool: listsnapshot: make it compatible with origin' from Botond Dénes The following incompatibilities were identified by `listsnapshots_test.py` in dtests: * Command doesn't bail out when there are no snapshots, instead it prints meaningless empty report * Formatting is incompatible Both are fixed in this mini-series. Closes scylladb/scylladb#17541 * github.com:scylladb/scylladb: tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots	2024-03-08 10:08:09 +01:00
Kefu Chai	185b503b73	service/paxos: add fmt::formatter for paxos::promise before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `service::paxos::promise`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-08 14:26:58 +08:00
Kefu Chai	cb6c7bb9bf	service/paxos: add fmt::formatter for paxos::proposal before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `service::paxos::proposal`, but its operator<< is preserved, as it is still used by our generic formatter for std::tuple<> which uses operator<< for printing the elements in it, so operator<< of this class is indirectly used. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-08 14:26:58 +08:00
Kefu Chai	14cb48eb0a	service: add fmt::formatter for topology_state_machine types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * service::fencing_token * service::topology::transition_state * service::node_state * service::topology_request * service::global_topology_request * service::raft_topology_cmd::command Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-08 14:05:45 +08:00
Kefu Chai	de276901f2	tools/scylla-nodetool: print hostname if --resolve-ip is passed to "ring" before this change, "ring" subcommand has two issues: 1. `--resolve-ip` option accepts a boolean argument, but this option should be a switch, which does not accept any argument at all 2. it always prints the endpoint no matter if `--resolve-ip` is specified or not. but it should print the resolved name, instead of an IP address if `--resolve-ip` is specified. in this change, both issues are addressed. and the test is updated accordingly to exercise the case where `--resolve-ip` is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:29:31 +08:00
Kefu Chai	d927ee8d8f	test/nodetool: calc max_width from all_hosts for better readability. as `token_to_endpoint` is but a derived variable from `all_hosts`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:28:54 +08:00
Kefu Chai	4a748c7fb0	test/nodetool: keep tokens as Host's member to be more consistent with the test_status.py. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:28:54 +08:00
Kefu Chai	aefc385786	test/nodetool: remove unused import and add two empty lines in between global functions Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 22:28:54 +08:00
Botond Dénes	b69ee6bc27	Merge 'Fix load-and-stream for tablets' from Raphael "Raph" Carvalho It might happen that multiple tablets co-habit the same shard, so we want load-and-stream to jump into a new streaming session for every tablet, such that the receiver will have the data properly segregated. That's a similar treatment we gave to repair. Today, load-and-stream fails due to sstables spanning more than 1 tablet in the receiver. Synchronization with migration is done by taking replication map, so migrations cannot advance while streaming new data. A bug was fixed too, where data must be streamed to pending replicas too, to handle case where migration is ongoing and new data must reach both old and new replica set. A test was added stressing this synchronization path. Another bug was fixed in sstable loading, which expected sharder to not be invalidated throughout the operation, but that breaks during migrations. Fixes #17315. Closes scylladb/scylladb#17449 * github.com:scylladb/scylladb: test: test_tablets: Add load-and-stream test sstables_loader: Stream to pending tablet replica if needed sstables_loader: Implement tablet based load-and-stream sstables_loader: Virtualize sstable_streamer for tablet sstables_loader: Avoid reallocations in vector sstable_loader: Decouple sstable streaming from selection sstables_loader: Introduce sstable_streamer Fix online SSTable loading with concurrent tablet migration	2024-03-07 14:18:30 +02:00
Nadav Har'El	19bcea6216	materialized views: fix rare failure caused by empty update This one-line patch fixes a failure in the dtest lwt_schema_modification_test.py::TestLWTSchemaModification ::test_table_alter_delete Where an update sometimes failed due to an internal server error, and the log had the mysterious warning message: "std::logic_error (Empty materialized view updated)" We've also seen this log-message in the past in another user's log, and never understood what it meant. It turns out that the error message was generated (and warning printed) while building view updates for a base-table mutation, and noticing that the base mutation contains an empty row - a row with no cells or tombstone or anything whatsoever. This case was deemed (8 years ago, in `d5a61a8c48`) unexpected and nonsensical, and we threw an exception. But this case actually can happen - here is how it happened in test_table_alter_delete - which is a test involving a strange combination of materialized views, LWT and schema changes: 1. A table has a materialized view, and also a regular column "int_col". 2. A background thread repeatedly drops and re-creates this column int_col. 3. Another thread deletes rows with LWT ("IF EXISTS"). 4. These LWT operations each reads the existing row, and because of repeated drop-and-recreate of the "int_col" column, sometimes this read notices that one node has a value for int_col and the other doesn't, and creates a read-repair mutation setting int_col (the difference between the two reads includes just in this column). 5. The node missing "int_col" receives this mutation which sets only int_col. It upgrade()s this mutation to its most recent schema, which doesn't have int_col, so it removes this column from the mutation row - and is left with a completely empty mutation row. This completely empty row is not useful, but upgrade() doesn't remove it. 6. The view-update generation code sees this empty base-mutation row and fails it with this std::logic_error. 7. The node which sent the read-repair mutation sees that the read repair failed, so it fails the read and therefore fails the LWT delete operation. It is this LWT operation which failed in the test, and caused the whole test to fail. The fix is trivial: an empty base-table row mutation should simply be ignored when generating view updates - it shouldn't cause any error. Before this patch, test_table_alter_delete used to fail in roughly 20% of the runs on my laptop. After this patch, I ran it 100 times without a single failure. Fixes #15228 Fixes #17549 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17607	2024-03-07 12:00:43 +02:00
Botond Dénes	09068d20ea	tools/scylla-nodetool: scrub: make keyspace parameter optional When no keyspace is provided, request all keyspaces from the server, then scrub all of them. This is what the legacy nodetool does, for some reason this was missed when re-implementing scrub. Closes scylladb/scylladb#17495	2024-03-07 11:15:46 +02:00
Tomasz Grabiec	ec6ed18b5c	Merge 'Handle tablet migration failure in barrier stages' from Pavel Emelyanov There are 4 barrier-only stages when migrating a tablet and the test needs to fail pending/leaving replica that handles it in order to validate how coordinator handles dead node. Failing the barrier is done by suspending it with injection code and stopping the node without waking it up. The main difficulty here is how to tell one barrier RPC call from another, because they don't have anything onboard that could tell which stage the barrier is run for. This PR suggests that barrier injection code looks directly into the system.tablets table for the transition stage, the stage is already there by the time barrier is about to ack itself over RPC. refs: #16527 Closes scylladb/scylladb#17450 * github.com:scylladb/scylladb: topology.tablets_migration: Handle failed use_new topology.tablets_migration: Handle failed write_both_read_new topology.tablets_migration: Handle failed write_both_read_old topology.tablets_migration: Handle failed allow_write_both_read_old test/tablets_migration: Add conditional break-point into barrier handler replica: Add helper to read tablet transition stage topology_coordinator: Add action_failed() helper	2024-03-07 09:56:13 +01:00
Botond Dénes	5dfaa69bde	tools/scylla-nodetool: listsnapshots: make the formatting compatible with origin's The author (me) tried to be clever and fix the formatting, but then he realized this just means a lot of unnecessary fighting with tests. So this patch makes the formatting compatible with that of the legacy nodetool: * Use compatible rounding and precision formatting * Use incorrect unit (KB instead of KiB) * Align numbers to the left * Add trailing white-space to "Snapshot Details: "	2024-03-07 03:54:54 -05:00
Botond Dénes	80483ba732	tools/scylla-nodetool: listsnapshots: bail out if there are no snapshots Print a message and exit, don't continue to output the snapshot table. This is what the legacy nodetool does too.	2024-03-07 03:54:54 -05:00
Botond Dénes	ac15e4c109	tools/scylla-nodetool: repair: accept and ignore -full/--full and -j/--job-threads These two parameters are not used by the native nodetool, because ScyllaDB itself doesn't support them. These should be just ignored and indeed there was a unit test checking that this is the case. However, due to a mistake in the unit test, this was not actually tested and nodetool complained when seeing these params. This patch fixes both the test and the native nodetool. Closes scylladb/scylladb#17477	2024-03-07 11:53:50 +03:00
Nadav Har'El	a36c8b28dd	Merge 'scylla-gdb.py: fixes warnings raised by flake8' from Kefu Chai this changeset addresses some warnings raised by flake8 in hope to improve the readability of this script in general. Closes scylladb/scylladb#17668 * github.com:scylladb/scylladb: scylla-gdb: s/if not foo is None/if foo is not None/ scylla-gdb.py: add space after keyword scylla-gdb.py: remove extraneous spaces scylla-gdb.py: use 2 empty lines between top-level funcs/classes scylla-gdb.py: replace <tab> with 4 spaces scylla-gdb: fix the indent	2024-03-07 10:41:15 +02:00
Botond Dénes	28639e6a59	Merge 'docs: trigger the docs-pages workflow on release branches' from Beni Peled Currently, the github docs-pages workflow is triggered only when changes are merged to the master/enterprise branches, which means that in the case of changes to a release branch, for example, a fix to branch-5.4, or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and therefore the documentation is not updated with the new change, In this change, I added the `branch-*` pattern, so changes to release branches will trigger the workflow Closes scylladb/scylladb#17281 github.com:scylladb/scylladb: docs: always build from the default branch docs: trigger the docs-pages workflow on release branches	2024-03-07 10:01:50 +02:00
Botond Dénes	75fe2f5c3a	Merge 'test: rest_api: fix tests to work with tablets' from Aleksandra Martyniuk Fix test_compaction_task.py, test_repair_task.py and test_storage_service.py to work with tablets. Fixes: #17338. Closes scylladb/scylladb#17474 * github.com:scylladb/scylladb: test: rest_api: enable tablets by default test: fix indentation and delete unused this_dc param test: rest_api: fix test_storage_service.py test: rest_api: fix test_repair_task.py test: rest_api: fix test_compaction_task.py test: rest_api: use skip_without_tablets fixture test: rest_api: add some tablet related fixtures	2024-03-07 10:00:09 +02:00
Asias He	83a28342ea	service: Drop unused table param from session_topology_guard The table param is not used. Dropping it so it can be used in places where the table object is not available. Closes scylladb/scylladb#17628	2024-03-07 09:34:40 +02:00
Israel Fruchter	6eb0509ff9	Update tools/cqlsh submodule * tools/cqlsh b8d86b76...e5f5eafd (2): > dist/debian: fix the trailer line format > `COPY TO STDOUT` shouldn't put None where a function is expected Fixes: scylladb/scylladb#17451 Closes scylladb/scylladb#17447	2024-03-07 09:33:36 +02:00
Michał Chojnowski	f9e97fa632	sstables: fix a use-after-free in key_view::explode() key_view::explode() contains a blatant use-after-free: unless the input is already linearized, it returns a view to a local temporary buffer. This is rare, because partition keys are usually not large enough to be fragmented. But for a sufficiently large key, this bug causes a corrupted partition_key down the line. Fixes #17625 Closes scylladb/scylladb#17626	2024-03-07 09:07:07 +02:00
Kefu Chai	7631605892	query-request: use default-generated operator== instead of using the hand-crafted operator==, use the default-generated one, which is equivalent to the former. regarding the difference between global operator== and member operator==, the default-generated operator in C++20 is now symmetric. so we don't need to worry about the problem of `max_result_size` being lhs or rhs. but neither do we need to worry about the implicit conversion, because all constructors of `max_result_size` are marked explicit. so we don't gain any advantage by making the operator== global instead of a member operator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17536	2024-03-07 09:02:42 +03:00
Kefu Chai	64e14d21db	locator/tablets: add fmt::formatter for tablet_* before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * tablet_id * tablet_replica * tablet_metadata * tablet_map their operator<<:s are dropped Refs scylladb/scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17504	2024-03-07 09:00:49 +03:00
Kefu Chai	6ef507e842	build: cmake: add table_check.cc to repair/CMakeLists.txt in `5202bb9d`, we introduced repair/table_check.cc, but we didn't update repair/CMakeLists.txt accordingly. but the symbols defined by this compilation unit is referenced by other source files when building scylla. so, in this change, we add this table_check.cc to the "repair" target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17517	2024-03-07 08:59:02 +03:00
Pavel Emelyanov	52a1b2c413	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * position_range * mutation_fragment * range_tombstone_stream * mutation_fragment_v2::printer Refs #13245 Closes scylladb/scylladb#17521 * github.com:scylladb/scylladb: mutation: add fmt::formatter for position_range mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream mutation: add fmt::formatter for mutation_fragment_v2::printer	2024-03-07 08:56:21 +03:00
Pavel Emelyanov	df6048adec	topology.tablets_migration: Handle failed use_new This stage doesn't need any special treatment, because we cannot revert to old replicas and should proceed normally. The barrier itself won't get stuck, because it already handles excluded/ignored nodes. Just make the test validate it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	fb7428c560	topology.tablets_migration: Handle failed write_both_read_new Two options here -- go revert to old replicas by jumping into cleanup_target stage or proceed noramlly. The choice depends on which replica set has less number of dead nodes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	324eaaf873	topology.tablets_migration: Handle failed write_both_read_old At this stage it can happen that target replica got some writes, so its tablet needs to be cleaned up, so jump to cleanup_target stage. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	f81e0b2e88	topology.tablets_migration: Handle failed allow_write_both_read_old This is early stage, just proceed to existing revert_migration Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	5bb1597a30	test/tablets_migration: Add conditional break-point into barrier handler There are several transition stages that are executed by the topology coordinator with the help of barrier-and-drain raft commands. For the test to stop and remove a node while handling this stage it must inject a break-point into barrier handler, wait for it to happen and then stop the node without resuming the break-point. Then removenode from the cluster. The break-point suspends barrier handling when a specific tablet is in specific transition stage. Tablet ID and desired stage are configured via injector parameters. With today's error-injection facilities the way to suspend code execution is with injecting a lambda that waits for a message from the injection engine. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:26 +03:00
Pavel Emelyanov	f5264dc501	replica: Add helper to read tablet transition stage Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:47:25 +03:00
Kefu Chai	4f8b618be7	scylla-gdb: s/if not foo is None/if foo is not None/ more readable this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	643a6d5bda	scylla-gdb.py: add space after keyword it'd be more pythonic to just put an expression after `assert`, instead of quoting it with a pair of parenthesis. and there is no need to add `;` after `break`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	8c65f92f1f	scylla-gdb.py: remove extraneous spaces Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	12c06c39c3	scylla-gdb.py: use 2 empty lines between top-level funcs/classes and 1 empty line for nested functions/classes, to be more PEP8 compliant. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	8e3b22c76a	scylla-gdb.py: replace <tab> with 4 spaces do not mix tab and spaces for indent Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Kefu Chai	c4b679fe3b	scylla-gdb: fix the indent indent should be multiple of 4 spaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-07 13:46:38 +08:00
Pavel Emelyanov	79b5a75ded	topology_coordinator: Add action_failed() helper It checks if the action holder holds a failed action. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-03-07 08:46:29 +03:00
Botond Dénes	8dd6fe75e7	Merge 'tools/scylla-nodetool: implement info ' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17498 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement info test/nodetool: move format_size into utils.py	2024-03-07 07:14:51 +02:00
Avi Kivity	c5f01349b1	Merge 'Add specialized tablet_sstable_set' from Benny Halevy Make a specialized sstable_set for tablets via tablet_storage_group_manager::make_sstable_set. This sstable set takes a snapshot of the storage_groups (compound) sstable_sets and maps the selected tokens directly into the tablet compound_sstable_set. This sstable_set provides much more efficient access to the table's sstable sets as it takes advantage of the disjointness of sstable sets between tablets/storage_groups, and making it is cheaper that rebuilding a complete partitioned_sstable_set from all sstables in the table. Fixes #16876 Cassandra-stress setup: ``` $ sudo cpupower frequency-set -g userspace $ build/release/scylla (developer-mode options) --smp=16 --memory=8G --experimental-features=consistent-topology-changes --experimental-features=tablets cqlsh> CREATE KEYSPACE keyspace1 WITH replication={'class':'NetworkTopologyStrategy', 'replication_factor':1} AND tablets={'initial':2048}; $ ./tools/java/tools/bin/cassandra-stress write no-warmup n=10000000 -pop 'seq=1...10000000' -rate threads=128 $ scylla-api-client system drop_sstable_caches POST $ ./tools/java/tools/bin/cassandra-stress read no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128 $ scylla-api-client system drop_sstable_caches POST $ ./tools/java/tools/bin/cassandra-stress mixed no-warmup duration=60s -pop 'dist=uniform(1..10000000)' -rate threads=128 ``` Baseline (`0a7854ea4d`) vs. fix (`0c2c00f01b`) Throughput (op/s): workload \| baseline \| fix ---------\|----------\|---------- write \| 76,806 \| 100,787 read \| 34,330 \| 106,099 mixed \| 32,195 \| 79,246 Closes scylladb/scylladb#17149 * github.com:scylladb/scylladb: table: tablet_storage_group_manager: make tablet_sstable_set storage_group_manager: add make_sstable_set tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode table: move compaction_group_list and storage_group_vector to storage_group_manager compaction_group::table_state: get_group_id: become self-sufficient compaction_group, table: make_compound_sstable_set: declare as const tablet_storage_group_manager: precalculate my_host_id and _tablet_map table: coroutinize update_effective_replication_map	2024-03-06 23:59:39 +02:00
Botond Dénes	557d851191	tools/toolchain/README.md: mention the need of credentials for publishing images Without this, the push will fail, complaining about bad permissions. Closes scylladb/scylladb#17652	2024-03-06 15:58:24 +02:00
Kefu Chai	3e91b1382b	tools/scylla-nodetool: always use compile-time format string instead of passing fmt string as a plain `const char*`, pass it as a consteval type, so that `fmt::format()` can perform compile-time format check against it and the formatted params. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17656	2024-03-06 14:55:10 +02:00
Avi Kivity	3ab2088119	Merge 'build: cmake: use scylla build mode for rust profile name ' from Kefu Chai before this change, we used the lower-case CMake build configuration name for the rust profile names. but this was wrong, because the profiles are named with the scylla build mode. in this change, we translate the `$<CONFIG>` to scylla build mode, and use it for the profile name and for the output directory of the built library. Closes scylladb/scylladb#17648 * github.com:scylladb/scylladb: build: cmake: use scylla build mode for rust profile name build: cmake: define per-config build mode	2024-03-06 13:46:20 +02:00
Botond Dénes	65b9e10543	repair: resolve start-up deadlock Repairs have to obtain a permit to the reader concurrency semaphore on each shard they have a presence on. This is prone to deadlocks: node1 node2 repair1_master (takes permit) repair1_follower (waits on permit) repair2_master (waits for permit) repair2_follower (takes permit) In lieu of strong central coordination, we solved this by making permits evictable: if repair2 can evict repair1's permit so it can obtain one and make progress. This is not efficient as evicting a permit usually means discarding already done work, but it prevents the deadlocks. We recently discovered that there is a window when deadlocks can still happen. The permit is made evictable when the disk reader is created. This reader is an evictable one, which effectively makes the permit evictable. But the permit is obtained when the repair constrol structrure -- repair meta -- is create. Between creating the repair meta and reading the first row from disk, the deadlock is still possible. And we know that what is possible, will happen (and did happen). Fix by making the permit evictable as soon as the repair meta is created. This is very clunky and we should have a better API for this (refs #17644), but for now we go with this simple patch, to make it easy to backport. Refs: #17644 Fixes: #17591 Closes scylladb/scylladb#17646	2024-03-06 11:38:07 +02:00
Kamil Braun	19b816bb68	Merge 'Migrate system_auth to raft group0' from Marcin Maliszkiewicz This patch series makes all auth writes serialized via raft. Reads stay eventually consistent for performance reasons. To make transition to new code easier data is stored in a newly created keyspace: system_auth_v2. Internally the difference is that instead of executing CQL directly for writes we generate mutations and then announce them via raft group0. Per commit descriptions provide more implementation details. Refs https://github.com/scylladb/scylladb/issues/16970 Fixes https://github.com/scylladb/scylladb/issues/11157 Closes scylladb/scylladb#16578 * github.com:scylladb/scylladb: test: extend auth-v2 migration test to catch stale static test: add auth-v2 migration test test: add auth-v2 snapshot transfer test test: auth: add tests for lost quorum and command splitting test: pylib: disconnect driver before re-connection test: adjust tests for auth-v2 auth: implement auth-v2 migration auth: remove static from queries on auth-v2 path auth: coroutinize functions in password_authenticator auth: coroutinize functions in standard_role_manager auth: coroutinize functions in default_authorizer storage_service: add support for auth-v2 raft snapshots storage_service: extract getting mutations in raft snapshot to a common function auth: service: capture string_view by value alternator: add support for auth-v2 auth: add auth-v2 write paths auth: add raft_group0_client as dependency cql3: auth: add a way to create mutations without executing cql3: run auth DML writes on shard 0 and with raft guard service: don't loose service_level_controller when bouncing client_state auth: put system_auth and users consts in legacy namespace cql3: parametrize keyspace name in auth related statements auth: parametrize keyspace name in roles metadata helpers auth: parametrize keyspace name in password_authenticator auth: parametrize keyspace name in standard_role_manager auth: remove redundant consts auth::meta::*::qualified_name auth: parametrize keyspace name in default_authorizer db: make all system_auth_v2 tables use schema commitlog db: add system_auth_v2 tables db: add system_auth_v2 keyspace	2024-03-06 10:11:33 +01:00
Botond Dénes	58265a7dc1	tools/utils: fix use-after-free when printing error message for unknown operation When a tool application is invoked with an unknown operation, an error message is printed, which includes all the known operations, with all their aliases. This is collected in `std::vector<std::string_view>`. The problem is that the vector containing alias names, is returned as a value, so the code ends up creating views to temporaries. Fix this by returning alias vector with const&. Fixes: #17584 Closes scylladb/scylladb#17586	2024-03-06 10:42:02 +02:00
Pavel Emelyanov	ca8bfed8e6	topology_coordinator: Demote log level for advance_in_background() errors The helper in question is supposed to spawn a background fiber with tablet migration stage action and repeat it in case action fails (until operator intervention, but that's another story). In case action fails a message with ERROR level is logger about the failure. This error confuses some tests that scan scylla log messages for ERROR-s at the end, treat most of them (if not all) as ciritical and fail. But this particular message is not in fact an error -- topology coordinator would re-execute this action anyway, so let's demote the message to be WARN instead. refs: #17027 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17568	2024-03-06 10:39:00 +02:00
Botond Dénes	88a76245ba	Merge 'Get metrics description' from Amnon Heiman This series adds a Python script that searches the code for metrics definition and their description. Because part of the code uses a nonstandard way of definition, it uses a configuration file to resolve parameter values. The script supports the code that uses string format and string concatenation with variables. The documentation team will use the results to both document the existing metrics and to get the metrics changes between releases. Replaces #16328 Closes scylladb/scylladb#17479 * github.com:scylladb/scylladb: Adding scripts/metrics-config.yml Adding scripts/get_description.py to fetch metrics description	2024-03-06 10:37:35 +02:00
Kefu Chai	e248ab48db	tools/scylla-nodetool: correct tablestats filtering before this change, we failed to apply the filtering of tablestats command in the right way: 1. `table_filter` failed to check if delimiter is npos before extract the cf component from the specified table name. 2. the stats should not included the keyspace which are not included by the filter. 3. the total number of tables in the stats report should contain all tables no matter they are filtered or not. in this change, all the problems above are addressed. and the tests are updated to cover these use cases. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17468	2024-03-06 10:36:20 +02:00
Benny Halevy	0c2c00f01b	table: tablet_storage_group_manager: make tablet_sstable_set Make a specialized sstable_set for tablets via tablet_storage_group_manager::make_sstable_set. This sstable set takes a snapshot of the storage_groups (compound) sstable_sets and maps the selected tokens directly into the tablet compound_sstable_set. Refs #16876 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	0745865914	storage_group_manager: add make_sstable_set Move the responsibility for preparing the table_set covering all sstables in the table to the storage_group_manager so it can specialize the sstable_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	3cee24c148	tablet_storage_group_manager: handle_tablet_split_completion: pre-calc new_tablet_count Mini-cleanup of `new_tablet_count`, similar to pre-calculating `old_tablet_count` once. While at it, add some missing coding-style related spaces. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	c65768dc24	table: tablet_storage_group_manager: storage_group_of: do not validate in release build mode No validation is really required in release build. Add `#ifndef SCYLLA_BUILD_MODE_RELEASE` before adding another term to the logic in the next patch that adds support for sparse allocation in a cloned tablet_storage_group_manager. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:36 +02:00
Benny Halevy	7f203f0551	table: move compaction_group_list and storage_group_vector to storage_group_manager So the storage_group_manager can be used later by table_sstable_set. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:35:33 +02:00
Tzach Livyatan	a245c0bb98	Docs: Remove 3rd party Rust Driver from the driver list The 3rd party Rust https://github.com/AlexPikalov/cdrs is not maintained, and we have a better internal alternative. Closes scylladb/scylladb#15815	2024-03-06 10:34:43 +02:00
Aleksandra Martyniuk	923ef3c8c8	repair: reuse table name from repair_range argument Currently in shard_repair_task_impl::repair_range table name is retrieved with database::find_column_family and in case of exception, we return from the function. But the table name is already kept in table_info passed to repair_range as an argument. Let's reuse it. If a table is dropped, we will find it out almost immediately after calling repair_cf_range_row_level and handle it more adequately. Closes scylladb/scylladb#17245	2024-03-06 10:34:21 +02:00
Botond Dénes	41424231f1	Merge 'compaction: reshape sstables within compaction groups' from Lakshmi Narayanan Sreethar For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. The shard_reshaping_compaction_task_impl now groups the sstables based on their compaction groups before reshaping them. Fixes https://github.com/scylladb/scylladb/issues/16966 Closes scylladb/scylladb#17395 * github.com:scylladb/scylladb: test/topology_custom: add testcase to verify reshape with tablets test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction replica/distributed_loader: enable reshape for sstables compaction: reshape sstables within compaction groups replica/table : add method to get compaction group id for an sstable compaction: reshape: update total reshaped size only on success compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run	2024-03-06 10:33:56 +02:00
Botond Dénes	f164ed8bae	Merge 'docs: fix the formattings in operating-scylla/nodetool-commands/info.rst' from Kefu Chai couple minor formatting fixes. Closes scylladb/scylladb#17518 * github.com:scylladb/scylladb: docs: remove leading space in table element docs: remove space in words	2024-03-06 10:33:21 +02:00
Tzach Livyatan	dafc83205b	Docs: rename the select-from-mutation-fragments page name Closes scylladb/scylladb#17456	2024-03-06 10:32:56 +02:00
David Garcia	d27d89fd34	docs: add collapsible for images Introduces collapsible dropdowns for images reference docs. With this update, only the latest version's details will be displayed open by default. Information about previous versions will be hidden under dropdowns, which users can expand as needed. This enhancement aims to make pages shorter and easier to navigate. Closes scylladb/scylladb#17492	2024-03-06 10:32:35 +02:00
Botond Dénes	dce42b2517	Merge 'tools/scylla-nodetool: fixes to address the test failure with dtest' from Kefu Chai - use API endpoint of /storage_service/toppartition/ - only print out the specified samplings. - print "\n" separator between samplings Closes scylladb/scylladb#17574 * github.com:scylladb/scylladb: tools/scylla-nodetool: print separator between samplings tools/scylla-nodetool: only print the specified sampling tools/scylla-nodetool: use /storage_service/toppartition/	2024-03-06 10:27:25 +02:00
David Garcia	847882b981	docs: add dynamic substitutions This pull request adds dynamic substitutions for the following variables: * `.. \|CURRENT_VERSION\| replace:: {current_version}` * `.. \|UBUNTU_SCYLLADB_LIST\| replace:: scylla-{current_version}.list` * `.. \|CENTOS_SCYLLADB_REPO\| replace:: scylla-{current_version}.repo` As a result, it is no longer needed to update the "Installation on Linux" page manually after every new release. Closes scylladb/scylladb#17544	2024-03-06 10:25:57 +02:00
comsky	48ad1b3d20	Update stats-output.rst I read this doc to learn how to use nodetool commands, and I eventually found some typos in the docs. 😄 Closes scylladb/scylladb#15771	2024-03-06 10:25:06 +02:00
Kefu Chai	7bb33a1f8d	node_ops: add fmt::formatter for node_ops_cmd and node_ops_cmd_request before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * node_ops_cmd * node_ops_cmd_request their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17505	2024-03-06 10:24:31 +02:00
Benny Halevy	dc10d02890	compaction_group::table_state: get_group_id: become self-sufficient Printing the compaction_group group_id as "i/size" where size is the total number of compaction_groups in the table is convenient but it comes with a price of a circular dependency on the table, as noted by Aleksandra Martyniuk in `c25827feb3 (r1511341251)`, which can be triggered when hitting an error when adding the compaction_group::table_state to the table's compaction_manager within the table's constructor. This patch just prints the _group_id member resolving the dependency on the table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:21:48 +02:00
Avi Kivity	6383aa1e3c	docs: maintainer.md: add exceptions to the don't-commit-your-own-code rules Submodule and toolchain updates aren't original code and so are exempt from the don't-commit-own-code rule. Closes scylladb/scylladb#17534	2024-03-06 10:19:46 +02:00
Tzach Livyatan	04b483e286	Docs: fix RF type in the consistency-calculator Closes scylladb/scylladb#17557	2024-03-06 10:18:29 +02:00
Kefu Chai	d93b018bcf	create-relocatable-package.py: add --debian-dir option before this change, we assume that debian packaging directory is always located under `build/debian/debian`. which is hardwired by `configure.py`. but this could might hold anymore, if we want to have a self-contained build, in the sense that different builds do not share the same build directory. this could be a waste for the non-mult-config build, but `configure.py` uses mult-config generator when building with CMake. so in that case, all builds still share the same $build_dir/debian/ directory. in order to work with the out-of-source build, where the build directory is not necessarily "build", a new option is added to `create-relocatable-package.py`, this allows us to specify the directory where "debian" artifacts are located. Refs #15241 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17558	2024-03-06 10:18:00 +02:00
Kefu Chai	19e02de1aa	transport/controller: remove unused struct definition the removed struct definition is not used, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17537	2024-03-06 10:17:08 +02:00
Tzach Livyatan	1edce9f4b6	Improve the frozen vs. non-frozen doc section, removing falses claimes Closes scylladb/scylladb#17556	2024-03-06 10:16:33 +02:00
Kefu Chai	4d4c0ddf31	build: cmake: exclude Seastar's tests from "all" in `02de9f1833`, we enable building Seastar testing for using the testing facilities in scylla's own tests. but this brings in Seastar's tests. since scylladb's CI builds the "all" targets, and we are not interested in running Seastar's tests when building scylladb, let's exclude Seastar's tests from the "all" target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17554	2024-03-06 10:15:45 +02:00
Benny Halevy	bfe13daed4	compaction_group, table: make_compound_sstable_set: declare as const It does not modify the compaction_group/table respectively. This is required by the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:15:34 +02:00
Benny Halevy	d7b1851449	tablet_storage_group_manager: precalculate my_host_id and _tablet_map The node host_id never changes, so get it once, when the object is constructed. A pointer to the tablet_map is taken when constructed using the effective_replication_map and it is updated whenever the e_r_m changes, using a newly added `update_effective_replication_map` method. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:15:34 +02:00
Benny Halevy	f2ff701489	table: coroutinize update_effective_replication_map It's better to wait on deregistering the old main compaction_groups:s in handle_tablet_split_completion rather than leaving work in the background. Especially since their respective storage_groups are being destroyed by handle_tablet_split_completion. handle_tablet_split_completion keeps a continuation chain for all non-ready compaction_group stop fibers. and returns it so that update_effective_replication_map can await it, leaving no cleanup work in the background. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-03-06 10:15:34 +02:00
Konstantin Osipov	39d882ddca	main: print pid (process id) at start Print process id to the log at start. It aids debugging/administering the instance if you have multiple instances running on the same machine. Closes scylladb/scylladb#17582	2024-03-06 10:14:22 +02:00
Kefu Chai	80d2981473	dist/docker: collect deb packages from different dir for CMake builds CMake generate debian packages under build/$<CONFIG>/debian instead of build/$mode/debian. so let's translate $mode to $<CONFIG> if build.ninja is found under build/ directory, as configure.py puts build.ninja under $top_srcdir, while CMake puts it under build/ . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17592	2024-03-06 10:13:47 +02:00
Botond Dénes	d37ac1545b	Merge 'build: cmake: fixes for debian packaging' from Kefu Chai - changes to use build/$<CONFIG> for build directory - add ${CMAKE_BINARY_DIR}/debian as a dep - generate deb packages under build/$<CONFIG>/debian Closes scylladb/scylladb#17560 * github.com:scylladb/scylladb: build: cmake: generate deb packages under build/$<CONFIG>/debian build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep build: cmake: use build/$<CONFIG>/ instead of build build: cmake: always pass absolute path for add_stripped()	2024-03-06 10:12:18 +02:00
Anna Stuchlik	a024c2d692	doc: remove Membership changes vs LWT page This commit removes the redundant "Cluster membership changes and LWT consistency" page. The page is no longer useful because the Raft algorithm serializes topology operations, which results in consistent topology updates. Closes scylladb/scylladb#17523	2024-03-06 10:10:01 +02:00
Kefu Chai	e8473d6d03	row_cache: add fmt::formatter for cache_entry before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cache_entry, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17594	2024-03-06 10:08:11 +02:00
Botond Dénes	6f374aa7d6	Merge 'doc: update procedures following the introduction of Raft-based topology' from Anna Stuchlik This PR updates the procedures that changed as a result of introducing Raft-based topology. Refs https://github.com/scylladb/scylladb/issues/15934 Applied the updates from https://docs.google.com/document/d/1BgZaYtKHs2GZKAxudBZv4G7uwaXcRt2jM6TK9dctRQg/edit In addition, it adds a placeholder for the 5.4-to-6.0 upgrade guide, as a file included in that guide, Enable Raft topology, is referenced from other places in the docs. Closes scylladb/scylladb#17500 * github.com:scylladb/scylladb: doc: replace "Raft Topology" with "Consistent Topology" doc: (Raft topology) update Removenode doc: (Raft topology) update Upscale a Cluster doc:(Raft topology)update Membership Change Failures doc: doc: (Raft topology) update Replace Dead Node doc: (Raft topology) update Remove a Node doc: (Raft topology) update Add a New DC doc: (Raft topology) update Add a New Node doc: (Raft topology) update Create Cluster (EC2) doc: (Raft topology) update Create Cluster (n-DC) doc: (Raft topology) update Create Cluster (1DC) doc: include the quorum requirement file doc: add the quorum requirement file doc: add placeholder for Enable Raft topology page	2024-03-06 10:05:47 +02:00
Botond Dénes	c843f98769	Merge 'cql3: add fmt::formatter for cql3 types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * std::vector<data_type> * column_identifier * column_identifier_raw * untyped_constant::type_class and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17538 * github.com:scylladb/scylladb: cql3: add fmt::formatter for expression::printer cql3: add fmt::formatter for raw_value{,_view} cql3: add fmt::formatter for std::vector<data_type> cql3: add fmt::formatter for untyped_constant::type_class cql3: add fmt::formatter for column_identifier{,_row}	2024-03-06 10:03:50 +02:00
Kefu Chai	1519904fb9	docs: quote CQL keywords this "misspelling" was identified by codespell. actually, it's not quite a misspelling, as "UPDATE" and "INSERT" are keywords in CQL. so we intended to emaphasis them, so to make codespell more useful, and to preserve the intention, let's quote the keywords with backticks. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17391	2024-03-06 09:57:07 +02:00
Kefu Chai	51a789afc1	build: cmake: use scylla build mode for rust profile name before this change, we used the lower-case CMake build configuration name for the rust profile names. but this was wrong, because the profiles are named with the scylla build mode. in this change, we translate the $<CONFIG> to scylla build mode, and use it for the profile name and for the output directory of the built library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-06 15:53:11 +08:00
Kefu Chai	0c1864eebd	build: cmake: define per-config build mode so that scylla_build_mode_$<CONFIG> can be referenced when necessary. we using it for referencing build mode in the building system instead of the CMake configuration name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-06 15:53:11 +08:00
Kefu Chai	7e9b0d3d9e	network_topology_strategy: use structured binding when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17642	2024-03-06 09:52:20 +02:00
Botond Dénes	c370f42d8b	Merge 'Automation of ScyllaDB backports - Phase #1 : Master → OSS backports' from Yaron Kaikov This PR includes 3 commits: - [actions] Add a check for backport labels: As part of the Automation of ScyllaDB backports project, each PR should get either a `backport/none` or `backport/X.Y` label. Based on this label we will automatically open a backport PR for the relevant OSS release. In this commit, I am adding a GitHub action to verify if such a label was added. This only applies to PR with a based branch of `master` or `next`. For releases, we don't need this check - Add Mergify (https://mergify.com/) configuration file: In this PR we introduce the `.mergify.yml` configuration file, which include a set of rules that we will use for automating our backport process. For each supported OSS release (currently 5.2 and 5.4) we have an almost identical configuration section which includes the four conditions before we open a backport pr: * PR should be closed * PR should have the proper label. for example: backport/5.4 (we can have multiple labels) * Base branch should be `master` * PR should be set with a `promoted` label - this condition will be set automatically once the commits are promoted to the `master` branch (passed gating) Once all conditions are applied, the verify bot will open a backport PR and will assign it to the author of the original PR, then CI will start running, and only after it pass. we merge - [action] Add promoted label when commits are in master: In Scylla, we don't merge our PR but use ./script/pull_github_pr.sh` to close the pull request, adding `closes scylladb/scylladb <PR number>` remark and push changes to `next` branch. One of the conditions for opening a backport PR is that all relevant commits are in `master` (passed gating), in this GitHub action, we will go through the list of commits once a push was made to `master` and will identify the relevant PR, and add `promoted` label to it. This will allow Mergify to start the process of backporting Closes scylladb/scylladb#17365 * github.com:scylladb/scylladb: [action] Add promoted label when commits are in master Add mergify (https://mergify.com/) configuration file [actions] Add a check for backport labels	2024-03-06 09:50:30 +02:00
Dawid Medrek	b36becc1f3	db/hints: Fix too_many_in_flight_hints_for The semantics of the function was accidentally modified in `6e79d64`. The consequence of the change was that we didn't limit memory consumption: the function always returned false for any node different from the local node. The returned value is used by storage_proxy to decide whether it is able to store a hint or not. This commit fixes the problem by taking other nodes into consideration again. Fixes #17636 Closes scylladb/scylladb#17639	2024-03-06 09:48:30 +02:00
Benny Halevy	08b0426318	scripts/open-coredump.sh: calculate MAIN_BRANCH before cloning repo We need MAIN_BRANCH calculated earlier so we can use it to checkout the right branch when cloning the src repo (either `master` or `enterprise`, based on the detected `PRODUCT`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17647	2024-03-06 09:46:30 +02:00
Avi Kivity	c32a4c8d5c	build: docker: clean up after docker build The `buildah commit` command doesn't remove the working container. These accumulate in ~/.local/container/storage until something bad happens. Fix by adding the `--rm` flag to remove the container and volume. Closes scylladb/scylladb#17546	2024-03-06 09:41:36 +02:00
Kefu Chai	3d8ac06ee8	cql3: add fmt::formatter for expression::printer before this change, we already have a `fmt::formatter` specialized for `expression::printer`. but the formatter was implemented by 1. formatting the `printer` instance to an `ostringstream`, and 2. extracting a `std::string` from this `ostringstream` 3. formatting the `std::string` instance to the fmt context this is convoluted and is not an optimal implementation. so, in this change, it is reimplemented by formatting directly to the context. its operator<< is also dropped in this change. please note, to avoid adding the large chunk of code into the .hh file, the implementation is put in the .cc file. but in order to preserve the usage of `transformed(fmt::to_string<expression::printer>)`, the `format()` function is defined as a template, and instantiated explicitly for two use cases: 1. to format to `fmt::context` 2. to format using `fmt::to_string()` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-05 14:00:13 +08:00
Kefu Chai	fc774361e8	cql3: add fmt::formatter for raw_value{,_view} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * raw_value * raw_value_view `raw_value_view` 's operator<< is still being used by the generic homebrew printer for vector<>, so it is preserved. `raw_value` 's operator<< is still being used by the generic homebrew printer for optional<>, so it's preserved as well. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-05 14:00:13 +08:00
Kamil Braun	0a7854ea4d	Merge 'test: test_topology_ops: fix flakiness and reenable bg writes' from Patryk Jędrzejczak We decrease the server's request timeouts in topology tests so that they are lower than the driver's timeout. Before, the driver could time out its request before the server handled it successfully. This problem caused scylladb/scylladb#15924. Since scylladb/scylladb#15924 is the last issue mentioned in scylladb/scylladb#15962, this PR also reenables background writes in `test_topology_ops` with tablets disabled. The test doesn't pass with tablets and background writes because of scylladb/scylladb#17025. We will reenable background writes with tablets after fixing that issue. Fixes scylladb/scylladb#15924 Fixes scylladb/scylladb#15962 Closes scylladb/scylladb#17585 * github.com:scylladb/scylladb: test: test_topology_ops: reenable background writes without tablets test: test_topology_ops: run with and without tablets test: topology: decrease the server's request timeouts	2024-03-04 20:57:24 +01:00
Patryk Jędrzejczak	f1d9248df9	test: wait for CDC generations publishing before checking CDC-topology consistency Tests that verify upgrading to the raft-based topology (`test_topology_upgrade`, `test_topology_recovery_basic`, `test_topology_recovery_majority_loss`) have flaky `check_system_topology_and_cdc_generations_v3_consistency` calls. `assert topo_results[0] == topo_res` can fail because of different `unpublished_cdc_generations` on different nodes. The upgrade procedure creates a new CDC generation, which is later published by the CDC generation publisher. However, this can happen after the upgrade procedure finishes. In tests, if publishing happens just before querying `system.topology` in `check_system_topology_and_cdc_generations_v3_consistency`, we can observe different `unpublished_cdc_generations` on different nodes. It is an expected and temporary inconsistency. For the same reasons, `check_system_topology_and_cdc_generations_v3_consistency` can fail after adding a new node. To make the tests not flaky, we wait until the CDC generation publisher finishes its job. Then, all nodes should always have equal (and empty) `unpublished_cdc_generations`. Fixes scylladb/scylladb#17587 Fixes scylladb/scylladb#17600 Fixes scylladb/scylladb#17621 Closes scylladb/scylladb#17622	2024-03-04 19:28:51 +02:00
Kamil Braun	ec1f574b3a	test/pylib: util: silence exception from `refresh_nodes` Driver's `refresh_nodes` function may throw an exception if we call it in the middle of driver reconnecting. Silence it. Fixes scylladb/scylladb#17616 Closes scylladb/scylladb#17620	2024-03-04 17:50:16 +02:00
Avi Kivity	e3de30f943	tools: toolchain: update frozen toolchain for python driver 3.26.7 Fixes scylladb/scylladb#16709 Fixes scylladb/scylladb#17353 Closes scylladb/scylladb#17604	2024-03-03 16:36:14 +02:00
Kefu Chai	4cc5fcde72	cql3: add fmt::formatter for std::vector<data_type> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for std::vector<data_type>, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-02 10:52:50 +08:00
Kefu Chai	ed6dc6e3b4	cql3: add fmt::formatter for untyped_constant::type_class before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for untyped_constant::type_class, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-02 10:52:50 +08:00
Kefu Chai	213d13a31c	cql3: add fmt::formatter for column_identifier{,_row} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * column_identifier * column_identifier_raw and their operator<<:s are dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-02 10:52:50 +08:00
Marcin Maliszkiewicz	eb56ae3bb9	test: extend auth-v2 migration test to catch stale static	2024-03-01 16:31:04 +01:00
Marcin Maliszkiewicz	6c30dc6351	test: add auth-v2 migration test	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	53996e2557	test: add auth-v2 snapshot transfer test	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	4f65e173cf	test: auth: add tests for lost quorum and command splitting With auth-v2 we can login even if quorum is lost. So test which checks if error occurs in such situation is deleted and the opposite test which checks if logging in works was added.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	a5f81f0836	test: pylib: disconnect driver before re-connection	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	1badd09d45	test: adjust tests for auth-v2	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	ebb0ffeb6c	auth: implement auth-v2 migration During raft topology upgrade procedure data from system_auth keyspace will be migrated to system_auth_v2. Migration works mostly on top of CQL layer to minimize amount of new code introduced, it mostly executes SELECTs on old tables and then INSERTs on new tables. Writes are not executed as usual but rather announced via raft.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	a8175ce5c6	auth: remove static from queries on auth-v2 path Because keyspace is part of the query when we migrate from v1 to v2 query should change otherwise code would operate on old keyspace if those statics were initialized. Likewise keyspace name can no longer be class field initialized in constructor as it can change during class lifetime.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	ca488c5777	auth: coroutinize functions in password_authenticator Affected functions are: create, create_default_if_missing, authenticate, alter, drop	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	9f172f1843	auth: coroutinize functions in standard_role_manager Affected functions are: find_record, create_default_role_if_missing, create_or_replace, drop, modify_membership, query_all, get_attribute, set_attribute, remove_attribute	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	896b474db0	auth: coroutinize functions in default_authorizer Affected functions: authorize, list_all, revoke_all	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	5a6d4dbc37	storage_service: add support for auth-v2 raft snapshots This patch adds new RPC for pulling snapshot of auth tables.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	c27a84d8e7	storage_service: extract getting mutations in raft snapshot to a common function	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	17572a0e44	auth: service: capture string_view by value This doesn't seem to fix anything but typically we capture string_view by value, so do it consistently the same way.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	9cb1f111d5	alternator: add support for auth-v2 Alternator doesn't do any writes to auth tables so it's simply change of keyspace name. Docs will be updated later, when auth-v2 is enabled as default.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	913a773b1a	auth: add auth-v2 write paths All auth modifications will go now via group0. This is achieved by acquiring group0 guard, creating mutations without executing and then announcing them. Actually first guard is taken by query processor, it serves as read barrier for query validations (such as standard_role_manager::exists), otherwise we could read older data. In principle this single guard should be used for entire query but it's impossible to achive with current code without major refactor. For read before write cases it's good to do write with the guard acquired before the read so that there wouldn't be any modify operation allowed in between. Alought not doing it doesn't make the implementation worse than it currently is so the most complex cases were left with FIXME.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	7f204a6e80	auth: add raft_group0_client as dependency Most auth classes need this to be able to announce raft commands. Usage added in subsequent commit.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	bd444ed6f1	cql3: auth: add a way to create mutations without executing To make table modifications go via raft we need to publish mutations. Currently many system tables (especially auth) use CQL to generate table modifications. Added function is a missing link which will allow to do a seamless transition of certain system tables to raft.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	b482679857	cql3: run auth DML writes on shard 0 and with raft guard Because we'll be doing group0 operations we need to run on shard 0. Additional benefit is that with needs_guard set query_processor will also do automatic retries in case of concurrent group0 operations.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	5607aa590e	service: don't loose service_level_controller when bouncing client_state When bounce_to_shard happens we need to fill client_state with sl_controller appropriate for destination shard. Before the patch sl_controller was set to null after the bounce. It was fine becauase looks like it was never used in such scenario. With auth-v2 we need to bounce attach/detach service level statements because they modify things via auth subsystem which needs to be called on shard 0.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	e26e786340	auth: put system_auth and users consts in legacy namespace This is done to clearly mark legacy (no longer used, once auth-v2 feature becomes default) code paths.	2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz	661eec6e07	cql3: parametrize keyspace name in auth related statements	2024-03-01 16:25:11 +01:00
Marcin Maliszkiewicz	6728965869	auth: parametrize keyspace name in roles metadata helpers	2024-03-01 16:25:03 +01:00
Marcin Maliszkiewicz	f9b985b68c	auth: parametrize keyspace name in password_authenticator It's the same approach as done for standard_role_manager in earlier commit.	2024-03-01 16:24:54 +01:00
Marcin Maliszkiewicz	1901b1c808	auth: parametrize keyspace name in standard_role_manager It's the same approach as done for default_authorizer in earlier commit. Note that only non-legacy paths were changed, in particular legacy migrations and table creations won't be ever executed in new keyspace as they will be managed by system_auth_keyspace implementation. For now we add keyspace name as class member because it's static value anyway. But statics will be removed in future commits because migration can occur and auth need to switch keyspace name in runtime.	2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz	12d7b40b34	auth: remove redundant consts auth::meta::*::qualified_name Just follow the same pattern as in default_authorizer so it's easy to track where system_auth keyspace is actually used. It will also allow for easier parametrization.	2024-03-01 16:24:32 +01:00
Marcin Maliszkiewicz	ae2d8975b9	auth: parametrize keyspace name in default_authorizer When adding group0 replication for auth we will change only write path and plan to reuse read path. To not copy the code or make more complicated class hierarchy default_authorizer's read code will remain unchanged except this parametrization, it is needed as group0 implementation uses separate keyspace (replication is defined on a keyspace level). In subsequent commits legacy write path code will be separated and new implementation placed in default_authorizer. For now we add keyspace name as class member because it's static value anyway. But statics will be removed in future commits because migration can occur and auth need to switch keyspace name in runtime.	2024-03-01 16:22:17 +01:00
Gleb Natapov	94cd235888	topology_coordinator: drop group0 guard while changing raft configuration Changing config under the guard can cause a deadlock. The guard holds _read_apply_mutex. The same lock is held by the group0 apply() function. It means that no entry can be applied while the guard is held and raft apply fiber may be even sleeping waiting for this lock to be release. Configuration change OTOH waits for the config change command to be committed before returning, but the way raft is implement is that commit notifications are triggered from apply fiber which may be stuck. Deadlock. Drop and re-take guard around configuration changes. Fixes scylladb/scylladb#17186	2024-03-01 11:20:15 +01:00
Marcin Maliszkiewicz	d3679de1d2	db: make all system_auth_v2 tables use schema commitlog	2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz	a706424825	db: add system_auth_v2 tables Their schema is equivalent to legacy tables in system_auth.	2024-03-01 10:40:29 +01:00
Marcin Maliszkiewicz	9144d8203b	db: add system_auth_v2 keyspace New keyspace is added similarly as system_schema keyspace, it's being registred via system_keyspace::make which calls all_tables to build its schema. Dummy table 'roles' is added as keyspaces are being currently registered by walking through their tables. Full table schemas will be added in subsequent commits. Change can be observed via cqlsh: cassandra@cqlsh> describe keyspaces; system_auth_v2 system_schema system system_distributed_everywhere system_auth system_distributed system_traces cassandra@cqlsh> describe keyspace system_auth_v2; CREATE KEYSPACE system_auth_v2 WITH replication = {'class': 'LocalStrategy'} AND durable_writes = true; CREATE TABLE system_auth_v2.roles ( role text PRIMARY KEY ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'} AND comment = 'comment' AND compaction = {'class': 'SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 604800 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE';	2024-03-01 10:40:29 +01:00
Kefu Chai	ca7f7bf8e2	build: cmake: generate deb packages under build/$<CONFIG>/debian this follows the convention of configure.py, which puts debian packages under build/$mode/debian. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-01 09:50:30 +08:00
Patryk Jędrzejczak	e7d4e080e9	test: test_topology_ops: reenable background writes without tablets After fixing scylladb/scylladb#15924 in one of the previous patches, we reenable background writes in `test_topology_ops`. We also start background writes a bit later after adding all nodes. Without this change and with tablets, the test fails with: ``` > await cql.run_async(f"CREATE TABLE tbl (pk int PRIMARY KEY, v int)") E cassandra.protocol.ConfigurationException: <Error from server: code=2300 [Query invalid because of configuration issue] message="Datacenter datacenter1 doesn't have enough nodes for replication_factor=3"> ``` The change above makes the test a bit weaker, but we don't have to worry about it. If adding nodes is bugged, other tests should detect it. Unfortunately, the test still doesn't pass with tablets and background writes because of scylladb/scylladb#17025, so we keep background writes disabled with tablets and leave FIXME. Fixes scylladb/scylladb#15962	2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak	90317c5ceb	test: test_topology_ops: run with and without tablets `test_topology_ops` is a valuable test that has uncovered many bugs. It's worth running it with and without tablets.	2024-02-29 18:37:41 +01:00
Patryk Jędrzejczak	9dfb26428b	test: topology: decrease the server's request timeouts We decrease the server's request timeouts in topology tests so that they are lower than the driver's timeout. Before, the driver could time out its request before the server handled it successfully. This problem caused scylladb/scylladb#15924. A high server's request timeout can slow down the topology tests (see the new comment in `make_scylla_conf`). We make the timeout dependent on the testing mode to not slow down tests for no reason. We don't touch the driver's request timeout. Decreasing it in some modes would require too much effort for almost no improvement. Fixes scylladb/scylladb#15924	2024-02-29 18:37:38 +01:00
Gleb Natapov	4ef57096bc	topology coordinator: fix use after free after streaming failure node.rs pointer can be freed while guard is released, so it cannot be accessed during error processing. Save state locally. Fixes #17577 Message-ID: <Zd9keSwiIC4v_EiF@scylladb.com>	2024-02-29 18:27:12 +02:00
Kamil Braun	57b14580f0	Merge 'move migration_request handling to shard0' from Gleb The RPC is used by group0 now which is available only on shard0 Fixes scylladb/scylladb#17565 * 'gleb/migration-request-shard0' of github.com:scylladb/scylla-dev: raft_group0_client: assert that hold_read_apply_mutex is called on shard 0 migration_manager: fix indentation after the previous patch. messaging_service: process migration_request rpc on shard 0	2024-02-29 15:13:16 +01:00
Anna Stuchlik	85cfc6059b	doc: replace "Raft Topology" with "Consistent Topology" This commit replaces "Raft-based Topology" with "Consistent Topology Updates" in the 5.4-to-6.0 upgrade guide and all the links to it.	2024-02-29 14:42:30 +01:00
Anna Stuchlik	9250e0d8e0	doc: (Raft topology) update Removenode This commit updates the Nodetool Removenode page with reference to the Raft-related topology. Specifically, it removes outdated warnings, and adds the information about banning removed and ignored nodes from the cluster.	2024-02-29 14:40:19 +01:00
Anna Stuchlik	d59f38a6ad	doc: (Raft topology) update Upscale a Cluster This commit updates the Upscale a Cluster page with reference to the Raft-related topology. Specifically, it adds a note with the quorum requirement.	2024-02-29 14:40:11 +01:00
Anna Stuchlik	5bece99d4d	doc:(Raft topology)update Membership Change Failures This commit updates the Handling Cluster Membership Change Failures page with reference to the Raft-related topology. Specifically, it adds a note that the page only applies when Raft-based topology is not enabled. In addition, it removes the Raft-enabled option.	2024-02-29 14:38:45 +01:00
Anna Stuchlik	48dd7021a7	doc: doc: (Raft topology) update Replace Dead Node This commit updates the Replace a Dead Node page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to replace the nodes one by one and the requirement to ensure that the the replaced node will never come back to the cluster In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:38:45 +01:00
Anna Stuchlik	a390ce9e6b	doc: (Raft topology) update Remove a Node This commit updates the Remove a Node page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to remove the nodes one by one and the requirement to ensure that the the removed node will never come back to the cluster In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:38:45 +01:00
Anna Stuchlik	59f890c0ef	doc: (Raft topology) update Add a New DC This commit updates the Add a New DC) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:38:36 +01:00
Anna Stuchlik	5a3a720b82	doc: (Raft topology) update Add a New Node This commit updates the Add a New Node (Out Scale) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, a warning is added to indicate the limitations when Raft-base topology is not enabled upon upgrade from 5.4.	2024-02-29 14:35:03 +01:00
Anna Stuchlik	631fcebe12	doc: (Raft topology) update Create Cluster (EC2) This commit updates the Create Cluster (EC2) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, it updates the concept of the seed node.	2024-02-29 14:30:00 +01:00
Anna Stuchlik	b6b610c16e	doc: (Raft topology) update Create Cluster (n-DC) This commit updates the Create Cluster (Multi DC) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, it updates the concept of the seed node.	2024-02-29 14:30:00 +01:00
Anna Stuchlik	cbf054f2b9	doc: (Raft topology) update Create Cluster (1DC) This commit updates the Create Cluster (Single DC) page with reference to the Raft-related topology. Specifically, it removes the previous pre-Raft limitation to bootstrap the nodes one by one. In addition, it updates the concept of the seed node.	2024-02-29 14:30:00 +01:00
Anna Stuchlik	57e0f15c7c	doc: include the quorum requirement file Include the file to avoid repetition.	2024-02-29 14:29:39 +01:00
Gleb Natapov	9847e272f9	raft_group0_client: assert that hold_read_apply_mutex is called on shard 0 group0 operations a valid on shard 0 only. Assert that.	2024-02-29 12:39:48 +02:00
Gleb Natapov	77907b97f1	migration_manager: fix indentation after the previous patch.	2024-02-29 12:39:48 +02:00
Gleb Natapov	4a3c79625f	messaging_service: process migration_request rpc on shard 0 Commit `0c376043eb` added access to group0 semaphore which can be done on shard0 only. Unlike all other group0 rpcs (that already always forwarded to shard0) migration_request does not since it is an rpc that what reused from non raft days. The patch adds the missing jump to shard0 before executing the rpc.	2024-02-29 12:39:48 +02:00
Petr Gusev	6afa80a443	sync_raft_topology_nodes: do no emit REMOVED_NODE on IP change Calling notify_left for old ip on topology change in raft mode was a regression. In gossiper mode it didn't occur. In gossiper mode the function handle_state_normal was responsible for spotting IP addresses that weren't managing any parts of the data, and it would then initiate their removal by calling remove_endpoint. This removal process did not include calling notify_left. Actually, notify_left was only supposed to be called (via excise) by a 'real' removal procedures - removenode and decommission. The redundant notify_left caused troubles in scylla python driver. The driver could receive REMOVED_NODE and NEW_NODE notifications in the same time and their handling routines could race with each other. In this commit we fix the problem by not calling notify_left if the remove_ip lambda was called from the ip change code path. Also, we add a test which verifies that the driver log doesn't mention the REMOVED_NODE notification. Fixes scylladb/scylladb#17444 Closes scylladb/scylladb#17561	2024-02-29 10:18:20 +01:00
Kefu Chai	ce45f93caf	tools/scylla-nodetool: print separator between samplings instead of printing it out after samplings, we should print it in between them. as toppartitions_test.py in dtest splits the samplings using "\n\n". without this change, dtest would consider the empty line as another sampling and then fail the test. as the empty sampling does not match with the expected regular expressions. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-29 16:17:44 +08:00
Kefu Chai	a53457f740	tools/scylla-nodetool: only print the specified sampling before this change, we print all samplings returned by the API, but this is not what cassandra nodetool's behavior, which only prints out the specified one. and the toppartitions_test.py in dtest actually expects that the number of sampling should match with the one specified with command line. so, in this change, we only print out the specified samplings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-29 16:17:44 +08:00
Kefu Chai	604c7440d2	tools/scylla-nodetool: use /storage_service/toppartition/ instead of using the endpoint of /storage_service/toppartition, use /storage_service/toppartition/. otherwise API server refuses to return the expected result. as it does match with any API endpoint. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-29 16:17:44 +08:00
Anna Stuchlik	b02f8a0759	doc: add the quorum requirement file	2024-02-28 13:21:11 +01:00
Botond Dénes	60e04e2c59	test/cql-pytest: test_select_from_mutation_fragments.py: move away from memtables Memtables are fickle, they can be flushed when there is memory pressure, if there is too much commitlog or if there is too much data in them. The tests in test_select_from_mutation_fragments.py currently assume data written is in the memtable. This is tru most of the time but we have seen some odd test failures that couldn't be understood. To make the tests more robust, flush the data to the disk and read it from the sstables. This means that some range scans need to filter to read from just a single mutation source, but this does not influence the tests.	2024-02-28 07:00:25 -05:00
Botond Dénes	c228e4d518	cql3: select_statement: mutation_fragments_select_statement: fix use-after-return Don't capture stack variables by reference... it can (and will) explode in your face.	2024-02-28 06:48:09 -05:00
Kefu Chai	9dbc30a385	build: cmake: add ${CMAKE_BINARY_DIR}/debian as a dep create-relocatable-package.py packages debian packaging as well, so we have to add it as a dependency for the targets which uses `create-relocatable-package.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-28 16:09:48 +08:00
Kefu Chai	a1cd019e50	build: cmake: use build/$<CONFIG>/ instead of build with multi-config generator, the generated artifacts are located under ${CMAKE_BINARY_DIR}/$<CONFIG>/ instead of ${CMAKE_BINARY_DIR}. so update the paths referencing the built executables. and update the `--build-dir` option of `create-relocatable-package.py` accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-28 16:09:48 +08:00
Kefu Chai	bf9a895c09	build: cmake: always pass absolute path for add_stripped() before this change, we assumed that the $<TARGET_FILE:${name} is the path to the parameter passed to this function, but this was wrong. it actually refers the `TARGET` argument of the keyword of this function. also, the path to the generated files should be located under path like "build/Debug" instead of "build" if multi-config generator is used. as multi-config builds share the same `${CMAKE_BINARY_DIR}`. in this change, instead of acccepting a CMake target, we always accept an absolute path. and use ""${CMAKE_BINARY_DIR}/$<CONFIG>" for the directory of the executable, this should work for multi-config generator which it is used by `configure.py`, when CMake is used to build the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-28 16:09:48 +08:00
Raphael S. Carvalho	305c63c629	test: test_tablets: Add load-and-stream test stresses concurrent migration and stream. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 15:18:21 -03:00
Raphael S. Carvalho	771cbf9b79	sstables_loader: Stream to pending tablet replica if needed Even though taking erm blocks migration, it cannot prevent the load-and-stream to start while a migration is going on, erm only prevents migration from advancing. With tablets, new data will be streamed to pending replica too if the write replica selector, in transition metadata, is set to both. If migration is at a later stage where only new replica is written to, then data is streamed only to new replica as selector is set to next (== new replica set). primary_replica_only flag is handled by only streaming to pending if the primary replica is the one leaving through migration. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 15:17:05 -03:00
Avi Kivity	616eec2214	Merge ' test/topology_custom: test_read_repair.py: reduce run-time ' from Botond Dénes This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time. * Changes query-tombstone-page-limit 1000 -> 10. Before `f068d1a6fa`, reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries. * Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB. The latter configuration is a new one, added by the first patches of this series. It allows configuring the page-size in bytes, after which pages are cut. Previously this was a hard-coded constant: 1MB. This forced any tests which wanted to check paging, with pages cut on size, to work with large datasets. This was especially pronounced in the tests fixed in this PR, because this test works with tombstones which are tiny and a lot of them were needed to trigger paging based on the size. With this two changes, we can reduce the data size: * total_rows: 20000 -> 100 * max_live_rows: 32 -> 8 The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine). Fixes: https://github.com/scylladb/scylladb/issues/15425 Fixes: https://github.com/scylladb/scylladb/issues/16899 Closes scylladb/scylladb#17529 * github.com:scylladb/scylladb: test/topology_custom: test_read_repair.py: reduce run-time replica/database: get_query_max_result_size(): use query_page_size_in_bytes replica/database: use include page-size in max-result-size query-request: max_result_size: add without_page_limit() db/config: introduce query_page_size_in_bytes	2024-02-27 18:54:38 +02:00
Aleksandra Martyniuk	9dcb5c76d6	test: rest_api: enable tablets by default Enable tablets by default. Add --vnodes flag to test/rest_api/run to run tests without tablets.	2024-02-27 17:46:30 +01:00
Aleksandra Martyniuk	92d87eb1f7	test: fix indentation and delete unused this_dc param	2024-02-27 17:37:31 +01:00
Aleksandra Martyniuk	9cca241ec6	test: rest_api: fix test_storage_service.py Fix test_storage_service.py to work with tablets. - test_describe_ring was failing because in storage_service/describe_ring table must be specified for keyspaces with tablets. Do not check the status if tablets are enabled. Add checks for specified table; - test_storage_service_keyspace_cleanup_with_no_owned_ranges was failing because cleanup is disabled on keyspaces with tablets. Use test_keyspace_vnodes fixture to use keyspace with tablet disabled; - test_storage_service_get_natural_endpoints required some minor type-related fixes.	2024-02-27 17:34:40 +01:00
Aleksandra Martyniuk	aee0257051	test: rest_api: fix test_repair_task.py Injection set in test_repair_task_progress didn't consider the case when repair::shard_repair_task_impl::ranges_size() == 1 which is true for tablets. Move the injection so that it is triggered before number of complete ranges is increased.	2024-02-27 17:33:59 +01:00
Aleksandra Martyniuk	6210c210ff	test: rest_api: fix test_compaction_task.py Fix test_compaction_task.py to work with tablets. Currently test fail because cleanup on keyspace with tablets is disabled, and reshape and reshard of keyspace with tablets uses load_and_stream which isn't covered by tasks. Use test_keyspace_vnodes for these tests to have a keyspace with tablets disabled.	2024-02-27 17:32:24 +01:00
Aleksandra Martyniuk	a996ed8be9	test: rest_api: use skip_without_tablets fixture Use skip_without_tablets in tests that can be run only with tablets enabled. Delete xfails for these tests.	2024-02-27 17:12:04 +01:00
Aleksandra Martyniuk	1fbe76814e	test: rest_api: add some tablet related fixtures Add fixtures for checking if tablets are enabled or skipping a test if they are/aren't enabled.	2024-02-27 17:11:57 +01:00
Raphael S. Carvalho	ab498489fe	sstables_loader: Implement tablet based load-and-stream Similar treatment to repair is given to load-and-stream. Jumps into a new streaming session for every tablet, so we guarantee data will be segregated into tablets co-habiting the same shard. Fixes #17315. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 13:04:20 -03:00
Nadav Har'El	fc861742d7	cql: avoid undefined behavior in totimestamp() of extreme dates This patch fixes a UBSAN-reported integer overflow during one of our existing tests, test_native_functions.py::test_mintimeuuid_extreme_from_totimestamp when attempting to convert an extreme "date" value, millions of years in the past, into a "timestamp" value. When UBSAN crashing is enabled, this test crashes before this patch, and succeeds after this patch. The "date" CQL type is 32-bit count of days from the epoch, which can span 2^31 days (5 million years) before or after the epoch. Meanwhile, the "timestamp" type measures the number of milliseconds from the same epoch, in 64 bits. Luckily (or intentionally), every "date", however extreme, can be converted into a "timestamp": This is because 2^31 days is 1.85e17 milliseconds, well below timestamp's limit of 2^63 milliseconds (9.2e18). But it turns out that our conversion function, date_to_time_point(), used some boost::gregorian library code, which carried out these calculations in microsecond resolution. The extra conversion to microseconds wasn't just wasteful, it also caused an integer overflow in the extreme case: 2^31 days is 1.85e20 microseconds, which does NOT fit in a 64-bit integer. UBSAN notices this overflow, and complains (plus, the conversion is incorrect). The fix is to do the trivial conversion on our own (a day is, by convention, exactly 86400 seconds - no fancy library is needed), without the grace of Boost. The result is simpler, faster, correct for the Pliocene-age dates, and fixes the UBSAN crash in the test. Fixes #17516 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17527	2024-02-27 17:04:18 +02:00
Raphael S. Carvalho	b9158e36ef	sstables_loader: Virtualize sstable_streamer for tablet virtualization allows for tablet version of streaming. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:30:14 -03:00
Raphael S. Carvalho	3523cc8063	sstables_loader: Avoid reallocations in vector Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	d1db17d490	sstable_loader: Decouple sstable streaming from selection That will make it easy to introduce tablet-based load-and-stream. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	0a41f2a11f	sstables_loader: Introduce sstable_streamer Will make it easier to implement tablet oriented variant. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	21533aff0f	Fix online SSTable loading with concurrent tablet migration load-and-stream is currently the only method -- for tablets -- that can load SSTables while the node is online. Today, sstable_directory relies on replication map (erm) not being invalidated during loading, and the assumption is broken with concurrent tablet migration. It causes load-and-stream to segfault. The sstable loader needs the sharder from erm in order to compute the owning shard. To fix, let's use auto_refreshing_sharder, which refreshes sharder every time table has replication map updated. So we guarantee any user of sharder will find it alive throughout the lifetime of sstable_directory. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:27:07 -03:00
Gleb Natapov	0c376043eb	migration_manager: take group0 lock during raft snapshot taking Group0 state machine access atomicity is guaranteed by a mutex in group0 client. A code that reads or writes the state needs to hold the log. To transfer schema part of the snapshot we used existing "migration request" verb which did not follow the rule. Fix the code to take group0 lock before accessing schema in case the verb is called as part of group0 snapshot transfer. Fixes scylladb/scylladb#16821	2024-02-27 11:15:17 +01:00
Botond Dénes	5dc145a93f	test/topology_custom: test_read_repair.py: reduce run-time This test needed a lot of data to ensure multiple pages when doing the read repair. This change two key configuration items, allowing for a drastic reduction of the data size and consequently a large reduction in run-time. * Changes query-tombstone-page-limit 1000 -> 10. Before `f068d1a6fa`, reducing this to a too small value would start killing internal queries. Now, after said commit, this is no longer a concern, as this limit no longer affects unpaged queries. * Sets (the new) query-page-size-in-bytes 1MB (default) -> 1KB. With this two changes, we can reduce the data size: * total_rows: 20000 -> 100 * max_live_rows: 32 -> 8 The runtime of the test consequently drops from 62 seconds to 13.5 seconds (dev mode, on my build machine).	2024-02-27 02:27:55 -05:00
Botond Dénes	7f3ca3a3d8	replica/database: get_query_max_result_size(): use query_page_size_in_bytes As the page size for user queries, instead of the hard-coded constant used before. For system queries, we keep using the previous constant.	2024-02-27 02:27:55 -05:00
Botond Dénes	8213e66815	replica/database: use include page-size in max-result-size This patch changes get_unlimited_query_max_result_size(): * Also set the page-size field, not just the soft/hard limits * Renames it to get_query_max_result_size() * Update callers, specifically storage_proxy::get_max_result_size(), which now has a much simpler common return path and has to drop the page size on one rare return path. This is a purely mechanical change, no behaviour is changed.	2024-02-27 02:27:55 -05:00
Botond Dénes	97615e0d9a	query-request: max_result_size: add without_page_limit() Returns an instance with the page_limit reset to 0. This converts a max_results_size which is usable only with the "page_size_and_safety_limit" feature, to one which can be used before this feature. To be used in the next patch.	2024-02-27 02:14:46 -05:00
Botond Dénes	5e37c1465f	db/config: introduce query_page_size_in_bytes Regulates the page size in bytes via config, instead of the currently used hard-coded constant. Allows tests to configure lower limits so they can work with smaller data-sets when testing paging related functionality. Not wired yet.	2024-02-27 02:14:45 -05:00
Kefu Chai	0fd85a98a9	mutation: add fmt::formatter for position_range before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `position_range`, and the helpers for printing related types are dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 20:15:57 +08:00
Kefu Chai	2f532b9ebc	mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * mutation_fragment * range_tombstone_stream their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 20:15:57 +08:00
Beni Peled	c06282b312	docs: always build from the default branch In order to publish the docs-pages from release branches (see the other commit), we need to make sure that docs is always built from the default branch which contains the updated conf.py Ref https://github.com/scylladb/scylladb/pull/17281	2024-02-26 11:48:38 +02:00
Beni Peled	f59f70fc58	docs: trigger the docs-pages workflow on release branches Currently, the github docs-pages workflow is triggered only when changes are merged to the master/enterprise branches, which means that in the case of changes to a release branch, for example, a fix to branch-5.4, or a branch-5.4>branch-2024.1 merge, the docs-pages is not triggering and therefore the documentation is not updated with the new change, In this change, I added the `branch-**` pattern, so changes to release branches will trigger the workflow.	2024-02-26 11:48:13 +02:00
Kefu Chai	1fe7a467e7	mutation: add fmt::formatter for mutation_fragment_v2::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation_fragment_v2::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 17:47:05 +08:00
Kefu Chai	3d6948c13e	tools/scylla-nodetool: implement info Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 14:52:22 +08:00
Kefu Chai	4d8f74f301	test/nodetool: move format_size into utils.py so that this helper can be shared across more tests. `test_info.py` will be using it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 14:52:22 +08:00
Kefu Chai	cd228f4d6c	docs: remove leading space in table element otherwise sphinx would consider "Within which Data Center the" as the "term" part of an entry in a definition list, and "node is located" as the definition part of this entry. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 13:03:26 +08:00
Kefu Chai	d12655ff46	docs: remove space in words * remove space in "Exceptions", otherwise it renders like "Except" "tions", which does not look right. * remove space in "applicable". * remove space in "Transport". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 13:03:26 +08:00
Kamil Braun	fd32e2ee10	Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc. The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0. The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0. There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A: 1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted, 2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state, 3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state, 4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`, 5. Shard 1 resumes and stores version `v` in the global counter, 6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`. 7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4. Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by: 1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual, 2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value, 3. running a test which performs schema changes in a loop. The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway. Refs: #17493 Closes scylladb/scylladb#17499 * github.com:scylladb/scylladb: version_generator: check that get_next_version is called on shard 0 misc_services: fix data race from bad usage of get_next_version	2024-02-25 19:35:34 +01:00
Gleb Natapov	59df47920b	topology coordinator: fix use after free in rollback_to_normal state node.rs pointer can be freed while guard is released, so it cannot be accessed during error processing. Save state locally. Fixes scylladb/scylladb#17402 CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6993/ Message-ID: <ZdtJNJM056r4EZzz@scylladb.com>	2024-02-25 16:34:19 +02:00
Raphael S. Carvalho	f07c233ad5	Fix potential data resurrection when another compaction type does cleanup work Since commit `f1bbf70`, many compaction types can do cleanup work, but turns out we forgot to invalidate cache on their completion. So if a node regains ownership of token that had partition deleted in its previous owner (and tombstone is already gone), data can be resurrected. Tablet is not affected, as it explicitly invalidates cache during migration cleanup stage. Scylla 5.4 is affected. Fixes #17501. Fixes #17452. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17502	2024-02-25 13:08:04 +02:00
Yaron Kaikov	493327afd8	[action] Add promoted label when commits are in master In Scylla, we don't merge our PR but use ./script/pull_github_pr.shto close the pull request, addingcloses scylladb/scylladb remark and push changes tonext` branch. One of the conditions for opening a backport PR is that all relevant commits are in master (passed gating), in this GitHub action, we will go through the list of commits once a push was made to master and will identify the relevant PR, and add promoted label to it. This will allow Mergify to start the process of backporting	2024-02-25 11:56:50 +02:00
Nadav Har'El	b4cef638ef	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * canonical_mutation * atomic_cell_view * atomic_cell * atomic_cell_or_collection::printer Refs #13245 Closes scylladb/scylladb#17506 * github.com:scylladb/scylladb: mutation: add fmt::formatter for canonical_mutation mutation: add fmt::formatter for atomic_cell_view and atomic_cell mutation: add fmt::formatter for atomic_cell_or_collection::printer	2024-02-25 09:48:56 +02:00
Kefu Chai	84ba624415	mutation: add fmt::formatter for canonical_mutation before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for canonical_mutation Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:48:13 +08:00
Kefu Chai	3625796222	mutation: add fmt::formatter for atomic_cell_view and atomic_cell before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * atomic_cell_view * atomic_cell and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:19:11 +08:00
Kefu Chai	b4fa32ec17	mutation: add fmt::formatter for atomic_cell_or_collection::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `atomic_cell_or_collection::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:18:41 +08:00
Lakshmi Narayanan Sreethar	c7eab9329f	test/topology_custom: add testcase to verify reshape with tablets Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar	ed2d8529f3	test/pylib/rest_client: add get_sstable_info, enable/disable_autocompaction Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar	7196d2fff4	replica/distributed_loader: enable reshape for sstables Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Lakshmi Narayanan Sreethar	83fecc2f1f	compaction: reshape sstables within compaction groups For tables using tablet based replication strategies, the sstables should be reshaped only within the compaction groups they belong to. Updated shard_reshaping_compaction_task_impl to group the sstables based on their compaction groups before reshaping them within the groups. Fixes #16966 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 18:43:39 +05:30
Piotr Dulikowski	54546e1530	version_generator: check that get_next_version is called on shard 0 The get_next_version function can only be safely called from shard 0, but this constraint is not enforced in any way. As evidenced in the previous commit, it is easy to accidentally call it from a non-zero shard. Introduce a runtime check to get_next_version which calls on_fatal_internal_error if it detects that the function was called form the wrong shard. This will let us detect cross-shard use issues in runtime.	2024-02-23 13:49:49 +01:00
Piotr Dulikowski	21d5d4e15c	misc_services: fix data race from bad usage of get_next_version The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc. The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0. Fix the race by constructing the versioned value on shard 0.	2024-02-23 12:54:32 +01:00
Kefu Chai	496cf9a1d8	interval: add fmt::formatters for managed_bytes and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * wrapping_interval * interval Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17488	2024-02-23 10:26:30 +02:00
Nadav Har'El	0aaa6b1a08	fmt: add formatter for mutation_fragment_v2::kind Unfortunately, fmt v10 dropped support for operator<< formatters, forcing us to replace the huge number of operator<< implementations in our code by uglier and templated fmt::formatter implementations to get Scylla to compile on modern distros (such as Fedora 39) :-( Kefu has already started doing this migration, here is my small contribution - the formatter for mutation_fragment_v2::kind. This patch is need to compile, for example, build/dev/mutation/mutation_fragment_stream_validator.o. I can't remove the old operator<< because it's still used by the implementation of other operator<< functions. We can remove all of them when we're done with this coversion. In the meantime, I replaced the original implementation of operator<< by a trivial implementation just passing the work to the new fmt::print support. Refs #13245 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17432	2024-02-23 10:25:39 +02:00
Botond Dénes	c1267900c6	Merge 'sstables: add fmt::formatter for sstable types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * bound_kind_m * sstable_state * indexable_element * deletion_time drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17490 * github.com:scylladb/scylladb: sstables: add fmt::formatter for deletion_time sstable: add fmt::formatter for indexable_element sstables: add fmt::foramtter for sstable_state sstables: add fmt::formatter for sstables::bound_kind_m	2024-02-23 10:09:26 +02:00
Botond Dénes	89efa89dd7	Merge 'test: add fmt::formatters' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for some types used in testing. Refs #13245 Closes scylladb/scylladb#17485 * github.com:scylladb/scylladb: test/unit: add fmt::formatter for tree_test_key_base test: add printer for type for BOOST_REQUIRE_EQUAL test: add fmt::formatters test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result	2024-02-23 09:32:39 +02:00
Botond Dénes	1f363a876e	Merge 'utils: add fmt::formatter for occupancy_stats, managed_bytes and friends ' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * managed_bytes * managed_bytes_view * managed_bytes_opt * occupancy_stats and drop their operator<<:s Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#17462 * github.com:scylladb/scylladb: utils/managed_bytes: add fmt::formatters for managed_bytes and friends utils/logalloc: add fmt::formatter for occupancy_stats	2024-02-23 09:31:22 +02:00
Botond Dénes	d314ad2725	Merge 'sstables: close index_reader in has_partition_key' from Aleksandra Martyniuk If index_reader isn't closed before it is destroyed, then ongoing sstables reads won't be awaited and assertion will be triggered. Close index_reader in has_partition_key before destroying it. Fixes: #17232. Closes scylladb/scylladb#17355 * github.com:scylladb/scylladb: test: add test to check if reader is closed sstables: close index_reader in has_partition_key	2024-02-23 09:27:55 +02:00
Kefu Chai	010fb5f323	tools/scylla-nodetool: make keyspace argument optional for "ring" the "keyspace" argument of the "ring" command is optional. but before this change, we considered it a mandatory option. it was wrong. so, in this change, we make it optional, and print out the warning message if the keyspace is not specified. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17472	2024-02-23 09:25:29 +02:00
Kefu Chai	6800810dba	interval, multishard_mutation_query: fix typos in comments these misspellings were identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17491	2024-02-23 09:06:24 +02:00
Botond Dénes	a08d9ba2a4	Merge 'tools/scylla-nodetool: fixes to address test failures with dtest' from Kefu Chai * tighten the param check for toppartitions * add an extra empty line inbetween reports Closes scylladb/scylladb#17486 * github.com:scylladb/scylladb: tools/scylla-nodetool: add an extra empty line inbetween reports tools/scylla-nodetool: tighten the param check for toppartitions	2024-02-23 09:05:30 +02:00
Botond Dénes	959d33ba39	Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk RPC calls lose information about the type of returned exception. Thus, if a table is dropped on receiver node, but it still exists on a sender node and sender node streams the table's data, then the whole operation fails. To prevent that, add a method which synchronizes schema and then checks, if the exception was caused by table drop. If so, the exception is swallowed. Use the method in streaming and repair to continue them when the table is dropped in the meantime. Fixes: #17028. Fixes: #15370. Fixes: #15598. Closes scylladb/scylladb#17231 * github.com:scylladb/scylladb: repair: handle no_such_column_family from remote node gracefully test: test drop table on receiver side during streaming streaming: fix indentation streaming: handle no_such_column_family from remote node gracefully repair: add methods to skip dropped table	2024-02-23 08:25:45 +02:00
Kefu Chai	3574c22d73	test/nodetool/utils: print out unmatched output on test failure would be more helpful if the matched could print out the unmatched output on test failure. so, in this change, both stdout and stderr are printed if they fail to match with the expected error. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17489	2024-02-23 08:20:30 +02:00
Botond Dénes	234aa99aaa	Merge 'tools/scylla-nodetool: extract and use {yaml,json}_writers' from Kefu Chai simpler this way. Closes scylladb/scylladb#17437 * github.com:scylladb/scylladb: tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation tools/scylla-nodetool: add {json,yaml}_writer	2024-02-23 08:13:07 +02:00
Kefu Chai	3a3f0d392f	gms/versioned_value: impl operator<<(.., const gms::versioned_value) using fmt less repeatings this way. this is also a follow-up change of `cb781c0ff7`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17390	2024-02-23 08:11:03 +02:00
Kefu Chai	62abf89312	sstables: add fmt::formatter for deletion_time before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::deletion_time`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:56:32 +08:00
Kefu Chai	a5a757387a	sstable: add fmt::formatter for indexable_element before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::indexable_element`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:56:28 +08:00
Kefu Chai	5754b9eb08	sstables: add fmt::foramtter for sstable_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::sstable_state`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:55:49 +08:00
Kefu Chai	9a32029a8f	sstables: add fmt::formatter for sstables::bound_kind_m before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `sstables::bound_kind_m`, drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 13:55:22 +08:00
Kefu Chai	67c69be3c6	tools/scylla-nodetool: add an extra empty line inbetween reports before this change, `toppartitions` does not print an empty line after an empty sampling warning message. but dtest/toppartitions_test.py actually split sampling reports with two newlines, so let's appease it. the output also looks better this way, as the samplings for READS and WRITES are always visually separated with an empty line. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 12:57:51 +08:00
Kefu Chai	381c389b56	tools/scylla-nodetool: tighten the param check for toppartitions the test cases of `test_any_of_required_parameters_is_missing` considers that we should either pass all positional argument or pass none of them, otherwise nodetool should fail. but `scylla nodetool` supported partial positional argument. to be more consistent with the expected behavior, in this change, we enforce the sanity check so that we only accept either all positional args or none of them. the corresponding test is added. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 12:57:51 +08:00
Kefu Chai	3835ebfcdc	utils/managed_bytes: add fmt::formatters for managed_bytes and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * managed_bytes * managed_bytes_view * managed_bytes_opt Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 11:32:41 +08:00
Kefu Chai	3d9054991b	utils/logalloc: add fmt::formatter for occupancy_stats before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `occupancy_stats`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 11:32:41 +08:00
Avi Kivity	bf107dae84	test/unit: add fmt::formatter for tree_test_key_base before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for the classes derived from `tree_test_key_base` (this change was extracted from a larger change at #15599) Refs #13245	2024-02-23 10:52:12 +08:00
Kefu Chai	a70318e722	test: add printer for type for BOOST_REQUIRE_EQUAL after dropping the operator<< for vector, we would not able to use BOOST_REQUIRE_EQUAL to compare vector<>. to be prepared for this, less defined the printer for Boost.test Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:52:12 +08:00
Kefu Chai	63396f780d	test: add fmt::formatters the operator<< for `cql3::expr::test_utils::mutation_column_value` is preserved, as it used by test/lib/expr_test_utils.cc, which prints std::map<sstring, cql3::expr::test_utils::mutation_column_value> using the homebrew generic formatter for std::map<>. and the formatter uses operator<< for printing the elements in map. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:52:12 +08:00
Kefu Chai	2ccd9e695d	test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * scheduling_latency_measurer * perf_result and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-23 10:17:50 +08:00
Lakshmi Narayanan Sreethar	c76871aa65	replica/table : add method to get compaction group id for an sstable Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar	9fffd8905f	compaction: reshape: update total reshaped size only on success The total reshaped size should only be updated on reshape success and not after reshape has been failed due to some exception. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 01:07:54 +05:30
Lakshmi Narayanan Sreethar	4fb099659a	compaction: simplify exception handling in shard_reshaping_compaction_task_impl::run Catch and handle the exceptions directly instead of rethrowing and catching again. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-23 01:07:54 +05:30
Pavel Emelyanov	5682e51a97	test.py: Add test-case splitting in 'name' selection When filtering a test by 'name' consider that name can be in a 'test::case' format. If so, get the left part to be the filter and the right part to be the case name to be passed down to test itself. Later, when the pytest starts it then appends the case name (if not None) to the pytest execution, thus making it run only the specified test-case, not the whole test file. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 19:24:10 +03:00
Pavel Emelyanov	b64710b0c6	test.py: Add casename argument to PythonTest And propagate it from add_test() helper. For now keep it None, next patch will bring more sense to this place Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 19:23:06 +03:00
Amnon Heiman	8859b4d991	Adding scripts/metrics-config.yml The scripts/metrics-config.yml is a configuration file used by get_description.py. It covers the places in the code that uses non-standard way of defining metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-02-22 17:15:30 +02:00
Amnon Heiman	4e67a98a21	Adding scripts/get_description.py to fetch metrics description The get_description script parse a c++ file and search of metrics decleration and their description. It create a pipe delimited file with the metric name, metric family name,description and location in file. To find all description in all files: find . -name "*.cc" -exec grep -l '::description' {} \; \| xargs -i ./get_description.py {} While many of the metrics define in the form of _metrics.add_group("hints_manager", { sm::make_gauge("size_of_hints_in_progress", _stats.size_of_hints_in_progress, sm::description("Size of hinted mutations that are scheduled to be written.")), Some metrics decleration uses variable and string format. The script uses a configuration file to translate parameters and concatenations to the actual names. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-02-22 17:06:26 +02:00
Anna Stuchlik	14a4fa16a8	doc: add placeholder for Enable Raft topology page This commit adds a placeholder for the Enable Raft-based Topology page in the 5.4-to-6.0 upgrade guide. This page needs to be referenced from other pages in the docs.	2024-02-22 16:02:06 +01:00
Pavel Emelyanov	5afaa03241	test/object_store: Remove unused managed_cluster (and other stuff) Now all test cases use pylib manager client to manipulate cluster While at it -- drop more unused bits from suite .py files Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:40:25 +03:00
Kefu Chai	57c408ab5d	alternator: add fmt::formatter for alternator::parsed::path before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `alternator::parsed::path`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17458	2024-02-22 16:40:01 +02:00
Pavel Emelyanov	95ed46e26a	test/object_store: Use tmpdir fixture in flush-retry case Now when the test case in question is not using ManagerCluster, there's no point in using test_tempdir either and the temporary object-store config can be generated in generic temporary directory Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:39:30 +03:00
Pavel Emelyanov	252688fe0c	test/object_store: Turn flush-retry case to use ManagerClient In the middle this test case needs to force scylla server reload its configs. Currently manager API requires that some existing config option is provided as an argument, but in this test case scylla.yaml remains intact. So it satisfies the API with non-chaning option. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Pavel Emelyanov	e742906f1f	test/object_store: Turn "misconfigured" case to use ManagerClient Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Pavel Emelyanov	857b48f950	test/object_store: Turn garbage-collect case to use ManagerClient Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Pavel Emelyanov	d27b91cfb4	test/object_store: Turn basic case to use ManagerClient This case is a bit tricky, as it needs to know where scylla's workdir is, so it replaces the use of test_tempdir with the call to manager to get server's workdir. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:32:34 +03:00
Avi Kivity	67f8dc5a7c	Merge 'mutation: add fmt::formatter for clustering_row, row_tombstone and friends' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * row_tombstone * row_marker * deletable_row::printer * row::printer * clustering_row::printer * static_row::printer * partition_start * partition_end * mutation_fragment::printer and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17461 * github.com:scylladb/scylladb: mutation: add fmt::formatter for clustering_row and friends mutation: add fmt::formatter for row_tombstone and friends	2024-02-22 16:16:26 +02:00
Pavel Emelyanov	89d0704d9b	test/object_store: Prepare to work with ManagerClient This includes - marking the suite as Topology - import needed fixtures and options from topology conftest - configuring the zero initial cluster size and anonymous auth - marking all test cases as skipped, as they no longer work after above Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 17:02:05 +03:00
Aleksandra Martyniuk	4530be9e5b	test: add test to check if reader is closed Add test to check if reader is closed in sstable::has_partition_key.	2024-02-22 14:53:14 +01:00
Aleksandra Martyniuk	5227336a32	sstables: close index_reader in has_partition_key If index_reader isn't closed before it is destroyed, then ongoing sstables reads won't be awaited and assertion will be triggered. Close index_reader in has_partition_key before destroying it.	2024-02-22 14:53:07 +01:00
Yaron Kaikov	6d07f7a0ea	Add mergify (https://mergify.com/) configuration file In this PR we introduce the .mergify.yml configuration file, which include a set of rules that we will use for automating our backport process. For each supported OSS release (currently 5.2 and 5.4) we have an almost identical configuration section which includes the four conditions before we open a backport pr: * PR should be closed * PR should have the proper label. for example: backport/5.4 (we can have multiple labels) * Base branch should be master * PR should be set with a promoted label - this condition will be set automatically once the commits are promoted to the master branch (passed gating) Once all conditions are applied, the verify bot will open a backport PR and will assign it to the author of the original PR, then CI will start running, and only after it pass. we merge	2024-02-22 14:28:08 +02:00
Nadav Har'El	b0233c0833	Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias. Closes scylladb/scylladb#17455 * github.com:scylladb/scylladb: interval: rename nonwrapping_interval to interval interval: rename interval_test to wrapping_interval_test	2024-02-22 14:03:43 +02:00
Kefu Chai	8afdc503b8	cdc: s/string_view/std::string_view/ in `af2553e8`, we added formatters for cdc::image_mode and cdc::delta_mode. but in that change, we failed to qualify `string_view` with `std::` prefix. even it compiles, it depends on a `using std::string_view` or a more error-prone `using namespace std`. neither of which shold be relied on. so, in this change, we add the `std::` prefix to `string_view`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17459	2024-02-22 13:49:19 +02:00
Avi Kivity	35b700a884	Merge 'compaction: add fmt::formatter for types' from Kefu Chai * `sstables::compaction_type` * `sstables::compaction_type_options::scrub::mode` * `sstables::compaction_type_options::scrub::quarantine_mode` * `formatted_sstables_list` Refs #13245 Closes scylladb/scylladb#17439 * github.com:scylladb/scylladb: compaction: add formatter for formatted_sstables_list compaction: add fmt::formatter for compaction_type and friends	2024-02-22 13:48:30 +02:00
Pavel Emelyanov	027282ee07	perf_simple_query: Add --memtable-partitions option There's the --partitions one that specifies how many partitions the test would generate before measuring. When --bypass-cache option is in use, thus making the test alway engage sstables readers, it makes sense to add some control over sstables granularity. The new option suggests that during population phase, memtable gets flushed every $this-number partitions, not just once at the end (and unknown amount of times in the middle because of dirty memory limit). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 14:44:17 +03:00
Pavel Emelyanov	fd4c2e607e	perf_simple_query: Disable auto compaction Usually a perf test doesn't expect that some activity runs in the background without controls. Compaction is one of a kind, so it makes sense to keep it off while running the measurement. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 14:43:23 +03:00
Pavel Emelyanov	74899f71de	perf_simple_query: Keep number of initial tablets in output json When producing the output json file, keep how many initial tablets were requested (if at all) next to other workload parameters Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-22 14:42:39 +03:00
Kefu Chai	643c01fd80	locator: fix typo in comment -- s/slecting/selecting/ fix a typo Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17470	2024-02-22 13:28:18 +02:00
Avi Kivity	89f86962f5	Merge 'streaming: add fmt::formatter for stream_session_state and stream_request' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * `streaming::stream_request`, * `stream_session_state` and drop their operator<<:s Refs #13245 Closes scylladb/scylladb#17464 * github.com:scylladb/scylladb: streaming: add fmt::formatter for streaming::stream_request streaming: add fmt::formatter for stream_session_state	2024-02-22 13:04:02 +02:00
Kefu Chai	5c0952ab59	compaction: add fmt::formatter for compaction_type and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * `sstables::compaction_type` * `sstables::compaction_type_options::scrub::mode` * `sstables::compaction_type_options::scrub::quarantine_mode`` and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17441	2024-02-22 13:02:37 +02:00
Kamil Braun	3d15fecf12	Merge 'amend cluster_status_table virtual table to work with raft' from Gleb cluster_status_table virtual table have a status field for each node. In gossiper mode the status is taken from the gossiper, but with raft the states are different and are stored in the topology state machine. The series fixes the code to check current mode and take the status from correct place. Refs scylladb/scylladb#16984 * 'gleb/cluster_status_table-v1' of github.com:scylladb/scylla-dev: gossiper: remove unused REMOVAL_COORDINATOR state virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled virtual_tables: create result for cluster_status_table read on shard 0	2024-02-22 11:47:57 +01:00
Kamil Braun	3ee56e1936	Merge 'raft topology: enable writes to previous CDC generations' from Patryk Jędrzejczak When we create a CDC generation and ring-delay is non-zero, the timestamp of the new generation is in the future. Hence, we can have multiple generations that can be written to. However, if we add a new node to the cluster with the Raft-based topology, it receives only the last committed generation. So, this node will be rejecting writes considered correct by the other nodes until the last committed generation starts operating. In scylladb/scylladb#17134, we have allowed sending writes to the previous CDC generations. So, the situation became even more complicated. This PR adjusts the Raft-based topology to ensure all required generations are loaded into memory and their data isn't cleared too early. To load all required generations into memory, we replace `current_cdc_generation_{uuid, timestamp}` with the set containing IDs of all committed generations - `committed_cdc_generations`. To ensure this set doesn't grow endlessly, we remove an entry from this set together with the data in CDC_GENERATIONS_V3. Currently, we may clear a CDC generation's data from CDC_GENERATIONS_V3 if it is not the last committed generation and it is at least 24 hours old (according to the topology coordinator's clock). However, after allowing writes to the previous CDC generations, this condition became incorrect. We might clear data of a generation that could still be written to. The new solution introduced in this PR is to clear data of the generations that finished operating more than 24 hours ago. Apart from the changes mentioned above, this PR hardens `test_cdc_generation_clearing.py`. Fixes scylladb/scylladb#16916 Fixes scylladb/scylladb#17184 Fixes scylladb/scylladb#17288 Closes scylladb/scylladb#17374 * github.com:scylladb/scylladb: test: harden test_cdc_generation_clearing test: test clean-up of committed_cdc_generations raft topology: clean committed_cdc_generations raft topology: clean only obsolete CDC generations' data storage_service: topology_state_load: load all committed CDC generations system_keyspace: load_topology_state: fix indentation raft topology: store committed CDC generations' IDs in the topology	2024-02-22 11:41:25 +01:00
Gleb Natapov	fe5853aacc	storage_service: disable removenode --force in raft mode and deprecate it for gossiper mode removenode --force is an unsafe operation and does not even make sense with topology over raft. This patch disables it if raft is enabled and prints a deprecation note otherwise. We already have a PR to remove it (https://github.com/scylladb/scylladb/pull/15834), but it was decided there that a deprecation period is needed for legacy use case. Fixes: scylladb/scylladb#16293	2024-02-22 11:08:57 +01:00
Kefu Chai	37c6073fd5	mutation: add fmt::formatter for clustering_row and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * clustering_row::printer * static_row::printer * partition_start * partition_end * mutation_fragment::printer and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 17:53:34 +08:00
Kefu Chai	9ee728dab9	scylla-gdb: use raw string when '\' is not used in an escape sequence when '\' does not start an escape sequence, Python complains at seeing it. but it continues anyway by considering '\' as a separate char. but the warning message is still annoying: ``` scylla-gdb.py: 2417: SyntaxWarning: invalid escape sequence '\-' branches = (r" \|-- ", " \-- ") ``` when sourcing this script. so, let's mark these strings as raw strings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17466	2024-02-22 09:03:26 +02:00
Kefu Chai	4ee2aee279	tools/scylla-nodetool: define operator<< for vector<sstring> we already have generic operator<< based formatter for sequence-alike ranges defined in `utils/to_string.hh`, but as a part of efforts to address #13245, we will eventually drop the formatter. to prepare for this change, we should create/find the alternatives where the operator<< for printing the ranges is still used. Boost::program_options is one of them. it prints the options' default values using operator<< in its error message or usage. so in order to keep it working, we define operator<< for `vector<sstring>` here. if there are more types are required, we will need the generalize this formatter. if there are more needs from different compiling units, we might need to extract this helper into, for instance, `utils/to_string.hh`. but we should do this after removing it. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17413	2024-02-22 09:01:04 +02:00
Kefu Chai	da7ffd4e73	tools/scylla-types: print using managed_bytes instead of materializing the `managed_bytes_view` to a string, and print it, print it directly to stdout. this change helps to deprecate `to_hex()` helpers, we should materialize string only when necessary. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17463	2024-02-22 09:00:38 +02:00
Kefu Chai	f644ba9cdc	streaming: add fmt::formatter for streaming::stream_request before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `streaming::stream_request`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 14:03:59 +08:00
Kefu Chai	618091f6f7	streaming: add fmt::formatter for stream_session_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `streaming::stream_session_state`, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 14:03:59 +08:00
Kefu Chai	b61b5a8b5d	mutation: add fmt::formatter for row_tombstone and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * row_tombstone * row_marker * deletable_row::printer * row::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 12:44:33 +08:00
Avi Kivity	51df8b9173	interval: rename nonwrapping_interval to interval Our interval template started life as `range`, and was supported wrapping to follow Cassandra's convention of wrapping around the maximum token. We later recognized that an interval type should usually be non-wrapping and split it into wrapping_range and nonwrapping_range, with `range` aliasing wrapping_range to preserve compatibility. Even later, we realized the name was already taken by C++ ranges and so renamed it to `interval`. Given that intervals are usually non-wrapping, the default `interval` type is non-wrapping. We can now simplify it further, recognizing that everyone assumes that an interval is non-wrapping and so doesn't need the nonwrapping_interval_designation. We just rename nonwrapping_interval to `interval` and remove the type alias.	2024-02-21 19:43:17 +02:00
Avi Kivity	e338f0e009	interval: rename interval_test to wrapping_interval_test As preparation for reclaiming the name `interval` for nonwrapping_interval, rename interval_test to wrapping_interval_test.	2024-02-21 19:38:53 +02:00
Avi Kivity	1df5697bd7	Merge 'Refine some api/column_family endpoints' from Pavel Emelyanov Those that collect vectors with ks/cf names can reserve the vectors in advance. Also one of those can use range loop for shorter code Closes scylladb/scylladb#17433 * github.com:scylladb/scylladb: api: Reserve vectors in advance api: Use range-loop to iterate keyspaces	2024-02-21 19:19:28 +02:00
Tomasz Grabiec	ef9e5e64a3	locator: token_metadata: Introduce topology barrier stall detector When topology barrier is blocked for longer than configured threshold (2s), stale versions are marked as stalled and when they get released they report backtrace to the logs. This should help to identify what was holding for token metadata pointer for too long. Example log: token_metadata - topology version 30 held for 299.159 [s] past expiry, released at: 0x2397ae1 0x23a36b6 ... Closes scylladb/scylladb#17427	2024-02-21 15:05:34 +02:00
Nadav Har'El	e02cfd0035	Merge 'query.h: add fmt::formatter for types' from Kefu Chai query::specific_ranges * query::partition_slice * query::read_command * query::forward_request * query::forward_request::reduction_type * query::forward_request::aggregation_info * query::forward_result::printer * query::result_set * query::result_set_row * query::result::printer Refs #13245 Closes scylladb/scylladb#17440 * github.com:scylladb/scylladb: query-result.hh: add formatter for query::result::printer query-result-set: add formatter for query-result-set.hh types query-request: add formatter for query-request.hh types	2024-02-21 14:46:36 +02:00
Avi Kivity	4be70bfc2b	Merge 'multishard_mutation_query: add tablets support' from Botond Dénes When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard. The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single instance. This makes the code simple. For tablets multishard_mutation_query.cc is no longer on the hot paths, range scans on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance. Fixes: #16484 Closes scylladb/scylladb#16802 * github.com:scylladb/scylladb: test/cql-pytest: remove skip_with_tablets fixture test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets multishard_mutation_query: add tablets support multishard_mutation_query: remove compaction-state from result-builder factory multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>> mutation_query: reconcilable_result: add merge_disjoint() locator: introduce tablet_range_spliter dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive interval: add before() overload which takes another interval	2024-02-21 13:40:55 +02:00
Botond Dénes	94dac43b2f	tools/utils: configure tools to use the epoll reactor backend The default AIO backend requires AIO blocks. On production systems, all available AIO blocks could have been already taken by ScyllaDB. Even though the tools only require a single unit, we have seen cases where not even that is available, ScyllDB having siphoned all of the available blocks. We could try to ensure all deployments have some spare blocks, but it is just less friction to not have to deal with this problem at all, by just using the epoll backend. We don't care about performance in the case of the tools anyway, so long as they are not unreasonably slow. And since these tools are replacing legacy tools written in Java, the bar is low. Closes scylladb/scylladb#17438	2024-02-21 11:58:09 +02:00
Kefu Chai	1263494dd1	query-result.hh: add formatter for query::result::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for following types * query::result::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:57:18 +08:00
Kefu Chai	e5a930e7c6	query-result-set: add formatter for query-result-set.hh types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for following types * query::result_set * query::result_set_row Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:54:48 +08:00
Kefu Chai	4383ca431c	query-request: add formatter for query-request.hh types before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for following types * query::specific_ranges * query::partition_slice * query::read_command * query::forward_request * query::forward_request::reduction_type * query::forward_request::aggregation_info * query::forward_result::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:54:41 +08:00
Kefu Chai	6408834e33	compaction: add formatter for formatted_sstables_list before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `formatted_sstables_list`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:45:45 +08:00
Kefu Chai	9969d88d82	compaction: add fmt::formatter for compaction_type and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * `sstables::compaction_type` * `sstables::compaction_type_options::scrub::mode` * `sstables::compaction_type_options::scrub::quarantine_mode` and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 17:45:40 +08:00
Kefu Chai	61308d51ef	tools/scylla-nodetool: use {yaml,json}_writers in compactionhistory_operation simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 16:49:30 +08:00
Kefu Chai	e9e558534a	tools/scylla-nodetool: add {json,yaml}_writer so that we have less repeatings for dumping the metrics. the repeatings are error-prone and not maintainable. also move them out into a separate header, to keep fit of this source file -- it's now 3000 LOC. also, by moving them out, we can reuse them in other subcommands without moving them to the top of this source file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-21 16:49:30 +08:00
Botond Dénes	ca585903b7	test/cql-pytest: remove skip_with_tablets fixture All tests that used it are fixed, and we should not add any new tests failing with tablets from now on, so remove.	2024-02-21 02:08:49 -05:00
Botond Dénes	8df82d4781	test/cql-pytest: test_select_from_mutation_fragments.py parameterize tests To run with both vnodes and tablets. For this functionality, both replication methods should be covered with tests, because it uses different ways to produce partition lists, depending on the replication method. Also add scylla_only to those tests that were missing this fixture before. All tests in this suite are scylla-only and with the parameterization, this is even more apparent.	2024-02-21 02:08:49 -05:00
Botond Dénes	b09b949159	test/cql-pytest: test_select_from_mutation_fragments.py: remove skip_with_tablets The underlying functionality was fixed, the tests should now pass with tablets.	2024-02-21 02:08:49 -05:00
Botond Dénes	ce472b33b8	multishard_mutation_query: add tablets support When reading a list of ranges with tablets, we don't need a multishard reader. Instead, we intersect the range list with the local nodes tablet ranges, then read each range from the respective shard. The individual ranges are read sequentially, with database::query[_mutations](), merging the results into a single instance. This makes the code simple. For tablets, multishard_mutation_query.cc is no longer on the hot paths, range scans on tables with tablets fork off to a different code-path in the coordinator. The only code using multishard_mutation_query.cc are forced, replica-local scans, like those used by SELECT * FROM MUTATION_FRAGMENTS(). These are mainly used for diagnostics and tests, so we optimize for simplicity, not performance.	2024-02-21 02:08:48 -05:00
Botond Dénes	d160a179ee	multishard_mutation_query: remove compaction-state from result-builder factory This param was used by the query-result builder, to set the last-position on end-of-stream. Instead, do this via a new ResultBuilder method, maybe_set_last_position(), which is called from read_page(), which has access to the compaction-state. With this, the ResultBuilder can be created without a compaction-state at hand. This will be important in the next patch.	2024-02-21 02:08:48 -05:00
Botond Dénes	95bc0cb1c0	multishard_mutation_query: do_query(): return foreign_ptr<lw_shared_ptr<result>> Makes future patching easier.	2024-02-21 02:08:48 -05:00
Botond Dénes	35e6cbf42e	mutation_query: reconcilable_result: add merge_disjoint() Merging two disjoint reconcilable_result instances.	2024-02-21 02:08:48 -05:00
Botond Dénes	7bdd0c2cae	locator: introduce tablet_range_spliter Given a list of partition-ranges, yields the intersection of this range-list, with that of that tablet-ranges, for tablets located on the given host. This will be used in multishard_mutation_query.cc, to obtain the ranges to read from the local node: given the read ranges, obtain the ranges belonging to tablets who have replicas on the local node.	2024-02-21 02:08:48 -05:00
Botond Dénes	4993d0e30a	dht/i_partitioner: to_partition_range(): don't assume input is fully inclusive Consider the inclusiveness of the token-range's start and end bounds and copy the flag to the output bounds, instead of assuming they are always inclusive.	2024-02-21 02:08:48 -05:00
Botond Dénes	239484f259	interval: add before() overload which takes another interval The current point variant cannot take inclusiveness into account, when said point comes from another interval bound. This method had no tests at all, so add tests covering both overloads.	2024-02-21 02:08:48 -05:00
Avi Kivity	605bf6e221	range.hh: retire range.hh was deprecated in `bd794629f9` (2020) since its names conflict with the C++ library concept of an iterator range. The name ::range also mapped to the dangerous wrapping_interval rather than nonwrapping_interval. Complete the deprecation by removing range.hh and replacing all the aliases by the names they point to from the interval library. Note this now exposes uses of wrapping intervals as they are now explicit. The unit tests are renamed and range.hh is deleted. Closes scylladb/scylladb#17428	2024-02-21 00:24:25 +02:00
Wojciech Mitros	4c767c379c	mv: adjust the overhead estimation for view updates In order to avoid running out of memory, we can't underestimate the memory used when processing a view update. Particularly, we need to handle the remote view updates well, because we may create many of them at the same time in contrast to local updates which are processed synchronously. After investigating a coredump generated in a crash caused by running out of memory due to these remote view updates, we found that the current estimation is much lower than what we observed in practice; we identified overhead of up to 2288 bytes for each remote view update. The overhead consists of: - 512 bytes - a write_response_handler - less than 512 bytes - excessive memory allocation for the mutation in bytes_ostream - 448 bytes - the apply_to_remote_endpoints coroutine started in mutate_MV() - 192 bytes - a continuation to the coroutine above - 320 bytes - the coroutine in result_parallel_for_each started in mutate_begin() - 112 bytes - a continuation to the coroutine above - 192 bytes - 5 unspecified allocations of 32, 32, 32, 48 and 48 bytes This patch changes the previous overhead estimate of 256 bytes to 2288 bytes, which should take into account all allocations in the current version of the code. It's worth noting that changes in the related pieces of code may result in a different overhead. The allocations seem to be mostly captures for the background tasks. Coroutines seem to allocate extra, however testing shows that replacing a coroutine with continuations may result in generating a few smaller futures/continuations with a larger total size. Besides that, considering that we're waiting for a response for each remote view update, we need the relatively large write_response_handler, which also includes the mutation in case we needed to reuse it. The change should not majorly affect workloads with many local updates because we don't keep many of them at the same time anyway, and an added benefit of correct memory utilization estimation is avoiding evictions of other memory that would be otherwise necessary to handle the excessive memory used by view updates. Fixes #17364 Closes scylladb/scylladb#17420	2024-02-21 00:05:49 +02:00
Tomasz Grabiec	e63d8ae272	Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration. This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier. To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field. refs: #16527 Closes scylladb/scylladb#17360 * github.com:scylladb/scylladb: test/topology: Add checking error paths for failed migration topology.tablets_migration: Handle failed streaming topology.tablets_migration: Add cleanup_target transition stage topology.tablets_migration: Add revert_migration transition stage storage_service: Rewrap cleanup stage checking in cleanup_tablet() test/topology: Move helpers to get tablet replicas to pylib	2024-02-20 18:50:55 +01:00
Anna Stuchlik	37237407f6	doc: remove info about outdated versions This PR removes information about outdated versions, including disclaimers and information when a given feature was added. Now that the documentation is versioned, information about outdated versions is unnecessary (and makes the docs harder to read). Fixes https://github.com/scylladb/scylladb/issues/12110 Closes scylladb/scylladb#17430	2024-02-20 19:32:13 +02:00
Pavel Emelyanov	ceac65be1e	api: Reserve vectors in advance Some endpoints in api/column_family fill vectors with data obtained from database and return them back. Since the amount of data is known in advance, it's good to reserve the vector. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 19:13:05 +03:00
Pavel Emelyanov	f3e58cb806	api: Use range-loop to iterate keyspaces The code uses standard for (;;) loop, but range version is nicer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 19:12:12 +03:00
Avi Kivity	93af3dd69b	Merge 'Maintenance socket: set filesystem permissions to 660' from Mikołaj Grzebieluch Set filesystem permissions for the maintenance socket to 660 (previously it was 755) to allow a scyllaadm's group to connect. Split the logic of creating sockets into two separate functions, one for each case: when it is a regular cql controller or used by maintenance_socket. Fixes https://github.com/scylladb/scylladb/issues/16487. Closes scylladb/scylladb#17113 * github.com:scylladb/scylladb: maintenance_socket: add option to set owning group transport/controller: get rid of magic number for socket path's maximal length transport/controller: set unix_domain_socket_permissions for maintenance_socket transport/controller: pass unix_domain_socket_permissions to generic_server::listen transport/controller: split configuring sockets into separate functions	2024-02-20 15:09:54 +02:00
Botond Dénes	73a3a3faf3	Merge 'tools/scylla-nodetool: implement tablestats' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17387 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement tablestats utils/rjson: add templated streaming_writer::Write()	2024-02-20 14:46:07 +02:00
Botond Dénes	8c228bffc8	Merge 'repair: accelerate repair load_history time' from Xu Chang Using `parallel_for_each_table` instance of `for_each_table_gently` on `repair_service::load_history`, to reduced bootstrap time. Using uuid_xor_to_uint32 on repair load_history dispatch to shard. Ref: https://github.com/scylladb/scylladb/issues/16774 Closes scylladb/scylladb#16927 * github.com:scylladb/scylladb: repair: resolve load_history shard load skew repair: accelerate repair load_history time	2024-02-20 13:45:26 +02:00
Kefu Chai	b0bb3ab5b0	topology: print `node` with node_printer in `da53854b66`, we added formatter for printing a `node`, and switched to this formatter when printing `node*`. but we failed to update some caller sites when migrating to the new formatter, where a `unique_ptr<node>` is printed instead. this is not the behavior before the change, and is not expected. so, in this change, we explicitly instantiate `node_printer` instances with the pointer held by `unique_ptr<node>`, to restore the behavior before `da53854b66`. this issue was identified when compiling the tree using {fmt} v10 and compile-time format-string check enabled, which is yet upstreamed to Seastar. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17418	2024-02-20 14:35:56 +03:00
Patryk Jędrzejczak	419354bc9f	test: harden test_cdc_generation_clearing In one of the previous patches, we fixed scylladb/scylladb#16916 as a side effect. We removed `system_keyspace::get_cdc_generations_cleanup_candidate`, which contained the bug causing the issue. Even though we didn't have to fix this issue directly, it showed us that `test_cdc_generation_clearing` was too weak. If something went wrong during/after the only clearing, the test still could pass because the clearing was the last action in the test. In scylladb/scylladb#16916, the CDC generation publisher was stuck after the clearing because of a recurring error. The test wouldn't detect it. Therefore, we harden the test by expecting two clearings instead of one. If something goes wrong during the first clearing, there is a high chance that the second clearing will fail. The new test version wouldn't pass with the old bug in the code.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	2b724735d1	test: test clean-up of committed_cdc_generations We extend `test_cdc_generation_clearing`. Now, it also tests the clean-up of `TOPOLOGY.committed_cdc_generations` added in the previous patch. In the implementation, we harden the already existing `check_system_topology_and_cdc_generations_v3_consistency`. After the previous patch, data of every generation present in `committed_cdc_generations` should be present in CDC_GENERATIONS_V3. In other words, `committed_cdc_generations` should always be a subset of a set containing generations in CDC_GENERATIONS_V3. Before the previous patch, this wasn't true after the clearing, so the new version of `test_cdc_generation_clearing` wouldn't pass back then.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	7301d1317b	raft topology: clean committed_cdc_generations We clean `TOPOLOGY.committed_cdc_generations` from obsolete generations to ensure this set doesn't grow endlessly. After this patch, the following invariant will be true: if a generation is in `committed_cdc_generation`, its data is in CDC_GENERATIONS_V3.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	b8aa74f539	raft topology: clean only obsolete CDC generations' data Currently, we may clear a CDC generation's data from CDC_GENERATIONS_V3 if it is not the last committed generation and it is at least 24 hours old (according to the topology coordinator's clock). However, after allowing writes to the previous CDC generations, this condition became incorrect. We might clear data of a generation that could still be written to. The new solution is to clear data of the generations that finished operating more than 24 hours ago. The rationale behind it is in the new comment in `topology_coordinator:clean_obsolete_cdc_generations`. The previous solution used the clean-up candidate. After introducing `committed_cdc_generations`, it became unneeded. The last obsolete generation can be computed in `topology_coordinator:clean_obsolete_cdc_generations`. Therefore, we remove all the code that handles the clean-up candidate. After changing how we clear CDC generations' data, `test_current_cdc_generation_is_not_removed` became obsolete. The tested feature is not present in the code anymore. `test_dependency_on_timestamps` became the only test case covering the CDC generation's data clearing. We adjust it after the changes.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	8b214d02fb	storage_service: topology_state_load: load all committed CDC generations We load all committed CDC generations into `cdc::metadata`. Since we have allowed sending writes to the previous generations in scylladb/scylladb#17134, the committed generations may be necessary to handle a correct request.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	18cff1aa6a	system_keyspace: load_topology_state: fix indentation Broken in the previous patch.	2024-02-20 12:35:18 +01:00
Patryk Jędrzejczak	e145e758eb	raft topology: store committed CDC generations' IDs in the topology When we create a CDC generation and ring-delay is non-zero, the timestamp of the new generation is in the future. Hence, we can have multiple generations that can be written to. However, if we add a new node to the cluster with the Raft-based topology, it receives only the last committed generation. So, this node will be rejecting writes considered correct by the other nodes until the last committed generation starts operating. In scylladb/scylladb#17134, we have allowed sending writes to the previous CDC generations. So, the situation became even more complicated. We need to adjust the Raft-based topology to ensure all required generations are loaded into memory and their data isn't cleared too early. This patch is the first step of the adjustment. We replace `current_cdc_generation_{uuid, timestamp}` with the set containing IDs of all committed generations - `committed_cdc_generations`. This set is sorted by timestamps, just like `unpublished_cdc_generations`. This patch is mostly refactoring. The last generation in `committed_cdc_generations` is the equivalent of the previous `current_cdc_generation_{uuid, timestamp}`. The other generations are irrelevant for now. They will be used in the following patches. After introducing `committed_cdc_generations`, a newly committed generation is also unpublished (it was current and unpublished before the patch). We introduce `add_new_committed_cdc_generation`, which updates both sets of generations so that we don't have to call `add_committed_cdc_generation` and `add_unpublished_cdc_generation` together. It's easy to forget that both of them are necessary. Before this patch, there was no call to `add_unpublished_cdc_generation` in `topology_coordinator::build_coordinator_state`. It was a bug reported in scylladb/scylladb#17288. This patch fixes it. This patch also removes "the current generation" notion from the Raft-based topology. For the Raft-based topology, the current generation was the last committed generation. However, for the `cdc::metadata`, it was the generation operating now. These two generations could be different, which was confusing. For the `cdc::metadata`, the current generation is relevant as it is handled differently, but for the Raft-based topology, it isn't. Therefore, we change only the Raft-based topology. The generation called "current" is called "the last committed" from now.	2024-02-20 12:35:16 +01:00
Kefu Chai	c627d9134e	tools/scylla-nodetool: implement tablestats Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 18:12:35 +08:00
Kefu Chai	a7a2cf64cc	utils/rjson: add templated streaming_writer::Write() so we can use it in a templated context. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 18:12:35 +08:00
Botond Dénes	050c6dcad7	api: storage_service/keyspaces: add replication filter To allow to filter the returned keyspaces based by the replication they use: tablets or vnodes. The filter can be disabled by omitting the parameter or passing "all". The default is "all". Fixes: #16509 Closes scylladb/scylladb#17319	2024-02-20 09:04:41 +01:00
Kefu Chai	57ede58a64	raft: add fmt::formatter for raft::fsm before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `raft::fsm`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17414	2024-02-20 09:02:02 +02:00
Kefu Chai	acefde0735	mutation: add fmt::formatter for mutation_partition::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `mutation_partition::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17419	2024-02-20 09:01:22 +02:00
Kefu Chai	0b13de52de	sstable/mx: add fmt::formatter for cached_promoted_index::promoted_index_block before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `cached_promoted_index::promoted_index_block`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17415	2024-02-20 09:00:32 +02:00
Botond Dénes	2a494b6c47	Merge 'test/nodetool: parameterize test_ring' from Kefu Chai so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Fixes #17401 Closes scylladb/scylladb#17417 * github.com:scylladb/scylladb: test/nodetool: parameterize test_ring test/nodetool: fail a test only with leftover expected requests	2024-02-20 08:48:11 +02:00
Anna Stuchlik	69ead0142d	doc: remove outdated/invalid entries from FAQ This commit removes outdated or invalid FAQ entries specified in https://github.com/scylladb/scylladb/issues/16631 In addition, the questions about Cassandra compatibility are removed as they are already answered on the forum: https://forum.scylladb.com/t/which-cassandra-version-is-scylladb-it-compatible-with/84 Also, the incorrect entry about the cache has been removed and the correct answer is added to the forum. Fixes https://github.com/scylladb/scylladb/issues/17003 The question about troubleshooting performance issues has also been removed, as it's already covered on the Forum. Also, it removes the Apache copyright entry, which should not be added to the FAQ page. Closes scylladb/scylladb#17200	2024-02-20 08:43:58 +02:00
Anna Stuchlik	4f8f183736	doc: remove SSTable2json from the docs This commit removes the SSTable2json documentation, as well as the links to the removed page. In addition, it adds a redirection for that page to prevent 404. Fixes https://github.com/scylladb/scylladb/issues/17204 Closes scylladb/scylladb#17340	2024-02-20 08:43:27 +02:00
Kefu Chai	64f9d90f7b	tools/scylla-nodetool: implement toppartitions Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17357	2024-02-20 08:16:43 +02:00
Pavel Emelyanov	1440eddc58	test/topology: Add checking error paths for failed migration For now only fail streaming stage and check that migration doesn't get stuck and doesn't make tablet appear on dead node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:59:06 +03:00
Pavel Emelyanov	cb02297642	topology.tablets_migration: Handle failed streaming In case pending or leaving replica is marked as ignored by operator, streaming cannot be retried and should jump to "cleanup_target" stage after a barrier. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:59:06 +03:00
Pavel Emelyanov	72f3b1d5fe	topology.tablets_migration: Add cleanup_target transition stage The new stage will be used to revert migration that fails at some stages. The goal is to cleanup the pending replica, which may already received some writes by doing the cleanup RPC to the pending replica, then jumping to "revert_migration" stage introduced earlier. If pending node is dead, the call to cleanup RPC is skipped. Coordinators use old replicas. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:59:06 +03:00
Pavel Emelyanov	ced5bf56eb	topology.tablets_migration: Add revert_migration transition stage It's like end_migration, but old replicas intact just removing the transition (including new replicas). Coordinators use old replicas. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:53:36 +03:00
Pavel Emelyanov	a0a33e8be1	storage_service: Rewrap cleanup stage checking in cleanup_tablet() Next patch will need to teach this code to handle new cleanup_target stage, this change prepares this place for smoother patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:53:36 +03:00
Pavel Emelyanov	c06cbc391f	test/topology: Move helpers to get tablet replicas to pylib These are very useful and will be used across different test files soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 08:53:36 +03:00
Kefu Chai	3a94a7c1ff	test/nodetool: parameterize test_ring so we exercise the cases where state and status are not "normal" and "up". turns out the MBean is able to cache some objects. so the requets retrieving datacenter and rack are now marked `ANY`. Fixes #17401 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Kefu Chai	3d8a6956fc	test/nodetool: fail a test only with leftover expected requests if there are unconsumed requests whose `multiple` is -1, we should not consider it a required, the test can consume it or not. but if it does not, we should not consider the test a failure just because these requests are sitting at the end of queue. so, in this change, we * filter out the requests whose `multiple` is `ANY` * include the unconsumed requets in the raised `AssertionError`. this should help with debugging. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-20 12:59:59 +08:00
Patryk Wrobel	82104b6f50	test_tablets: tablet count metric - remove assumption about tablets existence The mentioned test failed on CI. It sets up two nodes and performs operations related to creation and dropping of tables as well as moving tablets. Locally, the issue was not visible - also, the test was passing on CI in majority of cases. One of steps in the test case is intended to select the shard that has some tablets on host_0 and then move them to (host_1, shard_3). It contains also a precondition that requires the tablets count to be greater than zero - to ensure, that move_tablets operation really moves tablets. The error message in the failed CI run comes from the precondition related to tablets count on (host0, src_shard) - it was zero. This indicated that there were no tablets on entire host_0. The following commit removes the assumption about the existence of tablets on host_0. In case when there are no tablets there, the procedure is rerun for host_1. Now the logic is as follows: - find shard that has some tablets on host_0 - if such shard does not exist, then find such shard on host_1 - depending on the result of search set src/dest nodes - verify that reported tablet count metric is changed when move_tablet operation finishes Refs: scylladb#17386 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17398	2024-02-19 21:26:08 +01:00
Kefu Chai	3c84f08b93	alternator: add formatter for attribute_path_map_node<update_expression::action> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `attribute_path_map_node<update_expression::action>`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17270	2024-02-19 20:09:11 +02:00
Gleb Natapov	f00ea36f63	gossiper: remove unused REMOVAL_COORDINATOR state This is leftover from `66ff072540`	2024-02-19 15:01:33 +02:00
Gleb Natapov	461bba08cb	virtual_tables: take node state from raft for cluster_status_table table if topology over raft is enabled If topology over raft is enabled the most up-to-date node status is in the topology state machine. Get it from there.	2024-02-19 15:01:33 +02:00
Gleb Natapov	eb6fa81714	virtual_tables: create result for cluster_status_table read on shard 0 Next patch will access data that is available only on shard 0 during result creation.	2024-02-19 15:01:33 +02:00
Petr Gusev	f83df24108	test_decommission: fix log messages Closes scylladb/scylladb#17396	2024-02-19 12:09:43 +02:00
Mikołaj Grzebieluch	182cfebe40	maintenance_socket: add option to set owning group Option `maintenance-socket-group` sets the owning group of the maintenance socket. If not set, the group will be the same as the user running the scylla node.	2024-02-19 10:21:00 +01:00
Kefu Chai	34cc245da5	gms: add formatter for read_context::dismantle_buffer_stats before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `read_context::dismantle_buffer_stats`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17389	2024-02-19 09:43:53 +02:00
Kefu Chai	fe8e37c5bd	configure.py: remove -Wno-unused-command-line-argument `-Wno-unused-command-line-argument` is used to disable the warning of `-Wunused-command-line-argument`, which is in turn used to split warnings if any of the command line arguments passed to the compiler driver is not used. see https://clang.llvm.org/docs/DiagnosticsReference.html#wunused-command-line-argument but it seems we are not passing unused command line arguments to the compiler anymore. so let's drop this option. this change helps to * reduce the discrepencies between the compiling options used by CMake-generated rules and those generated directly using `configure.py` * reenable the warning so we are aware if any of the options is not used by compiler. this could a sign that the option fails to serve its purpose. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17195	2024-02-19 09:42:31 +02:00
Botond Dénes	42a76ca568	Merge 'Improve printing of nodes and backtraces in topology' from Pavel Emelyanov There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not. This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation. Closes scylladb/scylladb#17235 * github.com:scylladb/scylladb: topology: Restore indentation after previous patch topology: Drop if_enabled checks for logging topology: Add lazy_backtrace() helper topology: Add printer wrapper for node* and formatter for it topology: Expand formatter<locator::node>	2024-02-19 09:32:53 +02:00
Kefu Chai	47ec74ad1a	tools/scylla-nodetool: implement ring Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17375	2024-02-19 09:30:01 +02:00
Anna Stuchlik	ef1468d5ec	doc: remove Enterprise OS support from Open Source With this commit: - The information about ScyllaDB Enterprise OS support is removed from the Open Source documentation. - The information about ScyllaDB Open Source OS support is moved to the os-support-info file in the _common folder. - The os-support-info file is included in the os-support page using the scylladb_include_flag directive. This update employs the solution we added with https://github.com/scylladb/scylladb/pull/16753. It allows to dynamically add content to a page depending on the opensource/enterprise flag. Refs https://github.com/scylladb/scylladb/issues/15484 Closes scylladb/scylladb#17310	2024-02-18 22:09:06 +02:00
Petr Gusev	1d6caa42b9	join_cluster: move was_decommissioned check earlier Before the patch if a decommissioned node tries to restart, it calls _group0->discover_group0 first in join_cluster, which hangs since decommissioned nodes are banned and other nodes don't respond to their discovering requests. We fix the problem by checking was_decommissioned() flag before calling discover_group0. fixes scylladb/scylladb#17282 Closes scylladb/scylladb#17358	2024-02-18 22:07:28 +02:00
Kefu Chai	9d666f7d29	cmake: add -Wextra to compiling options this matches what we have in configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17376	2024-02-18 19:21:54 +02:00
Kefu Chai	cb781c0ff7	gms: add add formatter for gms::versioned_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::versioned_value`. its operator<< is preserved, as it's still being used by the homebrew generic formatter for std::unordered_map<gms::application_state, gms::versioned_value>, which is in turn used in gms/gossiper.cc. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17366	2024-02-18 19:21:54 +02:00
Avi Kivity	43f1c3df2e	Merge 'repair: Update repair history for tablet repair' from Asias He This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history Closes scylladb/scylladb#17047 * github.com:scylladb/scylladb: repair: Update repair history for tablet repair repair: Extract flush hints code	2024-02-18 19:21:54 +02:00
Kefu Chai	8fc4243cf6	configure.py: do not pass include cxx_ldflags in cxxflags ldflags are passed to ld (the linker), while cxxflags are passed to the C++ compiler. the compiler does not understand the ldflags. if we pass ldflags to it, it complains if `-Wunused-command-line-argument` is enabled. in this change, we do not include the ldflags in cxxflags, this helps us to enable the warning option of `-Wunused-command-line-argument`, so we don't need to disabled it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17328	2024-02-18 19:21:54 +02:00
Avi Kivity	d257cc5003	Merge 'scylla-nodetool: implement the repair command' from Botond Dénes As usual, the new command is covered with tests, which pass with both the legacy and the new native implementation. Refs: #15588 Closes scylladb/scylladb#17368 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the repair command test/nodetool: utils: add check_nodetool_fails_with_error_contains() test/nodetool: util: replace flags with custom matcher	2024-02-18 19:21:54 +02:00
Petr Gusev	4ef5d92f50	gossiping_property_file_snitch_test: modernize + fix potential race This is mostly a refactoring commit to make the test more readable, as a byproduct of scylladb/scylladb#17369 investigation. We add the check for specific type of exceptions that can be thrown (bad_property_file_error). We also fix the potential race - the test may write to res from multiple cores with no locks. Closes scylladb/scylladb#17371	2024-02-18 19:21:53 +02:00
Kefu Chai	4812a57f71	gms: add add formatter for gms::gossip_* before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for - gms::gossip_digest - gms::gossip_digest_ack - gms::gossip_digest_syn and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17379	2024-02-18 19:21:53 +02:00
Patryk Wrobel	3842bf18a7	storage_service/range_to_endpoint_map: allow API to properly handle tablets This API endpoint was failing when tablets were enabled because of usage of get_vnode_effective_replication_map(). Moreover, it was providing an error message that was not user-friendly. This change extends the handler to properly service the incoming requests. Furthermore, it introduces two new test cases that verify the behavior of storage_service/range_to_endpoint_map API. It also adjusts the test case of this endpoint for vnodes to succeed when tablets are enabled by default. The new logic is as follows: - when tablets are disabled then users may query endpoints for a keyspace or for a given table in a keyspace - when tablets are enabled then users have to provide table name, because effective replication map is per-table When user does not provide table name when tablets are enabled for a given keyspace, then BAD_REQUEST is returned with a meaningful error message. Fixes: scylladb#17343 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17372	2024-02-18 19:21:53 +02:00
Kefu Chai	808f4d72fb	storage_service: fix typos in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17377	2024-02-18 19:21:53 +02:00
Botond Dénes	b11213e547	tools/scylla-nodetool: implement the upgradesstables command Refs: #15588 Closes scylladb/scylladb#17370	2024-02-18 19:21:53 +02:00
Kefu Chai	af2553e8bc	cdc: add formatter for cdc::image_mode and cdc::delta_mode before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cdc::image_mode and cdc::delta_mode, and drop their operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17381	2024-02-18 19:21:53 +02:00
Avi Kivity	9bb4482ad0	Merge 'cdc: metadata: allow sending writes to the previous generations' from Patryk Jędrzejczak Before this PR, writes to the previous CDC generations would always be rejected. After this PR, they will be accepted if the write's timestamp is greater than `now - generation_leeway`. This change was proposed around 3 years ago. The motivation was to improve user experience. If a client generates timestamps by itself and its clock is desynchronized with the clock of the node the client is connected to, there could be a period during generation switching when writes fail. We didn't consider this problem critical because the client could simply retry a failed write with a higher timestamp. Eventually, it would succeed. This approach is safe because these failed writes cannot have any side effects. However, it can be inconvenient. Writing to previous generations was proposed to improve it. The idea was rejected 3 years ago. Recently, it turned out that there is a case when the client cannot retry a write with the increased timestamp. It happens when a table uses CDC and LWT, which makes timestamps permanent. Once Paxos commits an entry with a given timestamp, Scylla will keep trying to apply that entry until it succeeds, with the same timestamp. Applying the entry involves writing to the CDC log table. If it fails, we get stuck. It's a major bug with an unknown perfect solution. Allowing writes to previous generations for `generation_leeway` is a probabilistic fix that should solve the problem in practice. Apart from this change, this PR adds tests for it and updates the documentation. This PR is sufficient to enable writes to the previous generations only in the gossiper-based topology. The Raft-based topology needs some adjustments in loading and cleaning CDC generations. These changes won't interfere with the changes introduced in this PR, so they are left for a follow-up. Fixes scylladb/scylladb#7251 Fixes scylladb/scylladb#15260 Closes scylladb/scylladb#17134 * github.com:scylladb/scylladb: docs: using-scylla: cdc: remove info about failing writes to old generations docs: dev: cdc: document writing to previous CDC generations test: add test_writes_to_previous_cdc_generations cdc: generation: allow increasing generation_leeway through error injection cdc: metadata: allow sending writes to the previous generations	2024-02-18 19:21:53 +02:00
Asias He	796044be1c	repair: Update repair history for tablet repair This patch wires up tombstone_gc repair with tablet repair. The flush hints logic from the vnode table repair is reused. The way to mark the finish of the repair is also adjusted for tablet repair because it only has one shard per tablet token range instead of smp::count shards. Fixes: #17046 Tests: test_tablet_repair_history	2024-02-18 10:21:58 +08:00
Asias He	e43bc775d0	repair: Extract flush hints code So it can be used by tablet repair as well.	2024-02-18 09:42:02 +08:00
Kefu Chai	50964c423e	hints: host_filter: add formatter for hints::host_filter before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `hints::host_filter`. its operator<< is preserved as it's still used by the homebrew generic formatter for vector<>, which is in turn used by db/config.cc. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17347	2024-02-16 19:03:11 +03:00
Anna Stuchlik	e132ffdb60	doc: add missing redirections This commit adds the missing redirections to the pages whose source files were previously stored in the install-scylla folder and were moved to another location. Closes scylladb/scylladb#17367	2024-02-16 14:09:26 +02:00
Kefu Chai	47fec0428a	tools/scylla-nodetool: return 1 when viewbuild not succeeds this change introduces a new exception which carries the status code so that an operation can return a non-zero exit code without printing any errors. this mimics the behavior of "viewbuildstatus" command of C* nodetool. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17359	2024-02-16 13:53:33 +02:00
Botond Dénes	8d8ea12862	tools/scylla-nodetool: implement the repair command	2024-02-16 04:42:08 -05:00
Botond Dénes	48e8435466	test/nodetool: utils: add check_nodetool_fails_with_error_contains() Checks that at least one error snippet is contained in the error output.	2024-02-16 04:40:31 -05:00
Botond Dénes	190c9a7239	test/nodetool: util: replace flags with custom matcher _do_check_nodetool_fails_with() currently has a `match_all` flag to control how the match is checked. Now we need yet another way to control how matching is done. Instead of adding yet another flag (and who knows how many more), jut replace the flag and the errors input with a matcher functor, which gets the stdout and stderr and is delegated to do any checks it wants. This method will scale much better going forward.	2024-02-16 04:40:31 -05:00
Yaron Kaikov	44edb89f79	[actions] Add a check for backport labels As part of the Automation of ScyllaDB backports project, each PR should get either a backport/none or backport/X.Y label. Based on this label we will automatically open a backport PR for the relevant OSS release. In this commit, I am adding a GitHub action to verify if such a label was added. This only applies to PR with a based branch of master or next. For releases, we don't need this check	2024-02-15 22:40:09 +02:00
Avi Kivity	eedb997568	Merge 'compaction: upgrade: handle keyspaces that use tablets' from Lakshmi Narayanan Sreethar Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Closes scylladb/scylladb#17335 * github.com:scylladb/scylladb: compaction: upgrade: handle keyspaces that use tablets replica/database: add an optional variant to get_keyspace_local_ranges	2024-02-15 21:31:54 +02:00
Kefu Chai	f0b3068bcf	build: cmake: disable unused-parameter, missing-field-initializers and deprecated-copy -Wunused-parameter, -Wmissing-field-initializers and -Wdeprecated-copy warning options are enabled by -Wextra. the tree fails to build with these options enabled, before we address them if the warning are genuine problems, let's disable them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17352	2024-02-15 21:27:44 +02:00
Kamil Braun	50ebce8acc	Merge 'Purge old ip on change' from Petr Gusev When a node changes IP address we need to remove its old IP from `system.peers` and gossiper. We do this in `sync_raft_topology_nodes` when the new IP is saved into `system.peers` to avoid losing the mapping if the node crashes between deleting and saving the new IP. We also handle the possible duplicates in this case by dropping them on the read path when the node is restarted. The PR also fixes the problem with old IPs getting resurrected when a node changes its IP address. The following scenario is possible: a node `A` changes its IP from `ip1` to `ip2` with restart, other nodes are not yet aware of `ip2` so they keep gossiping `ip1`. After restart `A` receives `ip1` in a gossip message and calls `handle_major_state_change` since it considers it as a new node. Then `on_join` event is called on the gossiper notification handlers, we receive such event in `raft_ip_address_updater` and reverts the IP of the node A back to ip1. To fix this we ensure that the new gossiper generation number is used when a node registers its IP address in `raft_address_map` at startup. The `test_change_ip` is adjusted to ensure that the old IPs are properly removed in all cases, even if the node crashes. Fixes #16886 Fixes #16691 Fixes #17199 Closes scylladb/scylladb#17162 * github.com:scylladb/scylladb: test_change_ip: improve the test raft_ip_address_updater: remove stale IPs from gossiper raft_address_map: add my ip with the new generation system_keyspace::update_peer_info: check ep and host_id are not empty system_keyspace::update_peer_info: make host_id an explicit parameter system_keyspace::update_peer_info: remove any_set flag optimisation system_keyspace: remove duplicate ips for host_id system_keyspace: peers table: use coroutines storage_service::raft_ip_address_updater: log gossiper event name raft topology: ip change: purge old IP on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes	2024-02-15 17:40:29 +01:00
Nadav Har'El	6873a4772f	tablets: add warning on CREATE KEYSPACE The CDC feature is not supported on a table that uses tablets (Refs #16317), so if a user creates a keyspace with tablets enabled they may be surprised later (perhaps much later) when they try to enable CDC on the table and can't. The LWT feature always had issue Refs #5251, but it has become potentially more common with tablets. So it was proposed that as long as we have missing features (like CDC or LWT), every time a keyspace is created with tablets it should output a warning (a bona-fide CQL warning, not a log message) that some features are missing, and if you need them you should consider re-creating the keyspace without tablets. This patch does this. It was surprisingly hard and ugly to find a place in the code that can check the tablet-ness of a keyspace while it is still being created, but I think I found a reasonable solution. The warning text in this patch is the following (obviously, it can be improved later, as we perhaps find more missing features): "Tables in this keyspace will be replicated using tablets, and will not support the CDC feature (issue #16317) and LWT may suffer from issue #5251 more often. If you want to use CDC or LWT, please drop this keyspace and re-create it without tablets, by adding AND TABLETS = {'enabled': false} to the CREATE KEYSPACE statement." This patch also includes a test - that checks that this warning is is indeed generated when a keyspace is created with tablets (either by default or explicitly), and not generated if the keyspace is created without tablets. Obviously, this entire patch - the warning and its test - can be reverted as soon as we support CDC (and all other features) on tablets. Fixes #16807 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-02-15 15:51:47 +02:00
Nadav Har'El	29b42e47e5	test/cql-pytest: fix guadrail tests to not be sensitive to more warnings The guardrail tests check that certain guardrails enable and disable certain warnings. These tests currently check for the number of warnings returned by a request, assuming that without the guardrail there would be no warning. But in the following patch we plan to add an additional warning on keyspace creation (that warns about tablets missing some features). So the tests should check for whether or not a specific warning is returned - not the count. I only modified tests which the change in the next patch will break. Tests which use SimpleStrategy and will not get the extra warning, are unmodified and continue to use the old approach of counting warnings. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-02-15 15:08:08 +02:00
Lakshmi Narayanan Sreethar	7a98877798	compaction: upgrade: handle keyspaces that use tablets Tables in keyspaces governed by replication strategy that uses tablets, have separate effective_replication_maps. Update the upgrade compaction task to handle this when getting owned key ranges for a keyspace. Fixes #16848 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-15 17:47:39 +05:30
Lakshmi Narayanan Sreethar	8925a2c3cb	replica/database: add an optional variant to get_keyspace_local_ranges Add a new method database::maybe_get_keyspace_local_ranges that optionally returns the owned ranges for the given keyspace if it has a effective_replication_map for the entire keyspace. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-15 17:44:47 +05:30
Botond Dénes	22a5112bf1	tools/scylla-sstable-scripts: add keys.lua and largest-key.lua I wrote these scripts to identify sstables with too large keys for a recent investigation. I think they could be useful in the future, certainly as further examples on how to write lua scripts for scylla-sstable script. Closes scylladb/scylladb#17000	2024-02-15 13:39:41 +02:00
Avi Kivity	5df5714331	Merge 'api: storage_service/natural_endpoints: add tablets support' from Botond Dénes This API endpoint currently returns with status 500 if attempted to be called for a table which uses tablets. This series adds tablet support. No change in usage semantics is required, the endpoint already has a table parameter. This endpoint is the backend of `nodetool getendpoints` which should now work, after this PR. Fixes: #17313 Closes scylladb/scylladb#17316 * github.com:scylladb/scylladb: service/storage_service: get_natural_endpoints(): add tablets support replica/database: keyspace: add uses_tablets() service/storage_service: remove token overload of get_natural_endpoints()	2024-02-15 13:36:56 +02:00
Kefu Chai	caa20c491f	storage_service: pass non-empty keyspace when performing cleanup_all this change addresses the regression introduced by `5e0b3671`, which fall backs to local cleanup in cleanup_all. but `5e0b3671` failed to pass the keyspace to the `shard_cleanup_keyspace_compaction_task_impl` is its constructor parameter, that's why the test fails like ``` error executing POST request to http://localhost:10000/storage_service/cleanup_all with parameters {}: remote replied with status code 400 Bad Request: Can't find a keyspace ``` where the string after "Can't find a keyspace" is empty. in this change, the keyspace name of the keyspace to be cleaned is passed to `shard_cleanup_keyspace_compaction_task_impl`. we always enable the topology coordinator when performing testing, that's why this issue does not pop up until the longevity test. Fixes #17302 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17320	2024-02-15 13:17:45 +02:00
Aleksandra Martyniuk	cf36015591	repair: handle no_such_column_family from remote node gracefully If no_such_column_family is thrown on remote node, then repair operation fails as the type of exception cannot be determined. Use repair::with_table_drop_silenced in repair to continue operation if a table was dropped.	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	2ea5d9b623	test: test drop table on receiver side during streaming	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	b08f539427	streaming: fix indentation	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	219e1eda09	streaming: handle no_such_column_family from remote node gracefully If no_such_column_family is thrown on remote node, then streaming operation fails as the type of exception cannot be determined. Use repair::with_table_drop_silenced in streaming to continue operation if a table was dropped.	2024-02-15 12:06:47 +01:00
Aleksandra Martyniuk	5202bb9d3c	repair: add methods to skip dropped table Schema propagation is async so one node can see the table while on the other node it is already dropped. So, if the nodes stream the table data, the latter node throws no_such_column_family. The exception is propagated to the other node, but its type is lost, so the operation fails on the other node. Add method which waits until all raft changes are applied and then checks whether given table exists. Add the function which uses the above to determine, whether the function failed because of dropped table (eg. on the remote node so the exact exception type is unknown). If so, the exception isn't rethrown.	2024-02-15 12:06:42 +01:00
Botond Dénes	811e931b09	Merge 'tools/scylla-nodetool: implement compactionstats and viewbuildstatus' from Kefu Chai Refs #15588 Closes scylladb/scylladb#17344 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement viewbuildstatus tools/scylla-nodetool: implement compactionstats	2024-02-15 12:44:05 +02:00
Petr Gusev	c4140678ba	test_change_ip: improve the test In this commit we refactor test_change_ip to improve it in several ways: * We inject failure before old IP is removed and verify that after restart the node sees the proper peers - the new IP for node2 and old IP for node3, which is not restarted yet. * We introduce the lambda wait_proper_ips, which checks not only the system.peers table, but also gossiper and token_metadata. * We call this lambda for all nodes, not only the first node; this allows to validate that the node that has changed its IP has the proper IP of itself in the data structures above. Note that we need to inject an additional delay ip-change-raft-sync-delay before old IP is removed. Otherwise the problem stop reproducing - other nodes remove the old IP before it's send back to the just restarted node.	2024-02-15 13:26:02 +04:00
Petr Gusev	a068dba8c9	raft_ip_address_updater: remove stale IPs from gossiper In the scenario described in the previous commit the on_endpoint_change could be called with our previous IP. We can easily detect this case - after add_or_update_entry the IP for a given id in address_map hasn't changed. We remove such IP from gossiper since it's not needed, and makes the test in the next commit more natural - all old IPs are removed from all subsystems.	2024-02-15 13:25:56 +04:00
Petr Gusev	4b33ba2894	raft_address_map: add my ip with the new generation The following scenario is possible: a node A changes its IP from ip1 to ip2 with restart, other nodes are not yet aware of ip2 so they keep gossiping ip1, after restart A receives ip1 in a gossip message and calls handle_major_state_change since it considers it as a new node. Then on_join event is called on the gossiper notification handles, we receive such event in raft_ip_address_updater and reverts the IP of the node A back to ip1. The essence of the problem is that we don't pass the proper generation when we add ip2 as a local IP during initialization when node A restarts, so the zero generation is used in raft_address_map::add_or_update_entry and the gossiper message owerwrites ip2 to ip1. In this commit we fix this problem by passing the new generation. To do that we move the increment_and_get_generation call from join_token_ring to scylla_main, so that we have a new generation value before init_address_map is called. Also we remove the load_initial_raft_address_map function from raft_group0 since it's redundant. The comment above its call site says that it's needed to not miss gossiper updates, but the function storage_service::init_address_map where raft_address_map is now initialized is called before gossiper is started. This function does both - it load the previously persisted host_id<->IP mappings from system.local and subscribes to gossiper notifications, so there is no room for races. Note that this problem reproduces less likely with the 'raft topology: ip change: purge old IP' commit - other nodes remove the old IP before it's send back to the just restarted node. This is also the reason why this problem doesn't occur in gossiper mode. fixes scylladb/scylladb#17199	2024-02-15 13:21:04 +04:00
Petr Gusev	2bf75c1a4e	system_keyspace::update_peer_info: check ep and host_id are not empty	2024-02-15 13:21:04 +04:00
Petr Gusev	86410d71d1	system_keyspace::update_peer_info: make host_id an explicit parameter The host_id field should always be set, so it's more appropriate to pass it as a separate parameter. The function storage_service::get_peer_info_for_update is updated. It shouldn't look for host_id app state is the passed map, instead the callers should get the host_id on their own.	2024-02-15 13:21:04 +04:00
Petr Gusev	e0072f7cb3	system_keyspace::update_peer_info: remove any_set flag optimisation This optimization never worked -- there were four usages of the update_peer_info function and in all of them some of the peer_info fields were set or should be set: * sync_raft_topology_nodes/process_normal_node: e.g. tokens is set * sync_raft_topology_nodes/process_transition_node: host_id is set * handle_state_normal: tokens is set * storage_service::on_change: get_peer_info_for_update could potentially return a peer_info with all fields set to empty, but this shouldn't be possible, host_id should always be set. Moreover, there is a bug here: we extract host_id from the states_ parameter, which represent the gossiper application states that have been changed. This parameter contains host_id only if a node changes its IP address, in all other cases host_id is unset. This means we could end up with a record with empty host_id, if it wasn't previously set by some other means. We are going to fix this bug in the next commit.	2024-02-15 13:21:04 +04:00
Petr Gusev	4a14988735	system_keyspace: remove duplicate ips for host_id When a node changes IP we call sync_raft_topology_nodes from raft_ip_address_updater::on_endpoint_change with the old IP value in prev_ip parameter. It's possible that the nodes crashes right after we insert a new IP for the host_id, but before we remove the old IP. In this commit we fix the possible inconsistency by removing the system.peers record with old timestamp. This is what the new peers_table_read_fixup function is responsible for. We call this function in all system_keyspace methods that read the system.peers table. The function loads the table in memory, decides if some rows are stale by comparing their timestamps and removes them. The new function also removes the records with no host_id, so we no longer need the get_host_id function. We'll add a test for the problem this commit fixes in the next commit.	2024-02-15 13:21:04 +04:00
Petr Gusev	fa8718085a	system_keyspace: peers table: use coroutines This is a refactoring commit with no observable changes in behaviour. We switch the functions to coroutines, it'll be easy to work with them in this way in the next commit. Also, we add more const-s along the way.	2024-02-15 13:21:04 +04:00
Petr Gusev	00547d3f48	storage_service::raft_ip_address_updater: log gossiper event name It's useful for debugging.	2024-02-15 13:20:54 +04:00
Petr Gusev	6955cfa419	raft topology: ip change: purge old IP When a node changes IP address we need to remove its old IP from system.peers and gossiper. We do this in sync_raft_topology_nodes when the new IP is saved into system.peers to avoid losing the mapping if the node crashes between deleting and saving the new IP. In the next commit we handle the possible duplicates in this case by dropping them on the read path. In subsequent commits, test_change_ip will be adjusted to ensure that old IPs are removed. fixes scylladb/scylladb#16886 fixes scylladb/scylladb#16691	2024-02-15 13:19:13 +04:00
Petr Gusev	a2c0384cd1	on_endpoint_change: coroutinize the lambda around sync_raft_topology_nodes We introduce the helper 'ensure_alive' which takes a coroutine lambda and returns a wrapper which ensures the proper lifetime for it. It works by moving the input lambda onto the heap and keeping the ptr alive until the resulting future is resolved. We also move the holder acquired from _async_gate to the 'then' lambda closure, since now these closures will be kept alive during the lambda coroutine execution. We'll be adding more code to this lambda in the subsequent commits, it's easier to work with coroutines.	2024-02-15 13:13:44 +04:00
Kefu Chai	f9d19a61ff	tools/scylla-nodetool: implement viewbuildstatus Refs 15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-15 16:54:16 +08:00
Nadav Har'El	28db187756	alternator, tablets: return error if enabling TTL with tablets Alternator TTL doesn't yet work on tables using tablets (this is issue #16567). Before this patch, it can be enabled on a table with tablets, and the result is a lot of log spam and nothing will get expired. So let's make the attempt to enable TTL on a table that uses tablets into a clear error. The error message points to the issue, and also suggests how to create a table that uses vnodes, not tablets. This patch also adds a test that verifies that trying to enable TTL with tablets is an error. Obviously, this test should be removed once the issue is solved and TTL begins working with tablets. Refs #16567 Refs #16807 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17306	2024-02-15 10:47:06 +02:00
Kefu Chai	4da9a62472	utils: managed_bytes: fix typo in comment s/assigments/assignments/ this misspelling was identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17333	2024-02-15 10:37:25 +02:00
Kefu Chai	8e8b73fa82	dht: add formatter for paritition_range_view and i_partition before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `partition_range_view` and `i_partition`, and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17331	2024-02-15 09:46:03 +02:00
Lakshmi Narayanan Sreethar	3b7b315f6a	replica/database: quiesce compaction before closing system tables during shutdown During shutdown, as all system tables are closed in parallel, there is a possibility of a race condition between compaction stoppage and the closure of the compaction_history table. So, quiesce all the compaction tasks before attempting to close the tables. Fixes #15721 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17218	2024-02-15 09:44:16 +02:00
Nadav Har'El	b97ded5c4a	test/topology: tests for setting tombstone_gc on materialized view A user asked on the ScyllaDB forum several questions on whether tombstone_gc works on materialized views. This patch includes two tests that confirm the following: 1. The tombstone_gc may be set on a view - either during its creation with CREATE MATERIALIZED VIEW or later with ALTER MATERIALIZED VIEW. 2. The tombstone_gc setting is correctly shown - for both base tables and views - by the "DESC" statement. 3. The tombstone_gc setting is NOT inherited from a base table to a new view - if you want this option on a view, you need to set it separately. Unfortunately, this test could not be a single-node cql-pytest because we forbid tombstone_gc=repair when RF=1, and since recently, we forbid setting RF>1 on a single-node setup. So the new tests are written in the test/topology framework - which may run multiple tests against a single three-node cluster run multiple tests against it. To write tests over a shared cluster, we need functions which create temporary keyspaces, tables and views, which are deleted automatically as soon as a test ends. The test/topology framework was lacking such functions, so this tests includes them - currently inside the test file, but if other people find them useful they can be moved to a more central location. The new functions, net_test_keyspace(), new_test_table() and new_materialized_view() are inspired by the identically-named functions in test/cql-pytest/util.py, but the implementation is different: Importantly, the new functions here are async context managers, used via "async with", to fit with the rest of the asynchronous code used in the topology test framework. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17345	2024-02-15 09:43:30 +02:00
Kefu Chai	bcb144ada3	configure.py: disable stack-use-after-scope check only when ASan is enabled `-fno-sanitize-address-use-after-scope` is used to disable the check for stack-use-after-scope bugs, but this check is only performed when ASan is enabled. if we pass this option when ASan is not enabled, we'd have following warning, so let's apply it only when ASan is enabled. ``` clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17329	2024-02-15 09:28:29 +02:00
Botond Dénes	ca13ff10ea	service/storage_service: get_natural_endpoints(): add tablets support Also add a unit test for this API endpoint, testing it with both tablets and vnodes.	2024-02-15 02:07:18 -05:00
Botond Dénes	7f17d3bb0e	replica/database: keyspace: add uses_tablets() Mirroring table::uses_tablets(), provides a convenient and -- more importabtly -- easily discoverable way to determine whether the keyspace uses tablets or not. This information is of course already available via the abstract replication strategy, but as seen in a few examples, this is not easily discoverable and sometimes people resorted to enumerating the keyspace's tables to be able to invoke table::uses_tablets().	2024-02-15 01:51:26 -05:00
Botond Dénes	0b2acf90ff	service/storage_service: remove token overload of get_natural_endpoints() This overload does not work with tablets because it only has a keyspace and token parameters. The only caller is the other overload, which also has a table parameters, so it can be made to works with tablets. Inline this overload into the other and remove it, in preparation to fixing this method for tablets.	2024-02-15 01:51:25 -05:00
Kefu Chai	68795eb8fa	tools/scylla-nodetool: implement gossipinfo Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17317	2024-02-15 08:41:39 +02:00
Kefu Chai	a7abaa457b	tools/scylla-nodetool: implement compactionstats Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-15 12:29:10 +08:00
Anna Stuchlik	710d182654	doc: update Handling Node Failures to add topology This commit updates the Handling Node Failures page to specify that the quorum requirement refers to both schema and topology updates. Closes scylladb/scylladb#17321	2024-02-14 17:15:13 +01:00
Kamil Braun	7e9e10186f	Merge 'change the way ignored nodes are handled by the topology coordinator' from Gleb This series makes several changes to how ignored nodes list is treated by the topology coordinator. First the series makes it global and not part of a single topology operation, second it extends the list at the time of removenode/replace invocation and third it bans all nodes in the list from contacting the cluster ever again. The main motivation is to have a way to unblock tablet migration in case of a node failure. Tablet migration knows how to avoid nodes in ignored nodes list and this patch series provides a way to extend it without performing any topology operation (which is not possible while tables migration runs). Fixes scylladb/scylladb#16108 * 'gleb/ignore-nodes-handling-v2' of github.com:scylladb/scylla-dev: test: add test for the new ignore nodes behaviour topology coordinator: cleanup node_state::decommissioning state handling code topology coordinator: ban ignored nodes just like we ban nodes that are left storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time topology coordinator: make ignored_nodes list global and permanent topology_coordinator: do not cancel rebuild just because some other nodes are dead topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously	2024-02-14 16:36:01 +01:00
Marcin Maliszkiewicz	0b8b9381f4	auth: drop const from methods on write path In a follow-up patch abort_source will be used inside those methods. Current pattern is that abort_source is passed everywhere as non const so it needs to be executed in non const context. Closes scylladb/scylladb#17312	2024-02-14 13:24:53 +01:00
Tzach Livyatan	902733cd7e	Docs: rename doc page from REST tp Admin REST API Closes scylladb/scylladb#17334	2024-02-14 13:49:54 +02:00
Kefu Chai	d43c418f72	tools/scylla-nodetool: implement getendpoints Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17332	2024-02-14 11:20:52 +02:00
Gleb Natapov	7802c206c7	test: add test for the new ignore nodes behaviour The test checks that once a node is specified in ignored node list by one topology operation the information is carried over to the next operation as well.	2024-02-14 10:35:11 +02:00
Gleb Natapov	7ec9316774	topology coordinator: cleanup node_state::decommissioning state handling code The code is shared between decommission and removenode and it has scattered 'ifs' for different behaviours between those. Change it to have only one 'if'.	2024-02-14 10:35:11 +02:00
Gleb Natapov	363af9e664	topology coordinator: ban ignored nodes just like we ban nodes that are left Since now a node that is at one point was marked as dead, either via --ignore-dead-nodes parameter or by been a target for removenode or replace, can no longer be made "undead" we need to make sure that they cannot rejoin the cluster any longer. Do that by banning them on a messaging layer just like we do for nodes that are left. Not that the removenode failure test had to be altered since it restarted a node after removenode failure (which now will not work). Also, since the check for liveness was removed from the topology coordinator (because the node is already banned by then), the test case that triggers the removed code is removed as well.	2024-02-14 10:35:06 +02:00
Kefu Chai	ab07fb25f5	scylla_raid_setup: reference xfsprog on the minimal 1024 block size the quote of "The minimum block size for crc enabled filesystems is 1024" comes from the output of mkfs.xfs, let's quote the source for better maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17094	2024-02-14 08:44:14 +02:00
Michał Chojnowski	3d81138852	configure.py: don't modify `modes` in write_build_file() The true motivation for this patch is a certain problem with configure.py in scylla-enterprise, which can only be solved by moving the `extra_cxxflags` lines before configure_seastar(). This patch does that by hoisting get_extra_cxxflags() up to create_build_system(). But this patch makes sense even if we disregard the real motivation. It's weird that a function called `write_build_file()` adds additional build flags on its own. Closes scylladb/scylladb#17189	2024-02-13 21:28:32 +02:00
Patryk Wrobel	a3fb44cbca	Rename keyspace::get_effective_replication_map() This commit renames keyspace::get_effective_replication_map() to keyspace::get_vnode_effective_replication_map(). This change is required to ease the analysis of the usage of this function. When tablets are enabled, then this function shall not be used. Instead of per-keyspace, per-table replication map should be used. The rename was performed to distinguish between those two calls. The next step will be an audit of usages of keyspace::get_vnode_effective_replication_map(). Refs: scylladb#16626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17314	2024-02-13 20:22:02 +02:00
Nadav Har'El	5d4c60aee3	test/cql-pytest: avoid spurious guardrail warnings All cql-pytest tests use one node, and unsuprisingly most use RF=1. By default, as part of the "guardrails" feature, we print a warning when creating a keyspace with RF=1. This warning gets printed on every cql-pytest run, which creates a "boy who cried wolf" effect whereby developers get used to seeing these warnings, and won't care if new warnings start appearing. The fix is easy - in run.py start Scylla with minimum-replication-factor- warn-threshold set to -1 instead of the default 3. Note that we do have cql-pytest tests for this guardrail, but those don't rely on the default setting of this variable (they can't, cql-pytest tests can also be run on a Scylla instance run manually by a developer). Those tests temporarily set the threshold during the test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17274	2024-02-13 17:44:20 +02:00
Kefu Chai	b309e42195	collection_mutation: add formatter for collection_mutation_view::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `collection_mutation_view::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17300	2024-02-13 17:42:25 +02:00
Botond Dénes	120442231f	Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it. Closes scylladb/scylladb#17208 * github.com:scylladb/scylladb: row_cache_test: test cache consistency during memtable-to-cache merge row_cache: use preemption_source in update() utils: preempt: add preemption_source	2024-02-13 17:37:06 +02:00
Kefu Chai	54ed65bb50	mutation: s/statics/static content/ codespell reports that "statics" could be the misspelling of "statistics". but "static" here means the static column(s). so replace "static" with more specific wording. Refs #589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17216	2024-02-13 17:33:21 +02:00
Kefu Chai	9b6a66826c	api/storage_service: add more constness to http_context parameter when we just want to perform read access to `http_context`, there is no need to use a non-const reference. so let's add `const` specifier to make this explicit. this shoudl help with the readability and maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17219	2024-02-13 17:32:45 +02:00
Lakshmi Narayanan Sreethar	f8f8d64982	test.py: support skipping multiple test patterns Support skipping multiple patterns by allowing them to be passed via multiple '--skip' arguments to test.py. Example : `test.py --skip=topology --skip=sstables` Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17220	2024-02-13 17:32:03 +02:00
Kefu Chai	57d138b80f	row_cache: s/fro/reader/ "fro" is the short of "from" but the value is an `optimized_optional<flat_mutation_reader_v2>`. codespell considers it a misspelling of "for" or "from". neither of them makes sense, so let's change it to "reader" for better readability, also for silencing the warning. so that the geniune warning can stands out, this would help to make the codespell more useful. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17221	2024-02-13 17:28:14 +02:00
Kefu Chai	c555af3cd8	raft: add formatter for raft::log before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `raft::log`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17301	2024-02-13 17:17:57 +02:00
Anna Stuchlik	02cd84adbf	doc: remove OSS-vs-Ent Matrix from OSS docs This commit removes the Open Source vs. Enterprise matrix from the Open Source documentation. In addition, a redirection is added to prevent 404 in the OSS docs, and to the removed page is replaced with a link to the same page in the Enterprise docs. This commit must be reverted enterprise.git, because we want to keep the Matrix in the Enterprise docs. Fixes https://github.com/scylladb/scylladb/issues/17289 Closes scylladb/scylladb#17295	2024-02-13 17:17:22 +02:00
Yaniv Kaul	d2ef100b60	Typos: more/less then -> more/less than Fix repated typos in comments: more then -> more than, less then -> less than Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#17303	2024-02-13 17:16:15 +02:00
Nadav Har'El	dce47a81b0	alternator, tablets: return error if enabling Streams with tablets Alternator Streams doesn't yet work on tables using tablets (this is issue #16317). Before this patch, an attempt to enable it results in an unsightly InternalServerError, which isn't terrible - but we can do better. So in this patch, we make the attempt to enable Streams and tablets together into a clear error. The error message points to the open issue, and also suggests how to create a table that uses vnodes, not tablets. Unfortunately, there are slightly two different code paths and error messages for two cases: One case is the creation of a new table (where the validation happens before the keyspace is actually created), and the other case is an attempt to enable streams on an existing table with an existing keyspace (which already might or might not be using tablets). This patch also adds a test that verifies that trying to enable Streams with tablets is an error - in both cases (table creation and update). Obviously, this test - and the validation code - should be removed once the issue is solved and Alternator Streams begins working with tablets. Fixes #16497 Refs #16807 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17311	2024-02-13 16:42:35 +02:00
Raphael S. Carvalho	54226dddf5	replica: Kill vnode-oriented cleanup handling for multiple compaction groups With tablets, we don't use vnode-oriented sstable cleanup. So let's just remove unused code and bail out silently if sharding is tablet based. The reason for silence is that we don't want to break tests that might be reused for tablets, and it's not a problem for sstable cleanup to be ignored with tablets. This approach is actually already used in the higher level code, implementing the cleanup API. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17296	2024-02-13 16:35:15 +02:00
Gleb Natapov	8f7d2fd44b	storage_service: topology coordinator: validate ignore dead nodes parameters in removenode/replace Fail commands if provided nodes are not in the "normal" state.	2024-02-13 16:15:35 +02:00
Gleb Natapov	d062a04df0	topology coordinator: add removed/replaced nodes to ignored_nodes list at the request invocation time To unblock tablet migration in case of a node failure we need a way to dynamically extend a list of ignored_nodes while the migration is happening. This patch does it by piggybacking on existing topology operations that assume their target node is already dead. It adds the target node to now global ignored_nodes list when request is issued and, for better HA, makes the nodes in ignored_nodes non voters.	2024-02-13 16:15:35 +02:00
Gleb Natapov	9b52dc4560	topology coordinator: make ignored_nodes list global and permanent Currently ignored_nodes list is part of a request (removenode or replace) and exists only while a request is handled. This patch changes it to be global and exist outside of any request. Node stays in the list until they eventually removed and moved to the "left" state. If a node is specified in the ignore-dead-nodes option for any command it will be ignored for all other operations that support ignored_nodes (like tablet migration).	2024-02-13 16:15:35 +02:00
Gleb Natapov	cbef807e69	topology_coordinator: do not cancel rebuild just because some other nodes are dead Rebuild may not contact all the nodes, so it may succeed even while some nodes are dead.	2024-02-13 16:15:35 +02:00
Gleb Natapov	0fe00e34ef	topology coordinator: throw more specific error from wait_for_ip() function in case of a timeout It will be easier to distinguish the failure reason.	2024-02-13 16:15:35 +02:00
Gleb Natapov	f21a3b4ca5	raft_group0: add make_nonvoters function that can make multiple node non voters simultaneously	2024-02-13 16:15:35 +02:00
Petr Gusev	3722ca0a41	sync_raft_topology_nodes: parallelize system_keyspace update functions In sync_raft_topology_nodes we execute a system keyspace update query for each node of the cluster. The system keyspace tables use schema commitlog which by default enables use_o_dsync. This means that each write to the commitlog is accompanied by fsync. For large clusters this can incur hundreds of writes with fsyncs, which is very expensive. For example, in #17039 for a moderate size cluster of 50 nodes sync_raft_topology_nodes took almost 5 seconds. In this commit we solve this problem by running all such update queries in parallel. The commitlog should batch them and issue only one write syscall to the OS. Closes scylladb/scylladb#17243	2024-02-13 14:44:48 +01:00
Piotr Dulikowski	314fd9a11f	test: test_topology_recovery_basic: add missing driver reconnect Unfortunately, scylladb/python-driver#230 is not fixed yet, so it is necessary for the sake of our CI's stability to re-create the driver session after all nodes in the cluster are restarted. There is one place in test_topology_recovery_basic where all nodes are restarted but the driver session is not re-created. Even though nodes are not restarted at once but rather sequentially, we observed a failure with similar symptoms in a CI run for scylla-enterprise. Add the missing driver reconnect as a workaround for the issue. Fixes: scylladb/scylladb#17277 Closes scylladb/scylladb#17278	2024-02-13 12:28:30 +01:00
David Garcia	f45d9d33f1	docs: remove liveness asterisks Instead of adding an asterisk next to "liveness" linking to the glossary, we will temporarily replace them with a hyperlink pending the implementation of tooltip functionality. Closes scylladb/scylladb#17244	2024-02-12 20:37:52 +02:00
Avi Kivity	b22db74e6a	Regenerate frozen toolchain For gnutls 3.8.3 and clang clang-16.0.6-4. Fixes #17285. Closes scylladb/scylladb#17287	2024-02-12 18:36:11 +02:00
Botond Dénes	3f2d7e8b25	tree: remove unnecessary yields around for_each_tablet() Commit `904bafd069` consolidated the two existing for_each_tablet() overloads, to the one which has a future<> returning callback. It also added yields to the bodies of said callbacks. This is unnecessary, the loop in for_each_tablet() already has a yield per tablet, which should be enough to prevent stalls. This patch is a follow-up to #17118 Closes scylladb/scylladb#17284	2024-02-12 17:10:25 +01:00
Kamil Braun	2e81f045cc	Merge 'transport: controller: do_start_server: do not set_cql_read for maintenance port' from Benny Halevy RPC is not ready yet at this point, so we should not set this application state yet. Also, simplify add_local_application_state as it contains dead code that will never generate an internal error after `1d07a596bf`. Fixes #16932 Closes scylladb/scylladb#17263 * github.com:scylladb/scylladb: gossiper: add_local_application_state: drop internae error transport: controller: do_start_server: do not set_cql_read for maintenance port	2024-02-12 13:26:45 +01:00
Pavel Emelyanov	2b1612aa04	main: Stop lifecycle notifier for real It wasn't because of storage service, not the latter is stopped (since `e6b34527c1`), so the former can be stopped to Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17251	2024-02-12 13:59:50 +02:00
Kefu Chai	7baee379de	sstable/storage: pass fs::path to storage::create_links() this change is a follow-up of `637dd730`. the goal is to use std::filesystem::path for manipulating paths, and to avoid the converting between sstring and fs::path back and forth. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17257	2024-02-12 13:26:11 +02:00
Kefu Chai	7a5cb69e33	storage_service: s/format()/fmt::format/ in the same spirit of `e84a0991`, let's switch the callers who expect std::string to fmt::format(). to minimize the impact and to reduce the risk, the switch will be performed piecemeal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17253	2024-02-12 13:24:21 +02:00
Pavel Emelyanov	b9721bd397	test/tablets: Decommissioning node below RF is not allowed When a node is decommissioned, all tablet replicas need to be moved away from it. In some cases it may not be possible. If the number of node in the cluster equals the keysapce RF, one cannot decommission any node because it's not possible to find nodes for every replica. The new test case validates this constraint is satisfied. refs: #16195 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17248	2024-02-12 13:21:47 +02:00
Nadav Har'El	21e7deafeb	alternator, mv: fix case of two new key columns in GSI A materialized view in CQL allows AT MOST ONE view key column that wasn't a key column in the base table. This is because if there were two or more of those, the "liveness" (timestamp, ttl) of these different columns can change at every update, and it's not possible to pick what liveness to use for the view row we create. We made an exception for this rule for Alternator: DynamoDB's API allows creating a GSI whose partition key and range key are both regular columns in the base table, and we must support this. We claim that the fact that Alternator allows neither TTL (Alternator's "TTL" is a different feature) nor user-defined timestamps, does allow picking the liveness for the view row we create. But we did it wrong! We claimed in a comment - and implemented in the code before this patch - that in Alternator we can assume that both GSI key columns will have the same liveness, and in particular timestamp. But this is only true if one modifies both columns together! In fact, in general it is not true: We can have two non-key attributes 'a' and 'b' which are the GSI's key columns, and we can modify only b, without modifying a, in which case the timestamp of the view modification should be b's newer timestamp, not a's older one. The existing code took a's timestamp, assuming it will be the same as b's, which is incorrect. The result was that if we repeatedly modify only b, all view updates will receive the same timestamp (a's old timestamp), and a deletion will always win over all the modifications. This patch includes a reproducing test written by a user (@Zak-Kent) that demonstrates how after a view row is deleted it doesn't get recreated - because all the modifications use the same timestamp. The fix is, as suggested above, to use the higher of the two timestamps of both base-regular-column GSI key columns as the timestamp for the new view rows or view row deletions. The reproducer that failed before this patch passes with it. As usual, the reproducer passes on AWS DynamoDB as well, proving that the test is correct and should really work. Fixes #17119 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17172	2024-02-12 13:17:29 +02:00
Nadav Har'El	341af86167	test/cql-pytest: reproducer for GROUP BY regression This patch adds a simple reproducer for a regression in Scylla 5.4 caused by commit `432cb02`, breaking LIMIT support in GROUP BY. Refs #17237 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17275	2024-02-12 13:09:52 +02:00
Kefu Chai	57df20eef8	configure.py: use un-deprecated module PEP 632 deprecates distutils module, and it is remove from Python 3.12. we are actually using the one vendored by setuptools, if we are using 3.12. so let's use shutil for finding ninja executable. see https://peps.python.org/pep-0632/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17271	2024-02-12 13:05:35 +02:00
Kamil Braun	7d73c40125	Merge 'test.py: tablets: Fix flakiness of test_tablet_missing_data_repair' from Tomasz Grabiec Reimplements stop/start sequence using rolling_restart() which is safe with regards to UP status propagation and not prone to sudden connection drop which may cause later CQL queries to time out. It also ensures that CQL is up on all the remaining nodes when the with_down callback is executed. The test was observed to fail in CI like this: ``` cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.157.135.26:9042 datacenter1>: ConnectionException('Pool for 127.157.135.26:9042 is shutdown')}) ... @pytest.mark.repair @pytest.mark.asyncio async def test_tablet_missing_data_repair(manager: ManagerClient): ... for idx in range(0,3): s = servers[idx].server_id await manager.server_stop_gracefully(s, timeout=120) > await check() ``` Hopefully: Fixes #17107 Closes scylladb/scylladb#17252 * github.com:scylladb/scylladb: test: py: tablets: Fix flakiness of test_tablet_missing_data_repair test: pylib: manager_client: Wait for driver to catch up in rolling_restart() test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down	2024-02-12 11:52:09 +01:00
Botond Dénes	f068d1a6fa	query: do not kill unpaged queries when they reach the tombstone-limit The reason we introduced the tombstone-limit (query_tombstone_page_limit), was to allow paged queries to return incomplete/empty pages in the face of large tombstone spans. This works by cutting the page after the tombstone-limit amount of tombstones were processed. If the read is unpaged, it is killed instead. This was a mistake. First, it doesn't really make sense, the reason we introduced the tombstone limit, was to allow paged queries to process large tombstone-spans without timing out. It does not help unpaged queries. Furthermore, the tombstone-limit can kill internal queries done on behalf of user queries, because all our internal queries are unpaged. This can cause denial of service. So in this patch we disable the tombstone-limit for unpaged queries altogether, they are allowed to continue even after having processed the configured limit of tombstones. Fixes: #17241 Closes scylladb/scylladb#17242	2024-02-12 12:34:04 +02:00
Kefu Chai	9b85d1aebf	configure.py, cmake: do not pass -Wignored-qualifiers explicitly we recently added -Wextra to configure.py, and this option enables a bunch of warning options, including `-Wignored-qualifiers`. so there is no need to enable this specific warning anymore. this change remove ths option from both `configure.py` and the CMake building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17272	2024-02-12 12:32:00 +02:00
Avi Kivity	c14571af16	Update seastar submodule Because Seastar now defaults to C++23, we downgrade it explicitly to C++20. * seastar 289ad5e593...5d3ee98073 (10): > Update supported C++ standards to C++23 and C++20 (dropping C++17) > docker: install clang-tools-18 > http: add handler_base::verify_mandatory_params() > coroutine/exception: document return_exception_ptr() > http: use structured-binding when appropriate > test/http: Read full server response before sending next > doc/lambda-coroutine-fiasco: fix a syntax error > util/source_location-compat: use __cpp_consteval > Fix incorrect class name in documentation. > Add support for missing HTTP PATCH method. Closes scylladb/scylladb#17268	2024-02-12 12:21:47 +02:00
Patryk Wrobel	9fccd968d3	test_tablets.py: implement test_tablet_count_metric_per_shard This change introduces a new test that verifies the functionality related to tablet_count metric. It checks if tablet_count metric is correctly reported and updated when new tables are created, when tables are dropped and when `move_tablet` is executed. Refs: scylladb#16131 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17165	2024-02-12 11:49:38 +02:00
Kefu Chai	54995fcac0	test/manual: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17255	2024-02-12 11:49:38 +02:00
Patryk Jędrzejczak	38e1ddb8bc	docs: using-scylla: cdc: remove info about failing writes to old generations In one of the previous patches, we have allowed writing to the previous CDC generations for `generation_leeway`. This change has made the information about failing writes to the previous generation and the "rejecting writes to an old generation" example obsolete so we remove them. After the change, a write can only fail if its timestamp is distant from the node's timestamp. We add the information about it.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	9b923f8b81	docs: dev: cdc: document writing to previous CDC generations We update the dev documentation after allowing writes to the previous CDC generations in one of the previous patches.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	e64162e8f6	test: add test_writes_to_previous_cdc_generations In one of the previous patches, we allowed writing to the previous CDC generations for `generation_leeway`. Now, we add tests for this change.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	0470b721c2	cdc: generation: allow increasing generation_leeway through error injection The increased `generation_leeway` is used in the next patch to write a test. Since it's no longer a constant, we create a new getter for it.	2024-02-12 10:14:00 +01:00
Patryk Jędrzejczak	330a37b5c9	cdc: metadata: allow sending writes to the previous generations Before this patch, writes to the previous CDC generations would always be rejected. After this patch, they will be accepted if the write's timestamp is greater than `now - generation_leeway`. This change was proposed around 3 years ago. The motivation was to improve user experience. If a client generates timestamps by itself and its clock is desynchronized with the clock of the node the client is connected to, there could be a period during generation switching when writes fail. We didn't consider this problem critical because the client could simply retry a failed write with a higher timestamp. Eventually, it would succeed. This approach is safe because these failed writes cannot have any side effects. However, it can be inconvenient. Writing to previous generations was proposed to improve it. The idea was rejected 3 years ago. Recently, it turned out that there is a case when the client cannot retry a write with the increased timestamp. It happens when a table uses CDC and LWT, which makes timestamps permanent. Once Paxos commits an entry with a given timestamp, Scylla will keep trying to apply that entry until it succeeds, with the same timestamp. Applying the entry involves writing to the CDC log table. If it fails, we get stuck. It's a major bug with an unknown perfect solution. Allowing writes to previous generations for `generation_leeway` is a probabilistic fix that should solve the problem in practice. Note that allowing writes only to the previous generation might not be enough. With the Raft-based topology, it is possible to add multiple nodes concurrently. Moreover, tablets make streaming instant, which allows the topology coordinator to add multiple nodes very quickly. So, creating generations with almost identical timestamps is possible. Then, we could encounter the same bug but, for example, for a generation before the previous generation.	2024-02-12 10:14:00 +01:00
Asias He	a0e46a6b47	repair: Fix rpc::source and rpc::optional parameter order in rpc message In a mixed cluster (5.4.1-20231231.3d22f42cf9c3 and 5.5.0~dev-20240119.b1ba904c4977), in the rolling upgrade test, we saw repair never finishing. The following was observed: rpc - client 127.0.0.2:65273 msg_id 5524: caught exception while processing a message: std::out_of_range (deserialization buffer underflow) It turns out the repair rpc message was not compatible between the two versions. Even with a rpc stream verb, the new rpc parameters must come after the rpc::source<> parameter. The rpc::source<> parameter is not special in the sense that it must be the last parameter. For example, it should be: void register_repair_get_row_diff_with_rpc_stream( std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> ( const rpc::client_info& cinfo, uint32_t repair_meta_id, rpc::source<repair_hash_with_cmd> source, rpc::optional<shard_id> dst_cpu_id_opt)>&& func); not: void register_repair_get_row_diff_with_rpc_stream( std::function<future<rpc::sink<repair_row_on_wire_with_cmd>> ( const rpc::client_info& cinfo, uint32_t repair_meta_id, rpc::optional<shard_id> dst_cpu_id_opt, rpc::source<repair_hash_with_cmd> source)>&& func); Fixes #16941 Closes scylladb/scylladb#17156	2024-02-12 09:50:30 +02:00
Nadav Har'El	13e16475fa	cql-pytest: fix skipping of tests on Cassandra or old Scylla Recently we added a trick to allow running cql-pytests either with or without tablets. A single fixture test_keyspace uses two separate fixtures test_keyspace_tablets or test_keyspace_vnodes, as requested. The problem is that even if test_keyspace doesn't use its test_keyspace_tablets fixture (it doesn't, if the test isn't parameterized to ask for tablets explicitly), it's still a fixture, and it causes the test to be skipped. This causes every test to be skipped when running on Cassandra or old Scylla which doesn't support tablets. The fix is simple - the internal fixture test_keyspace_tablets should yield None instead of skipping. It is the caller, test_keyspace, which now skips the test if tablets are requested but test_keyspace_tablets is None. Fixes #17266 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17267	2024-02-11 21:03:25 +02:00
Kefu Chai	f990ea9678	tools/scylla-nodetool: implement describecluster Refs #15588 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17240	2024-02-11 20:21:07 +02:00
Avi Kivity	14bf09f447	Merge 'utils: managed_bytes: optimize memory usage for small buffers' from Michał Chojnowski managed_bytes is implemented as chain of blob_storage objects. Each blob_storage contains 24 bytes of metadata. But in the most common case -- when there is only a single element in the chain -- 16 bytes of this metadata is trivial/unused. This is regrettable waste because managed_bytes is used for every database cell in the memtables and cache. It means that every value of size >= 7 bytes (smaller ones fit in the inline storage of managed_bytes) receives 16 bytes of useless overhead. To correct that, this series adds to managed_bytes an alternative storage layout -- used for buffers small enough to fit in one fragment -- which only stores the necessary minimum of metadata. (That is: a pointer to the parent, to facilitate moving the storage during memory defragmentation). This saves 16 bytes on every cell greater than 15 bytes. Which includes e.g. every live cell with value bigger than 6 bytes, which likely applies to most cells. Before: ``` $ build/release/scylla perf-simple-query --duration 10 median 218692.88 tps ( 61.1 allocs/op, 13.1 tasks/op, 41762 insns/op, 0 errors) $ build/release/scylla perf-simple-query --duration 10 --write median 173511.46 tps ( 58.3 allocs/op, 13.2 tasks/op, 53258 insns/op, 0 errors) $ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16 - in cache: 2580222 - in memtable: 2549852 ``` After: ``` $ build/release/scylla perf-simple-query --duration 10 median 218780.89 tps ( 61.1 allocs/op, 13.1 tasks/op, 41763 insns/op, 0 errors) $ build/release/scylla perf-simple-query --duration 10 --write median 173105.78 tps ( 58.3 allocs/op, 13.2 tasks/op, 52913 insns/op, 0 errors) $ build/release/test/perf/mutation_footprint_test -c1 --row-count=20 --partition-count=100 --data-size=8 --column-count=16 - in cache: 2068238 - in memtable: 2037696 ``` Closes scylladb/scylladb#14263 * github.com:scylladb/scylladb: utils: managed_bytes: optimize memory usage for small buffers utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view	2024-02-11 16:43:40 +02:00
Kefu Chai	cfb2c2c758	db: add formatter for gc_clock::time_point before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gc_clock::time_point`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17254	2024-02-11 16:39:25 +02:00
Kefu Chai	33224cc10b	sstables/storage: avoid unnecessary type cast the type of `_dir` was changed to fs::path back in `637dd730`, there is no need to cast `_dir` to fs::path anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17256	2024-02-11 16:37:05 +02:00
Benny Halevy	2ed29e31db	gms: inet_address: make constructors explicit In particular, `inet_address(const sstring& addr)` is dangerous, since a function like `topology::get_datacenter(inet_address ep)` might accidentally convert a `sstring` argument into an `inet_address` (which would most likely throw an obscure std::invalid_argument if the datacenter name does not look like an inet_address). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17260	2024-02-11 15:44:13 +02:00
Benny Halevy	136df58cbc	data_value: delete data_value(T) constructor Currently, since the data_value(bool) ctor is implicit, pointers of any kind are implicitly convertible to data_value via intermediate conversion to `bool`. This is error prone, since it allows unsafe comparison between e.g. an `sstring` with `some` by implicit conversion of both sides to `data_value`. For example: ``` sstring name = "dc1"; struct X { sstring s; }; X x(name); auto p = &x; if (name == p) {} ``` Refs #17261 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17262	2024-02-11 15:42:55 +02:00
Benny Halevy	f86a5072d6	gossiper: add_local_application_state: drop internae error After `1d07a596bf` that dropped before_change notifications there is no sense in getting the local endpoint_state_ptr twice: before and after the notifications and call on_internal_error if the state isn't found after the notifications. Just throw the runtime_error if the endpoint state is not found, otherwise, use it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-02-11 13:33:26 +02:00
Benny Halevy	ac83df4875	transport: controller: do_start_server: do not set_cql_read for maintenance port RPC is not ready yet at this point, so we should not set this application state yet. This is indicated by the following warning from `gossiper::add_local_application_state`: ``` WARN 2024-01-22 23:40:53,978 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.227.191.13, application_states = {{RPC_READY -> Value(1,1)}}) ``` That should really be an internal error, but it can't because of this bug. Fixes #16932 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-02-11 11:49:52 +02:00
Kefu Chai	d7a404e1ec	alternator: add formatter for alternator::calculate_value_caller before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `alternator::calculate_value_caller`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17259	2024-02-11 11:49:46 +02:00
Michał Chojnowski	5a3e4a1cc0	utils: managed_bytes: optimize memory usage for small buffers managed_bytes is implemented as chain of blob_storage objects. Each blob_storage contains 24 bytes of metadata. But in the most common case -- when there is only a single element in the chain -- 16 bytes of this metadata is trivial/unused. This is regrettable waste because managed_bytes is used for every database cell in the memtables and cache. It means that every value of size >= 7 bytes (smaller ones fit in the inline storage of managed_bytes) receives 16 bytes of useless overhead. To correct that, this patch adds to managed_bytes an alternative storage layout -- used for buffers small enough to fit in one contiguous fragment -- which only stores the necessary minimum of metadata. (That is: a pointer to the parent, to facilitate moving the storage during memory defragmentation).	2024-02-09 20:56:20 +01:00
Tomasz Grabiec	1eedc85990	test: py: tablets: Fix flakiness of test_tablet_missing_data_repair Reimplement stop/start sequence using rolling_restart() which is safe with regards to UP status propagation and not prone to sudden connection drop which may cause later CQL queries to time out. It also ensures that CQL is up on all the remaining nodes when the with_down callback is executed. Hopefully: Fixes #17107	2024-02-09 20:37:06 +01:00
Tomasz Grabiec	27ed2d94fc	test: pylib: manager_client: Wait for driver to catch up in rolling_restart() For sanity of the developers who want to execute CQL queries after rolling restarts.	2024-02-09 20:35:41 +01:00
Tomasz Grabiec	3ce4ec796a	test: pylib: manager_client: Accept callback in rolling_restart() to execute with node down	2024-02-09 20:35:41 +01:00
Pavel Emelyanov	7a710425f0	streaming: Open-code on-stack lambda It just wraps one if, no benefit in keeping it this way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17250	2024-02-09 20:31:09 +01:00
Petr Gusev	4554653ad9	storage_proxy: add a test for stop_remote This patch adds a reproducer test for an issue #16382. See scylladb/seastar#2044 for details of the problem. The test is enabled only in dev mode since it requires error injection mechanism. The patch adds a new injection into storage_proxy::handle_read to simulate the problem scenario - the node is shutting down and there are some unfinished pending replica requests. Closes scylladb/scylladb#16776	2024-02-09 17:23:13 +01:00
Michał Chojnowski	277a31f0ae	utils: managed_bytes: rewrite managed_bytes methods in terms of managed_bytes_view Some methods of managed_bytes contain the logic needed to read/write the contents of managed_bytes, even though this logic is already present in managed_bytes_{,mutable}_view. Reimplementing those methods by using the views as intermediates allows us to remove some code and makes the responsibilities cleaner -- after the change, managed_bytes contains the logic of allocating and freeing the storage, while views provide read/write access to the storage. This change will simplify the next patch which changes the internals of managed_bytes.	2024-02-09 17:00:33 +01:00
Botond Dénes	ba89b86913	Update tools/java submodule * tools/java c75ce2c1...5e11ed17 (1): > bin/nodetool-wrapper: pass all args to nodetool for testings its ability	2024-02-09 16:34:47 +01:00
Raphael S. Carvalho	daa82f406c	test_tablets: Enable table debug log in split test If the test fails, it's helpful to see how split completion was handled. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17236	2024-02-09 14:38:24 +02:00
Mikołaj Grzebieluch	38191144ac	transport/controller: get rid of magic number for socket path's maximal length Calculate `max_socket_length` from the size of the structure representing the Unix domain socket address.	2024-02-09 12:32:37 +01:00
Mikołaj Grzebieluch	fffb732704	transport/controller: set unix_domain_socket_permissions for maintenance_socket Set filesystem permissions for the maintenance socket to 660. Fixes #16487	2024-02-09 12:32:26 +01:00
Botond Dénes	c7d9708092	Merge 'repair: delete table reference from repair related classes' from Aleksandra Martyniuk row_level_repair and repair_meta keep a reference to a table. If the table is dropped during repair, its object is destructed, leaving a dangling reference. Delete {row_level_repair,repair_meta}::_cf and replace their usages. Fixes: #17233. Closes scylladb/scylladb#17234 * github.com:scylladb/scylladb: repair: delete _cf from repair_meta repair: delete _cf from row_level_repair	2024-02-09 13:16:43 +02:00
Kamil Braun	e9e24f47ec	Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski This PR implements a procedure that upgrades existing clusters to use raft-based topology operations. The procedure does not start automatically, it must be triggered manually by the administrator after making sure that no topology operations are currently running. Upgrade is triggered by sending `POST /storage_service/raft_topology/upgrade` request. This causes the topology coordinator to start who drives the rest of the process: it builds the `system.topology` state based on information observed in gossip and tells all nodes to switch to raft mode. Then, topology coordinator runs normally. Upgrade progress is tracked in a new static column `upgrade_state` in `system.topology`. The procedure also serves as an extension to the current recovery procedure on raft. The current recovery procedure requires restarting nodes in a special mode which disables raft, perform `nodetool removenode` on the dead nodes, clean up some state on the nodes and restart them so that they automatically rebuild the group 0. Raft topology fits into existing procedure by falling back to legacy topology operations after disabling raft. After rebuilding the group 0, upgrade needs to be triggered again. Because upgrade is manual and it might not be convenient for administrators to run it right after upgrading the cluster, we allow the cluster to operate in legacy topology operations mode until upgrade, which includes allowing new nodes to join. In order to allow it, nodes now ask the cluster about the mode they should use to join before proceeding by using a new `JOIN_NODE_QUERY` RPC. The procedure is explained in more detail in `topology-over-raft.md`. Fixes: https://github.com/scylladb/scylladb/issues/15008 Closes scylladb/scylladb#17077 * github.com:scylladb/scylladb: test/topology_custom: upgrade/recovery tests for topology on raft cdc/generation_service: in legacy mode, fall back to raft tables system_keyspace: add read_cdc_generation_opt cdc/generation_service: turn off gossip notifications in raft topo mode cql_test_env: move raft_topology_change_enabled var earlier group0_state_machine: pull snapshot after raft topology feature enabled storage_service: disable persistent feature enabler on upgrade storage_service: replicate raft features to system.peers storage_service: gossip tokens and cdc generation in raft topology mode API: add api for triggering and monitoring topology-on-raft upgrade storage_service: infer which topology operations to use on startup storage_service: set the topology kind value based on group 0 state raft_group0: expose link to the upgrade doc in the header feature_service: fall back to checking legacy features on startup storage_service: add fiber for tracking the topology upgrade progress gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES topology_coordinator: implement core upgrade logic topology_coordinator: extract top-level error handling logic storage_service: initialize discovery leader's state earlier topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data topology_state_machine: introduce upgrade_state storage_service: disallow topology ops when upgrade is in progress raft_group0_client: add in_recovery method storage_service: introduce join_node_query verb raft_group0: make discover_group0 public raft_group0: filter current node's IP in discover_group0 raft_group0: remove my_id arg from discover_group0 storage_service: make _raft_topology_change_enabled more advanced docs: document raft topology upgrade and recovery	2024-02-09 11:54:53 +01:00
Kefu Chai	c1c96bbc16	api/storage_service: drop /storage_service/describe_ring/ API per its description, "`/storage_service/describe_ring/`" returns the token ranges of an arbitrary keyspace. actually, it returns the first keyspace which is of non-local-vnode-based-strategy. this API is not used by nodetool, neither is it exercised in dtest. scylla-manager has a wrapper for this API though, but that wrapper is not used anywhere. in this change, this API is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17197	2024-02-09 12:49:21 +02:00
Pavel Emelyanov	309d34a147	topology: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	f7a13b9bb0	topology: Drop if_enabled checks for logging Now all the logged arguments are lazily evaluated (node* format string and backtrace) so the preliminary log-level checks are not needed. indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	c1ea6c8acf	topology: Add lazy_backtrace() helper This helper returns lazy_eval-ed current_backtrace(), so it will be generated and printed only if logger is really going to do it with its current log-level. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	da53854b66	topology: Add printer wrapper for node* and formatter for it Currently to print node information there's a debug_format(node*) helper function that returns back an sstring object. Here's the formatter that's more flexible and convenient, and a node_printer wrapper, since formatters cannot format non-void pointers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Pavel Emelyanov	aa0293f411	topology: Expand formatter<locator::node> Equip it with :v specifier that turns verbose mode on and prints much more data about the node. Main user will appear in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-09 13:49:15 +03:00
Kefu Chai	c07de1fad1	topology_coordinator: s/sate/state/ fix a typo in the logging message. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17201	2024-02-09 10:27:33 +01:00
Kefu Chai	876478b84f	storage_service: allow concurrent tablet migration in tablets/move API Currently it waits for topology state machine to be idle, so it allows one tablet to be moved at a time. We should allow it to start migration if the current transition state is - topology::transition_state::tablet_migration or - topology::transition_state::tablet_draining to allow starting parallel tablet movement. That will be useful when scripting a custom rebalancing algorithm. in this change, we wait until the topology state machine is idle or it is at either of the above two states. Fixes #16437 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17203	2024-02-08 21:47:15 +01:00
Piotr Dulikowski	4d4976feb0	test/topology_custom: upgrade/recovery tests for topology on raft Adds three tests for the new upgrade procedure: - test_topology_upgrade - upgrades a cluster operating in legacy mode to use raft topology operations, - test_topology_recovery_basic - performs recovery on a three-node cluster, no node removal is done, - test_topology_majority_loss - simulates a majority loss scenario, i.e. removed two nodes out of three, performs recovery to rebuild the raft topology state and re-add two nodes back.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	d04b3338ce	cdc/generation_service: in legacy mode, fall back to raft tables When a node enters recovery after being in raft topology mode, topology operations switch back to legacy mode. We want CDC to keep working when that happens, so we need for the legacy code to be able to access generations created back in raft mode - so that the node can still properly serve writes to CDC log tables. In order to make this possible, modify the legacy logic to also look for a cdc generation in raft tables, if it is not found in legacy tables.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	fb02453686	system_keyspace: add read_cdc_generation_opt The `system_keyspace::read_cdc_generation` loads a cdc generation from the system tables. One of its preconditions is that the generation exists - this precondition is quite easy to satisfy in raft mode, and the function was designed to be used solely in that mode. In legacy mode however, in case when we revert from raft mode through recovery, it might be necessary to use generations created in raft mode for some time. In order to make the function useful as a fallback in case lookup of a generation in legacy mode fails, introduce a relaxed variant of `read_cdc_generation` which returns std::nullopt if the generation does not exist.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	77a8f5e3d6	cdc/generation_service: turn off gossip notifications in raft topo mode In raft topology mode CDC information is propagated through group 0. Prevent the generation service from reacting to gossiper notifications after we made the switch to raft mode.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	29e286ee03	cql_test_env: move raft_topology_change_enabled var earlier We will need to pass it to cdc::generation_service::config in the next commit, so move it a bit earlier.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	07aba3abc4	group0_state_machine: pull snapshot after raft topology feature enabled Pulling a snapshot of the raft topology is done via new rpc verb (RAFT_PULL_TOPOLOGY_SNAPSHOT). If the recipient runs an older version of scylla and does not understand the verb, sending it will result in an error. We usually use cluster features to avoid such situations, but in the case when a node joins the cluster, it doesn't have access to features yet. Therefore, we need to enable pulling snapshots in two situations: - when the SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES feature becomes enabled, - in case when starting group 0 server when joining a cluster that uses raft-based topology.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	53932420f8	storage_service: disable persistent feature enabler on upgrade When starting in legacy mode, a gossip event listener called persistent feature enabler is registered. This listener marks a feature as enabled when it notices, in gossip, that all nodes declare support for the feature. With raft-based topology, features are managed in group 0 instead and do not rely on the persistent feature enabler at all. Make the listener look at the raft_topology_change_enabled() method and prevent it from enabling more features after that method starts returning true.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	4fdd3e014a	storage_service: replicate raft features to system.peers This is necessary for cluster features to work after we switch from raft topology mode to legacy topology mode during recovery, because information in system.peers is used during legacy cluster feature check and when enabling features.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	08865a0bd7	storage_service: gossip tokens and cdc generation in raft topology mode A mixed raft/legacy cluster can happen when entering recovery mode, i.e. when the group 0 upgrade state is set to 0 and a rolling restart is performed. Legacy nodes expect at least information about tokens, otherwise an internal error occurs in the handle_state_normal function. Therefore, make nodes that use raft topology behave well with respect to other nodes.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	a672383c2a	API: add api for triggering and monitoring topology-on-raft upgrade Implements the /storage_service/raft_topology/upgrade route. The route supports two methods: POST, which triggers the cluster-wide upgrade to topology-on-raft, and GET which reports the status of the upgrade.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	0bfcf7d4c6	storage_service: infer which topology operations to use on startup Adds a piece of logic to storage_service::join_cluster which chooses the mode in which it will boot. If the experimental raft topology flag is disabled, it will fall back to legacy node operations. When the node starts for the first time, it will perform group 0 discovery. If the node creates a cluster, it will start it in raft topology mode. If it joins an existing one, it will ask the node chosen by the discovery algorithm about which joining method to use. If the node is already a part of the cluster, it will base its decision on the group0 state.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	1e0aae8576	storage_service: set the topology kind value based on group 0 state When booting for the first time, the node determines whether to use raft mode or not by asking the cluster, or by going straight to raft mode when it creates a new cluster by itself. This happens before joining group 0. However, right after joining group 0, the `upgrade_state` column from `system.topology` is supposed to control which operations the node is supposed to be using. In order to have a single source of control over the flag (either storage_service code or group 0 code), the `_manage_topology_change_kind_from_group0` flag is added which controls whether the `_topology_change_kind_enabled` flag is controlled from group 0 or not.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	5392bac85b	raft_group0: expose link to the upgrade doc in the header So that it can be referenced from other files.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	3513a07d8a	feature_service: fall back to checking legacy features on startup When checking features on startup (i.e. whether support for any feature was revoked in an unsafe way), it might happen that upgrade to raft topology didn't finish yet. In that case, instead of loading an empty set of features - which supposedly represents the set of features that were enabled until last boot - we should fall back to loading the set from the legacy `enabled_features` key in `system.scylla_local`.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	d5a2837658	storage_service: add fiber for tracking the topology upgrade progress The topology coordinator fiber is not started if a node starts in legacy topology mode. We need to start the raft state monitor fiber after all preconditions for starting upgrade to raft topology are met. Add a fiber which is spawned only in legacy mode that will wait until: - The schema-on-raft upgrade finishes, - The SUPPORTS_CONSISTENT_CLUSTER_MANAGEMENT feature is enabled, - The upgrade is triggered by the user. and, after that, will spawn the raft state monitor fiber.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	2ecb8641b1	gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES All nodes being capable of support for raft topology is a prerequisite for starting upgrade to raft topology. The newly introduced feature will track this prerequisite.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	a55797fd41	topology_coordinator: implement core upgrade logic Implement topology coordinator's logic responsible for building the group 0 state related to topology.	2024-02-08 19:12:28 +01:00
Piotr Dulikowski	b3369611bc	topology_coordinator: extract top-level error handling logic ...to a separate method. It will be reused in another method that will be introduced in the next commit.	2024-02-08 19:09:35 +01:00
Kefu Chai	082ad51b71	.git: skip *.svg when scanning spelling errors codespell reports following warnings: ``` Error: ./docs/kb/flamegraph.svg:1: writen ==> written Error: ./docs/kb/flamegraph.svg:1: writen ==> written Error: ./docs/kb/flamegraph.svg:1: storag ==> storage Error: ./docs/kb/flamegraph.svg:1: storag ==> storage ``` these misspellings come from the flamgraph, which can be viewed at https://opensource.docs.scylladb.com/master/kb/flamegraph.html they are very likely to be truncated function names displayed in the frames. and the spelling of these names are not responsible of the author of the article, neither can we change them in a meaningful way. so add it to the skip list. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17215	2024-02-08 19:46:54 +02:00
Kefu Chai	e84a09911a	data_dictionary: use fmt::format() when appropriate we have three format()s in our arsenal: * seastar::format() * fmt::format() * std::format() the first one is used most frequently. but it has two limitations: 1. it returns seastar::sstring instead of std::string. under some circumstances, the caller of the format() function actually expects std::string, in that case a deep copy is performed to construct an instance of std::string. this incurs unnecessary performance overhead. but this limitation is a by-design behavior. 2. it does not do compile-time format check. this can be improved at the Seastar's end. to address these two problems, we switch the callers who expect std::string to fmt::format(). to minimize the impact and to reduce the risk, the switch will be performed piecemeal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17212	2024-02-08 19:44:56 +02:00
Kefu Chai	64c829da70	docs: reformat the state machine diagram using mermaid for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16620	2024-02-08 19:43:53 +02:00
Kefu Chai	3dfb0f86f1	db: add formatter for error_injection_at_startup before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `error_injection_at_startup`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17211	2024-02-08 19:40:48 +02:00
Piotr Dulikowski	09a6862f96	storage_service: initialize discovery leader's state earlier Move it before the topology coordinator is started. This way, the topology coordinator will see non-empty state when it is started and it will allow for us to assert that topology coordinator is never started for an empty system.topology table.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	61e2b2fd9f	topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data Extend the prepare_and_broadcast_cdc_generation_data function like we did in the case of prepare_new_cdc_generation_data - the topology coordinator state building process not only has to create a new generation, but also broadcast it.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	0d9b88fd78	topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data During topology coordinator state build phase a new cdc generation will be generated. We can reuse prepare_new_cdc_generation_data for that. Currently, it always takes sharding information (shard count + ignore msb) from the topology state machine - which won't be available yet at the point of building the topology, so extend the function so that it can accept a custom source of sharding information.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	573bb8dd98	topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data The FIXME mentions that token metadata should return host ID for given token (instead of, presumably, an IP) - but that is already the case, so let's remove the fixme.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	32a2e24a0f	topology_state_machine: introduce upgrade_state `upgrade_state` is a static column which will be used to track the progress of building the topology state machine.	2024-02-08 18:05:02 +01:00
Piotr Dulikowski	b8e4e04096	storage_service: disallow topology ops when upgrade is in progress Forbid starting new topology changes while upgrade to topology on raft is in progress. While this does not take into account any ongoing topology operations, it makes sure that at the end of the upgrade no node will try to perform any legacy topology operations.	2024-02-08 18:05:02 +01:00
Avi Kivity	f1e11a7060	Merge 'scylla-nodetool: implement the describering command' from Botond Dénes On top of the capabilities of the java-nodetool command, tablet support is also implemented: in addition to the existing keyspace parameter, an optional table parameter is also accepted and forwarded to the REST API. For tablet keyspaces this is required to get a ring description. The command comes with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Refs: https://github.com/scylladb/scylladb/issues/16846 Closes scylladb/scylladb#17163 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement describering tools/scylla-nodetool.cc: handle API request failures gracefully test/nodetool: util.py: add check_nodetool_fails_with_all()	2024-02-08 18:52:34 +02:00
Tomasz Grabiec	c06173b3a3	range_streamer, tablets: Do not keep token metadata around streaming It holds back global token metadata barrier during streaming, which limits parallelism of load balancing. Topology transition is protected by the means of topology_guard. Closes scylladb/scylladb#17230	2024-02-08 18:26:00 +02:00
Aleksandra Martyniuk	5f7263afb5	repair: delete _cf from repair_meta repair_meta keeps a reference to a table. If the table is dropped during repair, its object is destructed, leaving a dangling reference. Delete repair_meta::_cf and replace its usages with appropriate methods.	2024-02-08 17:01:41 +01:00
Aleksandra Martyniuk	36882e1c4a	repair: delete _cf from row_level_repair row_level_repair keeps a reference to a table. If the table is dropped during repair, its object is destructed, leaving a dangling reference. Delete row_level_repair::_cf and replace its usages with appropriate methods.	2024-02-08 16:47:02 +01:00
Botond Dénes	8fcb4ed707	tools/scylla-nodetool: implement describering Also implementing tablet support, which basically just means that a new table parameter is also accepted and forwarded to the API, in addition to the existing keyspace one.	2024-02-08 09:20:25 -05:00
Botond Dénes	2df2733ed1	tools/scylla-nodetool.cc: handle API request failures gracefully Currently, error handling is done via catching http::unexpected_status_error and re-throwing an std::runtime_error. Turns out this no longer works, because this error will only be thrown by the http client, if the request had an expected reply code set. The scylla_rest_client doesn't set an expected reply code, so this exception was never thrown for some time now. Furthermore, even when the above worked, it was not too user-friendly as the error message only included the reply-code, but not the reply itself. So in this patch this is fixed: * The handling of http::unexpected_status_error is removed, we don't want to use this mechanism, because it yields very terse error messages. * Instead, the status code of the request is checked explicitely and all cases where it is not 200 are handled. * A new api_request_failed exception is added, which is throw for all non-200 statuses with the extracted error message from the server (if any). * This exception is caught by main, the error message is printed and scylla-nodetool returns with a new distinct error-code: 4. With this, all cases where the request fails on ScyllaDB are handled and we shouldn't hit cases where a nodetool command fails with some obscure JSON parsing error, because the error reply has different JSON schema than the expected happy-path reply.	2024-02-08 09:20:25 -05:00
Botond Dénes	d4f7f23b98	test/nodetool: util.py: add check_nodetool_fails_with_all() Similar to the existing check_nodetool_fails_with() but checks that all error messages from expected_errors are contained in stderr. While at it, use list as the typing hint, instead of typing.List.	2024-02-08 09:20:25 -05:00
Kefu Chai	e02958ad35	sstable: let make_entry_descriptor() accept a single fs::path both of its callers are passing parent_path() and filename() to it. so let the callee to do this. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17225	2024-02-08 16:44:16 +03:00
Kefu Chai	770baa806e	streaming: ignore failures when streaming dropped tables before this change, when performing `stream_transfer_task`, if an exception is raised, we check if the table being streamed is still around, if it is missing, we just skip the table as it should be dropped during streaming, otherwise we consider it a failure, and report it back to the peer. this behavior was introduced by `953af382`. but we perform the streaming on all shards in parallel, and if any of the shards fail because of the dropped table, the exception is thrown. and the current shard is not necessarily the one which throws the exception. actually, current shard might be still waiting for a write lock for removing the table from the database's table metadata. in that case, we consider the streaming RPC call a failure even if the table is already removed on some shard(s). and the peer would fail to bootstreap because of streaming failure. in this change, before catching all exceptions, we handle `no_such_column_family`, and do not fail the streaming in that case. please note, we don't touch other tables, so we can just assume that `no_such_column_family` is thrown only if the table to be transferred is missing. that's why `assert()` is added. Fixes #15370 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17160	2024-02-08 14:07:22 +02:00
Amnon Heiman	f4e82174b2	replica/table.cc: Align the tablet's behavior with other metrics. Due to the potentially large number of per-table metrics, ScyllaDB uses configuration to determine what metrics will be reported. The user can decide if they want per-table-per-shard metrics, per-table-per-instance metrics, or none. This patch uses the same logic for tablet metrics registration. It adds a new metrics group tablets with one metric inside it - count. So, scylla_tablets_count will report the number of tablets per shard. The existing per-table metrics will be reported aggregated or not like the other per-table metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#17182	2024-02-08 12:48:25 +01:00
xuchang	9b675d1fe4	repair: resolve load_history shard load skew Using uuid_xor_to_uint32 instance of table_uuid's most_significant_bits, optimize the hash conflict to shard.	2024-02-08 18:18:01 +08:00
xuchang	ae422fdf69	repair: accelerate repair load_history time Using `parallel_for_each_table` instance of `for_each_table_gently` on `repair_service::load_history`, and parallel num 16 for each shard, to reduced bootstrap time.	2024-02-08 18:18:01 +08:00
Kefu Chai	6eae678eb3	db: add formatter for gms::gossip_digest_ack2 before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::gossip_digest_ack2`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17153	2024-02-08 11:49:37 +02:00
Kefu Chai	07da9fd197	sstable: change sstable_touch_directory_io_check() to accept fs::path this change is a follow-up of `637dd730`. the goal is to use std::filesystem::path for manipulating paths, and to avoid the converting between sstring and fs::path back and forth. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17214	2024-02-08 10:01:47 +03:00
Kefu Chai	2c859bc310	sstables: let state_to_dir(sstable_state) return string_view state_to_dir(sstable_state) translate the enum to the corresponding directory component. and it returns a `seastar::sstring`. not all the callers of this function expect a full-blown sstring instance, on the contrary, quite a few of them just want a string-alike object which represents the directory component, so they can use it, for instance to compose a path, or just format the given `state` enum. so to avoid the overhead of creating/destroying the `seastar::sstring` instance, let's switch to `std::string_view`. with this change, we will be able to implement the fmt::formatter for `sstable_state` without the help of the formatter of sstring. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17213	2024-02-08 10:00:08 +03:00
Kurashkin Nikita	7ce9a3e9e5	cql: add limits for integer values when creating date type Added a simple check that prevents entering int values that lead to overflow when creating a date type. Fixes #17066 Closes scylladb/scylladb#17102	2024-02-08 00:08:01 +02:00
Michał Chojnowski	f5e3a728e4	row_cache_test: test cache consistency during memtable-to-cache merge A rather minimal reproducer for #16759. Not extensive.	2024-02-07 18:31:36 +01:00
Michał Chojnowski	bed20a2e37	row_cache: use preemption_source in update() To facilitate testing the state of cache after the update is preempted at various points, pass a preemption_source& to update() instead of calling the reactor directly. In release builds, the calls to preemption_source methods should compile to the same direct reactor calls as today. Only in dev mode they should add an extra branch. (However, the `preemption_source&` argument has to be shoveled in any mode).	2024-02-07 18:31:36 +01:00
Michał Chojnowski	fabab2f46f	utils: preempt: add preemption_source While `preemption_check` can be passed to functions to control their preemption points, there is no way to inspect the state of the system after the preemption results in a yield. `preemption_source` is a superset of `preemption_check`, which also allows for customizing the yield, not just the preemption check. An implementation passed by a test can hook the yield to put the tested function to sleep, run some code, and then wake the function up. We use the preprocessor to minimize the impact on release builds. Only dev-mode preemption_source is hookable. When it's used in other modes, it should compile to direct reactor calls, as if it wasn't used.	2024-02-07 18:31:28 +01:00
Piotr Dulikowski	f6b303d589	raft_group0_client: add in_recovery method It tells whether the current node currently operates in recovery mode or not. It will be vital for storage_service in determining which topology operations to use at startup.	2024-02-07 10:02:01 +01:00
Piotr Dulikowski	7601f40bf8	storage_service: introduce join_node_query verb When a node joins an existing cluster, it will ask a node that already belongs to the cluster about which topology operations to use when joining.	2024-02-07 10:02:00 +01:00
Piotr Dulikowski	bab5d3bbe5	raft_group0: make discover_group0 public The `discover_group0` function returns only after it either finds a node that belongs to some group 0, or learns that the current node is supposed to create a new one. It will be very helpful to storage_service in determining which topology mode to use.	2024-02-07 10:00:16 +01:00
Piotr Dulikowski	367df7322e	raft_group0: filter current node's IP in discover_group0 This was previously done by `setup_group0`, which always was an (indirect) caller of `discover_group0`. As we want to make `discover_group0` public, it's more convenient for the callers if the called method takes care of sanitizing the argument.	2024-02-07 10:00:16 +01:00
Piotr Dulikowski	86e4a59d5b	raft_group0: remove my_id arg from discover_group0 The goal is to make `discover_group0` public. The `my_id` argument was always set to `this->load_my_id()`, so we can get rid of it and it will make it more convenient to call `discover_group0` from the outside.	2024-02-07 10:00:16 +01:00
Piotr Dulikowski	4174a32d3f	storage_service: make _raft_topology_change_enabled more advanced Currently, nodes either operate in the topology-on-raft mode or legacy mode, depending on whether the experimental topology on raft flag is enabled. This also affects the way nodes join the cluster, as both modes have different procedures. We want to allow joining nodes in legacy mode until the cluster is upgraded. Nodes should automatically choose the best method. Therefore, the existing boolean _raft_topology_change_enabled flag is extended into an enum with the following variants: - unknown - the node still didn't decide in which mode it will operate - legacy - the node uses legacy topology operations - upgrading_to_raft - the node is upgrading to use raft topology operations - raft - the node uses raft topology operations Currently, only the `legacy` and `raft` variants are utilized, but this will change in the commits that follow. Additionally, the `_raft_experimental_topology` bool flag is introduced which retains the meaning of the old `_raft_topology_change_enabled` but has a more fitting name. It is explicitly needed in `topology_state_load`.	2024-02-07 10:00:15 +01:00
Piotr Dulikowski	1104f8b00f	docs: document raft topology upgrade and recovery	2024-02-07 09:54:54 +01:00
Botond Dénes	35da9551fb	Merge 'storage_service: Add describe_ring support for tablet table' from Asias He The table query param is added to get the describe_ring result for a given table. Both vnode table and tablet table can use this table param, so it is easier for users to user. If the table param is not provided by user and the keyspace contains tablet table, the request will be rejected. E.g., curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles" curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1" Refs #16509 Closes scylladb/scylladb#17118 * github.com:scylladb/scylladb: tablets: Convert to use the new version of for_each_tablet storage_service: Add describe_ring support for tablet table storage_service: Mark host2ip as const tablets: Add for_each_tablet_gently	2024-02-07 10:41:36 +02:00
Kefu Chai	b1e4513c2d	dht: add formatter for dht::ring_position before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_posittion`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17194	2024-02-07 09:30:45 +02:00
Kefu Chai	75be212ab2	lang: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17193	2024-02-07 09:27:39 +02:00
Pavel Emelyanov	ca261f8916	utils: Mark chunked_vector::max_chunk_capacity with constexpr It uses only compile-time constants to produce the value, so deserves this marking Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17181	2024-02-07 09:22:23 +02:00
Raphael S. Carvalho	41a5c9eaec	test: Reduce mem footprint of test_token_group_based_splitting_mutation_writer Reduces footprint from hundreds of MB to a very few MB. Issue could be reproduced with: ./build/dev/test/boost/mutation_writer_test --run_test=test_token_group_based_splitting_mutation_writer -- -m 500M --smp 1 --random-seed 1848215131 Fixes #17076. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17187	2024-02-07 09:21:24 +02:00
Tomasz Grabiec	032c1a3d04	Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. closes: #16529 Closes scylladb/scylladb#17079 * github.com:scylladb/scylladb: tablets: Make sure topology has enough endpoints for RF cql-pytest: Disable tablets when RF > nodes-in-DC test: Remove test that configures RF larger than the number of nodes keyspace_metadata: Include tablets property in DESCRIBE	2024-02-06 22:38:11 +01:00
Kefu Chai	f3845a7f3d	sstable: replace "welp" with more descriptive words despite that "welp" is more emotional expressive, it is considered a misspelling of "whelp" by codespell. that's why this comment stands out. but from a non-native speaker's point of view, probably we can use more descriptive words to explain what "welp" is for in plain English. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17183	2024-02-06 16:31:18 +02:00
David Garcia	f14edf3543	docs: correct image sorting order for reference docs This commit displays images in reference docs in the correct order. Prior to this fix, the images were listed as 4.0.0, 4.0.1, and 4.0.2, but they should be sorted in reverse order (4.0.2, 4.0.1, 4.0.0). The changes made in this PR resolve the issue introduced in https://github.com/scylladb/scylladb/pull/16942 when common functions for Azure and GCP were extracted into a separate file without reversing the list as defined in the original extension: https://github.com/scylladb/scylladb/pull/16942/files#diff-b8f6253ea8fdcca681deb556ca61cd1f3feb3b7aeb7e856b145ef9b685aad460L185 Closes scylladb/scylladb#17185	2024-02-06 16:24:22 +02:00
Kamil Braun	c0c291b985	Merge 'raft topology: harden IP related tests' from Petr Gusev In this PR we add the tests for two scenarios, related to the use of IPs in raft topology. * When the replaced node transitions to the `LEFT` state we used to remove the IP of such node from gossiper. If we replace with same IP, this caused the IP of the new node to be removed from gossiper. This problem was fixed by #16820, this PR adds a regression test for it. * When a node is restarted after decommissioning some other node, the restarting node tries to apply the raft log, this log contains a record about the decommissioned node, and we got stuck trying to resolve its IP. This was fixed by #16639 - we excluded IPs from the RAFt log application code and moved it entirely to host_id-s. This PR adds a regression test for this case. Closes scylladb/scylladb#15967 Closes scylladb/scylladb#14803 Closes scylladb/scylladb#17180 * github.com:scylladb/scylladb: test_topology_ops: check node restart after decommission test_replace_reuse_ip: check other servers see the IP	2024-02-06 14:28:06 +01:00
Nadav Har'El	14315fcbc3	mv: fix missing view deletions in some cases of range tombstones For efficiency, if a base-table update generates many view updates that go the same partition, they are collected as one mutation. If this mutation grows too big it can lead to memory exhaustion, so since commit `7d214800d0` we split the output mutation to mutations no longer than 100 rows (max_rows_for_view_updates) each. This patch fixes a bug where this split was done incorrectly when the update involved range tombstones, a bug which was discovered by a user in a real use case (#17117). Range tombstones are read in two parts, a beginning and an end, and the code could split the processing between these two parts and the result that some of the range tombstones in update could be missed - and the view could miss some deletions that happened in the base table. This patch fixes the code in two places to avoid breaking up the processing between range tombstones: 1. The counter "_op_count" that decides where to break the output mutation should only be incremented when adding rows to this output mutation. The existing code strangely incrmented it on every read (!?) which resulted in the counter being incremented on every input fragment, and in particular could reach the limit 100 between two range tombstone pieces. 2. Moreover, the length of output was checked in the wrong place... The existing code could get to 100 rows, not check at that point, read the next input - half a range tombstone - and only then check that we reached 100 rows and stop. The fix is to calculate the number of rows in the right place - exactly when it's needed, not before the step. The first change needs more justification: The old code, that incremented _op_count on every input fragment and not just output fragments did not fit the stated goal of its introduction - to avoid large allocations. In one test it resulted in breaking up the output mutation to chunks of 25 rows instead of the intended 100 rows. But, maybe there was another goal, to stop the iteration after 100 input rows and avoid the possibility of stalls if there are no output rows? It turns out the answer is no - we don't need this _op_count increment to avoid stalls: The function build_some() uses `co_await on_results()` to run one step of processing one input fragment - and `co_await` always checks for preemption. I verfied that indeed no stalls happen by using the existing test test_long_skipped_view_update_delete_with_timestamp. It generates a very long base update where all the view updates go to the same partition, but all but the last few updates don't generate any view updates. I confirmed that the fixed code loops over all these input rows without increasing _op_count and without generating any view update yet, but it does NOT stall. This patch also includes two tests reproducing this bug and confirming its fixed, and also two additional tests for breaking up long deletions that I wanted to make sure doesn't fail after this patch (it doesn't). By the way, this fix would have also fixed issue #12297 - which we fixed a year ago in a different way. That issue happend when the code went through 100 input rows without generating any output rows, and incorrectly concluding that there's no view update to send. With this fix, the code no longer stops generating the view update just because it saw 100 input rows - it would have waited until it generated 100 output rows in the view update (or the input is really done). Fixes #17117 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17164	2024-02-06 14:57:33 +02:00
Asias He	e7e1f4b01a	streaming: Fix rpc::source and rpc::optional parameter order The new rpc::optional parameter must come after any existing parameters, including the rpc::source parameters, otherwise it will break compatibility. The regression was introduced in: ``` commit `fd3c089ccc` Author: Tomasz Grabiec <tgrabiec@scylladb.com> Date: Thu Oct 26 00:35:19 2023 +0200 service: range_streamer: Propagate topology_guard to receivers ``` We need to backport this patch ASAP before we release anything that contains commit `fd3c089ccc`. Refs: #16941 Fixes: #17175 Closes scylladb/scylladb#17176	2024-02-06 13:15:28 +01:00
Botond Dénes	a3d4131918	Merge 'Sanitize replication factor parsing by strategies' from Pavel Emelyanov RF values appear as strings and strategies classes convert them to integers. This PR removes some duplication of efforts in converting code. Closes scylladb/scylladb#17132 * github.com:scylladb/scylladb: network_topology_strategy: Do not walk list of datacenters twice replication_strategy: Do not convert string RF into int twise abstract_replication_strategy: Make validate_replication_factor return value	2024-02-06 13:26:31 +02:00
Kefu Chai	a40d3fc25b	db: add formatter for data_dictionary::user_types_metadata before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `data_dictionary::user_types_metadata`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17140	2024-02-06 13:24:07 +02:00
Kefu Chai	97587a2ea4	test/boost: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17139	2024-02-06 13:22:16 +02:00
Kefu Chai	16e1700246	exceptions: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17152	2024-02-06 13:16:03 +02:00
Kefu Chai	3bca11668a	db: add formatter for exceptions::exception_code before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `exceptions::exception_code`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17151	2024-02-06 13:15:08 +02:00
Pavel Emelyanov	93918eef62	ks_prop_defs: Remove preprocessor-guarded java code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17166	2024-02-06 13:14:15 +02:00
Botond Dénes	53a11cba62	Merge 'types/types.cc: move stringstream content instead of copying it' from Patryk Wróbel C++20 introduced a new overload of std::ostringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. It also removes a helper function `inet_addr_type_impl::to_sstring()` - it was used only in two places. It was replaced with `fmt::to_string()`. Closes scylladb/scylladb#16991 * github.com:scylladb/scylladb: use fmt::to_string() for seastar::net::inet_address types/types.cc: move stringstream content instead of copying it	2024-02-06 13:11:41 +02:00
Botond Dénes	619c3fdf32	Merge 'types: use {fmt} to format time and boolean' from Kefu Chai so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. also, there is no need to reinvent the wheels if there are ready-to-use ones. Closes scylladb/scylladb#17177 * github.com:scylladb/scylladb: types: use {fmt} to format boolean types: use {fmt} to format time	2024-02-06 13:10:39 +02:00
Kefu Chai	3dfe7c44f6	dht: add formatter for dht::sharder before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::sharder`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17178	2024-02-06 13:06:46 +02:00
Kefu Chai	c38325db26	Update seastar submodule * seastar 85359b28...289ad5e5 (19): > net/dpdk: use user-defined literal when appropriate > io_tester: Allow running on non-XFS fs > io: Apply rate-factor early > circular_buffer: make iterator default constructible > net/posix: add a way to change file permissions of unix domain socket > resource: move includes to the top of the source file > treewide: replace calls to future::get0() by calls to future::get() > core/future: add as_ready_future utility > build: do not expose -Wno-error=#warnings > coroutine: remove remnants of variadic futures > build: prevent gcc -Wstringop-overflow workaround from affecting clang > util/spinlock: use #warning instead of #warn > io_tester: encapsulate code into allocate_and_fill_buffer() > io_tester: make maybe_remove_file a function > future: remove tuples from get0_return_type > circular_buffer_fixed_capacity: use std::uninitialized_move() instead of open-coding > rpc/rpc_types: do not use integer literal in preprocessor macro > future: use "(T(...))" instead of "{T(...)}" in uninitialized_set() > net/posix: include used header Closes scylladb/scylladb#17179	2024-02-06 13:05:33 +02:00
David Garcia	ad1c9ae452	docs: fix logging in images extensions Adds a missing logging import in the file scylladb_common_images extension, which prevents the enterprise build from building. Additionally, it standardizes logging handling across the extensions and removes "ami" references in Azure and GCP extensions. Closes scylladb/scylladb#17137	2024-02-06 13:00:37 +02:00
Botond Dénes	ce3233112e	Merge 'configure.py: add -Wextra to cflags' from Kefu Chai also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them. this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra. Closes scylladb/scylladb#17131 * github.com:scylladb/scylladb: configure.py: add -Wextra to cflags test/tablets: do not compare signed and unsigned	2024-02-06 12:57:32 +02:00
Petr Gusev	646ca9515e	test_topology_ops: check node restart after decommission There used to be a problem with restarting a node after decommissioning some other node - the restarting node tries to apply the raft log, this log contains a record about the decommissioned node, and we got stuck trying to resolve its IP. This was fixed in #16639 - we excluded IPs from the RAFt log application code and moved it entirely to host_id-s. In this commit we add a regression test for this case. We move the decommission_node call before server_stop/server_start. We need to add one more server to retain majority when the node is decommissioned, otherwise the topology coordinator won't migrate from the stopped node before replacing it, and we'll get an error. closes #14803	2024-02-06 13:29:42 +04:00
Petr Gusev	aeed5c5fe3	test_replace_reuse_ip: check other servers see the IP The replaced node transitions to LEFT state, and we used to remove the IPs of such nodes from gossiper. If we replace with same IP, this caused the IP of the new node to be removed from gossiper. This problem was fixed by #16820, this commit adds a regression test for it. closes #15967	2024-02-06 13:28:04 +04:00
Botond Dénes	115ee4e1f5	Merge 'doc: remove the OSS and Enterprise Features pages' from Anna Stuchlik This PR removes the following pages: - ScyllaDB Open Source Features - ScyllaDB Enterprise Features They were outdated, incomplete, and misleading. They were also redundant, as the per-release updates are added as Release Notes. With this update, the features listed on the removed pages are added under the common page: ScyllaDB Features. In addition, a reference to the Enterprise-only Features section is added. Note: No redirections are added because no file paths or URLs are changed with this PR. Fixes https://github.com/scylladb/scylladb/issues/13485 Refs https://github.com/scylladb/scylladb/issues/16496 (nobackport) Closes scylladb/scylladb#17150 * github.com:scylladb/scylladb: Update docs/using-scylla/features.rst doc: remove the OSS and Enterprise Features pages	2024-02-06 08:17:18 +02:00
Botond Dénes	edb983d165	Merge 'doc: add the 5.4-to-2024.1 upgrade guide' from Anna Stuchlik This PR: - Adds the upgrade guide from ScyllaDB Open Source 5.4 to ScyllaDB Enterprise 2024.1. Note: The need to include the "Restore system tables" step in rollback has been confirmed; see https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959. - Removes the 5.1-to-2022.2 upgrade guide (unsupported versions). Fixes https://github.com/scylladb/scylladb/issues/16445 Closes scylladb/scylladb#16887 * github.com:scylladb/scylladb: doc: fix the OSS version number doc: metric updates between 2024.1. and 5.4 doc: remove the 5.1-to-2022.2 upgrade guide doc: add the 5.4-to-2024.1 upgrade guide	2024-02-06 08:16:05 +02:00
Kefu Chai	6f07d9edaa	types: use {fmt} to format boolean {fmt} format boolean as "true" / "false" since v2.0.1, no need to reinvent the wheel. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-06 10:40:02 +08:00
Kefu Chai	be29556955	types: use {fmt} to format time so we can tighten our dependencies a little bit. there are only three places where we are using the `date` library. the outputs of these two ways are identical: see https://wandbox.org/permlink/Lo9NUrQNUEqyiMEa and https://godbolt.org/z/YEha9ah7v to compare their outputs. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-06 10:39:30 +08:00
Kefu Chai	02376250b5	storage_service: do no filter tablets tables manually instead of filtering the keyspaces manually, let's reuse `database::get_non_local_strategy_keyspaces_erms()`. less repeatings and more future-proof this way. Fixes #16974 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17121	2024-02-05 21:28:35 +01:00
Anna Stuchlik	d6723134ab	doc: fix the OSS version number Replace "5.2" with "5.4", as this is the 5.4-to-2024.1 upgrade guide.	2024-02-05 21:10:50 +01:00
Tomasz Grabiec	448e117e7d	Merge 'service: validate replication strategy constraints in tablet-moving API' from Aleksandra Martyniuk Validate replication strategy constraints in /storage_service/tablets/move API: - replicas are not on the same node - replicas don't move across DC (violates RF in each DC) - availability is not reduced due to rack overloading Add flag to force tablet move even though dc/rack constraints aren't fulfilled. Test for the change: https://github.com/scylladb/scylla-dtest/pull/3911. Fixes: #16379. Closes scylladb/scylladb#16648 * github.com:scylladb/scylladb: api: service: add force param to move_tablet api service: validate replication strategy constraints	2024-02-05 20:07:21 +01:00
Avi Kivity	9dd76c1035	Merge 'db: add formatter for dht::ring_position_{ext,view}' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_position_ext` and `dht::ring_position_view`, and drop their operator<<. Refs #13245 Closes scylladb/scylladb#17128 * github.com:scylladb/scylladb: db: add formatter for dht::ring_position_ext db: add formatter for dht::ring_position_view	2024-02-05 20:27:54 +02:00
Patryk Wrobel	cc186c1798	use fmt::to_string() for seastar::net::inet_address This change removes inet_addr_type_impl::to_sstring() and replaces its usages with fmt::to_string(). The removed helper performed an uneeded copying via std::ostringstream::str(). Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-02-05 16:56:40 +01:00
Patryk Wrobel	8c0d30cd88	types/types.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::ofstringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-02-05 16:35:27 +01:00
Kamil Braun	968d1e3e78	Merge 'raft topology: make rollback_to_normal a transition state' from Patryk Jędrzejczak After changing `left_token_ring` from a node state to a transition state in scylladb/scylladb#17009, we do the same for `rollback_to_normal`. `rollback_to_normal` was created as a node state because `left_token_ring` was a node state. This change will allow us to distinguish a failed removenode from a failed decommission in the `rollback_to_normal` handler. Currently, we use the same logic for both of them, so it's not required. However, this might change, as it has happened with the decommission and the failed bootstrap/replace in the `left_token_ring` state (scylladb/scylladb#16797). We are making this change now because it would be much harder after branching. Fixes scylladb/scylladb#17032 Closes scylladb/scylladb#17136 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: align indentation docs: dev: topology-over-raft: document the rollback_to_normal state topology_coordinator: improve logs in rollback_to_normal handler raft topology: make rollback_to_normal a transition state	2024-02-05 16:30:20 +01:00
Anna Stuchlik	6d6c400b77	doc: metric updates between 2024.1. and 5.4 This commit adds the information about metrics updates between these two versions. Fixes https://github.com/scylladb/scylladb/issues/16446	2024-02-05 16:24:16 +01:00
Anna Stuchlik	1e9c7ab6d1	Update docs/using-scylla/features.rst Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>	2024-02-05 14:44:31 +01:00
Mikołaj Grzebieluch	4cecda7ead	transport/controller: pass unix_domain_socket_permissions to generic_server::listen	2024-02-05 14:22:03 +01:00
Mikołaj Grzebieluch	6b178f9a4a	transport/controller: split configuring sockets into separate functions TCP sockets and unix domain sockets don't share common listen options excluding `socket_address`. For unix domain sockets, available options will be expanded to cover also filesystem permissions and owner for the socket. Storing listen options for both types of sockets in one structure would become messy. For now, both use `listen_cfg`. In a singular cql controller, only sockets of one type are created, thus it can be easily split into two cases. Isolate maintenance socket from `listen_cfg`.	2024-02-05 14:20:17 +01:00
Nadav Har'El	7888b23e9e	Merge 'test/cql-pytest: re-enable disabled tests' from Botond Dénes In a previous PR (https://github.com/scylladb/scylladb/pull/16840), we enabled tablets by default when running the cql-pytest suite. To handle tests which are failing with tablets enabled, we used a new fixture, `xfail_tablets` to mark these as xfail. This means that we effectively lost test coverage, as these tests can now freely fail and no-one will notice if this is due to a new regression. To restore test coverage, this PR re-enables all the previously disabled tests, by parametrizing each one of them to run with both vnodes and tablets, and targetedly mark as xfail, only the tablet variant. After these tests are fixed with tablets (or the underlying functionality they test is fixed to work with tablets), we will run them with both vnodes and tablets, because these tests apparently do care which replication method is used. Together with https://github.com/scylladb/scylladb/pull/16802, this means all previously disabled test is re-enabled and no coverage is lost. Closes scylladb/scylladb#16945 * github.com:scylladb/scylladb: test/cql-pytest: conftest.py: remove xfail_tablets fixture test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests test/cql-pytest: test_describe.py: re-enable disabled tests test/cql-pytest: test_cdc.py: re-enable disabled tests test/cql-pytest: add parameter support to test_keyspace	2024-02-05 14:12:57 +02:00
Asias He	904bafd069	tablets: Convert to use the new version of for_each_tablet It is more gently than the old one.	2024-02-05 18:45:40 +08:00
Asias He	04773bd1df	storage_service: Add describe_ring support for tablet table The table query param is added to get the describe_ring result for a given table. Both vnode table and tablet table can use this table param, so it is easier for users to user. If the table param is not provided by user and the keyspace contains tablet table, the request will be rejected. E.g., curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles" curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1" Refs #16509	2024-02-05 18:11:07 +08:00
Pavel Emelyanov	45dbe38658	tablets: Make sure topology has enough endpoints for RF When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged. With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:04 +03:00
Pavel Emelyanov	8471d88576	cql-pytest: Disable tablets when RF > nodes-in-DC All the cql-pytest-s run agains single scylla node, but new_random_keyspace() helper may request RF in the rage of 1 through 6, so tablets need to be explicitly disabled when the RF is too large Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:04 +03:00
Pavel Emelyanov	3b9ca29411	test: Remove test that configures RF larger than the number of nodes This is going to be disabled soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:50:03 +03:00
Pavel Emelyanov	8910d37994	keyspace_metadata: Include tablets property in DESCRIBE When tablets are enabled and a keyspace being described has them explicitly disabled or non-automatic initial value of zero, include this into the returned describe statement too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-05 12:49:20 +03:00
Benny Halevy	bd3ed168ab	api/compaction_manager: stop_keyspace_compaction: prevent stack use-after-free Since `t.parallel_foreach_table_state` may yield, we should access `type` by reference when calling `stop_compaction` since it is captured by the calling lambda and gets lost when it returns if `parallel_foreach_table_state` returns an unavailable future. Instead change all captures to `[&]` so we can access the `type` variable held by the coroutine frame. Fixes #16975 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17143	2024-02-05 09:32:08 +02:00
Asias He	ab560c1580	storage_service: Mark host2ip as const So it can be used by another const function.	2024-02-05 13:42:08 +08:00
Asias He	fab0d33d08	tablets: Add for_each_tablet_gently In this version, the callback returns a future<>, so it can yield itself to avoid stalls in func itself.	2024-02-05 13:42:08 +08:00
Anna Stuchlik	f7afa6773f	doc: remove the OSS and Enterprise Features pages This commit removes the following pages: - ScyllaDB Open Source Features - ScyllaDB Enterprise Features They were outdated, incomplete, and misleading. They were also redundant, as the per-release updates are added as Release Notes. With this update, the features listed on the removed pages are added under the common page: ScyllaDB Features. Note: No redirections are added, because no file paths or URLs are changed with this commit. Fixes https://github.com/scylladb/scylladb/issues/13485 Refs https://github.com/scylladb/scylladb/issues/16496	2024-02-04 20:55:40 +01:00
Avi Kivity	784c2f8ad2	Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing. Closes scylladb/scylladb#17130 * github.com:scylladb/scylladb: treewide: replace seastar::future::get0() with seastar::future::get() sstable: capture return value of get0() using auto utils: result_loop: define result_type with decayed type [avi: add another one that snuck in while this was cooking]	2024-02-04 15:23:33 +02:00
Michał Chojnowski	ed98102c45	row_cache: update _prev_snapshot_pos even if apply_to_incomplete() is preempted Commit `e81fc1f095` accidentally broke the control flow of row_cache::do_update(). Before that commit, the body of the loop was wrapped in a lambda. Thus, to break out of the loop, `return` was used. The bad commit removed the lambda, but didn't update the `return` accordingly. Thus, since the commit, the statement doesn't just break out of the loop as intended, but also skips the code after the loop, which updates `_prev_snapshot_pos` to reflect the work done by the loop. As a result, whenever `apply_to_incomplete()` (the `updater`) is preempted, `do_update()` fails to update `_prev_snapshot_pos`. It remains in a stale state, until `do_update()` runs again and either finishes or is preempted outside of `updater`. If we read a partition processed by `do_update()` but not covered by `_prev_snapshot_pos`, we will read stale data (from the previous snapshot), which will be remembered in the cache as the current data. This results in outdated data being returned by the replica. (And perhaps in something worse if range tombstones are involved. I didn't investigate this possibility in depth). Note: for queries with CL>1, occurences of this bug are likely to be hidden by reconciliation, because the reconciled query will only see stale data if the queried partition is affected by the bug on on all queried replicas at the time of the query. Fixes #16759 Closes scylladb/scylladb#17138	2024-02-04 11:17:41 +02:00
Aleksandra Martyniuk	89c683f51a	api: service: add force param to move_tablet api Force flag is added to /storage_service/tablets/move. If force is set to true, replication strategy constraints regarding racks and dcs can be broken.	2024-02-02 19:08:01 +01:00
Aleksandra Martyniuk	3b0fa7335a	service: validate replication strategy constraints Check whether tablet move meets replication strategy constraints, i.e. replicas aren't on the same node, replicas don't move across DCs or HA isn't reduced due to rack overloading. Throw if constraints are broken.	2024-02-02 19:06:45 +01:00
Botond Dénes	017a574b16	tools: lua_sstable_consumer.cc: load os and math libs The amount of standard Lua libraries loaded for the sstable-script was limited, due to fears that some libraries (like the io library) could expose methods, which if used from the script could interfere with seastar's asynchronous arhitecture. So initially only the table and string libraries were loaded. This patch adds two more libraries to be loaded: match and os. The former is self-explanatory and the latter contains methods to work with date/time, obtain the values of environment variables as well as launch external processes. None of these should interfere with seastar, on the other hand the facilities they provide can come very handy for sstable scripts. Closes scylladb/scylladb#17126	2024-02-02 19:00:57 +03:00
Patryk Jędrzejczak	2687204c7f	docs: dev: topology-over-raft: align indentation	2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak	fdd3c3a280	docs: dev: topology-over-raft: document the rollback_to_normal state In one of the previous patches, we changed the `rollback_to_normal` state from a node state to a transition state. We document it in this patch. The node state wasn't documented, so there is nothing to remove.	2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak	8d6a9730db	topology_coordinator: improve logs in rollback_to_normal handler After making `rollback_to_normal` a transition state, we can distinguish a failed decommission from a failed bootstrap in the `rollback_to_normal` handler. We use it to make logs more descriptive.	2024-02-02 16:55:28 +01:00
Patryk Jędrzejczak	25b90f5554	raft topology: make rollback_to_normal a transition state After changing `left_token_ring` from a node state to a transition state in scylladb/scylladb#17009, we do the same for `rollback_to_normal`. `rollback_to_normal` was created as a node state because `left_token_ring` was a node state. This change will allow us to distinguish a failed removenode from a failed decommission in the `rollback_to_normal` handler. Currently, we use the same logic for both of them, so it's not required. However, this might change, as it has happened with the decommission and the failed bootstrap/replace in the `left_token_ring` state (scylladb/scylladb#16797). We are making this change now because it would be much harder after branching. The change also simplifies the code in `topology_coordinator:rollback_current_topology_op`. Moving the `rollback_to_normal` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There is only one change - adding `auto node = get_node_to_work_on(std::move(guard));`.	2024-02-02 16:55:20 +01:00
Pavel Emelyanov	52e6398ad6	messaging: Add formatter for netw::msg_addr As a part of ongoing "support fmt v10" effort Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17053	2024-02-02 15:20:40 +01:00
Kefu Chai	cd3c7a50ed	scylla_raid_setup: drop unused import Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17095	2024-02-02 15:20:40 +01:00
Kefu Chai	e62b29bab7	tasks: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17125	2024-02-02 15:20:40 +01:00
Pavel Emelyanov	75bc702ae8	utils: Remove unused operator<< for file_lock object The lock itself is only used by utils/directories code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17051	2024-02-02 15:20:40 +01:00
Kefu Chai	792fa4441e	docs: s/ontop/on top/ this misspelling is identified by codespell. ontop cannot be found on merriam-webster, but "on top" can, see https://www.merriam-webster.com/dictionary/on%20top, so let's replace ontop with "on top". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17127	2024-02-02 15:20:40 +01:00
Botond Dénes	c9ab39af88	install-dependencies.sh: remove duplicate python3-pyudev package It appeared in the list twice. Closes scylladb/scylladb#17060	2024-02-02 15:20:40 +01:00
Avi Kivity	7cb1c10fed	treewide: replace seastar::future::get0() with seastar::future::get() get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it. Replace with seastar::future::get(), which does the same thing.	2024-02-02 22:12:57 +08:00
Kefu Chai	deef78c796	sstable: capture return value of get0() using auto instead of capturing the return value of `get0()` with a reference type, use a plain type. as `get0()` returns a plain `T` while `get0()` returns a `T&&`, to avoid the value referenced by `T&&` gets destroyed after the expression, let's use a plain `auto` instead of `auto&&`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 22:12:18 +08:00
Kefu Chai	9fcca8f585	utils: result_loop: define result_type with decayed type this change prepares for replacing `seastar::future::get0()` with `seastar::future::get()`. the former's return type is a plain `T`, while the latter is `T&&`. in this case `T` is `boost::outcome::result<..>`. in order to extract its `error_type`, we need to get its decayed type. since `std::remove_reference_t<T>` also returns `T`, let's use it so it works with both `get0()` and `get()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 22:12:18 +08:00
Kefu Chai	19025127c3	configure.py: add -Wextra to cflags also disable some more warnings which are failing the build after `-Wextra` is enabled. we can fix them on a case-by-case basis, if they are geniune issues. but before that, we just disable them. this goal of this change is to reduce the discrepancies between the compile options used by CMake and those used by configure.py. the side effect is that we enable some more warning enabeld by `-Wextra`, for instance, `-Wsign-compare` is enable now. for the full list of the enabled warnings when building with Clang, please see https://clang.llvm.org/docs/DiagnosticsReference.html#wextra. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 20:49:21 +08:00
Kefu Chai	aea6cd0b2d	test/tablets: do not compare signed and unsigned this change should silence following warning: ``` test/boost/tablets_test.cc:1600:27: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare] 19:47:04 for (int i = 0; i < smp::count * 20; i++) { 19:47:04 ~ ^ ~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 20:49:21 +08:00
Pavel Emelyanov	afda0f6ddf	network_topology_strategy: Do not walk list of datacenters twice Construct of that class walks the provided options to get per-DC replication factors. It does it twice -- first to populate the dc:rf map, second to calculate the sum of provided RF values. The latter loop can be optimized away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:39:24 +03:00
Pavel Emelyanov	06f9e7367c	replication_strategy: Do not convert string RF into int twise There are two replication strategy classes that validate string RF and then convert it into integer. Since validation helper returns the parsed value, it can be just used avoiding the 2nd conversion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:38:17 +03:00
Pavel Emelyanov	a8cd3bc636	abstract_replication_strategy: Make validate_replication_factor return value The helper in question checks if string RF is indeed an integer. Make this helper return the "checked" integer value, because it does this conversion. And rename it to parse_... to reflect what it now does. Next patches will make use of this change. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-02 14:36:47 +03:00
Kefu Chai	e56e74df0a	db: add formatter for dht::ring_position_ext before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_position_ext`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 18:37:56 +08:00
Kefu Chai	bb3ba81b15	db: add formatter for dht::ring_position_view before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `dht::ring_position_view`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-02 18:36:17 +08:00
Pavel Emelyanov	9450a03cdf	data_dictionary: Add formatter for keyspace-metadata Other than being fmt v10 compatible, it's also shorter and easier to read, thanks to fmt::join() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17115	2024-02-02 11:26:39 +02:00
Kefu Chai	c7a01b9eb4	transport: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17092	2024-02-02 11:20:24 +02:00
Lakshmi Narayanan Sreethar	e86965c272	compaction: run rewrite_sstables_compaction_task_executor tasks in maintenance group Use maintenance group to run all the compaction tasks that use the rewrite_sstables_compaction_task_executor. Fixes #16699 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17112	2024-02-02 11:18:49 +02:00
Pavel Emelyanov	b557dcbf5a	cql3: Sanitize ALTER KEYSPACE check for non-local storages This kills three birds with one stone 1. fixes broken indentation 2. re-uses new_options local variable 3. stops using string literal to check storage type Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17111	2024-02-02 11:13:29 +02:00
Botond Dénes	63d44712af	Merge 'storage_service: Fix indentation for stream_ranges' from Asias He This is a follow up of "storage_service: Run stream_ranges cmd in streaming group" to fix indentation and drop a unnecessary co_return. Refs: #17090 Closes scylladb/scylladb#17114 * github.com:scylladb/scylladb: storage_service: Drop unnecessary co_return in raft_topology_cmd_handler storage_service: Fix indentation for stream_ranges	2024-02-02 11:12:52 +02:00
Kefu Chai	b45af994c2	locator/utils: remove stale comment this comment has already served its purpose when rewriting C* in C++. since we've re-implemented it, there is no need to keep it around. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17120	2024-02-02 11:07:35 +02:00
Asias He	23a8b0552c	storage_service: Drop unnecessary co_return in raft_topology_cmd_handler It is introduced in "storage_service: Run stream_ranges cmd in streaming group". Refs: #17090	2024-02-02 08:20:06 +08:00
Asias He	732a9b5253	storage_service: Fix indentation for stream_ranges Fixes the indentation introduced in "storage_service: Run stream_ranges cmd in streaming group". Refs: #17090	2024-02-02 08:20:03 +08:00
Pavel Emelyanov	66b859a29f	gms: Remove unused operator<< for feature object Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17109	2024-02-01 19:00:46 +02:00
Kefu Chai	aad8035bed	replica/database: use structured-bind when appropriate for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17104	2024-02-01 16:31:29 +02:00
Botond Dénes	dc8e13baed	Merge 'Move some tablets tests from topology_custom to cql-pytest' from Pavel Emelyanov The latter suite is now tablets-aware and tablets cases from the former one can happily work with single shared scylla instance Closes scylladb/scylladb#17101 * github.com:scylladb/scylladb: test/topology_custom: Remove test_tablets.py test/topology: Move test_tablet_change_initial_tablets test/topology: Move test_tablet_explicit_disabling test/topology: Move test_tablet_default_initialization test/topology: Move test_tablet_change_replication_strategy test/topology: Move test_tablet_change_replication_vnode_to_tablets cql-pytest: Add skip_without_tablets fixture	2024-02-01 16:28:43 +02:00
Kamil Braun	c911bf1a33	test_raft_snapshot_request: fix flakiness (again) At the end of the test, we wait until a restarted node receives a snapshot from the leader, and then verify that the log has been truncated. To check the snapshot, the test used the `system.raft_snapshots` table, while the log is stored in `system.raft`. Unfortunately, the two tables are not updated atomically when Raft persists a snapshot (scylladb/scylladb#9603). We first update `system.raft_snapshots`, then `system.raft` (see `raft_sys_table_storage::store_snapshot_descriptor`). So after the wait finishes, there's no guarantee the log has been truncated yet -- there's a race between the test's last check and Scylla doing that last delete. But we can check the snapshot using `system.raft` instead of `system.raft_snapshots`, as `system.raft` has the latest ID. And since `1640f83fdc`, storing that ID and truncating the log in `system.raft` happens atomically. Closes scylladb/scylladb#17106	2024-02-01 16:06:12 +02:00
Kefu Chai	946d281d39	exceptions: s/#warn/#warning/ `#warning` is a preprocessor macro in C/C++, while `#warn` is not. the reason we haven't run into the build failure caused by this is likely that we are only building on amd64/aarch64 with libstdc++ at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17074	2024-02-01 14:50:17 +02:00
Botond Dénes	1a0300dba6	Merge 'compaction_manager: flush tables before cleanup' from Kefu Chai according to the document "nodetool cleanup" > Triggers removal of data that the node no longer owns currently, scylla performs cleanup by rewriting the sstables. but commitlog segments may still contain the mutations to the tables which are dropped during sstable rewriting. when scylla server restarts, the dirty mutations are replayed to the memtable. if any of these dirty mutations changes the tables cleaned up. the stale data are reapplied. this would lead to data resurrection. so, in this change we following the same model of major compaction where we 1. forcing new active segment, 2. flushing tables being cleaned up 3. perform cleanup using compaction Fixes #4734 Closes scylladb/scylladb#16757 * github.com:scylladb/scylladb: storage_service: fall back to local cleanup in cleanup_all compaction: format flush_mode without the helper compaction_manager: flush all tables before cleanup replica: table: pass do_flush to table::perform_cleanup_compaction() api, compaction: promote flush_mode	2024-02-01 13:47:45 +02:00
libo-sober	a341b870bc	Remove unnecessary calculations in integrity_checked_file_impl::write_dma. Use calculated `rbuf_end` in `std::mismatch` to reduce unnecessary calculations. Closes scylladb/scylladb#16979	2024-02-01 13:42:59 +02:00
Botond Dénes	8debb6b98f	Merge 'storage_service: Run stream_ranges cmd in streaming group' from Asias He Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group. Fixes #17090 Closes scylladb/scylladb#17097 * github.com:scylladb/scylladb: streaming: Verify stream consumer runs inside streaming group storage_service: Run stream_ranges cmd in streaming group	2024-02-01 13:18:26 +02:00
Patryk Wrobel	25324bbe50	cql_test_env.cc: remove dead code This change removes empty anonymous namespace that is a dead code. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17099	2024-02-01 13:17:48 +02:00
Pavel Emelyanov	64cb3a6496	test/topology_custom: Remove test_tablets.py It's now empty, all test cases had been moved to cql-pytest Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	3fbe93e45d	test/topology: Move test_tablet_change_initial_tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	480227fcad	test/topology: Move test_tablet_explicit_disabling Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	45b0490100	test/topology: Move test_tablet_default_initialization Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	3258c56ca3	test/topology: Move test_tablet_change_replication_strategy Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Pavel Emelyanov	6f50cc2783	test/topology: Move test_tablet_change_replication_vnode_to_tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 13:59:51 +03:00
Botond Dénes	b9af2efcb1	Merge 'directories: prevent inode cache fragmentation by orderly verifying data directory contents' from Lakshmi Narayanan Sreethar During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Mitigate this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes https://github.com/scylladb/scylladb/issues/14506 Closes scylladb/scylladb#16952 * github.com:scylladb/scylladb: directories: prevent inode cache fragmentation by orderly verifying data directory contents directories: skip verifying data directory contents during startup directories: co-routinize create_and_verify	2024-02-01 12:30:07 +02:00
Kefu Chai	4ec104e086	api: storage_service: correct a typo s/a any keyspace/a given keyspace/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17098	2024-02-01 10:55:58 +02:00
Botond Dénes	2a4b991772	Merge 'Fix mintimeuuid() call that could crash Scylla' from Nadav Har'El This PR fixes the bug of certain calls to the `mintimeuuid()` CQL function which large negative timestamps could crash Scylla. It turns out we already had protections in place against very positive timestamps, but very negative timestamps could still cause bugs. The actual fix in this series is just a few lines, but the bigger effort was improving the test coverage in this area. I added tests for the "date" type (the original reproducer for this bug used totimestamp() which takes a date parameter), and also reproducers for this bug directly, without totimestamp() function, and one with that function. Finally this PR also replaces the assert() which made this molehill-of-a-bug into a mountain, by a throw. Fixes #17035 Closes scylladb/scylladb#17073 * github.com:scylladb/scylladb: utils: replace assert() by on_internal_error() utils: add on_internal_error with common logger utils: add a timeuuid minimum, like we had maximum test/cql-pytest: tests for "date" type	2024-02-01 10:48:48 +02:00
Patryk Wrobel	6e5a85c387	replica/table: add tablet count metric This change introduces a new metric called tablet_count that is recalculated during construction of table object and on each call to table::update_effective_replication_map(). To get the count of tablet per current shard, tablet map is traversed and for each tablet_id tablet_map::get_shard() is called. Its return value is compared with this_shard_id(). The new metric is maintained and exposed only for tables that uses tablets. Refs: scylladb#16131 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17056	2024-02-01 10:46:53 +02:00
Asias He	2888c3086c	utils: Add uuid_xor_to_uint32 helper Convert the uuid to a uint32_t using xor. It is useful to get a uint32_t number from the uuid. Refs: #16927 Closes scylladb/scylladb#17049	2024-02-01 10:27:55 +02:00
Botond Dénes	f5917b215f	Merge 'replica, tablet_allocator: do not compare unsigned with signed' from Kefu Chai this series addresses couple `-Wsign-compare` warnings surfaced in the tree. Closes scylladb/scylladb#17091 * github.com:scylladb/scylladb: tablet_allocator: do not compare signed and unsigned replica: table: do not compare signed with unsigned	2024-02-01 10:26:04 +02:00
Kefu Chai	7a8e8c2ced	db: add formatter for db::write_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::write_type`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17093	2024-02-01 10:22:45 +02:00
Kefu Chai	005d231f96	db: add formatter for gms::application_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `gms::application_state`, but its operator<< is preserved, as it is still used by the generic homebrew formatter for `std::unordered_map<>`. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17096	2024-02-01 10:02:25 +02:00
Pavel Emelyanov	ab7ce3d1fa	cql-pytest: Add skip_without_tablets fixture It's opposite to skip_with_tablets one and thus also depends on scylla_only one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-01 10:58:13 +03:00
Lakshmi Narayanan Sreethar	dbe758d309	directories: prevent inode cache fragmentation by orderly verifying data directory contents During startup, the contents of the data directory are verified to ensure that they have the right owner and permissions. Verifying all the contents, which includes files that will be read and closed immediately, and files that will be held open for longer durations, together, can lead to memory fragementation in the dentry/inode cache. Prevent this by updating the verification in a such way that these two set of files will be verified separately ensuring their separation in the dentry/inode cache. Fixes #14506 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 12:20:23 +05:30
Lakshmi Narayanan Sreethar	74a4085426	directories: skip verifying data directory contents during startup This is in preparation for a subsequent patch that will verify the contents of the data directory in a specific order. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:54:59 +05:30
Lakshmi Narayanan Sreethar	2e3d2498f4	directories: co-routinize create_and_verify Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-02-01 11:41:10 +05:30
Kefu Chai	5e0b3671d3	storage_service: fall back to local cleanup in cleanup_all before this change, if no keyspaces are specified, scylla-nodetool just enumerate all non-local keyspaces, and call "/storage_service/keyspace_cleanup" on them one after another. this is not quite efficient, as each this RESTful API call force a new active commitlog segment, and flushes all tables. so, if the target node of this command has N non-local keyspaces, it would repeat the steps above for N times. this is not necessary. and after a topology change, we would like to run a global "nodetool cleanup" without specifying the keyspace, so this is a typical use case which we do care about. to address this performance issue, in this change, we improve an existing RESTful API call "/storage_service/cleanup_all", so if the topology coordinator is not enabled, we fall back to a local cleanup to cleanup all non-local keyspaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	4f90a875f6	compaction: format flush_mode without the helper since flush_mode is moved out of major_compaction_task_impl, let's drop the helper hosted in that class as well, and implement the formatter witout it. please note, the `__builtin_unreachable()` is dropped. it should not change the behavior of the formatter. we don't put it in the `default` branch in hope that `-Wswitch` can warn us in the case when another enum of `flush_mode` is added, but we fail to handle it somehow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	b39cc01bb3	compaction_manager: flush all tables before cleanup according to the document "nodetool cleanup" > Triggers removal of data that the node no longer owns currently, scylla performs cleanup by rewriting the sstables. but commitlog segments may still contain the mutations to the tables which are dropped during sstable rewriting. when scylla server restarts, the dirty mutations are replayed to the memtable. if any of these dirty mutations changes the tables cleaned up. the stale data are reapplied. this would lead to data resurrection. so, in this change we following the same model of major compaction: 1. force new active segment, 2. flush all tables 3. perform cleanup using compaction, which rewrites the sstables of specified tables because we already `flush()` all tables in `cleanup_keyspace_compaction_task_impl::run()`, there is no need to call `flush()` again, in `table::perform_cleanup_compaction()`, so the `flush()` call is dropped in this function, and the tests using this function are updated to call `flush()` manually to preserve the existing behavior. there are two callers of `cleanup_keyspace_compaction_task_impl`, * one is `storage_service::sstable_cleanup_fiber()`, which listens for the events fired by topology_state_machine, which is in turn driven by, for instance, "/storage_service/cleanup_all" API. which cleanup all keyspaces in one after another. * another is "/storage_service/keyspace_cleanup", which cleans up the specified keyspace. in the first use case, we can force a new active segment for a single time, so another parameter to the ctor of `cleanup_keyspace_compaction_task_impl` is introduced to specify if the `db.flush_all_tables()` call should be skiped. please note, there are two possible optimizations, 1. force new active segment only if the mutations in it touches the tables being cleaned up 2. after forcing new active segment, only flush the (mem)tables mutated by the non-active segments but let's leave them for following-up changes. this change is a minimal fix for data resurrection issue. Fixes #16757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	34d80690fa	replica: table: pass do_flush to table::perform_cleanup_compaction() this parameter defaults to do_flush::yes, so the existing behavior is preserved. and this change prepares for a change which flushes all tables before performing cleanup on the tables per-demand. please note, we cannot pass compaction::flush_mode to this function, as it is used by compaction/task_manager_module.hh, if we want to share it by both database.hh and compaction/task_manager_module.hh, we would have to find it a new home. so `table::do_flush` boolean tag is reused instead. Refs #16757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	9afec2e3e7	api, compaction: promote flush_mode so that this enum type can be shared by other task(s) as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:25:53 +08:00
Kefu Chai	110d2e52be	tablet_allocator: do not compare signed and unsigned `available_shards` could be negative when `resize_plan` is empty, and the loop to build `resize_plan` stops at the next iteration after `available_shards` is assigned with a negative number. so, instead of making it an `unsigned`, let's just compare it using `std::cmp_less()`. this change should silence following warning: ``` /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -g -O0 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wignored-qualifiers -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -MF service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o.d -o service/CMakeFiles/service.dir/Debug/tablet_allocator.cc.o -c /home/kefu/dev/scylladb/service/tablet_allocator.cc /home/kefu/dev/scylladb/service/tablet_allocator.cc:529:60: error: comparison of integers of different signs: 'long' and 'const size_t' (aka 'const unsigned long') [-Werror,-Wsign-compare] 529 \| if (resize_plan.size() > 0 && available_shards < size_desc.shard_count) { \| ~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:01:19 +08:00
Kefu Chai	493a608417	replica: table: do not compare signed with unsigned this change helps to silence follow warning: ``` /home/kefu/dev/scylladb/replica/table.cc:1952:26: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] 1952 \| for (auto id = 0; id < _storage_groups.size(); id++) { \| ~~ ^ ~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-01 11:01:19 +08:00
Asias He	e1fc91bea9	streaming: Verify stream consumer runs inside streaming group This will catch schedule group leaks by accident. Refs: 17090	2024-02-01 10:37:24 +08:00
Asias He	f103f75ed8	storage_service: Run stream_ranges cmd in streaming group Otherwise it will inherit the rpc verb's scheduling group which is gossip. As a result, it causes the streaming runs in the wrong scheduling group. Fixes #17090	2024-02-01 10:20:02 +08:00
Kamil Braun	b2c02d8268	Merge 'schema: column_mapping::{static,regular}_column_at(): use on_internal_error()' from Botond Dénes Instead of std::out_of_range(). Accessing a non-existing column is a serious bug and the backtrace coming with `on_internal_error()` can be very valuable when debugging it. As can be the coredump that is possible to trigger with `--abort-on-internal-error`. This change follows another similar change to `schema::column_at()`. This should help us get to the bottom of the mysterious repair failures caused by invalid column access, seen in https://github.com/scylladb/scylladb/issues/16821. Refs: https://github.com/scylladb/scylladb/issues/16821 Closes scylladb/scylladb#17080 * github.com:scylladb/scylladb: schema: column_mapping::{static,regular}_column_at(): use on_internal_error() schema: column_mapping: move column accessors out-of-line	2024-01-31 16:29:15 +01:00
Nadav Har'El	458fd0c2f7	utils: replace assert() by on_internal_error() In issue #17035 we had a situation where a certain input timestamp could result in the create_time() utility function getting called on a timestamp that cannot be represented as timeuuid, and this resulted in an assertion failure, and a crash. I guess we used an assertion because we believed that callers try to avoid calling this function on excessively large timestamps, but evidentally, they didn't tried hard enough and we got a crash. The code in UUID_gen.hh changed a lot over the years and has become very convoluted and it is almost impossible to understand all the code paths that could lead to this assertion failures. So it's better to replace this assertion by a on_internal_error, which by default is just an exception - and also logs the backtrace of the failure. Issue #17035 would have been much less serious if we had an exception instead of an assert. Refs #17035 Refs #7871, Refs #13970 (removes an assert) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:28 +02:00
Nadav Har'El	259811b6ec	utils: add on_internal_error with common logger Seastar's on_internal_error() is a useful replacement for assert() but it's inconvenient that it requires each caller to supply a logger - which is often inconvenient, especially when the caller is a header file. So in this patch we introduce a utils::on_internal_error() function which is the same as seastar::on_internal_error() (the former calls the latter), except it uses a single logger instead of asking the caller to pass a logger. Refs #7871 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 16:45:09 +02:00
Patryk Wrobel	c6de20a608	replica/mutation_dump.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::stringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Moreover, it introduces usage of std::stringstream::view() when checking if the stream contains some characters. It skips another copy of the underlying string, because std::string_view is returned. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17084	2024-01-31 14:58:20 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Avi Kivity	c8397f0287	Merge 'Implement tablet splitting' from Raphael "Raph" Carvalho The motivation for tablet resizing is that we want to keep the average tablet size reasonable, such that load rebalancing can remain efficient. Too large tablet makes migration inefficient, therefore slowing down the balancer. If the avg size grows beyond the upper bound (split threshold), then balancer decides to split. Split spans all tablets of a table, due to power-of-two constraint. Likewise, if the avg size decreases below the lower bound (merge threshold), then merge takes place in order to grow the avg size. Merge is not implemented yet, although this series lays foundation for it to be impĺemented later on. A resize decision can be revoked if the avg size changes and the decision is no longer needed. For example, let's say table is being split and avg size drops below the target size (which is 50% of split threshold and 100% of merge one). That means after split, the avg size would drop below the merge threshold, causing a merge after split, which is wasteful, so it's better to just cancel the split. Tablet metadata gains 2 new fields for managing this: resize_type: resize decision type, can be either of "merge", "split", or "none". resize_seq_number: a sequence number that works as the global identifier of the decision (monotonically increasing, increased by 1 on every new decision emitted by the coordinator). A new RPC was implemented to pull stats from each table replica, such that load balancer can calculate the avg tablet size and know the "split status", for a given table. Avg size is aggregated carefully while taking RF of each DC into account (which might differ). When a table is done splitting its storage, it loads (mirror) the resize_seq_number from tablet metadata into its local state (in another words, my split status is ready). If a table is split ready, coordinator will see that table's seq number is the same as the one in tablet metadata. Helps to distinguish stale decisions from the latest one (in case decisions are revoked and re-emited later on). Also, it's aggregated carefully, by taking the minimum among all replicas, so coordinator will only update topology when all replicas are ready. When load balancer emits split decision, replicas will listen to need to split with a "split monitor" that is awakened once a table has replication metadata updated and detects the need for split (i.e. resize_type field is "split"). The split monitor will start splitting of compaction groups (using mechanism introduced here: `081f30d149`) for the table. And once splitting work is completed, the table updates its local state as having completed split. When coordinator pulls the split status of all replicas for a table via RPC, the balancer can see whether that table is ready for "finalizing" the decision, which is about updating tablet metadata to split each tablet into two. Once table replicas have their replication metadata updated with the new tablet count, they can update appropriately their set of compaction groups (that were previously split in the preparation step). Fixes #16536. Closes scylladb/scylladb#16580 * github.com:scylladb/scylladb: test/topology_experimental_raft: Add tablet split test replica: Bypass reshape on boot with tablets temporarily replica: Fix table::compaction_group_for_sstable() for tablet streaming test/topology_experimental_raft: Disable load balancer in test fencing replica: Remap compaction groups when tablet split is finalized service: Split tablet map when split request is finalized replica: Update table split status if completed split compaction work storage_service: Implement split monitor topology_cordinator: Generate updates for resize decisions made by balancer load_balancer: Introduce metrics for resize decisions db: Make target tablet size a live-updateable config option load_balancer: Implement resize decisions service: Wire table_resize_plan into migration_plan service: Introduce table_resize_plan tablet_mutation_builder: Add set_resize_decision() topology_coordinator: Wire load stats into load balancer storage_service: Allow tablet split and migration to happen concurrently topology_coordinator: Periodically retrieve table_load_stats locator: Introduce topology::get_datacenter_nodes() storage_service: Implement table_load_stats RPC replica: Expose table_load_stats in table replica: Introduce storage_group::live_disk_space_used() locator: Introduce table_load_stats tablets: Add resize decision metadata to tablet metadata locator: Introduce resize_decision	2024-01-31 13:59:56 +02:00
Kefu Chai	bd71e0b794	tracing: add formatter for tracing::span_id before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `tracing::span_id`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17058	2024-01-31 13:43:46 +02:00
Kefu Chai	f5e3a2d98e	test.py: add `boost_tests()` to suite this change is a cleanup. so it only returns tests, to be more symmetric with `junit_tests()`. this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`. as only the BoostTest will be asked for `get_test_case()` after this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16961	2024-01-31 13:43:21 +02:00
Botond Dénes	181f68f248	Merge 'raft_group0: trigger snapshot if existing snapshot index is 0' from Kamil Braun The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Also add a test. Fixes scylladb/scylladb#16683 Closes scylladb/scylladb#17072 * github.com:scylladb/scylladb: test: add test for fixing a broken group 0 snapshot raft_group0: trigger snapshot if existing snapshot index is 0	2024-01-31 13:04:59 +02:00
Kefu Chai	843d74428d	configure.py: s/-DBOOST_TEST_DYN_LINK/-DBOOST_ALL_DYN_LINK/ we add `-DBOOST_TEST_DYN_LINK` to the cflags when `--static-boost` is not passed to `configure.py`. but we don't never pass this option to `configure.py` in our CI/CD. also, we don't install `boost-static` in `install-dependencies.sh`, so the linker always use the boost shared libraries when building scylla and other executables in this project. this fact has been verified with the latest master HEAD, after building scylla from `build.ninja` which was in turn created using `configure.py`. Seastar::seastar_testing exposes `Boost::dynamic_linking` in its public interface, and `Boost::dynamic_linking` exposes `-DBOOST_ALL_DYN_LINK` as one of its cflags. so, when building testings using CMake, the tests are compiled with `-DBOOST_ALL_DYN_LINK`, while when building tests using `configure.py`, they are compiled with `-DBOOST_TEST_DYN_LINK`. the former is exposed by `Boost::dynamic_linking`, the latter is hardwired using `configure.py`. but the net results are identical. it would be better to use identical cflags on these two building systems. so, let's use `-DBOOST_ALL_DYN_LINK` in `configure.py` also. furthermore, this is what non-static-boost implies. please note, we don't consume the cflags exposed by `seastar-testing.pc`, so they don't override the ones we set using `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17070	2024-01-31 12:21:31 +02:00
Botond Dénes	ecf654ea11	schema: column_mapping::{static,regular}_column_at(): use on_internal_error() Instead of std::out_of_range(). Accessing a non-existing column is a serious bug and the backtrace coming with on_internal_error() can be very valuable when debugging it. As can be the coredump that is possible to trigger with --abort-on-internal-error. This change follows another similar change to schema::column_at().	2024-01-31 05:12:33 -05:00
Botond Dénes	03ed9f77ff	schema: column_mapping: move column accessors out-of-line To faciliate further patching.	2024-01-31 05:06:34 -05:00
Lakshmi Narayanan Sreethar	b5e1097858	build: cmake: include raft.cc in api library When building with cmake, include the raft source files introduced by commit `617e0913` as sources for api library target. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17075	2024-01-31 11:39:41 +02:00
Nadav Har'El	827c20467c	utils: add a timeuuid minimum, like we had maximum Our time-handling code in UUID_gen.hh is very fragile for very large timestamps, because the different types - such as Cassandra "timestamp" and Timeuuid use very different resolution and ranges. In issue #17035 we discovered a situation where a certain CQL "timestamp"-type value could cause an assertion-failure and a crash in the create_time() function that creates a timeuuid - because that timestamp didn't fit the place we have in timeuuid. We already added in the past a limit, UUID_UNIXTIME_MAX, beyond which we refuse timestamps, to avoid these assertions failure. However, we missed the possibility of negative timestamps (which are allowed in CQL), and indeed a negative timestamp (or a timestamp which was "wrapped" to a negative value) is what caused issue #17035. So this patch adds a second limit, UUID_UNIXTIME_MIN - limiting the most negative timestamp that we support to well below the area which causes problems, and adds tests that reproduce #17035 and that we didn't break anything else (e.g., negative timestamps are still allowed - just not extremely negative timestamps). Fixes #17035. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 11:32:26 +02:00
Kamil Braun	bb22e06a9e	Merge 'abort failed rebuild instead of retrying it forever' from Gleb Add error handling to rebuild instead of retrying it until succeeds. * 'gleb/rebuild-fail-v2' of github.com:scylladb/scylla-dev: test: add test for rebuild failure test: add expected_error to rebuild_node operation topology_coordinator: Propagate rebuild failure to the initiator	2024-01-31 10:07:28 +01:00
Nadav Har'El	47955642d9	test/cql-pytest: tests for "date" type This patch adds a few simple tests for the values of the "date" column type, and how it can be initialized from string or integers, and what do those values mean. Two of the tests reproduce issue #17066, where validation is missing for values that don't fit in a 32-bit unsigned integer. Refs #17066 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-31 10:58:02 +02:00
Patryk Wrobel	1b6ab65c51	reader_concurrency_semaphore.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::stringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#17064	2024-01-31 09:31:50 +02:00
Botond Dénes	f8d3070559	Merge 'Fix flakiness in test_raft_snapshot_request' from Kamil Braun Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added. Closes scylladb/scylladb#17071 * github.com:scylladb/scylladb: test_raft_snapshot_request: fix flakiness test: topology/util: update comment for `reconnect_driver`	2024-01-31 09:30:27 +02:00
Pavel Emelyanov	84ddc37130	utils: Coroutinize disk_sanity() It's pretty hairy in its future-promises form, with coroutines it's much easier to read Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17052	2024-01-31 09:20:21 +02:00
Kefu Chai	8a9f13c187	redis: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17057	2024-01-31 09:17:18 +02:00
Kefu Chai	b931d93668	treewide: fix misspellings in code comments these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17004	2024-01-31 09:16:10 +02:00
Kamil Braun	57d5aa5a68	test: add test for fixing a broken group 0 snapshot In a cluster with group 0 with snapshot at index 0 (such group 0 might be established in a 5.2 cluster, then preserved once it upgrades to 5.4 or later), no snapshot transfer will be triggered when a node is bootstrapped. This way to new node might not obtain full schema, or obtain incorrect schema, like in scylladb/scylladb#16683. Simulate this scenario in a test case using the RECOVERY mode and error injections. Check that the newly added logic for creating a new snapshot if such situation is detected helps in this case.	2024-01-30 16:44:01 +01:00
Kamil Braun	98d75c65af	raft_group0: trigger snapshot if existing snapshot index is 0 The persisted snapshot index may be 0 if the snapshot was created in older version of Scylla, which means snapshot transfer won't be triggered to a bootstrapping node. Commands present in the log may not cover all schema changes --- group 0 might have been created through the upgrade upgrade procedure, on a cluster with existing schema. So a deployment with index=0 snapshot is broken and we need to fix it. We can use the new `raft::server::trigger_snapshot` API for that. Fixes scylladb/scylladb#16683	2024-01-30 16:35:54 +01:00
Kamil Braun	74bf60a8ca	test_raft_snapshot_request: fix flakiness Add workaround for scylladb/python-driver#295. Also an assert made at the end of the test was false, it is fixed with appropriate comment added.	2024-01-30 16:21:24 +01:00
Kamil Braun	39339b9f70	test: topology/util: update comment for `reconnect_driver` The issues mentioned in the comment before are already fixed. Unfortunately, there is another, opposite issue which this function can be used for. The previous issue was about the existing driver session not reconnecting. The current issue is about the existing driver session reconnecting too much... (and in the middle of queries.)	2024-01-30 15:36:48 +01:00
Piotr Smaroń	35ba037724	config: fix a typo in --role-manager's description Closes scylladb/scylladb#17063	2024-01-30 16:13:33 +02:00
Kamil Braun	cf3f26dc94	test_maintenance_mode: fix flakiness Wait until CQL is available and nodes see each other before trying to perform a query. Closes scylladb/scylladb#17059	2024-01-30 14:11:14 +02:00
Gleb Natapov	8b50613465	test: add test for rebuild failure	2024-01-30 11:04:19 +02:00
Gleb Natapov	d62204e758	test: add expected_error to rebuild_node operation	2024-01-30 11:04:19 +02:00
Gleb Natapov	51c40034f5	topology_coordinator: Propagate rebuild failure to the initiator Do not retry rebuild endlessly, but report the error instead.	2024-01-30 11:04:19 +02:00
Kefu Chai	90c0e83f9a	thrift: remove unused namespace definition thrift_transport is never used, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17050	2024-01-30 09:16:47 +02:00
Michał Chojnowski	904bb25987	test: test_tablet_cleanup: wait for servers to see each other before multi-node queries Waiting for CQL connections is not enough. For the queries to succeed, nodes must see each other. We have to wait for this, otherwise the test will be flaky. Fixes #17029 Closes scylladb/scylladb#17040	2024-01-30 08:56:01 +02:00
Tomasz Grabiec	36f218c83b	Merge 'main: refuse startup when tablet resharding is required' from Botond Dénes We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.) Refs: #16739 Fixes: #16843 Closes scylladb/scylladb#17008 * github.com:scylladb/scylladb: test/topolgy_experimental_raft: test_tablets.py: add test for resharding test/pylib: manager[_client]: add update_cmdline() main: refuse startup when tablet resharding is required locator: tablets: add check_tablet_replica_shards()	2024-01-29 23:39:41 +01:00
Pavel Emelyanov	370fbd346c	Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel `db::config` is a class, that is used in many places across the code base. When it is changed, its clients' code need to be recompiled. It represents the configuration of the database. Some fields of the configuration that describe the location of directories may be empty. In such cases `db::config::setup_directories()` function is called - it modifies the provided configuration. Such modification is not good - it is better to keep `db::config` intact. This PR: - extends the public interface of utils::directories class to provide required directory paths to the users - removes 'db::config::setup_directories()' to avoid altering the fields of configuration object - replaces usages of db::config object with utils::directories object in places that require obtaining paths to dirs Fixes: scylladb#5626 Closes scylladb/scylladb#16787 * github.com:scylladb/scylladb: utils/directories: make utils::directories::set an internal type db::config: keep dir paths unchanged cql_transport/controler: use utils::directories to get paths of dirs service/storage_proxy: use utils::directories to get paths of dirs api/storage_service.cc: use utils::directories to get paths of dirs tools/scylla-sstable.cc: use utils::directories to get paths db/commitlog: do not use db::config to get dirs Use utils::directories to get dirs paths in replica::database Allow utils::directories to provide paths to dirs Clean-up of utils::directories	2024-01-29 18:01:15 +03:00
Kamil Braun	0912d2a2c6	Merge 'raft topology: make left_token_ring a transition state' from Patryk Jędrzejczak When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this PR is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Fixes scylladb/scylladb#16944 Closes scylladb/scylladb#17009 * github.com:scylladb/scylladb: docs: dev: topology-over-raft: document the left_token_ring state topology_coordinator: adjust reason string in left_token_ring handler raft topology: make left_token_ring a transition state topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes	2024-01-29 15:29:01 +01:00
Kefu Chai	819fc95a67	reader: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17036	2024-01-29 16:21:42 +02:00
Kefu Chai	43094d2023	db: add formatter for db::read_repair_decision before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::read_repair_decision`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17033	2024-01-29 15:43:51 +02:00
Botond Dénes	d202d32f81	Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun This allows the user of `raft::server` to cause it to create a snapshot and truncate the Raft log (leaving no trailing entries; in the future we may extend the API to specify number of trailing entries left if needed). In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables). The PR adds the API to `raft::server` and a HTTP endpoint that uses it. In a follow-up PR, we plan to modify group 0 server startup logic to automatically call this API if it sees that no snapshot is present yet (to automatically fix the aforementioned 5.2 deployments once they upgrade.) Closes scylladb/scylladb#16816 * github.com:scylladb/scylladb: raft: remove `empty()` from `fsm_output` test: add test for manual triggering of Raft snapshots api: add HTTP endpoint to trigger Raft snapshots raft: server: add `trigger_snapshot` API raft: server: track last persisted snapshot descriptor index raft: server: framework for handling server requests raft: server: inline `poll_fsm_output` raft: server: fix indentation raft: server: move `io_fiber`'s processing of `batch` to a separate function raft: move `poll_output()` from `fsm` to `server` raft: move `_sm_events` from `fsm` to `server` raft: fsm: remove constructor used only in tests raft: fsm: move trace message from `poll_output` to `has_output` raft: fsm: extract `has_output()` raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` raft: server: pass `*_aborted` to `set_exception` call	2024-01-29 15:06:04 +02:00
Beni Peled	8009170d3a	docs: update the installation instructions with the new gpg 2024 key Closes scylladb/scylladb#17019	2024-01-29 14:37:25 +02:00
Kefu Chai	6f55d68dd9	.git: add more skip words these words are either * shortened words: strategy => strat, read_from_primary => fro * or acronyms: node_or_data => nd before we rename them with better names, let's just add them to the ignore word list. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17002	2024-01-29 14:37:03 +02:00
Patryk Wrobel	781a6a5071	utils/directories: make utils::directories::set an internal type Previously, utils::directories::set could have been used by clients of utils::directories class to provide dirs for creation. Due to moving the responsibility for providing paths of dirs from db::config to utils::directories, such usage is no longer the case. This change: - defines utils::directories::set in utils/directories.cc to disallow its usage by the clients of utils::directories - makes utils::directories::create_and_verify() member function private; now it is used only by the internals of the class - introduces a new member function to utils::directories called create_and_verify_sharded_directory() to limit the functionality provided to clients Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	dc8d5ffaf6	db::config: keep dir paths unchanged This change is intended to ensure, that db::config fields related to directories are not changed. To achieve that a member function called setup_directories() is removed. The responsibility for directories paths has been moved to utils::directories, which may generate default paths if the configuration does not provide a specific value. Fixes: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:41 +01:00
Patryk Wrobel	0f3b00f9ad	cql_transport/controler: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in cql_transport/controler. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:20:38 +01:00
Patryk Wrobel	f08768e767	service/storage_proxy: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in service/storage_proxy. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	5ac3d0f135	api/storage_service.cc: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories in api/storage_service.cc in order to get the paths of directories. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	51fa108df7	tools/scylla-sstable.cc: use utils::directories to get paths This change replaces usage of db::config with usage of utils::directories to get paths of directories in tools/scylla-sstable.cc. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	804afffb11	db/commitlog: do not use db::config to get dirs This change removes usage of db::config to get path of commitlog_directory. Instead, it introduces a new parameter to directly pass the path to db::commitlog::config::from_db_config(). Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	9483d149af	Use utils::directories to get dirs paths in replica::database This change replaces the usage of db::config with usage of utils::directories to get dirs paths in replica::database class. Moreover, it adjusts tests that require construction of replica::database - its constructor has been changed to accept utils::directories object. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1cd676e438	Allow utils::directories to provide paths to dirs This change extends utils::directories class in the following way: - adds new member variables that correspond to fields from db::config that describe paths of directories - introduces a public interface to retrieve the values of the new members - allows construction of utils::directories object based on db::config to setup internal member variables related to paths to dirs The new members of utils::directories are overriden when the provided values are empty. The way of setting paths is taken from db::config. To ensure that the new logic works correctly `utils_directories_test` has been created. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Patryk Wrobel	1b0ccaf4f2	Clean-up of utils::directories This change is intended to clean-up files in which utils::directories class is defined to ease further extensions. The preparation consists of: - removal of `using namespace` from directories.hh to avoid namespace pollution in files, that include this header - explicit inclusion of headers, that were missing or were implicitly included to ensure that directories.hh is self-sufficient - defining directories::set class outside of its parent to improve readability Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Botond Dénes	fd66ce1591	test/topolgy_experimental_raft: test_tablets.py: add test for resharding Check that scylla refuses to start when the shard count is reduced.	2024-01-29 07:04:33 -05:00
Botond Dénes	a7a5aada2a	test/pylib: manager[_client]: add update_cmdline() Similar to the existing update_config(). Updates the command-line arguments of the specified nodes, merging the new options into the existing ones. Needs a restart to take effect.	2024-01-29 07:04:33 -05:00
Botond Dénes	8a439fc2a8	main: refuse startup when tablet resharding is required We do not support tablet resharding yet. All tablet-related code assumes that the (host_id, shard) tablet replica is always valid. Violating this leads to undefined behaviour: errors in the tablet load balancer and potential crashes. Avoid this by refusing to start if the need to resharding is detected. Be as lenient as possible: check all tablets with a replica on this node, and only refuse startup if at least one tablet has an invalid replica shard. Startup will fail as: ERROR 2024-01-26 07:03:06,931 [shard 0:main] init - Startup failed: std::runtime_error (Detected a tablet with invalid replica shard, reducing shard count with tablet-enabled tables is not yet supported. Replace the node instead.)	2024-01-29 07:04:33 -05:00
Botond Dénes	95b6aeebae	locator: tablets: add check_tablet_replica_shards() Checks that all tablets with a replica on the this node, have a valid replica shard (< smp::count). Will be used to check whether the node can start-up with the current shard-count.	2024-01-29 07:04:33 -05:00
Patryk Jędrzejczak	7c10cae6c4	docs: dev: topology-over-raft: document the left_token_ring state In one of the previous patches, we changed the `left_token_ring` state from a node state to a transition state. We document it in this patch. The node state wasn't documented, so there is nothing to remove.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	9b2d1a20a3	topology_coordinator: adjust reason string in left_token_ring handler We were using the "finished decommission node" reason string for a failed bootstrap and replace.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	b0eef50b2e	raft topology: make left_token_ring a transition state A node can be in the `left_token_ring` state after: - a finished decommission, - a failed bootstrap, - a failed replace. When a node is in the `left_token_ring` state, we don't know how it has ended up in this state. We cannot distinguish a node that has finished decommissioning from a node that has failed bootstrap. The main problem it causes is that we incorrectly send the `barrier_and_drain` command to a node that has failed bootstrapping or replacing. We must do it for a node that has finished decommissioning because it could still coordinate requests. However, since we cannot distinguish nodes in the `left_token_ring` state, we must send the command to all of them. This issue appeared in scylladb/scylladb#16797 and this patch is a follow-up that fixes it. The solution is changing `left_token_ring` from a node state to a transition state. Regarding implementation, most of the changes are simple refactoring. The less obvious are: - Before this patch, in `system_keyspace::left_topology_state`, we had to keep the ignored nodes' IDs for replace to ensure that the replacing node will have access to it after moving to the `left_token_ring` state, which happens when replace fails. We don't need this workaround anymore. When we enter the new `left_token_ring` transition state, the new node will still be in the `decommissioning` state, so it won't lose its request param. - Before this patch, a decommissioning node lost its tokens while moving to the `left_token_ring` state. After the patch, it loses tokens while still being in the `decommissioning` state. We ensure that all `decommissioning` handlers correctly handle a node that lost its tokens. Moving the `left_token_ring` handler from `handle_node_transition` to `handle_topology_transition` created a large diff. There are only three changes: - adding `auto node = get_node_to_work_on(std::move(guard));`, - adding `builder.del_transition_state()`, - changing error logged when `global_token_metadata_barrier` fails.	2024-01-29 10:39:07 +01:00
Patryk Jędrzejczak	12eb0738cf	topology_coordinator: rollback_current_topology_op: remove unused exclude_nodes The `exclude_nodes` variable was unused, but it wasn't a bug. The `left_token_ring` and `rollback_to_normal` handlers correctly compute excluded nodes on their own.	2024-01-29 10:39:06 +01:00
Kefu Chai	0cbf8f75f0	db: add formatter for dht::decorated_key and repair_sync_boundary before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for dht::decorated_key and repair_sync_boundary. please note, before this change, repair_sync_boundary was using the operator<< based formatter of `dht::decorated_key`, so we are updating both of them in a single commit. because we still use the homebrew generic formatter of vector<> in to format vector<repair_sync_boundary> and vector<dht::decorated_key>, so their operator<< are preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16994	2024-01-29 11:11:41 +02:00
Tzach Livyatan	06a9a925a5	Update link to sizing / pricing calc Closes scylladb/scylladb#17015	2024-01-29 11:07:20 +02:00
Kefu Chai	b5ff098f28	thrift: add formatter for cassandra::ConsistencyLevel::type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cassandra::ConsistencyLevel::type. please note, the operator<< for `cassandra::ConsistencyLevel::type` is generated using `thrift` command line tool, which does not emit specialization for fmt::formatter yet, so we need to use `fmt::ostream_formatter` to implement the formatter for this type. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17013	2024-01-29 10:10:35 +02:00
Pavel Emelyanov	3abdb3c7ee	tablets: Remove tablet_aware_replication_strategy::parse_initial_tablets It's now unused, string with initial tablets its parsed elsewhere Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17010	2024-01-29 10:03:38 +02:00
Kefu Chai	912c588975	thrift: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17012	2024-01-29 10:02:30 +02:00
Kefu Chai	abb12979f8	raft: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17011	2024-01-29 10:00:56 +02:00
Kefu Chai	8f38bd5376	commitlog: add formatter for db::replay_position before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `db::replay_position`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17014	2024-01-29 09:59:30 +02:00
Botond Dénes	d3c1be9107	Merge 'alternator: enable tablets by default if experimental feature is enabled' from Nadav Har'El This series does a similar change to Alternator as was done recently to CQL: 1. If the "tablets" experimental feature in enabled, new Alternator tables will use tablets automatically, without requiring an option on each new table. A default choice of initial_tablets is used. These choices can still be overridden per-table if the user wants to. 3. In particular, all test/alternator tests will also automatically run with tablets enabled 4. However, some tests will fail on tablets because they use features that haven't yet been implemented with tablets - namely Alternator Streams (Refs #16317) and Alternator TTL (Refs #16567). These tests will - until those features are implemented with tablets - continue to be run without tablets. 5. An option is added to the test/alternator/run to allow developers to manually run tests without tablets enabled, if they wish to (this option will be useful in the short term, and can be removed later). Fixes #16355 Closes scylladb/scylladb#16900 * github.com:scylladb/scylladb: test/alternator: add "--vnodes" option to run script alternator: use tablets by default, if available test/alternator: run some tests without tablets	2024-01-29 09:22:13 +02:00
Kefu Chai	cb5453d534	.git: only allow codespell to run on master branch so that non-master branches are not read by 3rd-party tools unless they are audited. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16999	2024-01-29 09:04:20 +02:00
Kefu Chai	f96d25a0a7	tool: check for existence of keyspace before getting it in general, user should save output of `DESC foo.bar` to a file, and pass the path to the file as the argument of `--schema-file` option of `scylla sstable` commands. the CQL statement generated from `DESC` command always include the keyspace name of the table. but in case user create the CQL statement manually and misses the keyspace name. he/she would have following assertion failure ``` scylla: cql3/statements/cf_statement.cc:49: virtual const sstring &cql3::statements::raw::cf_statement::keyspace() const: Assertion `_cf_name->has_keyspace()' failed. ``` this is not a great user experience. so, in this change, we check for the existence of keyspace before looking it up. and throw a runtime error with a better error mesage. so when the CQL statement does not have the keyspace name, the new error message would look like: ``` error processing arguments: could not load schema via schema-file: std::runtime_error (tools::do_load_schemas(): CQL statement does not have keyspace specified) ``` since this check is only performed by `do_load_schemas()` which care about the existence of keyspace, and it only expects the CQL statement to create table/keyspace/type, we just override the new `has_keyspace()` method of the corresponding types derived from `cf_statement`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16981	2024-01-29 09:02:01 +02:00
Anna Stuchlik	dfa88ccc28	doc: document nodetool resetlocalschema This adds the documentation for the nodetool resetlocalschema command. The syntax description is based on the description for Cassandra and the ScyllaDB help for nodetool. Fixes https://github.com/scylladb/scylladb/issues/16286 Closes scylladb/scylladb#16790	2024-01-28 21:09:02 +01:00
Kefu Chai	fe3bc00045	topology_coordinator: fix misspellings in log these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17006	2024-01-26 16:50:39 +02:00
Dawid Medrek	b92fb3537a	main: Postpone start-up of hint manager In this commit, we postpone the start-up of the hint manager until we obtain information about other nodes in the cluster. When we start the hint managers, one of the things that happen is creating endpoint managers -- structures managed by db::hints::manager. Whether we create an instance of endpoint manager depends on the value returned by host_filter::can_hint_for, which, in turn, may depend on the current state of locator::topology. If locator::topology is incomplete, some endpoint managers may not be started even though they should (because the target node IS part of the cluster and we SHOULD send hints to it if there are some). The situation like that can happen because we start the hint managers too early. This commit aims to solve that problem. We only start the hint managers when we've gathered information about the other nodes in the cluster and created the locator::topology using it. Hinted Handoff is not negatively affected by these changes since in between the previous point of starting the hint managers and the current one, all of the mutations performed by service::storage_proxy target the local node, so no hints would need to be generated anyway. Fixes scylladb/scylladb#11870 Closes scylladb/scylladb#16511	2024-01-26 12:49:40 +01:00
Botond Dénes	c6fd4dffbb	Merge 'Remove anonymous namespaces from headers' from Patryk Wróbel Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This PR removes unnamed namespaces from header files. References: - [CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header"](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf21-dont-use-an-unnamed-anonymous-namespace-in-a-header) - [SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file"](https://wiki.sei.cmu.edu/confluence/display/cplusplus/DCL59-CPP.+Do+not+define+an+unnamed+namespace+in+a+header+file) Closes scylladb/scylladb#16998 * github.com:scylladb/scylladb: utils/config_file_impl.hh: remove anonymous namespace from header mutation/mutation.hh: remove anonymous namespace from header	2024-01-26 13:20:17 +02:00
Kefu Chai	a9d781d70f	test/nodetool: only test "storage_service/cleanup_all" with scylla this RESTful API is a scylla specific extension and is only used by scylla-nodetool. currently, the java-based nodetool does not use it at all, so mark it with "scylla_only". one can verify this change with: ``` pytest --mode=debug --nodetool=cassandra test_cleanup.py::test_cleanup ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17001	2024-01-26 13:19:15 +02:00
Botond Dénes	582ddc70ec	Merge 'test/nodetool: return a randomized address if not running with unshare' from Kefu Chai we should allow user to run nodetool tests without `test.py`. but there are good chance that the host could be reused by multiple tests or multiple users who could be using port 12345. by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Closes scylladb/scylladb#16996 * github.com:scylladb/scylladb: test/nodetool: return a randomized address if not running with unshare test/nodetool: return an address from loopback_network fixture	2024-01-26 13:15:58 +02:00
Kefu Chai	9ee6c00c84	docs: fix misspellings these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17005	2024-01-26 13:14:21 +02:00
Kefu Chai	72cec22932	repair: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16993	2024-01-26 13:12:38 +02:00
Kamil Braun	4f736894e1	Merge 'Add maintenance mode' from Mikołaj Grzebieluch In this mode, the node is not reachable from the outside, i.e. * it refuses all incoming RPC connections, * it does not join the cluster, thus * all group0 operations are disabled (e.g. schema changes), * all cluster-wide operations are disabled for this node (e.g. repair), * other nodes see this node as dead, * cannot read or write data from/to other nodes, * it does not open Alternator and Redis transport ports and the TCP CQL port. The only way to make CQL queries is to use the maintenance socket. The node serves only local data. To start the node in maintenance mode, use the `--maintenance-mode true` flag or set `maintenance_mode: true` in the configuration file. REST API works as usual, but some routes are disabled: * authorization_cache * failure_detector * hinted_hand_off_manager This PR also updates the maintenance socket documentation: * add cqlsh usage to the documentation * update the documentation to use `WhiteListRoundRobinPolicy` Fixes #5489. Closes scylladb/scylladb#15346 * github.com:scylladb/scylladb: test.py: add test for maintenance mode test.py: generalize usage of cluster_con test.py: when connecting to node in maintenance mode use maintenance socket docs: add maintenance mode documentation main: add maintenance mode main: move some REST routes initialization before joining group0 message_service: add sanity check that rpc connections are not created in the maintenance mode raft_group0_client: disable group0 operations in the maintenance mode service/storage_service: add start_maintenance_mode() method storage_service: add MAINTENANCE option to mode enum service/maintenance_mode: add maintenance_mode_enabled bool class service/maintenance_mode: move maintenance_socket_enabled definition to seperate file db/config: add maintenance mode flag docs: add cqlsh usage to maintenance socket documentation docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy	2024-01-26 11:02:34 +01:00
Botond Dénes	f94acc2eb4	test/cql-pytest: conftest.py: remove xfail_tablets fixture No test uses it and going forward we should not add tests wchich do not work with tablets.	2024-01-26 04:02:40 -05:00
Botond Dénes	dcaf308a59	test/cql-pytest: test_tombstone_limit.py: re-enable disabled tests The tests in this file, that are related to partition-scans are failing with tablets, and were hence disabled with xfail_tablets. This means we are loosing test coverage, so parametrize these tests to run with both vnodes and tablets, and targetedly mark as xfail only when running with tablets.	2024-01-26 04:02:40 -05:00
Botond Dénes	3527d0aaed	test/cql-pytest: test_describe.py: re-enable disabled tests This test file has two tests disabled: * test_desc_cluster - due to #16789 * test_whitespaces_in_table_options - due to #16317 They are disabled via xfail, because they do not work with tablets. This means we loose test coverage of the respective functionality. This patch re-enables the two tests, by parametrizing them to run with both vnodes and tablets: * test_desc_cluster - when run with tablets, endpoint info is not validated. The test is still useful because it checks that DESC CLUSTER doesn't break with tablets. A FIXME with a link to #16789 is left. * test_whitespaces_in_table_options - marked xfail when run with tablets, but not when run with vnodes, thus we re-gain the test coverage.	2024-01-26 04:02:40 -05:00
Botond Dénes	a3b75e863b	test/cql-pytest: test_cdc.py: re-enable disabled tests The tests in this file are currently all marked with xfail_tablets, because tablets are not enabled by default in the cql-pytest suite and CDC doesn't currently work with tablets at all. This however means that the CDC functionality looses test coverage. So instead, of a blanket xfail, prametrize these tests to run with both vnodes and tablets, and add a targeted xfail for the tablets parameter. This way the no coverage is lost, the tests are still running with vnode (and will fail if regressions are introduced), and they are allowed to xfail with tablets enabled. We could simply make these tests only run with vnodes for now. But looking forward, after the CDC functionality is fixed to work with tablets, we want to verify that it works with both vnodes and tablets. So we run the test with both and leave the xfail as a remainder that a fix is required.	2024-01-26 04:02:40 -05:00
Botond Dénes	631f7c99f5	test/cql-pytest: add parameter support to test_keyspace Tests can now request to be run against both tablets and vnodes, via: @pytest.mark.parametrize("test_keyspace", ["tablets", "vnodes"], indirect=True) This will set request.param for the test_keyspace fixture, which can create the keyspace according to the requested parameter. This way, tests can conveniently opt-in to be run against both replication methods. When not parameterized like this, the test_keyspace fixture will create a keyspace as before -- with tablets, if support is enabled.	2024-01-26 04:02:40 -05:00
Kefu Chai	637dd73079	sstable/storage: use fs::path to represent _dir and _temp_dir they are directories, and we are concating strings to build the paths to the sstable components. so it would be more elegant to use fs::path for manipulating paths. this change was inspired by the discussion on passing the relative path to sstable to `scylla sstables`, where we use the `path::parent_path()` as the dir of sstable, and then concatenate it with the filename component. but if the `parent_path()` method returns an empty string, we end up with a path like "/me-42-big-TOC.txt", which is not reachable. what we should be reading is "me-42-big-TOC.txt". so, we should better off either using `fs::path` or enforcing the absolute path. since we already using "/" as separator, and concatenating strings, this is an opportunity to switch over to `fs::path` to address the problem and to avoid the string concatenating. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16982	2024-01-26 09:54:41 +02:00
Patryk Wrobel	6faa178f10	utils/config_file_impl.hh: remove anonymous namespace from header Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This change aligns the code with the following guidelines: - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header" - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file" Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-26 08:44:44 +01:00
Patryk Wrobel	c218333afb	cql3/type_json.cc: move stringstream content instead of copying it C++20 introduced a new overload of std::ofstringstream::str() that is selected when the mentioned member function is called on r-value. The new overload returns a string, that is move-constructed from the underlying string instead of being copy-constructed. This change applies std::move() on stringstream objects before calling str() member function to avoid copying of the underlying buffer. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16990	2024-01-26 09:41:09 +02:00
Kefu Chai	36e81f93d2	.git: do not apply codespell to licenses we should keep the licenses as they are, even with misspellings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16992	2024-01-26 09:39:27 +02:00
Patryk Wrobel	ba488b10ec	mutation/mutation.hh: remove anonymous namespace from header Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This change aligns the code with the following guidelines: - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header" - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file" Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-26 08:38:39 +01:00
Kefu Chai	01727a5399	test/nodetool: return a randomized address if not running with unshare we should allow user to run nodetool tests without `test.py`. but there are good chance that the host could be reused by multiple tests or multiple users who could be using port 12345. by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-26 13:32:47 +08:00
Kefu Chai	358d30fd29	test/nodetool: return an address from loopback_network fixture * rename "maybe_setup_loopback_network" to "server_address" * return an address from the fixture this change prepares for bringing back the randomized IP and port, in case users run this test without test.py, by randomizing the IP and port, they would have better chance to complete the test without running into used port problem. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-26 13:20:37 +08:00
Raphael S. Carvalho	3b14c5b84a	test/topology_experimental_raft: Add tablet split test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	90c9a5d7af	replica: Bypass reshape on boot with tablets temporarily Without it, table loading fails as reshape mixes sstables from different tablets together, and now we have a guard for that: Unable to load SSTable ...-big-Data.db that belongs to tablets 1 and 31, The fix is about making reshape compaction group aware. It will be fixed, but not now. Refs #16966. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	2cb8a824ec	replica: Fix table::compaction_group_for_sstable() for tablet streaming It might happen that sstable being streamed during migration is not split yet, therefore it should be added to the main compaction group, allowing the streaming stage to start split work on it, and not fool the coordinator thinking it can proceed with split execution which would cause problems. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	4245ad333a	test/topology_experimental_raft: Disable load balancer in test fencing This is easier to reproducer after changes in load balancer, to emit resize decisions, which in turn results in topology version being incremented, and that might race with fencing tests that manipulate the topology version manually. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	85020861fc	replica: Remap compaction groups when tablet split is finalized When coordinator executes split, i.e. commit the new tablet map with each tablet split into two, all replicas must then proceed with remapping of compaction groups that were previously split. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	bf6f692f60	service: Split tablet map when split request is finalized When load balancer emits finalize request, the coordinator will now react to it by splitting each tablet in the current tablet map and then committing the new map. There can be no active migration while we do it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	9342792173	replica: Update table split status if completed split compaction work The table replica will say to coordinator that its split status is ready by loading the sequence number from tablet metadata into its local state, which is pulled periodically by the coordinator via RPC. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	cfa8200da5	storage_service: Implement split monitor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:43 -03:00
Raphael S. Carvalho	e0de3dd844	topology_cordinator: Generate updates for resize decisions made by balancer Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:58:40 -03:00
Raphael S. Carvalho	3ef792c4e8	load_balancer: Introduce metrics for resize decisions Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	638e6e30cb	db: Make target tablet size a live-updateable config option Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	7ed5b44d52	load_balancer: Implement resize decisions This implements the ability in load balancer to emit split or merge requests, cancel ongoing ones if they're no longer needed, and also finalize those that are ready for the topology changes. That's all based on average tablet size, collected by coordinator from all nodes, and split and merge thresholds. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	8f7f74c490	service: Wire table_resize_plan into migration_plan Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	8d283b2593	service: Introduce table_resize_plan Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	ed2138a35a	tablet_mutation_builder: Add set_resize_decision() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	490d109055	topology_coordinator: Wire load stats into load balancer Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	ce353bf47c	storage_service: Allow tablet split and migration to happen concurrently Lack of synchronization could lead the coordinator to think that a pending replica in migration has split ready status, when in reality it escaped the check if it happens that the leaving replica escaped the split ready check, after the status has already been pulled at destination by coordinator. Example: 1) Coordinator pulls split status (ready) from destination replica 2) Migration sends a non-split tablet into destination 3) Coordinator pulls split status (ready) from source after transition stage of migration moved to cleanup (so there's no longer a leaving replica in it). 4) Migration completes, but compaction group is not split yet. Coordinator thinks destination is ready. To solve it, streaming now guarantees that pending replica is split before returning, so migration can only advance to next stage after the pending replica is split, if and only if there's a split request emitted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	2209c7440c	topology_coordinator: Periodically retrieve table_load_stats This implements the fiber that aggregates per-table stats that will be feeded into load balancer to make resize decisions (split, merge, or revoke ongoing ones). Initially, the stats will be refreshed every 60s, but the idea is that eventually we make the frequency table based, where the size of each table is taken into account. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	489a527e20	locator: Introduce topology::get_datacenter_nodes() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	9519a0c9e4	storage_service: Implement table_load_stats RPC This implements the RPC for collecting table stats. Since both leaving and pending replica can be accounted during tablet migration, the RPC handler will look at tablet transition info and account only either leaving or replica based on the tablet migration stage. Replicas that are not leaving or pending, of course, don't contribute to the anomaly in the reported size. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	4684615927	replica: Expose table_load_stats in table This is the table replica state that coordinator will aggregate from all nodes and feed into the load balancer. A tablet filter is added to not double account migrating tablets, so only one of pending or leaving tablet replica will be accounted based on current migration stage. More details can be known in the patch that will implement the filter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	beef9c9f70	replica: Introduce storage_group::live_disk_space_used() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	6c74fc4b82	locator: Introduce table_load_stats This is per table stats that will be aggregated from all nodes, by the coordinator, in order to help load balancer make resize decisions. size_in_bytes is the total aggregated table size, so coordinator becomes responsible for taking into account RF of each DC and also tablet count, for computing an accurate average size. split_ready_seq_number is the minimum sequence number among all replicas. If coordinator sees all replicas store the seq number of current split, then it knows all replicas are ready for the next stage in the split process. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:08 -03:00
Raphael S. Carvalho	0d5ba1ee4b	tablets: Add resize decision metadata to tablet metadata The new metadata describes the ongoing resize operation (can be either of merge, split or none) that spans tablets of a given table. That's managed by group0, so down nodes will be able to see the decision when they come back up and see the changes to the metadata. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:36:06 -03:00
Raphael S. Carvalho	57582ac9c4	locator: Introduce resize_decision resize_decision is the metadata the says whether tablets of a table needs split, merge, or none. That will be recorded in tablet metadata, and therefore stored in group0. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-01-25 18:31:12 -03:00
Avi Kivity	03313d359e	Merge ' db: commitlog_replayer: ignore mutations affected by (tablet) cleanups ' from Michał Chojnowski To avoid data resurrection, mutations deleted by cleanup operations should be skipped during commitlog replay. This series implements the above for tablet cleanups, by using a new system table which holds records of cleanup operations. Fixes #16752 Closes scylladb/scylladb#16888 * github.com:scylladb/scylladb: test: test_tablets: add a test for cleanup after migration test: pylib: add ScyllaCluster.wipe_sstables test: boost: add commitlog_cleanup_test db: commitlog_replayer: ignore mutations affected by (tablet) cleanups replica: table: garbage-collect irrelevant system.commitlog_cleanups records db: commitlog: add min_position() replica: table: populate system.commitlog_cleanups on tablet cleanup db: system_keyspace: add system.commitlog_cleanups replica: table: refresh compound sstable set after tablet cleanup	2024-01-25 20:51:03 +02:00
Patryk Wrobel	a858daf038	service/client_state.cc: remove redundant copying db::schema_tables::all_table_names() returns std::vector<sstring>. Usage of range-for loop without reference results in copying each of the elements of the traversed container. Such copying is redundant. This change introduces usage of const reference to avoid copying. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16983	2024-01-25 20:35:05 +02:00
Kamil Braun	543ad0987a	Merge 'raft topology: send barrier_and_drain to a decommissioning node' from Patryk Jędrzejczak We didn't send the `barrier_and_drain` command to a decommissioning node that could still be coordinating requests. It could happen that a decommissioning node sent a request with an old topology version after normal nodes received the new fence version. Then, the request would fail on replicas with the stale topology exception. This PR fixes this problem by modifying `exec_global_command`. From now on, it sends `barrier_and_drain` to a decommissioning node. We also stop filtering stale topology exceptions in `test_topology_ops`. We added this filter after detecting the bug fixed by this PR. Fixes scylladb/scylladb#15804 Fixes scylladb/scylladb#16579 Fixes scylladb/scylladb#16642 Closes scylladb/scylladb#16797 * github.com:scylladb/scylladb: test: test_topology_ops: remove failed mutations filter raft topology: send barrier_and_drain to a decommissioning node raft topology: ensure at most one transitioning node	2024-01-25 16:09:02 +01:00
Kefu Chai	ee28cf2285	test.py: s/defalt/default/ this typo was identified by codespell Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16980	2024-01-25 16:54:07 +02:00
Botond Dénes	6d5ee6d48a	Merge 'test/nodetool: run nodetool tests using "unshare"' from Kefu Chai before this change, we use a random address when launching rest_api_mock server, but there are chances that the randomly picked address conflicts with an already-used address on the host. and the subprocess fails right away with the returncode of 1 upon this failure, but we just continue on and check the readiness of the already-dead server. actually, we've seen test failures caused by the EADDRINUSE failure, and when we checked the readiness of the rest_api_mock by sending HTTP request and reading the response, what we had is not a JSON encoded response but a webpage, which was likely the one returned by a minio server. in this change, we * specify the "launcher" option of nodetool test suite to "unshare", so that all its tests are launched in separated namespaces. * do not use a random address for the mock server, as the network namespaces are separated. Fixes #16542 Closes scylladb/scylladb#16773 * github.com:scylladb/scylladb: test/nodetool: run nodetool tests using "unshare" test.py: add "launcher" option support	2024-01-25 16:53:49 +02:00
Mikołaj Grzebieluch	763911af5b	test.py: add test for maintenance mode The test checks that in maintenance mode server A is not available for other nodes and for clients. It is possible to connect by the maintenance socket to server A and perform local CQL operations.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	ca35e352f5	test.py: generalize usage of cluster_con Add option to pass load_balancing policy. Change hosts type to list of IPs or cassandra.Endpoint.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	77a656bfd6	test.py: when connecting to node in maintenance mode use maintenance socket A node in the maintenance socket hasn't an opened regular CQL port. To connect to the node, the scylla cluster needs to use the node's maintenance socket.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	9c07a189e8	docs: add maintenance mode documentation	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	0bdbd6e8f5	main: add maintenance mode In maintenance mode: * Group0 doesn't start and the node doesn't join the token ring to behave as a dead node to others, * Group0 operations are disabled and result in an error, * Only the maintenance socket listens for CQL requests, * The storage service initialises token_metadata with the local node as the only node on the token ring. Maintenance mode is enabled by passing the --maintenance-mode flag. Maintenance mode starts before the group0 is initialised.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	617adde9c9	main: move some REST routes initialization before joining group0 Move REST endpoints that don't need connection with other nodes, before joining the group0. This way, they can be initialized in the maintenance mode. Move `snapshot_ctl` along with routes because of snapshots API and tasks API. Its constructor is a noop, so it is safe to move it.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	d8de209dcf	message_service: add sanity check that rpc connections are not created in the maintenance mode In maintenance mode, a node shouldn't be able to communicate with other nodes. To make sure this does not happen, the sanity check is added.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	c08266cfe5	raft_group0_client: disable group0 operations in the maintenance mode In maintenance mode, the node doesn't communicate with other nodes, so it doesn't start or apply group0 operations. Users can still try to start it, e.g. change the schema, and the node can't allow it. Init _upgrade_state with recovery in the maintenance mode. Throw an error if the group0 operation is started in maintenance mode.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	97641f646a	service/storage_service: add start_maintenance_mode() method In the maintenance mode, other nodes won't be available thus we disabled joining the token ring and the token metadata won't be populated with the local node's endpoint. When a CQL query is executed it checks the `token_metadata` structure and fails if it is empty. Add a method that initialises `token_metadata` with the local node as the only node in the token ring.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	c530756837	storage_service: add MAINTENANCE option to mode enum join_cluster and start_maintenance_mode are incompatible. To make sure that only one is called when the node starts, add the MAINTENANCE option. start_maintenance_mode sets _operation_mode to MAINTENANCE. join_cluster sets _operation_mode to STARTING. set_mode will result in an internal error if: * it tries to set MAINTENANCE mode when the _operation_mode is other than NONE, i.e. start_maintenance_mode is called after join_cluster (or it is called during the drain, but it also shouldn't happen). * it tries to set STARTING mode when the mode is set to MAINTENANCE, i.e. join_cluster is called after start_maintenance_mode.	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	d4c22fc86c	service/maintenance_mode: add maintenance_mode_enabled bool class	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	8b2f0e38d9	service/maintenance_mode: move maintenance_socket_enabled definition to seperate file	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	e6a83b9819	db/config: add maintenance mode flag	2024-01-25 15:27:53 +01:00
Mikołaj Grzebieluch	81ef9fc91e	docs: add cqlsh usage to maintenance socket documentation After https://github.com/scylladb/scylla-cqlsh/pull/67, the user can use cqlsh to connect to the node by maintenance socket.	2024-01-25 15:27:53 +01:00
Botond Dénes	c67698ea06	compaction/compaction_manager: perform_cleanup(): hold the compaction gate While the cleanup is ongoing. Otherwise, a concurrent table drop might trigger a use-after-free, as we have seen in dtests recently. Fixes: #16770 Closes scylladb/scylladb#16874	2024-01-25 14:52:50 +01:00
Mikołaj Grzebieluch	2c34d9fcd8	docs: update maintenance socket documentation to use WhiteListRoundRobinPolicy After https://github.com/scylladb/python-driver/pull/287, the user can use WhiteListRoundRobinPolicy to connect to the node by maintenance socket.	2024-01-25 14:52:24 +01:00
Pavel Emelyanov	bf3cae4992	Merge 'tests: utils: error injection: print time duration instead of count' from Kefu Chai before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. the tests are updated to print the time duration instead of count to provide information with a higher resolution. Fixes #15902 Closes scylladb/scylladb#16264 * github.com:scylladb/scylladb: tests: utils: error injection: print time duration instead of count error_injection: do not cast to milliseconds when injecting timeout	2024-01-25 16:13:27 +03:00
Avi Kivity	69d597075a	Merge 'tablets: Add support for removenode and replace handling' from Tomasz Grabiec New tablet replicas are allocated and rebuilt synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet rebuilding transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes https://github.com/scylladb/scylladb/issues/16690. Closes scylladb/scylladb#16894 * github.com:scylladb/scylladb: tests: tablets: Add tests for removenode and replace tablets: Add support for removenode and replace handling topology_coordinator: tablets: Do not fail in a tight loop topology_coordinator: tablets: Avoid warnings about ignored failured future storage_service, topology: Track excluded state in locator::topology raft topology: Introduce param-less topology::get_excluded_nodes() raft topology: Move get_excluded_nodes() to topology tablets: load_balancer: Generalize load tracking tablets: Introduce get_migration_streaming_info() which works on migration request tablets: Move migration_to_transition_info() to tablets.hh tablets: Extract get_new_replicas() which works on migraiton request tablets: Move tablet_migration_info to tablets.hh tablets: Store transition kind per tablet	2024-01-25 14:49:43 +02:00
Patryk Jędrzejczak	b348014745	test: test_topology_ops: remove failed mutations filter We added this filter after detecting a bug in the Raft-based topology. We weren't sending `barrier_and_drain` commands to a decommissioning node that could still be coordinating requests. It could cause stale topology exceptions on replicas if the decommissioning node sent a request with an old topology version after normal nodes received the new fence version. This bug has been fixed in the previous commit, so we remove the filter.	2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak	9aebd6dd96	raft topology: send barrier_and_drain to a decommissioning node Before this patch, we didn't send the `barrier_and_drain` command to a decommissioning node that could still be coordinating requests. It could happen that a decommissioning node sent a request with an old topology version after normal nodes received the new fence version. Then, the request would fail on replicas with the stale topology exception. We fix this problem by modifying `exec_global_command`. From now on, it sends `barrier_and_drain` to a decommissioning node, which can also be in the `left_token_ring` state.	2024-01-25 13:42:48 +01:00
Patryk Jędrzejczak	378cbd0b70	raft topology: ensure at most one transitioning node We add a sanity check to ensure at most one transitioning node at a time. If there is more, something must have gone wrong. In the future, we might implement concurrent topology operations. Then, we will remove this sanity check. We also extend the comment describing `transition_nodes` so that it better explains why we use a map and how it should be handled.	2024-01-25 13:42:46 +01:00
Alexander Turetskiy	c1ae5425f7	DROP TYPE IF EXISTS should work on non-existent keyspace DROP TYPE IF EXISTS should pass and do nothing on non-existent keyspace fixes #9082 Closes scylladb/scylladb#16504	2024-01-25 14:28:43 +02:00
Kefu Chai	b1431f08f7	test/nodetool: run nodetool tests using "unshare" before this change, we use a random address when launching rest_api_mock server, but there are chances that the randomly picked address conflicts with an already-used address on the host. and the subprocess fails right away with the returncode of 1 upon this failure, but we just continue on and check the readiness of the already-dead server. actually, we've seen test failures caused by the EADDRINUSE failure, and when we checked the readiness of the rest_api_mock by sending HTTP request and reading the response, what we had is not a JSON encoded response but a webpage, which was likely the one returned by a minio server. in this change, we * specify the "launcher" option of nodetool test suite to "unshare", so that all its tests are launched in separated namespaces. * use a random fixed address for the mock server, as the network namespaces are not shared anymore * add an option in `nodetool/conftest.py`, so that it can optionally setup the lo network interface when it is launched in a separated new network namespace. Fixes #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 20:28:36 +08:00
Kefu Chai	35b3c51f40	test.py: add "launcher" option support before this change, all "tool" test suites use "pytest" to launch their tests. but some of the tests might need a dedicated namespace so they do not interfere with each other. fortunately, "unshare(1)" allows us to run a progame in new namespaces. in this change, we add a "launcher" option to "tool" test suites. so that these tests can run with the specified "launcher" instead of using "launcher". if "launcher" is not specified, its default value of "pytest" is used. Refs #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 20:28:01 +08:00
Kurashkin Nikita	d90eeb5c4f	cql3:statement_restrictions.cc: multi-column relation null check Before this patch we received internal server error "Attempted to create key component from empty optional" when used null in multi-column relations. This patch adds a null check for each element of each tuple in the expression and generates an invalid request error if it finds such an element. Modified cassandra test and added a new one that checks the occurrence of null values in tuples. Added a test that checks whether the wrong number of items is entered in tuples. Fixes #13217 Closes scylladb/scylladb#16415	2024-01-25 14:17:43 +02:00
Botond Dénes	5df4ad2e48	test/cql-pytest: test_tools.py: fix flaky schema load failure test The test TestScyllaSsstableSchemaLoading.test_fail_schema_autodetect was observed to be flaky. Sometimes failing on local setups, but not in CI. As it turns out, this is because, when run via test.py, the test's working directory is root directory of scylla.git. In this case, scylla-sstable will find and read conf/scylla.yaml. After having done so, it will try look in the default data directory (/var/lib/scylla/data) for the schema tables. If the local machine happens to have a scylla data-dir setup at the above mentioned location, it will read the schema tables and will succeed to find the tested table (which is system table, so it is always present). This will fail the test, as the test expects the opposite -- the table not being found. The solution is to change the test's working directory to the random temporary work dir, so that the local environment doesn't interfere with it. Fixes: #16828 Closes scylladb/scylladb#16837	2024-01-25 15:14:16 +03:00
Botond Dénes	b341aa8f6d	Merge 'api/api.hh: improve usage of standard containers' from Patryk Wróbel This PR contains improvements related to usage of std::vector and looping over containers in the range-for loop. It is advised to use `std::vector::reserve()` to avoid unneeded memory allocations when the total size is known beforehand. When looping over a container that stores non-trivial types usage of const reference is advised to avoid redundant copies. Closes scylladb/scylladb#16978 * github.com:scylladb/scylladb: api/api.hh: use const reference when looping over container api/api.hh: use std::vector::reserve() when the total size is known	2024-01-25 13:22:48 +02:00
Kamil Braun	994a2ea5c3	Merge 'Call left/joined notifiers when topology coordinator is enabled' from Gleb The gossiper topology change code calls left/joined notifiers when a node leave or joins the cluster. This code it not executed in topology coordinator mode, so the coordinator needs to call those notifiers by itself. The series add the calls. Fixes scylladb/scylladb#15841 * 'gleb/raft-topo-notifications-v1' of github.com:scylladb/scylla-dev: storage service: topology coordinator: call notify_joined() when a node joins a cluster storage service: topology coordinator: call notify_left() when a node leaves a cluster storage_service: drop redundant check from notify_joined()	2024-01-25 12:12:53 +01:00
Kefu Chai	1d33a68dd7	tests: utils: error injection: print time duration instead of count instead of casting / comparing the count of duration unit, let's just compare the durations, so that boost.test is able to print the duration in a more informative and user friendly way (line wrapped) test/boost/error_injection_test.cc(167): fatal error: in "test_inject_future_disabled": critical check wait_time > sleep_msec has failed [23839ns <= 10ms] Refs #15902 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 19:10:24 +08:00
Kefu Chai	8a5689e7a7	error_injection: do not cast to milliseconds when injecting timeout before this change, we always cast the wait duration to millisecond, even if it could be using a higher resolution. actually `std::chrono::steady_clock` is using `nanosecond` for its duration, so if we inject a deadline using `steady_clock`, we could be awaken earlier due to the narrowing of the duration type caused by the duration_cast. in this change, we just use the duration as it is. this should allow the caller to use the resolution provided by Seastar without losing the precision. Fixes #15902 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 19:10:24 +08:00
Gleb Natapov	adf70aae15	storage service: topology coordinator: call notify_joined() when a node joins a cluster When the topology coordinator is used for topology changes the gossiper based code that calls notify_joined() is not called. The coordinator needs to call it itself. But it needs to call it only once when node becomes normal. For that the patch changes state loading code to remember the old set of nodes in normal state to check if a node that is normal after new state is loaded was not in the normal state before.	2024-01-25 12:28:08 +02:00
Botond Dénes	c9f247f3e8	Merge 'sstables: writer: don't block topology changes while writing sstables' from Avi Kivity The sstable writer held the effective_replication_map_ptr while writing sstables, which is both a layering violation and slows down tablet load balancing. It was needed in order to ensure the sharder was stable. But it turns out that sharding metadata is unnecessary for tablets, so just skip the whole thing when writing an sstable for tablets. Closes scylladb/scylladb#16953 * github.com:scylladb/scylladb: sstables: writer: don't require effective_replication_map for sharding metadata schema: provide method to get sharder, iff it is static	2024-01-25 12:12:01 +02:00
Botond Dénes	8e82df6fb6	Merge 'coverage libraries: bug fixes' from Eliran Sinvani This mini-series contains two bug fixes that were found as part of testing coverage reporting in CI: ref: https://github.com/scylladb/scylladb/pull/16895 1. The html-fixup which is triggered when using:`test/pylib/coverage_utils.py lcov-tools genhtml...` rendered incorrect links for multiple links in the same line. 2. For files that contined `,` in their name the output was simply wrong and resulted in lcov not being able to find such files for the purpose of filtering or generating reports. The aforementioned draft PR served as a testing bed for finding and fixing those bugs. Closes scylladb/scylladb#16977 * github.com:scylladb/scylladb: lcov_utils.py: support sourcefiles that contains commas in their name coreage_utils.py: make regular expression lazy in html-fixup	2024-01-25 11:46:15 +02:00
Kefu Chai	0fbfc96619	db: add formatter for schema_tables::table_kind before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for db::schema_tables::table_kind, and its operator<<() is still used by the homebrew generic formatter for std::map<>, so it is preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16972	2024-01-25 11:33:13 +03:00
Kefu Chai	ffb5ad494f	api: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16973	2024-01-25 11:28:02 +03:00
Patryk Wrobel	cdfe0c1c35	api/api.hh: use const reference when looping over container When reference is not used in the range-for loop, then each element of a container is copied. Such copying is not a problem for scalar types. However, the in case of non-trivial types it may cause unneeded overhead. This change replaces copying with const references to avoid copying of types like seastar::sstring etc. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-25 09:20:35 +01:00
Patryk Wrobel	1ca71f2532	api/api.hh: use std::vector::reserve() when the total size is known When growing via push_back(), std::vector may need to reallocate its internal block of memory due to not enough space. It is advised to allocate the required space before appending elements if the size is known beforehand. This change introduces usage of std::vector::reserve() in api.hh to ensure that push_back() does not cause reallocations. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-25 08:50:19 +01:00
Eliran Sinvani	d27283918f	lcov_utils.py: support sourcefiles that contains commas in their name As part of the parsing, every line of an lcov file was modeled as INFO_TYPE:field[,field]... However specifically for info type "SF" which represents the source file there can only be one field. This caused files that are using ',' in their names to be cut down up to the first ',' and as a results not handled correctly by lcov_utils.py especially when rewriting a file. This patch adds a special handling for the "SF" INFO_TYPE. ref : `man geninfo` Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-25 09:30:52 +02:00
Eliran Sinvani	11eb9f5bb2	coreage_utils.py: make regular expression lazy in html-fixup The html-fixup procedure was created because of a bug in genhtml (`man genhtml` for details about what genhtml is). The bug is that genhtml doesn't account for file names that contains illegal url characters (ref: https://stackoverflow.com/a/1547940/2669716). html-fixup converts those characters to the %<octet> notation (i.e space character becomes %20 etc..). However, the regular expression used to detect links was eager, which didn't account for multiple links in the same line. This was discovered during browsing one of the report and noticing that the links that are meant to alternate between code view and function view of a source got scrambled and unusable after html-fixup. This change makes the regex that is used to detect links lazy so it can handle multiple links in the same line in an html file correctly. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-25 09:30:42 +02:00
Nadav Har'El	69a68e35dd	Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes Loading schemas of views and indexes was not supported, with either `--schema-file`, or when loading schema from schema sstables. This PR addresses both: * When loading schema from CQL (file), `CREATE MATERIALIZED VIEW` and `CREATE INDEX` statements are now also processed correctly. * When loading schema from schema tables, `system_schema.views` is also processed, when the table has no corresponding entry in `system_schema.tables`. Tests are also added. Fixes: #16492 Closes scylladb/scylladb#16517 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI test/cql-pytest: test_tools.py: extract some fixture logic to functions test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas test/boost/schema_loader_test: add test for mvs and indexes tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL replica/database: extract existing_index_names and get_available_index_name tools/schema_loader: make real_db.tables the only source of truth on existing tables tools/schema_loader: table(): store const keyspace& tools/schema_loader: make database,keyspace,table non-movable cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value cql3/statements/create_index_statement: make build_index_schema() public cql3/statements/create_index_statement: relax some method's dependence on qp cql3/statements/create_view_statement: make prepare_view() public	2024-01-24 23:36:54 +02:00
Nadav Har'El	df6c9828ef	Merge 'Add protobuf and Native histogram support' from Amnon Heiman Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. Native histograms hold the benefits of high resolution at a lower resource cost. This series allows sending histograms in a native histogram format over protobuf. By default, protobuf support is disabled. To use protobuf with native histograms, the command line flag prometheus_allow_protobuf should be set to true, and the Prometheus server should send the accept header with protobuf. Fixes #12931 Closes scylladb/scylladb#16737 * github.com:scylladb/scylladb: main.cc: Add prometheus_allow_protobuf command line histogram_metrics_helper: support native histogram config: Add prometheus_allow_protobuf flag	2024-01-24 21:24:50 +02:00
Michał Chojnowski	f0eadc734e	test: test_tablets: add a test for cleanup after migration Reproduces the problems fixed by earlier commits in the series.	2024-01-24 19:36:29 +01:00
Botond Dénes	7bb3ed7f23	docs/operating-scylla: scylla-sstable.rst: fix checksum list Add empty line before list of different checksums in validate-checksums's description. Otherwise the list is not rendered. Closes scylladb/scylladb#16401	2024-01-24 16:34:13 +01:00
Kefu Chai	a9851cf834	test.py: replace "$foo is False" with "not $foo" for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16960	2024-01-24 15:21:53 +02:00
Kefu Chai	add74ec8ee	mutation_writer: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16958	2024-01-24 15:20:02 +02:00
Kefu Chai	c978d1b3f8	config: s/re-use/reuse/ this misspelling is identified by codespell. per m-w, reuse is a word per-se, and we don't need the hyphen for addressing the ambiguity in the use cases, like, recover and re-cover. see also https://www.merriam-webster.com/dictionary/reuse Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16962	2024-01-24 15:19:03 +02:00
Kefu Chai	8c39aba820	tools/scylla-sstable: use canonical path for sst_path we deduce the paths to other SSTable components from the one specified from the command line, for instance, if /a/b/c/me-really-big-Data.db is fed to `scylla sstable`, the tool would try to read /a/b/c/me-really-big-TOC.txt for the list of other components. this works fine if the full path is specified in the command line. but if a relative path is specified, like, "me-really-big-Data.db", this does not work anymore. before this change, the tool would be reading "/me-really-big-TOC.txt", which does not exist under most circumstances. while $PWD/me-really-big-TOC.txt should exist if the SSTable is sane. after this change, we always convert the specified path to its canonical representation, no matter it is relative or absolutate. this enables us to get the correct parent path path when trying to read, for instance, the TOC component. Fixes #16955 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16964	2024-01-24 13:28:40 +02:00
Michał Chojnowski	b88a0eb9ab	test: pylib: add ScyllaCluster.wipe_sstables Add a method which wipes sstables files for a particular table on a particular stopped node.	2024-01-24 11:52:49 +01:00
Michał Chojnowski	94cdfcaa94	test: boost: add commitlog_cleanup_test Adds a test for the commitlog cleanup functionality added earlier in the series.	2024-01-24 10:37:39 +01:00
Michał Chojnowski	a246bb39ef	db: commitlog_replayer: ignore mutations affected by (tablet) cleanups To avoid data resurrection, mutations deleted by cleanup operations have to be skipped during commitlog replay. This patch implements this, based on the metadata recorded on cleanup operations into system.commitlog_cleanups.	2024-01-24 10:37:39 +01:00
Michał Chojnowski	f458a1bf3e	replica: table: garbage-collect irrelevant system.commitlog_cleanups records Currently, rows in system.commitlog_cleanups are only dropped on node restart, so the table can accumulate an unbounded number of records. This probably isn't a problem in practice, because tablet cleanups aren't that frequent, but this patch adds a countermeasure anyway. This patch makes the choice to delete the unneeded records right when new records are added. This isn't ideal -- it would be more natural if the unneeded records were deleted as soon as they become unneeded -- but it does the job with a minimal amount of code.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	05ff32ebf9	db: commitlog: add min_position() Add a helper function which returns the minimum replay position across all existing or future commitlog segments. Only positions greater or equal to it can be replayed on the next reboot. We will use this helper in a future patch to garbage collect some cleanup metadata which refers to replay positions.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	a10650959c	replica: table: populate system.commitlog_cleanups on tablet cleanup To avoid data resurrection after cleanup, we have to filter out the cleaned mutations during commitlog replay. In this patch, we get tablet cleanup to record the affected set of mutations to system.commitlog_cleanups. In a later patch, we will use these records for filtering during commitlog replay.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	7c5a8894be	db: system_keyspace: add system.commitlog_cleanups Add a system table which will hold records of cleanup operations, for the purpose of filtering commitlog replays to avoid data resurrections.	2024-01-24 10:37:38 +01:00
Michał Chojnowski	8bfd078c54	replica: table: refresh compound sstable set after tablet cleanup If the compound set isn't refreshed, readers will keep seeing the dataset as it was before the cleanup, which is a bug.	2024-01-24 10:37:38 +01:00
Kefu Chai	207fe93b90	utils: add formatter for rjson::value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for rjson::value, and drop its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16956	2024-01-24 10:30:52 +02:00
Gleb Natapov	b97ff54a41	storage service: topology coordinator: call notify_left() when a node leaves a cluster When the topology coordinator is used for topology changes the gossiper based code that calls notify_left() is not called. The coordinator needs to call it itself.	2024-01-24 10:21:01 +02:00
Gleb Natapov	5459a8b9a5	storage_service: drop redundant check from notify_joined() notify_joined() is called from handle_state_normal only, so there is no point checking that the state is normal inside the function as well.	2024-01-24 10:17:12 +02:00
Avi Kivity	8ee75ae8f4	sstables: writer: don't require effective_replication_map for sharding metadata Currently, we pass an effective_replication_map_ptr to sstable_writer, so that we can get a stable dht::sharder for writing the sharding metadata. This is needed because with tablets, the sharder can change dynamically. However, this is both bad and unnecessary: - bad: holding on to an effective_replication_map_ptr is a barrier for topology operations, preventing tablet migrations (etc) while an sstable is being written - unnecessary: tablets don't require sharding metadata at all, since two tablets cannot overlap (unlike two sstables from different shards in the same node). So the first/last key is sufficient to determine the shard/tablet ownership. Given that, just pass the sharder for vnode sstables, and don't generate sharding metadata for tablet sstables.	2024-01-23 22:23:08 +02:00
Avi Kivity	b88f422a53	schema: provide method to get sharder, iff it is static The current get_sharder() method only allows getting a static sharder (since a dynamic sharder needs additional protection). However, it chooses to abort if someone attempt to get a dynamic sharder. In one case, it's useful to get a sharder only if it's static, so provide a method to do that. This is for providing sstable sharding metadata, which isn't useful with tablets.	2024-01-23 22:20:59 +02:00
Kamil Braun	05643208a8	Merge 'raft topology: move the topology coordinator to a dedicated file' from Piotr Dulikowski The `topology_coordinator` is a large class (>1000 loc) which resides in an even larger source file (storage_service.cc, ~7800 loc). This PR moves the topology_coordinator class out of the storage_service.cc file in order to improve modularity and recompilation times during development. As a first step, the `topology_mutation_builder` and `topology_node_mutation_builder` classes are also moved from storage_service.cc to their own, new header/source files as they are an important abstraction used both by the topology coordinator code and some other code in storage_service.cc that won't be moved. Then, the `topology_coordinator` is moved out. The `topology_coordinator` class is completely hidden in the new topology_coordinator.cc file and can only be started and waited on to finish via the new `run_topology_coordinator` function. Fixes: scylladb/scylladb#16605 Closes scylladb/scylladb#16609 * github.com:scylladb/scylladb: service: move topology coordinator to a separate file storage_service: introduce run_topology_coordinator function service: move topology mutation builder out of storage_service storage_service: detemplate topology_node_mutation_builder::set	2024-01-23 20:02:06 +01:00
Kefu Chai	f86a5ae87a	streaming: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16947	2024-01-23 19:38:30 +02:00
Kefu Chai	d493f949ca	cql3: add formatter for cql3::statements::statement_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cql3::statements::statement_type. and its operator<<() is dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16948	2024-01-23 19:36:24 +02:00
Piotr Dulikowski	c3c3f5c1c8	service: move topology coordinator to a separate file The topology coordinator is a large class that sits in an even larger storage_service.cc file. For the sake of code modularization and reducing recompilation time, move the topology coordinator outside storage_service.cc. The topology_coordinator class is moved to the new topology_coordinator.cc unchanged. Along with it, the following items are moved: - wait_for_ip function - it's used both by storage_service and topology_coordinator, so in order for the new topology_coordinator.cc not to depend on storage service, it is moved to the new file, - raft_topology logger - for the same reason as wait_for_ip, - run_topology_coordinator - serves as the main interface for the topology coordinator. The topology coordinator class is not exposed at all, it's only possible to start the coordinator and wait until it shuts down itself via that function.	2024-01-23 17:51:10 +01:00
Avi Kivity	4a57b67634	docs: add a rough diagram of module interaction It is incomplete and maybe inaccurate, but it is a start. Closes scylladb/scylladb#16903	2024-01-23 18:08:48 +02:00
Kamil Braun	1824c12975	raft: remove `empty()` from `fsm_output` Nobody remembered to keep this function up to date when adding stuff to `fsm_output`. Turns out that it's not being used by any Raft logic but only in some tests. That use case can now be replaced with `fsm::has_output()` which is also being used by `raft::server` code.	2024-01-23 16:48:28 +01:00
Kamil Braun	bf6d5309ca	test: add test for manual triggering of Raft snapshots	2024-01-23 16:48:28 +01:00
Kamil Braun	617e09137d	api: add HTTP endpoint to trigger Raft snapshots This uses the `trigger_snapshot()` API added in previous commit on a server running for the given Raft group. It can be used for example in tests or in the context of disaster recovery (ref scylladb/scylladb#16683).	2024-01-23 16:48:28 +01:00
Kamil Braun	0eda7a2619	raft: server: add `trigger_snapshot` API This allows the user of `raft::server` to ask it to create a snapshot and truncate the Raft log. In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables).	2024-01-23 16:48:28 +01:00
David Garcia	77822fc51d	chore: add azure and gcp images extensions Closes scylladb/scylladb#16942	2024-01-23 16:06:40 +02:00
Botond Dénes	e79ea91990	Merge 'Extend query tracing information' from Michał Jadwiszczak This little patch adds: - authenticated user to "Processing a statement" tracing log - name of a semaphore to reader concurrency semaphore logs The purpose of this patch is to be able to verify parts of query execution to track down issues with service levels. ``` cassandra@cqlsh> select * from ks1.t where a = 1; a \| b ---+--- (0 rows) Tracing session: ea7e5ce0-b9f5-11ee-b123-b0816809f2c0 activity \| timestamp \| source \| source_elapsed \| client ----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2024-01-23 14:47:14.734000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 1/sl:sl1] \| 2024-01-23 14:47:14.734126 \| 127.0.0.1 \| 3 \| 127.0.0.1 Processing a statement for authenticated user: cassandra [shard 1/sl:sl1] \| 2024-01-23 14:47:14.734279 \| 127.0.0.1 \| 156 \| 127.0.0.1 Creating read executor for token -4069959284402364209 with all: {127.0.0.2} targets: {127.0.0.2} repair decision: NONE [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737348 \| 127.0.0.1 \| 3225 \| 127.0.0.1 Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737351 \| 127.0.0.1 \| 3228 \| 127.0.0.1 read_data: sending a message to /127.0.0.2 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737358 \| 127.0.0.1 \| 3236 \| 127.0.0.1 read_data: message received from /127.0.0.1 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.737593 \| 127.0.0.2 \| 16 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, 000400000001}} [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737676 \| 127.0.0.2 \| 24 \| 127.0.0.1 [reader concurrency semaphore sl:sl1] admitted immediately [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737684 \| 127.0.0.2 \| 31 \| 127.0.0.1 [reader concurrency semaphore sl:sl1] executing read [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737688 \| 127.0.0.2 \| 35 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, 000400000001}} and slice {(-inf, +inf)} [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737715 \| 127.0.0.2 \| 63 \| 127.0.0.1 Page stats: 0 partition(s), 0 static row(s) (0 live, 0 dead), 0 clustering row(s) (0 live, 0 dead) and 0 range tombstone(s) [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737724 \| 127.0.0.2 \| 72 \| 127.0.0.1 Querying is done [shard 0/sl:sl1] \| 2024-01-23 14:47:14.737731 \| 127.0.0.2 \| 79 \| 127.0.0.1 read_data handling is done, sending a response to /127.0.0.1 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.738321 \| 127.0.0.2 \| 743 \| 127.0.0.1 read_data: got response from /127.0.0.2 [shard 1/sl:sl1] \| 2024-01-23 14:47:14.739148 \| 127.0.0.1 \| 5026 \| 127.0.0.1 Done processing - preparing a result [shard 1/sl:sl1] \| 2024-01-23 14:47:14.739196 \| 127.0.0.1 \| 5074 \| 127.0.0.1 Request complete \| 2024-01-23 14:47:14.739087 \| 127.0.0.1 \| 5087 \| 127.0.0.1 ``` Closes scylladb/scylladb#16920 * github.com:scylladb/scylladb: reader_concurrency_semaphore: add name of semaphore in tracing messages cql3:query_processor: add logged user to query tracing info	2024-01-23 16:06:16 +02:00
Piotr Dulikowski	4ad6b6563b	storage_service: introduce run_topology_coordinator function Extracts a part of the logic of the raft_state_monitor_fiber method into a separate function. It will be moved to a separate file in the next commit along with the topology coordinator, and will serve as the only way of interaction with the topology coordinator while the class itself will remain hidden. The topology_coordinator class is now directly constructed on the stack (or rather in the coroutine frame), the indirection via shared_ptr is no longer needed.	2024-01-23 14:09:12 +01:00
Patryk Wrobel	f15880dc48	compaction_group::stop(): always call compaction_manager.remove() Before introduction of PR#15524 the removal had always been invoked via finally() continuation. In spite of making flush() noexcept, the mentioned PR modified the logic. If flush() returns exceptional future, then the removal is not performed. This change restores the old behavior - removal operation is always called. Since now, the logic of compaction_group::stop() is as follows: - firstly, it waits for completion of flush() via seastar::coroutine::as_future() to avoid premature exception - then it executes compaction_manager.remove() - in the end it inspects the future returned from flush() to re-throw the exception if the operation failed Fixed: scylladb#16751 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16940	2024-01-23 14:56:27 +02:00
Botond Dénes	78ec96f5f3	Merge 'alternator: allow empty tag value' from Nadav Har'El Alternator incorrectly refuses an empty tag value for TagResource, but DynamoDB does allow this case and it's useful (note that an empty tag key is rightly forbidden). So this short series fixes this case, and adds additional tests for TagResource which covers this case and other cases we forgot to cover in tests. Fixes #16904. Closes scylladb/scylladb#16910 * github.com:scylladb/scylladb: test/alternator: add more tests for TagResource alternator: allow empty tag value	2024-01-23 13:53:30 +02:00
Botond Dénes	26d814d8be	Merge 'Configure initial tablets count scaling' from Pavel Emelyanov There are currently two options how to "request" the number of initial tables for a table 1. specify it explicitly when creating a keyspace 2. let scylla calculate it on its own Both are not very nice. The former doesn't take cluster layout into consideration. The latter does, but starts with one tablet per shard, which can be too low if the amount of data grows rapidly. Here's a (maybe temporary) proposal to facilitate at least perf tests -- the --tablets-initial-scale-factor option that enhances the option number two above by multiplying the calculated number of tablets by the configured number. This is what we currently do to run perf tests by patching scylla, with the option it going to be more convenient. Closes scylladb/scylladb#16919 * github.com:scylladb/scylladb: config: Add --tablets-initial-scale-factor tablet_allocator: Add initial tablets scale to config tablet_allocator: Add config	2024-01-23 13:25:12 +02:00
Amnon Heiman	50b3078916	main.cc: Add prometheus_allow_protobuf command line This patch add the prometheus_allow_protobuf command line support. When set to true, Prometheus will accept protobuf requests and will reply with protobuf protocol. This will also enable the experimental Prometheus Native Histograms. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:34 +02:00
Amnon Heiman	95d1146fea	histogram_metrics_helper: support native histogram approx_exponential_histogram uses similar logic to Prometheus native histogram, to allow Prometheus sending its data in a native histogram format it needs to report schema and min id (id of the first bucket). This patch update to_metrics_histogram to set those optional parameters, leaving it to the Prometheus to decide in what format the histogram will be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:34 +02:00
Amnon Heiman	fc9bd2de03	config: Add prometheus_allow_protobuf flag Native histograms (also known as sparse histograms) are an experimental Prometheus feature. They use protobuf as the reporting layer. The prometheus_allow_protobuf flag allows the user to enable protobuf protocol. When this flag is set to true, and the Prometheus server sends in the request that it accepts protobuf, the result will be in protobuf protocol. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-01-23 13:12:07 +02:00
Piotr Dulikowski	79c3ed7fdb	service: move topology mutation builder out of storage_service The topology_mutation_builder, topology_node_mutation_builder and topology_request_tracking_mutation_builder are currently used by storage service - mainly, but not exclusively, by the topology coordinator logic. As we are going to extract the topology coordinator to a separate file, we need to move the builders to their own file as well so that they will be accessible both by the topology coordinator and the storage service.	2024-01-23 11:17:46 +01:00
Piotr Dulikowski	6f11651222	storage_service: detemplate topology_node_mutation_builder::set One of the overloads of `topology_node_mutation_builder::set` is a template which takes a std::set of things that convert to a sstring. This was done to support sets of strings of different types (e.g. sstring, string_view) but it turns out that only sstring is used at the moment. De-template the method as it is unnecessary for it to be a template. Moreover, the `topology_node_mutation_builder` is going to be moved in the next commit of the PR to a separate file, so not having template methods makes the task simpler.	2024-01-23 11:17:46 +01:00
Nadav Har'El	830e52008d	test/alternator: add more tests for TagResource Issue #16904 discovered that Alternator refuses to allow an empty tag value while it's useful (and DynamoDB allows it). This brought to my attention that our test coverage of the TagResource operation was lacking. So this patch adds more tests for some corner cases of TagResource which we missed, including the allowed lengths of tag keys and values. These tests reproduce #16904 (the case of empty tag value) and also #16908 (allowing and correctly counting unicode letters), and also add regression testing to cases which we already handled correctly. As usual, all the new tests also pass on DynamoDB. Refs #16904 Refs #16908 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 11:55:22 +02:00
Nadav Har'El	08b26269d8	alternator: allow empty tag value The existing code incorrectly forbid setting a tag on a table to an empty string value, but this is allowed by DynamoDB and is useful, so we fix it in this patch. While at it, improve the error-checking code for tag parameters to cleanly detect more cases (like missing or non-string keys or values). The following patch is a test that fails before this patch (because it fails to insert a tag with an empty value) and passes after it. Fixes #16904. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 11:26:08 +02:00
Michał Jadwiszczak	49544c47a1	reader_concurrency_semaphore: add name of semaphore in tracing messages	2024-01-23 10:25:34 +01:00
Michał Jadwiszczak	aac90c1f92	cql3:query_processor: add logged user to query tracing info	2024-01-23 10:25:34 +01:00
Nadav Har'El	4d6b286345	test/alternator: add "--vnodes" option to run script test/cql-pytest/run.py was recently modified to add the "tablets" experimental feature, so test/alternator/run now also runs Scylla by default with tablets enabled. This is the correct default going forward, but in the short term it would be nice to also have an option to easily do a manual test run without tablets. So this patch adds a "--vnodes" option to the test/alternator/run script. This option causes "run" to run Scylla without enabling the "tablets" experimental feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 10:53:23 +02:00
Nadav Har'El	c496d60716	alternator: use tablets by default, if available Before this patch, Alternator tables did not use tablets even if this feature was available - tablets had to be manually enabled per table by using a tag. But recently we changed CQL to enable tablets by default on all keyspaces (when the experimental "tablets" option is turned on), so this patch does the same for Alternator tables: 1. When the "tablets" experimental feature is on, new Alternator tables will use tablets instead of vnodes. They will use the default choice of initial_tablets. 2. The same tag that in the past could be used to enable tablets on a specific table, now can be used to disable tablets or change the default initial_tablets for a specific table at creation time. Fixes #16355 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 10:53:23 +02:00
Nadav Har'El	36f14f89df	test/alternator: run some tests without tablets If an Alternator table uses tablets (we'll turn this on in a following patch), some tests are known to fail because of features not yet supported with tablets, namely: Refs #16317 - Support Alternator Streams with tablets (CDC) Refs #16567 - Support Alternator TTL with tablets This patch changes all tests failing on tablets due to one of these two known issues to explicitly ask to disable tablets when creating their test table. This means that at least we continue to test these two features (Streams and TTL) even if they don't yet work with tablets. We'll need to remember to remove this override when tablet support for CDC and Alternator TTL arrives. I left a comment in the right places in the code with the relevant issue numbers, to remind us what to change when we fix those issues. This patch also adds xfail_tablets and skip_tablets fixtures that can be used to xfail or skip tests when running with tablets - but we don't use them yet - and may never use them, but since I already wrote this code it won't hurt having it, just in case. When running without tablets, or against an older Scylla or on DynamoDB, the tests with these marks are run normally. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-23 10:46:48 +02:00
Botond Dénes	08cf5ccd23	Merge 'Fix test_tablet_missing_data_repair' from Asias He This PR fixes test_tablet_missing_data_repair and enable the test again. If a node is not UP yet, repair in the test will be a partial repair. The partial repair will not repair all the data which cause the check of rows after repair to fail. Check nodes see each other as UP before repair. Closes scylladb/scylladb#16930 * github.com:scylladb/scylladb: test: Enable test_tablet_missing_data_repair again test: Wait for nodes to be up when repair test: Check repair status in ScyllaRESTAPIClient	2024-01-23 10:38:13 +02:00
Anna Stuchlik	9076a944c5	doc: improve the ScyllaDB for Developers page This commit improves the developer-oriented section of the core documentation: - Added links to the developer sections in the new Get Started guide (Develop with ScyllaDB and Tutorials and Example Projects) for ease of access. - Replaced the outdated Learn to Use ScyllaDB page with a link to the up-to-date page in the Get Started guide. This involves removing the learn.rst file and adding an appropriate redirection. - Removed the Apache Copyrights, as this page does not need it. - Removed the Features panel box as there was only one feature listed, which looked weird. Also, we are in the process of removing the Features section. Closes scylladb/scylladb#16800	2024-01-23 10:06:31 +02:00
Kefu Chai	ac473eca91	utils:: add formatter for enum_option before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for enum_option<>. since its operator<<() is still used by the homebrew generic formatter for formatting vector<>, operator<<() is preserved. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16917	2024-01-23 10:03:51 +02:00
Kefu Chai	91a93b125b	utils:: add formatter for cql3::authorized_prepared_statements_cache_key before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cql3::authorized_prepared_statements_cache_key, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16924	2024-01-23 09:13:14 +02:00
Kefu Chai	76b9e4f4f4	locator: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16914	2024-01-23 09:12:23 +02:00
Asias He	99e3d2ce72	test: Enable test_tablet_missing_data_repair again Fixes #16859	2024-01-23 15:02:02 +08:00
Kefu Chai	db77587309	tracing: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16925	2024-01-23 08:57:11 +02:00
Kefu Chai	26004071b3	configure.py: reenable -Wnarrowing it seems that the tree builds just fine with this warning enabled. and narrowing is a potentially unsafe numeric conversion. so let's enable this warning option. this change also helps to reduce the difference between the rules generated by configure.py and those generated by CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16929	2024-01-23 08:49:25 +02:00
Kefu Chai	5005e0a156	configure.py: s/--std=/-std/ neither clang nor gcc supports the --std flag, they support -std= though. see https://clang.llvm.org/cxx_status.html and https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html so, let's use the -std=gnu++20 for the C++20 standard with GNU extensions. this change also helps to reduce the difference between the rules generated by `configure.py` and those generated by CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16928	2024-01-23 08:48:05 +02:00
Asias He	7c230f17cc	test: Wait for nodes to be up when repair If a node is not UP yet, repair in the test will be a partial repair. Check nodes see each other as UP before repair. Fixes #16859	2024-01-23 11:10:08 +08:00
Asias He	57a4e5594d	test: Check repair status in ScyllaRESTAPIClient Raise an exception in case the repair is not successful.	2024-01-23 11:10:08 +08:00
Tomasz Grabiec	06c42681bd	tests: tablets: Add tests for removenode and replace	2024-01-23 01:19:42 +01:00
Tomasz Grabiec	e5dcf03b88	tablets: Add support for removenode and replace handling New tablet replicas are allocated synchronously with node operations. They are safely rebuilt from all existing replicas. The list of ignored nodes passed to node operations is respected. Tablet scheduler is responsible for scheduling tablet transition which changes the replicas set. The infrastructure for handling decommission in tablet scheduler is reused for this. Scheduling is done incrementally, respecting per-shard load limits. Rebuilding transitions are recognized by load calculation to affect all tablet replicas. New kind of tablet transition is introduced called "rebuild" which adds new tablet replica and rebuilds it from existing replicas. Other than that, the transition goes through the same stages as regular migration to ensure safe synchronization with request coordinators. In this PR we simply stream from all tablet replicas. Later we should switch to calling repair to avoid sending excessive amounts of data. Fixes #16690.	2024-01-23 01:19:42 +01:00
Tomasz Grabiec	bdd5bdae14	topology_coordinator: tablets: Do not fail in a tight loop If streaming or cleanup RPC fails, we would retry immediately. That fills the logs with erorrs. Throttle them by sleeping on error before the same action is retried.	2024-01-23 01:19:42 +01:00
Tomasz Grabiec	a3f6682ba2	topology_coordinator: tablets: Avoid warnings about ignored failured future	2024-01-23 01:18:10 +01:00
Tomasz Grabiec	5fccee3a13	storage_service, topology: Track excluded state in locator::topology Will be used by tablet load balancer to avoid excluded nodes in scheduling.	2024-01-23 01:12:58 +01:00
Tomasz Grabiec	d59db94f3c	raft topology: Introduce param-less topology::get_excluded_nodes() Picks up currently excluded nodes. Will be used during tablet rebuild on removenode.	2024-01-23 01:12:58 +01:00
Tomasz Grabiec	d053c5ef1e	raft topology: Move get_excluded_nodes() to topology Will be accessed outside topology coordinator from tablet rebuild handler.	2024-01-23 01:12:58 +01:00
Tomasz Grabiec	92f01674f2	tablets: load_balancer: Generalize load tracking This patch removes some duplication of logic and implicit assumptions by creating clear algebra for load impact calculation and its application to state of the load balancer. Will make adding new kinds of tablet transitions with different impact on load much easier.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	649ca0e46c	tablets: Introduce get_migration_streaming_info() which works on migration request Will be used by tablet load balancer to compute impact on load of planned migrations. Currently, the logic is hard coded in the load balancer and may get out of sync with the logic we have in get_migration_streaming_info() for already running tablet transitions. The logic will become more complex for rebuild transition, so use shared code to compute it.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	6dc56fd80b	tablets: Move migration_to_transition_info() to tablets.hh	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	1df256221c	tablets: Extract get_new_replicas() which works on migraiton request Now we have a single place which translates tablet migration request to new replicas. Will be reused in other places.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	ae382196f1	tablets: Move tablet_migration_info to tablets.hh Will add methods which operate on it to tablets.hh where they belong.	2024-01-23 01:12:57 +01:00
Tomasz Grabiec	4a06ffb43c	tablets: Store transition kind per tablet Will be used to distinguish regular migration from rebuild, repair and RF change.	2024-01-23 01:12:57 +01:00
Pavel Emelyanov	d1d4620af8	config: Add --tablets-initial-scale-factor Previous patch taught tablets allocator to multiply the initial tablets count by some value. This patch makes this factor configurable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:18:18 +03:00
Pavel Emelyanov	eb3b237e05	tablet_allocator: Add initial tablets scale to config When allocating tablets for table for the frist time their initial count is calculated so that each shard in a cluster gets one tablet. It may happen that more than one initial tablet per shard is better, e.g. perf tests typically rely on that. It's possible to specify the initial tablets count when creating a keyspace, this number doesn't take the cluster topology into consideration and may also be not very nice. As a temporary solution (e.g. for perf tests) we may add a configurable that scales the initial number of calculated tablets by some factor Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:14:45 +03:00
Pavel Emelyanov	f57b194db0	tablet_allocator: Add config Tablet allocator is a sharded service, that starts in main, it's worth equipping it with a config. Next patches will fill it with some payload Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-22 19:13:58 +03:00
Kamil Braun	3268be3860	raft: server: track last persisted snapshot descriptor index Also introduce a condition variable notified whenever this index is updated. Will be user in following commits.	2024-01-22 16:48:08 +01:00
Kamil Braun	1e786d9d64	raft: server: framework for handling server requests Add data structures and modify `io_fiber` code to prepare it for handling requests generated by the `server`, not just `fsm`. Used in later commits.	2024-01-22 16:47:34 +01:00
Kefu Chai	33794eca19	database: wait until commitlog are reclaimed in flush_all_tables() this change addresses the possible data resurrection after "nodetool compact" and "nodetool flush" commands. and prepare for the fix of a similar data resurrection issue after "nodetool cleanup". active commitlog segments are recycled in the background once they are discarded. and there is a chance that we could have data resurrection even after "nodetool cleanup", because the mutations in commitlog's active segments could change the tables which are supposed to be removed by "nodetool cleanup", so as a solution to address this problem in the pre-tablets era, we force new active segments of commitlog, and flush the involved memtables. since the active segments are discarded in the background, the completion of the "nodetool cleanup" does not guarantee that these mutation won't be applied to memtable when server restarts, if it is killed right away. the same applies to "force_flush", "force_compaction" and "force_keyspace_compaction" API calls which are used by nodetool as well. quote from Benny's comment > If major comapction doesn't wait for the commitlog deletion it is > also exposed to data resurrection since theoretically it could purge > tombstones based on the assumption that commitlog would not resurrect > data that they might shadow, BUT on a crash/restart scenario commitlog > replay would happen since the commitlog segments weren't deleted - > breaking the contract with compaction. so to ensure that the active segments are reclaimed upon completion of "nodetool cleanup", "nodetool compact" and "nodetool flush" commands, let's wait for pending deletes in `database::flush_all_tables()`, so the caller wait until the reclamation of deleted active segments completes. Refs #4734 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16915	2024-01-22 17:31:57 +02:00
David Garcia	f3eeba8cc6	docs: parse config.cc properties as rst text This enhancement formats descriptions in config.cc using the standard markup language reStructuredText (RST). By doing so, it improves the rendering of these descriptions in the documentation, allowing you to use various directives like admonitions, code blocks, ordered lists, and more. Closes scylladb/scylladb#16311	2024-01-22 16:40:18 +02:00
Botond Dénes	a48881801a	replica/tablets: drop keyspace_name from system.tablets partition-key The name of the keyspace being part of the partition key is not useful, the table_id already uniquely identifies the table. The keyspace name being part of the key, means that code wanting to interact with this table, often has to resolve the table id, just to be able to provide the keyspace name. This is counter productive, so make the keyspace_name just a static column instead, just like table_name already is. Fixes: #16377 Closes scylladb/scylladb#16881	2024-01-22 13:12:02 +01:00
Petr Gusev	6a4176c84f	Update seastar submodule * seastar 8b9ae36b...85359b28 (4): > rpc: extend the use_gate until request processing is finished Fixes scylladb/scylladb#16382 > scripts: Remove build.sh > build: do not install FindProtobuf.cmake > net: add missing include Closes scylladb/scylladb#16883	2024-01-22 11:29:50 +01:00
Kamil Braun	1007ac4956	Merge 'sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes' from Petr Gusev Before the patch we called `gossiper.remove_endpoint` for IP-s of the left nodes. The problem is that in replace-with-same-ip scenario we called `gossiper.remove_endpoint` for IP which is used by the new, replacing node. The `gossiper.remove_endpoint` method puts the IP into quarantine, which means gossiper will ignore all events about this IP for `quarantine_delay` (one minute by default). If we immediately replace just replaced node with the same IP again, the bootstrap will fail since the gossiper events are blocked for this IP, and we won't be able to resolve an IP for the new host_id. Another problem was that we called gossiper.remove_endpoint method, which doesn't remove an endpoint from `_endpoint_state_map`, only from live and unreachable lists. This means the IP will keep circulating in the gossiper message exchange between cluster nodes until full cluster restart. This patch fixes both of these problems. First, we rely on the fact that when topology coordinator moves the `being_replaced` node to the left state, the IP of the `replacing` node is known to all nodes. This means before removing an IP from the gossiper we can check if this IP is currently used by another node in the current raft topology. This is done by constructing the `used_ips` map based on normal and transition nodes. This map is cached to avoid quadratic behaviour. Second, we call `gossiper.force_remove_endpoint`, not `gossiper.remove_endpoint`. This function removes and IP from `_endpoint_state_map`, as well as from live and unreachable lists. Closes scylladb/scylladb#16820 * github.com:scylladb/scylladb: get_peer_info_for_update: update only required fields in raft topology mode get_peer_info_for_update: introduce set_field lambda storage_service::on_change: fix indent storage_service::on_change: skip handle_state functions in raft topology mode test_replace_different_ip: check old IP is removed from gossiper test_replace: check two replace with same IP one after another storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes	2024-01-22 11:25:55 +01:00
Botond Dénes	742bc1bd11	test/topology_experimental_raft: test_tablet.py: disable flaky test Skip test_tablet_missing_data_repair, it is failing a lot breaking promotion and CI. Can't revert because the PR introducing it was already piled on. So disable while investigated. Refs: #16859 Closes scylladb/scylladb#16879	2024-01-22 11:49:05 +02:00
Avi Kivity	9e8b65f587	chunked_vector: remove range constructor Standard containers don't have constructors that take ranges; instead people use boost::copy_range or C++23 std::ranges::to. Make the API more uniform by removing this special constructor. The only caller, in a test, is adjusted. Closes scylladb/scylladb#16905	2024-01-22 10:26:15 +02:00
Lakshmi Narayanan Sreethar	a1867986e7	test.py: deduce correct path for unit tests when built with cmake Fix the path deduction for unit test executables when the source code is built with cmake. Fixes #16906 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16907	2024-01-22 10:03:44 +02:00
Nadav Har'El	0bef50ef0c	cql-pytest: add "--vnodes" option to "run" script Running test/cql-pytest/run now defaults to enabling the "tablets" experimental feature when running Scylla - and tests detect this and use this feature as appropriate. This is the correct default going forward, but in the short term it would be nice to also have an option to easily do a manual test run without tablets. So this patch adds a "--vnodes" option to the test/cql-pytest/run script. This option causes "run" to run Scylla without enabling the "tablets" experimental feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16896	2024-01-22 09:35:11 +02:00
Anna Stuchlik	a462b914cb	doc: add 2024.1 to the OSS vs. Enterprise matrix This commit adds the information that ScyllaDB Enterprise 2024.1 is based on ScyllaDB Open Source 5.4 to the OSS vs. Enterprise matrix. Closes scylladb/scylladb#16880	2024-01-22 09:25:08 +02:00
Kefu Chai	9550f29d22	cql3: add formatter for cql3::prepared_cache_key_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for cql3::prepared_cache_key_type and cql3::prepared_cache_key_type::cache_key_type, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16901	2024-01-21 19:12:59 +02:00
Avi Kivity	3092e3a5dc	Merge 'doc: improvements to the Create Cluster page' from Anna Stuchlik This PR: - Removes the redundant information about previous versions from the Create Cluster page. - Fixes language mistakes on that page, and replaces "Scylla" with "ScyllaDB". (nobackport) Closes scylladb/scylladb#16885 * github.com:scylladb/scylladb: doc: fix the language on the Create Cluster page doc: remove reduntant info about old versions	2024-01-21 18:18:32 +02:00
Avi Kivity	5810396ba1	Merge 'Invalidate prepared statements for views when their schema changes.' from Eliran Sinvani When a base table changes and altered, so does the views that might refer to the added column (which includes "SELECT " views and also views that might need to use this column for rows lifetime (virtual columns). However the query processor implementation for views change notification was an empty function. Since views are tables, the query processor needs to at least treat them as such (and maybe in the future, do also some MV specific stuff). This commit adds a call to `on_update_column_family` from within `on_update_view`. The side effect true to this date is that prepared statements for views which changed due to a base table change will be invalidated. Fixes https://github.com/scylladb/scylladb/issues/16392 This series also adds a test which fails without this fix and passes when the fix is applied. Closes scylladb/scylladb#16897 github.com:scylladb/scylladb: Add test for mv prepared statements invalidation on base alter query processor: treat view changes at least as table changes	2024-01-21 17:43:49 +02:00
Kefu Chai	d1dd71fbd7	mutation: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16889	2024-01-21 16:58:26 +02:00
Kefu Chai	1ce58595aa	dht: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16891	2024-01-21 16:56:16 +02:00
Kefu Chai	45c4f2039b	cql3: add formatter for cql3::ut_name before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for cql3::ut_name, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16890	2024-01-21 16:53:05 +02:00
Kefu Chai	f916286b25	index: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16892	2024-01-21 16:52:25 +02:00
Kefu Chai	ce076b5ae3	gossiping_property_file_snitch: drop unused using namespace we don't use any symbol in this namespace, in this function, so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16893	2024-01-21 16:48:37 +02:00
Eliran Sinvani	0e5a8cad62	Add test for mv prepared statements invalidation on base alter Issue #16392 describes a bug where when a base table is altered, it's materialized views prepared statements are not invalidated which in turn causes them to return missing data. This test reproduces this bug and serves as a regression test for this problem. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-21 15:44:06 +02:00
Eliran Sinvani	5e33d9346b	query processor: treat view changes at least as table changes When a base table changes and altered, so does the views that might refer to the added column (which includes "SELECT *" views and also views that might need to use this column for rows lifetime (virtual columns). However the query processor implementation for views change notification was an empty function. Since views are tables, the query processor needs to at least treat them as such (and maybe in the future, do also some MV specific stuff). This commit adds a call to `on_update_column_family` from within `on_update_view`. The side effect true to this date is that prepared statements for views which changed due to a base table change will be invalidated. Fixes #16392 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-21 15:40:54 +02:00
Anna Stuchlik	652cf1fa70	doc: remove the 5.1-to-2022.2 upgrade guide This commit removes the 5.1-to-2022.2 upgrade guide - the upgrade guide for versions we no longer support. We should remove it while adding the 5.4-to-2024.1 upgrade guide (the previous commit).	2024-01-19 18:33:08 +01:00
Anna Stuchlik	3c17fca363	doc: add the 5.4-to-2024.1 upgrade guide This commit adds the upgrade guide from ScyllaDB Open Source 5.4 to ScyllaDB Enterprise 2024.1. The need to include the "Restore system tables" step in rollback has been confirmed; see https://github.com/scylladb/scylladb/issues/11907#issuecomment-1842657959 Fixes https://github.com/scylladb/scylladb/issues/16445	2024-01-19 18:23:37 +01:00
Petr Gusev	5de970e430	get_peer_info_for_update: update only required fields in raft topology mode Some fields of system.peers table are updated through raft, we don't need to peek them from gossiper. The goal of the patch is to declare explicitly which code is responsible for which fields. In particular, in raft topology mode we don't need to update raft-managed fields since it's done in topology_state_load and raft_ip_address_updater.	2024-01-19 20:37:12 +04:00
Petr Gusev	f51f843b67	get_peer_info_for_update: introduce set_field lambda This is a refactoring commit. In the next commit we'll add a parameter to this unified lambda and this is easy to do if we have only one lambda and not three.	2024-01-19 20:37:12 +04:00
Petr Gusev	37063e2432	storage_service::on_change: fix indent	2024-01-19 20:37:12 +04:00
Petr Gusev	8e6b569de5	storage_service::on_change: skip handle_state functions in raft topology mode We don't need them in raft topology mode since the token_metadata update happens in topology_state_load function. We lift the _raft_topology_change_enabled checks from those functions to on_change.	2024-01-19 20:37:12 +04:00
Petr Gusev	1e00889842	test_replace_different_ip: check old IP is removed from gossiper In this commit we modify the existing test_replace_different_ip. We add the check that the old IP is not contained in alive or down lists, which means it's completely wiped from gossiper. This test is failing without the force_remove_endpoint fix from a previous commit. We also check that the state of local system.peers table is correct.	2024-01-19 20:36:52 +04:00
Anna Stuchlik	d345a893d6	doc: fix the language on the Create Cluster page This commit fixes language mistakes on the Create Cluster page, and replaces "Scylla" with "ScyllaDB".	2024-01-19 17:21:12 +01:00
Anna Stuchlik	af669dd7ae	doc: remove reduntant info about old versions This commit removes the information about old versions, which is reduntant in the next upcoming version.	2024-01-19 17:06:34 +01:00
Anna Stuchlik	b1ba904c49	doc: remove upgrade for unsupported versions This commit removes the upgrade guides from ScyllaDB Open Source to Enterprise for versions we no longer support. In addition, it removes a link to one of the removed pages from the Troubleshooting section (the link is redundant). Closes scylladb/scylladb#16249	2024-01-19 15:59:35 +02:00
Mikołaj Grzebieluch	c589793a9e	test.py: test_maintenance_socket: remove pytest.xfail Issue https://github.com/scylladb/python-driver/issues/278 was fixed in https://github.com/scylladb/python-driver/pull/279. Closes scylladb/scylladb#16873	2024-01-19 14:54:15 +01:00
Botond Dénes	b50d9bb802	Merge 'Add code coverage support' from Eliran Sinvani This mini-set includes code coverage support for ScyllaDB, it provides: 1. Support for building ScyllaDB with coverage support. 2. Utilities for processing coverage profiling data 3. test.py support for generation and processing of coverage profiling into an lcov trace files which can later be used to produce HTML or textual coverage reports. Refs #16323 Closes scylladb/scylladb#16784 * github.com:scylladb/scylladb: Add code coverage documentation test.py: support code coverage code coverage: Add libraries for coverage handling test.py: support --coverage and --coverage-mode configure.py support coverage profiles on standrad build modes	2024-01-19 15:27:44 +02:00
Pavel Emelyanov	e62114214f	Merge 'More logging for Raft-based topology' from Kamil Braun Currently if topology coordinator gets stuck in a CI test run it's hard to debug this (e.g. scylladb/scylladb#16708). We can add a lot of logging inside topology coordinator code to aid debugging, without spamming the logs -- these are relatively rare control plane events. Closes scylladb/scylladb#16749 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: enable raft_topology=debug level by default raft topology: increase level of some TRACE messages raft topology: log when entering transition states raft topology: don't include null ID in exclude_nodes raft topology: INFO log when executing global commands and updating topology state storage_service: separate logger for raft topology	2024-01-19 16:19:44 +03:00
Nadav Har'El	debf6753c7	Merge 'test/cql-pytest: run tests with tablets' from Botond Dénes Add `--experimental-features=tablets` to both `test/cql-pytest/suite.yaml` and `test/cql-pytest/run.py`, so tablets are enabled. Detect tablet support in `contest.py` and add an xfail and skip marker to mark tests that fail/crash with tablets. These are expected to be fixed soon. Some tests checking things around alter-keyspace, had to force-disable tablets on the created keyspace, because tablets interfere with the test (a keyspace with tablets cannot have simple strategy for example). Tablets were also interfering with `test_keyspace.py:test_storage_options_local`, because it is expecting `system_schema.scylla_keyspaces` to not have any entries for local storage keyspace, but they have it if tablets are enabled. Adjust the test to account for this. Closes scylladb/scylladb#16840 * github.com:scylladb/scylladb: test/cql-pytest: run.py,suite.yaml: enable tablets by default test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed test/cql-pytest: disable tablets for some keyspace-altering tests test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets test/cql-pytest: fix test_tablets.py to set initial_tablets correctly test/cql-pytest: add tablet detection logic and fixtures test/cql-pytest: extract is_scylla check into util.py	2024-01-19 13:38:56 +02:00
Kamil Braun	cc039498c6	Update tools/cqlsh submodule * tools/cqlsh 426fa0ea...b8d86b76 (8): > Make cqlsh work with unix domain sockets Fixes scylladb/scylladb#16489 > Bump python-driver version > dist/debian: add trailer line > dist/debian: wrap long line > Draft: explicit build-time packge dependencies > stop retruning status_code=2 on schema disagreement > Fix minor typos in the code > Dockerfile: apt-get update and apt-get upgrade to get latest OS packages	2024-01-19 11:23:22 +01:00
Botond Dénes	04881b3915	test/cql-pytest: run.py,suite.yaml: enable tablets by default All the preparations are done, the tests can now run with tablets.	2024-01-19 03:46:38 -05:00
Botond Dénes	075be5a04a	test/cql-pytest: sprinkle xfail_tablets and skip_with_tablets as needed For tests that cover functionality, which doesn't yet work with tablets. These tests and the respective functionality they test, are expected to be fixed soon, and then these fixtures will be removed.	2024-01-19 03:46:38 -05:00
Botond Dénes	6e6bee4368	test/cql-pytest: disable tablets for some keyspace-altering tests When tablets are enabled on a keyspace, they cannot be altered to simple replication strategy anymore. These keyspaces are testing exactly that, so disable tablets on the initial keyspace create statements.	2024-01-19 03:46:38 -05:00
Botond Dénes	5f11aa940d	test/cql-pytest: test_keyspace.py: test_storage_options_local(): fix for tablets This test expects a keyspace with local storage option, to not have a row in system_schema.scylla_keyspace. With tablets enabled by default, this won't be the case. Adjust the test to check for the specific storage-related columns instead.	2024-01-19 03:46:38 -05:00
Nadav Har'El	f92d2b4928	test/cql-pytest: fix test_tablets.py to set initial_tablets correctly Recently, in commit `49026dc319`, the way to choose the number of tablets in a new keyspace changed. This broke the test we had for a memory leak when many tablets were used, which saw the old syntax wasn't recognized and assumed Scylla is running without tablet support - so the test was skipped. Let's fix the syntax. After this patch the test passes if the tablets experimental feature is enabled, and only skipped if it isn't. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-19 03:46:38 -05:00
Botond Dénes	2119faf7fe	test/cql-pytest: add tablet detection logic and fixtures Add keyspace_has_tablets() utility function, which, given a keyspace, returns whether it is using tablets or not. In addition, 3 new fixtures are added: * has_tablets - does scylla has tablets by default? * xfail_tablets - the test is marked xfail, when tablets are enabled by default. * skip_with_tablets - the test is skipped when tablets are enabled by default, because it might crash with tablets. We expect the latter two to be removed soon(ish), as we make all test, and the functionality they test work with tablets.	2024-01-19 03:46:38 -05:00
Botond Dénes	6e53264bc3	test/cql-pytest: extract is_scylla check into util.py This logic is currently in the scylla_only fixture, but we want to re-use this in other utility functions in the next patches too.	2024-01-19 03:46:38 -05:00
Petr Gusev	070de5c551	test_replace: check two replace with same IP one after another This is a test case for the problem, described in the previous commit. Before that fix the second replace failed since it couldn't resolve an IP for the new host_id.	2024-01-19 12:24:04 +04:00
Petr Gusev	30b2e5838c	storage_service: sync_raft_topology_nodes: force_remove_endpoint for left nodes only if an IP is not used by other nodes Before the patch we called gossiper.remove_endpoint for IP-s of the left nodes. The problem is that in replace-with-same-ip scenario we called gossiper.remove_endpoint for IP which is used by the new, replacing node. The gossiper.remove_endpoint method puts the IP into quarantine, which means gossiper will ignore all events about this IP for quarantine_delay (one minute by default). If we immediately replace just replaced node with the same IP again, the bootstrap will fail since the gossiper events are blocked for this IP, and we won't be able to resolve an IP for the new host_id. Another problem was that we called gossiper.remove_endpoint method, which doesn't remove an endpoint from _endpoint_state_map, only from live and unreachable lists. This means the IP will keep circulating in the gossiper message exchange between cluster nodes until full cluster restart. This patch fixes both of these problems. First, we rely on the fact that when topology coordinator moves the being_replaced node to the left state, the IP of the replacing node is known to all nodes. This means before removing an IP from the gossiper we can check if this IP is currently used by another node in the current raft topology. This is done by constructing the used_ips map based on normal and transition nodes. This map is cached to avoid quadratic behaviour. Second, we call gossiper.force_remove_endpoint, not gossiper.remove_endpoint. This function removes and IP from _endpoint_state_map, as well as from live and unreachable lists. The tests for both of these improvements will be added in subsequent commits.	2024-01-19 12:24:04 +04:00
Kefu Chai	0dbb0ed09f	api: storage_service: correct a typo s/trough/through/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16870	2024-01-19 10:21:41 +02:00
Kefu Chai	5c0484cb02	db: add formatter for db::operation_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for db::operation_type, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16832	2024-01-19 10:16:41 +02:00
Kefu Chai	2d2cd5fa3a	repair: do not compare unsigned with signed this change should silence the warning like ``` /home/kefu/dev/scylladb/repair/repair.cc:222:23: error: comparison of integers of different signs: 'int' and 'size_type' (aka 'unsigned long') [-Werror,-Wsign-compare] 222 \| for (int i = 0; i < all.size(); i++) { \| ~ ^ ~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16867	2024-01-19 08:52:02 +02:00
Kefu Chai	21d55abe8b	unimplemented: add format_as() for unimplemented::cause before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we replace operator<< with format_as() for unimplemented::cause, so that we don't rely on the deprecated behavior, and neither do we create a fully blown fmt::formatter. as in fmt v10, format_as() can be used in place of fmt::formatter, while in fmt v9, format_as() is only allowed to return a integer. so, to be future-proof, and to be simpler, format_as() is used. we can even replace `format_as(c)` with `c`, once fmt v10 is available in future. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16866	2024-01-19 08:38:30 +02:00
Botond Dénes	70252ee36f	Merge 'auth: do not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Closes scylladb/scylladb#16868 * github.com:scylladb/scylladb: auth: do not include unused headers locator: Handle replication factor of 0 for initial_tablets calculations table: add_sstable_and_update_cache: trigger compaction only in compaction group compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction	2024-01-19 08:30:11 +02:00
Kefu Chai	263e2fabae	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-19 10:49:17 +08:00
Avi Kivity	d65ce16cf6	Merge 'Prevent empty compaction tasks in cleanup, upgrade sstables, and add_sstable' from Benny Halevy This short series prevents the creation of compaction tasks when we know in advance that they have nothing to do. This is possible in the clean path by: - improve the detection of candidates for cleanup by skipping sstables that require cleanup but are already being compacted - checking that list of sstables selected for cleanup isn't empty before creating the cleanup task For upgrade sstables, and generally when rewriting all sstable: launch the task only if the list off candidate sstables isn't empty. For regular compaction, when triggered via `table::add_sstable_and_update_cache`, we currently trigger compaction (by calling `submit`) on all compaction groups while the sstable is added only to one of them. Also, it is typically called for maintenance sstables that are awaiting offstrategy compaction, in which case we can skip calling `submit` entirely since the caller triggers offstrategy compaction at a later stage. Refs scylladb/scylladb#15673 Refs scylladb/scylladb#16694 Fixes scylladb/scylladb#16803 Closes scylladb/scylladb#16808 * github.com:scylladb/scylladb: table: add_sstable_and_update_cache: trigger compaction only in compaction group compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction	2024-01-18 19:47:33 +02:00
Pavel Emelyanov	8595d64d01	locator: Handle replication factor of 0 for initial_tablets calculations When calculating per-DC tablets the formula is shards_in_dc / rf_in_dc, but the denominator in it can be configured to be literally zero and the division doesn't work. Fix by assuming zero tablets for dcs with zero rf fixes: #16844 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16861	2024-01-18 19:42:08 +02:00
Kamil Braun	8d9b0a6538	raft: server: inline `poll_fsm_output`	2024-01-18 18:09:13 +01:00
Kamil Braun	754a7b54e4	raft: server: fix indentation	2024-01-18 18:09:11 +01:00
Kamil Braun	527780987b	raft: server: move `io_fiber`'s processing of `batch` to a separate function	2024-01-18 18:09:02 +01:00
Kamil Braun	3e6b4910a6	raft: move `poll_output()` from `fsm` to `server` `server` was the only user of this function and it can now be implemented using `fsm`'s public interface. In later commits we'll extend the logic of `io_fiber` to also subscribe to other events, triggered by `server` API calls, not only to outputs from `fsm`.	2024-01-18 18:07:52 +01:00
Kamil Braun	95b6a60428	raft: move `_sm_events` from `fsm` to `server` In later commits we will use it to wake up `io_fiber` directly from `raft::server` based on events generated by `raft::server` itself -- not only from events generated by `raft::fsm`. `raft::fsm` still obtains a reference to the condition variable so it can keep signaling it.	2024-01-18 18:07:44 +01:00
Kamil Braun	a83e04279e	raft: fsm: remove constructor used only in tests This constructor does not provide persisted commit index. It was only used in tests, so move it there, to the helper `fsm_debug` which inherits from `fsm`. Test cases which used `fsm` directly instead of `fsm_debug` were modified to use `fsm_debug` so they can access the constructor. `fsm_debug` doesn't change the behavior of `fsm`, only adds some helper members. This will be useful in following commits too.	2024-01-18 18:07:17 +01:00
Kamil Braun	689d59fccd	raft: fsm: move trace message from `poll_output` to `has_output` In a later commit we'll move `poll_output` out of `fsm` and it won't have access to internals logged by this message (`_log.stable_idx()`). Besides, having it in `has_output` gives a more detailed trace. In particular we can now see values such as `stable_idx` and `last_idx` from the moment of returning a new fsm output, not only when poll started waiting for it (a lot of time can pass between these two events).	2024-01-18 18:06:55 +01:00
Kamil Braun	f6d43779af	raft: fsm: extract `has_output()` Also use the more efficient coroutine-specific `condition_variable::when` instead of `wait`.	2024-01-18 18:06:27 +01:00
Kamil Braun	dccfd09d83	raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` This parameter says how many entries at most should be left trailing before the snapshot index. There are multiple places where this decision is made: - in `applier_fiber` when the server locally decides to take a snapshot due to log size pressure; this applies to the in-memory log - in `fsm::step` when the server received an `install_snapshot` message from the leader; this also applies to the in-memory log - and in `io_fiber` when calling `store_snapshot_descriptor`; this applies to the on-disk log. The logic of how many entries should be left trailing is calculated twice: - first, in `applier_fiber` or in `fsm::step` when truncating the in-memory log - and then again as the snapshot descriptor is being persisted. The logic is to take `_config.snapshot_trailing` for locally generated snapshots (coming from `applier_fiber`) and `0` for remote snapshots (from `fsm::step`). But there is already an error injection that changes the behavior of `applier_fiber` to leave `0` trailing entries. However, this doesn't affect the following `store_snapshot_descriptor` call which still uses `_config.snapshot_trailing`. So if the server got restarted, the entries which were truncated in-memory would get "revived" from disk. Fortunately, this is test-only code. However in future commits we'd like to change the logic of `applier_fiber` even further. So instead of having a separate calculation of trailing entries inside `io_fiber`, it's better for it to use the number that was already calculated once. This number is passed to `fsm::apply_snapshot` (by `applier_fiber` or `fsm::step`) and can then be received by `io_fiber` from `fsm_output` to use it inside `store_snapshot_descriptor`.	2024-01-18 18:05:45 +01:00
Kamil Braun	40cd91cff7	raft: server: pass `_aborted` to `set_exception` call This looks like a minor oversight, in `server_impl::abort` there are multiple calls to `set_exception` on the different promises, only one of them would not receive `_aborted`.	2024-01-18 18:05:18 +01:00
Kefu Chai	09a688d325	sstables: do not use lambda when not necessary before this change, we always reference the return value of `make_reader()`, and the return value's type `flat_mutation_reader_v2` is movable, so we can just pass it by moving away from it. in this change, instead of using a lambda, let's just have the return value of it. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16835	2024-01-18 15:54:49 +02:00
Kefu Chai	a1dcddd300	utils: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16833	2024-01-18 12:50:06 +02:00
Asias He	d3efb3ab6f	storage_service: Set session id for raft_rebuild Raft rebuild is broken because the session id is not set. The following was seen when run rebuild stream_session - [Stream #8cfca940-afc9-11ee-b6f1-30b8f78c1451] stream_transfer_task: Fail to send to 127.0.70.1:0: seastar::rpc::remote_verb_error (Session not found: 00000000-0000-0000-0000-000000000000) with raft topology, e.g., scylla --enable-repair-based-node-ops 0 --consistent-cluster-management true --experimental-features consistent-topology-changes Fix by setting the session id. Fixes #16741 Closes scylladb/scylladb#16814	2024-01-18 12:47:20 +02:00
Kamil Braun	e4918c0d31	test/pylib: scylla_cluster: enable raft_topology=debug level by default To help debugging test.py failures in CI.	2024-01-18 11:24:16 +01:00
Kamil Braun	52e67ca121	raft topology: increase level of some TRACE messages Increased them to DEBUG level, and in one case to WARN (inside an exception handler). The selected messages are still relatively rare (per-node per-transition control plane events, plus events such as fibers sleeping and waking up) although more low level. They are also small messages. Messages that are large such as those which print all tokens of nodes or large mutations are left on TRACE level. The plan is to enable DEBUG level logging in test.py tests for raft_topology, while not spamming the logs completely such as by printing large mutations.	2024-01-18 11:24:16 +01:00
Kamil Braun	92e6604127	raft topology: log when entering transition states Those are rare control plane events, but might be useful when debugging problems with topology coordinator (e.g. where it got stuck).	2024-01-18 11:24:15 +01:00
Kamil Braun	aeb53ea31d	raft topology: don't include null ID in exclude_nodes Observed with newly added logs: ``` raft topology - executing global topology command barrier_and_drain, excluded nodes: {00000000-0000-0000-0000-000000000000} ```	2024-01-18 11:24:15 +01:00
Kamil Braun	ae25f703c4	raft topology: INFO log when executing global commands and updating topology state Those are rare control plane events, but useful for debugging e.g. if topology coordinator gets stuck at some point.	2024-01-18 11:24:15 +01:00
Kamil Braun	71957b4320	storage_service: separate logger for raft topology Allows selectively enabling higher logging levels for just raft-topology related things, without doing it for the entire storage_service (which includes things like gossiper callbacks). Also gets rid of the redundant "raft topology:" prefix which was also not included everywhere.	2024-01-18 11:24:14 +01:00
Eliran Sinvani	32d8dadf1a	Add code coverage documentation Add `docs/dev/code-coverage.md` with explanations about how to work with the different tools added for coverage reporting and cli options added to `configure.py` and `test.py` Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	c7dff1b81b	test.py: support code coverage test.py already support the routing of coverage data into a predetermined folder under the `tmpdir` logs folder. This patch extends on that and leverage the code coverage processing libraries to produce test coverage lcov files and a coverage summary at the end of the run. The reason for not generating the full report (which can be achieved with a one liner through the `coverage_utils.py` cli) is that it is assumed that unit testing is not necessarily the "last stop" in the testing process and it might need to be joined with other coverage information that is created at other testing stages (for example dtest). The result of this patch is that when running test.py with one of the coverage options (`--coverage` / `--mode-coverage`) it will perform another step of processing and aggregating the profiling information created. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	00a55abdd6	code coverage: Add libraries for coverage handling Coverage handling is divided into 3 steps: 1. Generation of profiling data from a run of an instrumented file (which this patch doesn't cover) 2. Processing of profiling data, which involves indexing the profile and producing the data in some format that can be manipulated and unified. 3. Generate some reporting based on this data. The following patch is aiming to deal with the last two steps by providing a cli and a library for this end. This patch adds two libraries: 1. `coverage_utils.py` which is a library for manipulating coverage data, it also contains a cli for the (assumed) most common operations that are needed in order to eventually generate coverage reporting. 2. `lcov_utils.py` - which is a library to deal with lcov format data, which is a textual form containing a source dependant coverage data. An example of such manipulation can be `coverage diff` operation which produces a set like difference operation. cov_a - cov_b = diff where diff is an lcov formated file containing coverage data for code cov_a that is not covered at all in cov_b. The libraries and cli main goal is to provide a unified way to handle coverage data in a way that can be easily scriptable and extensible. This will pave the way for automating the coverage reporting and processing in test.py and in jenkins piplines (for example to also process dtest or sct coverage reporting) Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	f4b6c9074a	test.py: support --coverage and --coverage-mode We aim to support code coverage reporting as part of our development process, to this end, we will need the ability to "route" the dumped profiles from scylla and unit test to a predetermined location. We can consider profile data as logged data that should persist after tests have been run. For this we add two supported options to test.py: --coverage - which means that all suits on all modes will participate in coverage. --coverage-mode - which can be used to "turn on" coverage support only for some of the modes in this run. The strategy chosen is to save the profile data in `tmpdir`/mode/coverage/%m.profraw (ref: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program) This means that for every suite the profiling data of each object is going to be merged into the same file (llvm claims to lock the file so concurrency is fine). More resolution than the suite level seems to not give us anything useful (at least not at the moment). Moreover, it can also be achieved by running a single test. Data in the suite level will help us to detect suits that don't generate coverage data at all and to fix this or to skip generating the profiles for them. Also added support of 'coverage' parameter in the `suite.yaml` file, which can be used to disable coverage for a specific suite, this parameter defaults to True but if a suite is known to not generate profiles or the suite profile data is not needed or obfuscate the result it can be set to false in order to cancel profiles routing and processing for this suite. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	759d70deee	configure.py support coverage profiles on standrad build modes We already have a dedicated coverage build, however, this build is dedicated mostly for coverage in boost and standalone unit tests. This added configuration option will compile every configured build mode with coverage profiling support (excluding 'coverage' mode). It also does targeted profiling that is narrowed down only to ScyllaDB code and doesn't instrument seastar and testing code, this should give a more accurate coverage reporting and also impact performance less, as one example, the reactor loop in seastar will not be profiled (along with everything else). The targeted profiling is done with the help of the newly added `coverage_sources.list` file which excludes all seastar sub directories from the profiling. Also an extra measure is taken to make sure that the seastar library will not be linked with the coverage framework (so it will not dump confusing empty profiles). Some of the seastar headers are still going to be included in the profile since they are indirectly included by profiled source files in order to remove them from the final report a processing step on the resulting profile will need to take place. A note about expected performance impact: It is expected to have minimal impact on performance since the instrumentation adds counter increments without locking. Ref: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fprofile-update This means that the numbers themselves are less reliable but all covered lines are guarantied to have at least non-zero value. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Kefu Chai	f5d1836a45	types: fix indent `f344e130` failed to get the indent right, so fix it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16834	2024-01-18 09:14:39 +02:00
Botond Dénes	8087bc72f0	Merge 'Basic tablet repair support' from Asias He This patch adds basic tablet repair support. Below is an example showing how tablet repairs works. The `nodetool repair -pr` cmd was performed on all the nodes, which makes sure no duplication repair work will be performed and each tablet will be repaired exactly once. Three nodes in the cluster. RF = 2. 16 initial tablets. Tablets: ``` cqlsh> SELECT * FROM system.tablets; keyspace_name \| table_id \| last_token \| table_name \| tablet_count \| new_replicas \| replicas \| session \| stage ---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+------- ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -8070450532247928833 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -6917529027641081857 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -5764607523034234881 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -4611686018427387905 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -3458764513820540929 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -2305843009213693953 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1152921504606846977 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 1152921504606846975 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 2305843009213693951 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 3458764513820540927 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 4611686018427387903 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 5764607523034234879 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 6917529027641081855 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 8070450532247928831 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] \| null \| null ``` node1: ``` $nodetool repair -p 7199 -pr ks1 standard1 [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true [shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true [shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true [shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true [shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s ``` node2: ``` $nodetool repair -p 7200 -pr ks1 standard1 [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true [shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s ``` node3: ``` $nodetool repair -p 7300 -pr ks1 standard1 [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true [shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true [shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true [shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s ``` Fixes #16599 Closes scylladb/scylladb#16600 * github.com:scylladb/scylladb: test: Add test_tablet_missing_data_repair test: Add test_tablet_repair test: Allow timeout in server_stop_gracefully test: Increase STOP_TIMEOUT_SECONDS repair: Wire tablet repair with the user repair request repair: Pass raft_address_map to repair service repair: Add host2ip_t type repair: Add finished user-requested log for vnode table too repair: Log error in the rpc_stream_handler repair: Make row_level repair work with tablet repair: Add get_dst_shard_id repair: Add shard to repair_node_state repair: Add shard map to repair_neighbors	2024-01-18 09:13:00 +02:00
Asias He	1399dc0ff2	test: Add test_tablet_missing_data_repair The test verifies repair brings the missing rows to the owner. - Shutdown part of the nodes in the cluster - Insert data - Start all nodees - Run repair - Shutdown part of the nodes - Check all data is present	2024-01-18 08:49:06 +08:00
Asias He	bfe5894a9f	test: Add test_tablet_repair A basic repair test that verifies tablet repair works.	2024-01-18 08:49:06 +08:00
Asias He	39912d7bed	test: Allow timeout in server_stop_gracefully The default is 60s. Sometimes it takes more than 60s to stop a node for some reason.	2024-01-18 08:49:06 +08:00
Asias He	276b04a572	test: Increase STOP_TIMEOUT_SECONDS It is observed that the stop of scylla took more than 60s to finish in some cases. Increase the hard coded stop timeout.	2024-01-18 08:49:06 +08:00
Asias He	54239514af	repair: Wire tablet repair with the user repair request Currently, only the table and primary replica selection options are supported. Reject repair request if the repair options are not supported yet. With this patch, users can repair tablet tables by running nodetool repair -pr myks mytable on each node in the cluster, so that each tablet will be repaired only once without duplication work. Below is an example showing how tablet repairs works. The `nodetool repair -pr` cmd was performed on all the nodes. Three nodes in the cluster. RF = 2. 16 initial tablets. Tablets: cqlsh> SELECT * FROM system.tablets; keyspace_name \| table_id \| last_token \| table_name \| tablet_count \| new_replicas \| replicas \| session \| stage ---------------+--------------------------------------+----------------------+------------+--------------+--------------+----------------------------------------------------------------------------------------+---------+------- ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -8070450532247928833 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -6917529027641081857 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 5)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -5764607523034234881 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -4611686018427387905 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 5), (2dd3808d-6601-4483-b081-adf41ef094e5, 4)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -3458764513820540929 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 1), (951cb5bc-5749-481a-9645-4dd0f624f24a, 0)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -2305843009213693953 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 7), (2dd3808d-6601-4483-b081-adf41ef094e5, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1152921504606846977 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| -1 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 2), (2dd3808d-6601-4483-b081-adf41ef094e5, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 1152921504606846975 \| standard1 \| 16 \| null \| [(951cb5bc-5749-481a-9645-4dd0f624f24a, 6), (19caaeb3-d754-4704-a998-840df53eb54c, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 2305843009213693951 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 7)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 3458764513820540927 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 1), (19caaeb3-d754-4704-a998-840df53eb54c, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 4611686018427387903 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 7), (951cb5bc-5749-481a-9645-4dd0f624f24a, 1)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 5764607523034234879 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 6), (2dd3808d-6601-4483-b081-adf41ef094e5, 2)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 6917529027641081855 \| standard1 \| 16 \| null \| [(19caaeb3-d754-4704-a998-840df53eb54c, 5), (951cb5bc-5749-481a-9645-4dd0f624f24a, 3)] \| null \| null ks1 \| 3ffadad0-a552-11ee-bc15-66412bbb6978 \| 8070450532247928831 \| standard1 \| 16 \| null \| [(2dd3808d-6601-4483-b081-adf41ef094e5, 0), (19caaeb3-d754-4704-a998-840df53eb54c, 7)] \| null \| null node1: $nodetool repair -p 7199 -pr ks1 standard1 [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: starting user-requested repair for keyspace ks1, repair id 6, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=2 range=(-6917529027641081857,-5764607523034234881] replicas={19caaeb3-d754-4704-a998-840df53eb54c:2, 2dd3808d-6601-4483-b081-adf41ef094e5:3} primary_replica_only=true [shard 2:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07399633 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7174440}, {127.0.0.2, 7174440}}, row_from_disk_nr={{127.0.0.1, 15330}, {127.0.0.2, 15330}}, row_from_disk_bytes_per_sec={{127.0.0.1, 92.4651}, {127.0.0.2, 92.4651}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 207172}, {127.0.0.2, 207172}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=4 range=(-4611686018427387905,-3458764513820540929] replicas={19caaeb3-d754-4704-a998-840df53eb54c:1, 951cb5bc-5749-481a-9645-4dd0f624f24a:0} primary_replica_only=true [shard 1:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07302664 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7195032}, {127.0.0.3, 7195032}}, row_from_disk_nr={{127.0.0.1, 15374}, {127.0.0.3, 15374}}, row_from_disk_bytes_per_sec={{127.0.0.1, 93.9618}, {127.0.0.3, 93.9618}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 210526}, {127.0.0.3, 210526}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=6 range=(-2305843009213693953,-1152921504606846977] replicas={19caaeb3-d754-4704-a998-840df53eb54c:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06781354 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7095816}, {127.0.0.3, 7095816}}, row_from_disk_nr={{127.0.0.1, 15162}, {127.0.0.3, 15162}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.7898}, {127.0.0.3, 99.7898}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 223584}, {127.0.0.3, 223584}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=12 range=(4611686018427387903,5764607523034234879] replicas={19caaeb3-d754-4704-a998-840df53eb54c:6, 2dd3808d-6601-4483-b081-adf41ef094e5:2} primary_replica_only=true [shard 6:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06793772 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7150572}, {127.0.0.2, 7150572}}, row_from_disk_nr={{127.0.0.1, 15279}, {127.0.0.2, 15279}}, row_from_disk_bytes_per_sec={{127.0.0.1, 100.376}, {127.0.0.2, 100.376}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 224897}, {127.0.0.2, 224897}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=13 range=(5764607523034234879,6917529027641081855] replicas={19caaeb3-d754-4704-a998-840df53eb54c:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:3} primary_replica_only=true [shard 5:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.068579935 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7129512}, {127.0.0.3, 7129512}}, row_from_disk_nr={{127.0.0.1, 15234}, {127.0.0.3, 15234}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1432}, {127.0.0.3, 99.1432}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222135}, {127.0.0.3, 222135}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[f7ac8fb6-8e49-4b31-8c7d-0d493064977c]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=6 duration=0.352379s node2: $nodetool repair -p 7200 -pr ks1 standard1 [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 1 out of 6 tablets: table=ks1.standard1 tablet_id=1 range=(-8070450532247928833,-6917529027641081857] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:5} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07016466 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7212816}, {127.0.0.2, 7212816}}, row_from_disk_nr={{127.0.0.1, 15412}, {127.0.0.2, 15412}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.0362}, {127.0.0.2, 98.0362}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 219655}, {127.0.0.2, 219655}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 2 out of 6 tablets: table=ks1.standard1 tablet_id=9 range=(1152921504606846975,2305843009213693951] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:5, 951cb5bc-5749-481a-9645-4dd0f624f24a:7} primary_replica_only=true [shard 5:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07180758 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7236216}, {127.0.0.3, 7236216}}, row_from_disk_nr={{127.0.0.2, 15462}, {127.0.0.3, 15462}}, row_from_disk_bytes_per_sec={{127.0.0.2, 96.104}, {127.0.0.3, 96.104}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 215325}, {127.0.0.3, 215325}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 3 out of 6 tablets: table=ks1.standard1 tablet_id=10 range=(2305843009213693951,3458764513820540927] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:1, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 1:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06772773 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7039188}, {127.0.0.2, 7039188}}, row_from_disk_nr={{127.0.0.1, 15041}, {127.0.0.2, 15041}}, row_from_disk_bytes_per_sec={{127.0.0.1, 99.1188}, {127.0.0.2, 99.1188}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 222080}, {127.0.0.2, 222080}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 4 out of 6 tablets: table=ks1.standard1 tablet_id=11 range=(3458764513820540927,4611686018427387903] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:7, 951cb5bc-5749-481a-9645-4dd0f624f24a:1} primary_replica_only=true [shard 7:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07025768 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7229664}, {127.0.0.3, 7229664}}, row_from_disk_nr={{127.0.0.2, 15448}, {127.0.0.3, 15448}}, row_from_disk_bytes_per_sec={{127.0.0.2, 98.1351}, {127.0.0.3, 98.1351}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 219876}, {127.0.0.3, 219876}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 5 out of 6 tablets: table=ks1.standard1 tablet_id=14 range=(6917529027641081855,8070450532247928831] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:0, 19caaeb3-d754-4704-a998-840df53eb54c:7} primary_replica_only=true [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0719635 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7225452}, {127.0.0.2, 7225452}}, row_from_disk_nr={{127.0.0.1, 15439}, {127.0.0.2, 15439}}, row_from_disk_bytes_per_sec={{127.0.0.1, 95.7531}, {127.0.0.2, 95.7531}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 214539}, {127.0.0.2, 214539}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e] Repair 6 out of 6 tablets: table=ks1.standard1 tablet_id=15 range=(8070450532247928831,9223372036854775807] replicas={2dd3808d-6601-4483-b081-adf41ef094e5:4, 19caaeb3-d754-4704-a998-840df53eb54c:3} primary_replica_only=true [shard 4:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0691715 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7122960}, {127.0.0.2, 7122960}}, row_from_disk_nr={{127.0.0.1, 15220}, {127.0.0.2, 15220}}, row_from_disk_bytes_per_sec={{127.0.0.1, 98.2049}, {127.0.0.2, 98.2049}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 220033}, {127.0.0.2, 220033}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[5c805f0c-4ff2-4c5c-88df-bb318d559e0e]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.42178s node3: $nodetool repair -p 7300 -pr ks1 standard1 [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: starting user-requested repair for keyspace ks1, repair id 1, options {{trace -> false}, {primaryRange -> true}, {columnFamilies -> standard1}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 1 out of 5 tablets: table=ks1.standard1 tablet_id=0 range=(minimum token,-8070450532247928833] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 2dd3808d-6601-4483-b081-adf41ef094e5:5} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.07126866 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7133256}, {127.0.0.3, 7133256}}, row_from_disk_nr={{127.0.0.2, 15242}, {127.0.0.3, 15242}}, row_from_disk_bytes_per_sec={{127.0.0.2, 95.4529}, {127.0.0.3, 95.4529}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 213867}, {127.0.0.3, 213867}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 2 out of 5 tablets: table=ks1.standard1 tablet_id=3 range=(-5764607523034234881,-4611686018427387905] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:5, 2dd3808d-6601-4483-b081-adf41ef094e5:4} primary_replica_only=true [shard 5:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.0701025 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7138404}, {127.0.0.3, 7138404}}, row_from_disk_nr={{127.0.0.2, 15253}, {127.0.0.3, 15253}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1108}, {127.0.0.3, 97.1108}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217581}, {127.0.0.3, 217581}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 3 out of 5 tablets: table=ks1.standard1 tablet_id=5 range=(-3458764513820540929,-2305843009213693953] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:7, 2dd3808d-6601-4483-b081-adf41ef094e5:1} primary_replica_only=true [shard 7:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06859512 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7171632}, {127.0.0.3, 7171632}}, row_from_disk_nr={{127.0.0.2, 15324}, {127.0.0.3, 15324}}, row_from_disk_bytes_per_sec={{127.0.0.2, 99.7068}, {127.0.0.3, 99.7068}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 223398}, {127.0.0.3, 223398}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 4 out of 5 tablets: table=ks1.standard1 tablet_id=7 range=(-1152921504606846977,-1] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:2, 2dd3808d-6601-4483-b081-adf41ef094e5:7} primary_replica_only=true [shard 2:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.06975318 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.2, 7105176}, {127.0.0.3, 7105176}}, row_from_disk_nr={{127.0.0.2, 15182}, {127.0.0.3, 15182}}, row_from_disk_bytes_per_sec={{127.0.0.2, 97.1429}, {127.0.0.3, 97.1429}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.2, 217653}, {127.0.0.3, 217653}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6] Repair 5 out of 5 tablets: table=ks1.standard1 tablet_id=8 range=(-1,1152921504606846975] replicas={951cb5bc-5749-481a-9645-4dd0f624f24a:6, 19caaeb3-d754-4704-a998-840df53eb54c:2} primary_replica_only=true [shard 6:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: stats: repair_reason=repair, keyspace=ks1, tables={standard1}, ranges_nr=1, round_nr=2, round_nr_fast_path_already_synced=2, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=6, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.070810474 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.0.1, 7023276}, {127.0.0.3, 7023276}}, row_from_disk_nr={{127.0.0.1, 15007}, {127.0.0.3, 15007}}, row_from_disk_bytes_per_sec={{127.0.0.1, 94.5894}, {127.0.0.3, 94.5894}} MiB/s, row_from_disk_rows_per_sec={{127.0.0.1, 211932}, {127.0.0.3, 211932}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} [shard 0:strm] repair - repair[350b97f3-f06e-470f-9164-43997a4f82a6]: Finished user-requested repair for tablet keyspace=ks1 tables={standard1} repair_id=1 duration=0.351395s Fixes #16599	2024-01-18 08:49:06 +08:00
Asias He	93028f4848	repair: Pass raft_address_map to repair service It is needed to translate hostid to ip address.	2024-01-18 08:49:06 +08:00
Asias He	194e870996	repair: Add host2ip_t type It is used to translate hostid to ip address in repair code.	2024-01-18 08:49:06 +08:00
Asias He	637b8e4f51	repair: Add finished user-requested log for vnode table too	2024-01-18 08:49:06 +08:00
Asias He	b24f6fbc92	repair: Log error in the rpc_stream_handler It is useful for debug when the handler goes wrong. In addition to send the error back to the peer. Log the error as well.	2024-01-18 08:49:06 +08:00
Asias He	fd774862be	repair: Make row_level repair work with tablet Since a given tablet belongs to a single shard on both repair master and repair followers, row level repair code needs to be changed to work on a single shard for a given tablet. In order to tell the repair followers which shard to work on, a dst_cpu_id value is passed over rpc from the repair master.	2024-01-18 08:49:06 +08:00
Asias He	e1f68ea64a	repair: Add get_dst_shard_id A helper to get the dst shard id on the repair follower. If the repair master specifies the shard id for the follower, use it. Otherwise, the follower chooses one itself.	2024-01-18 08:49:06 +08:00
Asias He	2e8c6ebfca	repair: Add shard to repair_node_state It is used to specify the shard id that repair instance runs on.	2024-01-18 08:49:06 +08:00
Asias He	16349be37e	repair: Add shard map to repair_neighbors It is used to specify the shard id that repair instance should run repair on.	2024-01-18 08:49:06 +08:00
Avi Kivity	394ef13901	build: regenerate frozen toolchain for tablets-aware Python driver Pull in scylla-driver 3.26.5, which supports tablets. Closes scylladb/scylladb#16829	2024-01-17 22:47:36 +02:00
Kefu Chai	0ae81446ef	./: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16766	2024-01-17 16:30:14 +02:00
Kamil Braun	787b24cd24	Merge 'raft topology: join: shut down a node on error in response handler' from Patryk Jędrzejczak If the joining node fails while handling the response from the topology coordinator, it hangs even though it knows the join operation has failed. Therefore, we ensure it shuts down in this patch. Additionally, we ensure that if the first join request response was a rejection or the node failed while handling it, the following acceptances by the (possibly different) coordinator don't succeed. The node considers the join operation as failed. We shouldn't add it to the cluster. Fixes scylladb/scylladb#16333 Closes scylladb/scylladb#16650 * github.com:scylladb/scylladb: topology_coordinator: clarify warnings raft topology: join: allow only the first response to be a succesful acceptance storage_service: join_node_response_handler: fix indentation raft topology: join: shut down a node on error in response handler	2024-01-17 14:55:26 +01:00
Botond Dénes	f22fc88a64	Merge 'Configure service levels interval' from Michał Jadwiszczak Service level controller updates itself in interval. However the interval time is hardcoded in main to 10 seconds and it leads to long sleeps in some of the tests. This patch moves this value to `service_levels_interval_ms` command line option and sets this value to 0.5s in cql-pytest. Closes scylladb/scylladb#16394 * github.com:scylladb/scylladb: test:cql-pytest: change service levels intervals in tests configure service levels interval	2024-01-17 12:24:49 +02:00
Benny Halevy	0d937f3974	table: add_sstable_and_update_cache: trigger compaction only in compaction group There is no need to trigger compaction in all compaction groups when an sstable is added to only one of them. And with that level of control, if the caller passes sstables::offstrategy::yes, we know it will trigger offstrategy compaction later on so there is no need to trigger compaction at all for this sstable at this time. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 12:13:17 +02:00
Benny Halevy	51a46aa83b	compaction_manager: perform_task_on_all_files: return early when there are no sstables to compact Prevent the creation of a compaction task when the list of sstables is known to be empty ahead of time. Refs scylladb/scylladb#16694 Fixes scylladb/scylladb#16803 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 11:53:39 +02:00
Benny Halevy	bd1d65ec38	compaction_manager: perform_cleanup: use compaction_manager::eligible_for_compaction `3b424e391b` introduced a loop in `perform_cleanup` that waits until all sstables that require cleanup are cleaned up. However, with `f1bbf705f9`, an sstable that is_eligible_for_compaction (i.e. it is not in staging, awaiting view update generation), may already be compacted by e.g. regular compaction. And so perform_cleanup should interrupt that by calling try_perform_cleanup, since the latter reevaluates `update_sstable_cleanup_state` with compaction disabled - that stops ongoing compactions. Refs scylladb/scylladb#15673 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-01-17 11:53:39 +02:00
David Garcia	f555a2cb05	docs: dynamic include based on flag docs: extend include options Closes scylladb/scylladb#16753	2024-01-17 09:33:40 +02:00
Calle Wilund	af0772d605	commitlog: Add wait_for_pending_deletes Refs #16757 Allows waiting for all previous and pending segment deletes to finish. Useful if a caller of `discard_completed_segments` (i.e. a memtable flush target) not only wants to ensure segments are clean and released, but thoroughly deleted/recycled, and hence no treat to resurrecting data on crash+restart. Test included. Closes scylladb/scylladb#16801	2024-01-17 09:30:55 +02:00
Kefu Chai	84a9d2fa45	add formatter for auth::role_or_anonymous before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for auth::role_or_anonymous, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16812	2024-01-17 09:28:13 +02:00
Kefu Chai	3f0fbdcd86	replica: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16810	2024-01-17 09:27:09 +02:00
Tomasz Grabiec	3d76aefb98	Merge "Enhance topology request status tracking" from Gleb Currently to figure out if a topology request is complete a submitter checks the topology state and tries to figure out from that the status of the request. This is not exact. Lets look at rebuild handling for instance. To figure out if request is completed the code waits for request object to disappear from the topology, but if another rebuild starts between the end of the previous one and the code noticing that it completed the code will continue waiting for the next rebuild. Another problem is that in case of operation failure there is no way to pass an error back to the initiator. This series solves those problems by assigning an id for each request and tracking the status of each request in a separate table. The initiator can query the request status from the table and see if the request was completed successfully or if it failed with an error, which is also evadable from the table. The schema for the table is: CREATE TABLE system.topology_requests ( id timeuuid PRIMARY KEY, initiating_host uuid, start_time timestamp, done boolean, error text, end_time timestamp, ); and all entries have TTL of one month.	2024-01-17 00:37:19 +01:00
Benny Halevy	d6071945c8	compaction, table: ignore foreign sstables replay_position The sstables replay_position in stats_metadata is valid only on the originating node and shard. Therefore, validate the originating host and shard before using it in compaction or table truncate. Fixes #10080 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16550	2024-01-16 18:45:59 +02:00
Benny Halevy	7a7a1db86b	sstables_loader: load_new_sstables: auto-enable load-and-stream for tablets And call on_internal_error if process_upload_dir is called for tablets-enabled keyspace as it isn't supported at the moment (maybe it could be in the future if we make sure that the sstables are confined to tablets boundaries). Refs #12775 Fixes #16743 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16788	2024-01-16 18:43:52 +02:00
Gleb Natapov	9a7243d71a	storage_service: topology coordinator: Consolidate some mutation builder code	2024-01-16 17:02:54 +02:00
Gleb Natapov	a145a73136	storage_service: topology coordinator: make topology operation rollback error more informative Include an error which caused the rollback.	2024-01-16 17:02:54 +02:00
Gleb Natapov	bf91eb37f2	storage_service: topology coordinator: make topology operation cancellation error more informative Include the list of nodes that were down when cancellation happened.	2024-01-16 17:02:54 +02:00
Gleb Natapov	8beb399b72	storage_service: topology coordinator: consolidate some code in cancel_all_requests There is a code duplication that can be avoided.	2024-01-16 17:02:54 +02:00
Gleb Natapov	fba6877b3e	storage_service: topology coordinator: TTL topology request table To prevent topology_request table growth TTL all writes to expire after a month.	2024-01-16 17:02:54 +02:00
Gleb Natapov	d576ed31dc	storage_service: topology request: drop explicit shutdown rpc Now that we have explicit status for each request we may use it to replace shutdown notification rpc. During a decommission, in left_token_ring state, we set done to true after metadata barrier that waits for all request to the decommissioning node to complete and notify the decommissioning node with a regular barrier. At this point the node will see that the request is complete and exit.	2024-01-16 17:02:54 +02:00
Gleb Natapov	84197ff735	storage_service: topology coordinator: check topology operation completion using status in topology_requests table Instead of trying to guess if a request completed by looking into the topology state (which is sometimes can be error prone) look at the request status in the new topology_requests. If request failed report a reason for the failure from the table.	2024-01-16 17:02:54 +02:00
Kefu Chai	0092700ad1	memtable: add formatter for replica::{memtable,memtable_entry} before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for replica::memtable and replica::memtable_entry, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16793	2024-01-16 16:46:52 +02:00
Kefu Chai	2dbf044b91	cql3: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16791	2024-01-16 16:43:17 +02:00
Avi Kivity	a9844ed69a	Merge 'view: revert cleanup filter that doesn't work with tablets' from Nadav Har'El The goal of this PR is fix Scylla so that the dtest test_mvs_populating_from_existing_data, which starts to fail when enabling tablets, will pass. The main fix (the second patch) is reverting code which doesn't work with tablets, and I explain why I think this code was not necessary in the first place. Fixes #16598 Closes scylladb/scylladb#16670 * github.com:scylladb/scylladb: view: revert cleanup filter that doesn't work with tablets mv: sleep a bit before view-update-generator restart	2024-01-16 16:42:20 +02:00
Gleb Natapov	1c18476385	storage_service: topology coordinator: update topology_requests table with requests progress Make topology coordinator update request's status in topology_requests table as it changes.	2024-01-16 15:35:18 +02:00
Benny Halevy	e277ec6aef	force_keyspace_cleanup: skip keyspaces that do not require or support cleanup Local keyspaces do not need cleanup, and keyspaces configured with tablets, where their replication strategy is per-table do not support cleanup. In both cases, just skip their cleanup via the api. Fixes #16738 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16785	2024-01-16 15:01:49 +03:00
Gleb Natapov	1ce1c5001d	topology coordinator: add topology_requests table to group0 snapshot Since the table is updated through raft's group0 state machine its content needs to be part of the snapshot.	2024-01-16 13:57:27 +02:00
Gleb Natapov	584551f849	topology coordinator: add request_id to the topology state machine Provide a unique ID for each topology request and store it the topology state machine. It will be used to index new topology requests table in order to retrieve request status.	2024-01-16 13:57:27 +02:00
Gleb Natapov	ecb8778950	system keyspace: introduce local table to store topology requests status The table has the following schema and will be managed by raft: CREATE TABLE system.topology_requests ( id timeuuid PRIMARY KEY, initiating_host uuid, start_time timestamp, done boolean, error text, end_time timestamp, ); In case of an request completing with an error the "error" filed will be non empty when "done" is set to true.	2024-01-16 13:57:16 +02:00
Tomasz Grabiec	49026dc319	Merge 'Turn on tablets on keyspace by default when the feature is enabled' from Pavel Emelyanov To enable tablets replication one needs to turn on the (experimental) feature and specify the `initial_tablets: N` option when creating a keyspace. We want tablets to become default in the future and allow users to explicitly opt it out if they want to. This PR solves this by changing the CREATE KEYSPACE syntax wrt tablets options. Now there's a new TABLETS options map and the usage is * `CREATE KEYSPACE ...` will turn tablets on or off based on cluster feature being enabled/disabled * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false }` will turn tablets off regardless of what * `CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true }` will try to enable tablets with default configuration * `CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> }` is now the replacement for `REPLICATION = { ... 'initial_tablets': <int> }` thing fixes: #16319 Closes scylladb/scylladb#16364 * github.com:scylladb/scylladb: code: Enable tablets if cluster feature is enabled test: Turn off tablets feature by default test: Move test_tablet_drain_failure_during_decommission to another suite test/tablets: Enable tables for real on test keyspace test/tablets: Make timestamp local cql3: Add feature service to as_ks_metadata_update() cql3: Add feature service to ks_prop_defs::as_ks_metadata() cql3: Add feature service to get_keyspace_metadata() cql: Add tablets on/off switch to CREATE KEYSPACE cql: Move initial_tablets from REPLICATION to TABLETS in DDL network_topology_strategy: Estimate initial_tablets if 0 is set	2024-01-16 00:15:10 +01:00
Avi Kivity	5e70dd1dbe	database: don't allow keyspace objects to be copied keyspace objects are heavyweight and copies are immediately our-of-date, so copying them is bad. Fix by deleting the copy constructor and copy assignment operator. One call site is fixed. This call site is safe since the it's only used for accessing a few attributes (introduced in `f70c4127c6`). Closes scylladb/scylladb#16782	2024-01-15 21:48:32 +01:00
Botond Dénes	204d3284fa	readers/multishard: evictable_reader::fast_forward_to(): close reader on exception When the reader is currently paused, it is resumed, fast-forwarded, then paused again. The fast forwarding part can throw and this will lead to destroying the reader without it being closed first. Add a try-catch surrounding this part in the code. Also mark `maybe_pause()` and `do_pause()` as noexcept, to make it clear why that part doesn't need to be in the try-catch. Fixes: #16606 Closes scylladb/scylladb#16630	2024-01-15 20:55:55 +01:00
Kefu Chai	e5300f3e21	topology_state_machine: add formatter for service::cleanup_status before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for service::cleanup_status, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16778	2024-01-15 21:31:42 +02:00
Anna Stuchlik	af1405e517	doc: remove support for CentOS 7 This commit removes support for CentOS 7 from the docs. The change applies to version 5.4,so it must be backported to branch-5.4. Refs https://github.com/scylladb/scylla-enterprise/issues/3502 In addition, this commit removes the information about Amazon Linux and Oracle Linux, unnecessarily added without request, and there's no clarity over which versions should be documented. Closes scylladb/scylladb#16279	2024-01-15 15:37:29 +02:00
Anna Stuchlik	bca39b2a93	doc: remove Serverless from the Drivers page This commit removes the information about ScyllaDB Cloud Serverless, which is no longer valid. Closes scylladb/scylladb#16700	2024-01-15 15:36:51 +02:00
Botond Dénes	66bef6e961	cql3: cluster_describe_statement: don't produce range ownership for tablet keyspaces Tablet keyspaces have per/table range ownership, which cannot currently be expressed in a DESC CLUSTER statement, which describes range ownership in the current keyspace (if set). Until we figure out how to represent range ownership (tablets) of all tables of a keyspace, we disable range ownership for tablet keyspaces. Fixes: #16483 Closes scylladb/scylladb#16713	2024-01-15 14:03:54 +01:00
Patryk Wrobel	aec0db1b96	cql_auth_query_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<seastar::sstring>&) from test/boost/cql_auth_query_test.cc. It prepares the test for removal of the templated helpers. Such removal is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16758	2024-01-15 13:30:05 +02:00
Kefu Chai	ece2bd2f6e	service: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16764	2024-01-15 13:29:33 +02:00
Kefu Chai	fc97d91f1a	auth: add fmt::format for auth::resource and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `auth::resource` and friends, * update their callers of `operator<<` to use `fmt::print()`. * drop `operator<<`, as they are not used anymore. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16765	2024-01-15 13:26:39 +02:00
Kefu Chai	f344e13066	types: add formatter for data_value before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for data_value, but its its operator<<() is preserved as we are still using the generic homebrew formatter for formatting std::vector, which in turn uses operator<< of the element type. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16767	2024-01-15 13:18:23 +02:00
Kefu Chai	218334eaf5	test/nodetool: use build/$CMAKE_BUILD_TYPE when appropriate because the CMake-generated build.ninja is located under build/, and it puts the `scylla` executable at build/$CMAKE_BUILD_TYPE/scylla, instead of at build/$scylla_build_mode/scylla, so let's adapt to this change accordingly. we will promote this change to a shared place if we have similar needs in other tests as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16775	2024-01-15 12:52:35 +02:00
Pavel Emelyanov	dd892b0d8a	code: Enable tablets if cluster feature is enabled If the TABLETS map is missing in the CREATE KEYSPACE statement the tablets are anyway enabled if the respective cluster feature is enabled. To opt-out keyspaces one may use TABLETS = { 'enabled': false } syntax. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4838eeb201	test: Turn off tablets feature by default Next patches will make per-keyspace initial_tables option really optional and turn tablets ON when the feature is ON. This will break all other tests' assumptions, that they are testing vnodes replication. So turn the feature off by default, tests that do need tables will need to explicitly enable this feature on their own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	ae7da54f88	test: Move test_tablet_drain_failure_during_decommission to another suite In its current location it will be started with 3 pre-created scylla nodes with default features ON. Next patch will exclude `tablets` from the default list, so the test needs to create servers on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	46b36d8c07	test/tablets: Enable tables for real on test keyspace When started cql_test_env creates a test keyspace. Some tablets test cases create a table in this keyspace, but misuse the whole feature. The thing is that while tablets feature is ON in those test cases, the keyspace itself doesn _not_ have the initial_tables option and thus tablets are not enabled for the ks' table for real. Currently test cases work just because this table is only used as a transparent table ID placeholder. If turning on tablets for the keyspace, several test cases would get broken for two reasons. First, the tables map will no longer be empty on test start. Second, applying changes to tablet metadata may not be visible, becase test case uses "ranom" timestamp, that can be less that the initial metadata mutations' timestamp. This patch fixes all three places: 1. enables tables for the test keyspace 2. removes assumption that the initial metadata is empty 3. uses large enough timestamp for subsequent mutations Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	2376b699e0	test/tablets: Make timestamp local Just to make next patching simpler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	f3a69bfaca	cql3: Add feature service to as_ks_metadata_update() To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	4dede19e4f	cql3: Add feature service to ks_prop_defs::as_ks_metadata() To call prepare_options() with tablets feature state later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	267770bf0f	cql3: Add feature service to get_keyspace_metadata() To be passed down to ks_prop_defs::as_ks_metadata() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:12 +03:00
Pavel Emelyanov	6cb3055059	cql: Add tablets on/off switch to CREATE KEYSPACE Now the user can do CREATE KEYSPACE ... WITH TABLETS = { 'enabled': false } to turn tablets off. It will be useful in the future to opt-out keyspace from tablets when they will be turned on by default based on cluster features only. Also one can do just CREATE KEYSPACE ... WITH TABLETS = { 'enabled': true } and let Scylla select the initial tablets value by its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:12:11 +03:00
Pavel Emelyanov	941f6d8fca	cql: Move initial_tablets from REPLICATION to TABLETS in DDL This patch changes the syntax of enabling tablets from CREATE KEYSPACE ... WITH REPLICATION = { ..., 'initial_tablets': <int> } to be CREATE KEYSPACE ... WITH TABLETS = { 'initial': <int> } and updates all tests accordingly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Pavel Emelyanov	4c4a9679d8	network_topology_strategy: Estimate initial_tablets if 0 is set If user configured zero initial tablets (spoiler: or this value was set automagically when enabling tablets begind the scenes) we still need some value to start with and this patch calculates one. The math is based on topology and RF so that all shards are covered: initial_tablets = max(nr_shards_in(dc) / RF_in(dc) for dc in datacenters) The estimation is done when a table is created, not when the keyspace is created. For that, the keyspace is configured with zero initial tabled, and table-creation time zero is converted into auto-estimated value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-01-15 13:04:48 +03:00
Kamil Braun	423234841e	Merge 'add automatic sstable cleanup to the topology coordinator' from Gleb For correctness sstable cleanup has to run between (some) topology changes. Sometimes even a failed topology change may require running the cleanup. The series introduces automatic sstable cleanup step to the topology change coordinator. Unlike other operations it is not represented as a global transition state, but done by each node independently which allows cleanup to run without locking the topology state machine so tablet code can run in parallel with the cleanup. It is done by having a cleanup state flag for each node in the topology. The flag is a tri state: "clean" - the node is clean, "needed" - cleanup is needed (but not running), "running" - cleanup is running. No topology operation can proceed if there is a node in "running" state, but some operation can proceed even if there are nodes in "needed" state. If the coordinator needs to perform a topology operation that cannot run while there are nodes that need cleanup the coordinator will start one automatically and continue only after cleanup completes. There is also a possibility to kick cleanup manually through the new RAFT API call. * 'cleanup-needed-v8' of https://github.com/gleb-cloudius/scylla: test: add test for automatic cleanup procedure test: add test for topology requests queue management storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator storage_service: topology coordinator: add logging to removenode and decommission storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator storage_service: topology coordinator: manage cluster cleanup as part of the topology management storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter test: use servers_see_each_other when needed test: add servers_see_each_other helper storage_service: topology coordinator: make topology coordinator lifecycle subscriber system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request storage_service: topology coordinator: introduce sstable cleanup fiber storage_proxy: allow to wait for all ongoing writes storage_service: topology coordinator: mark nodes as needing cleanup when required storage_service: add mark_nodes_as_cleanup_needed function vnode_effective_replication_map: add get_all_pending_nodes() function vnode_effective_replication_map: pre calculate dirty endpoints during topology change raft topology: add cleanup state to the topology state machine	2024-01-14 18:54:02 +01:00
Gleb Natapov	f8b90aeb14	test: add test for automatic cleanup procedure The test runs two bootstraps and checks that there is no cleanup in between. Then it runs a decommission and checks that cleanup runs automatically and then it runs one more decommission and checks that no cleanup runs again. Second part checks manual cleanup triggering. It adds a node, triggers cleanup through the REST API, checks that is runs, decommissions a node and check that the cleanup did not run again.	2024-01-14 15:45:53 +02:00
Gleb Natapov	5882855669	test: add test for topology requests queue management This test creates a 5 node cluster with 2 down nodes (A and B). After that it creates a queue of 3 topology operation: bootstrap, removenode A and removenode B with ignore_nodes=A. Check that all operation manage to complete. Then it downs one node and creates a queue with two requests: bootstrap and decommission. Since none can proceed both should be canceled.	2024-01-14 15:45:53 +02:00
Gleb Natapov	ba7aa0d582	storage_service: topology coordinator: add error injection point to be able to pause the topology coordinator	2024-01-14 15:45:53 +02:00
Gleb Natapov	1afc891bd5	storage_service: topology coordinator: add logging to removenode and decommission Add some useful logging to removenode and decommission to be used by tests later.	2024-01-14 15:45:53 +02:00
Gleb Natapov	97ab3f6622	storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator Introduce new REST API "/storage_service/cleanup_all" that, when triggered, instructs the topology coordinator to initiate cluster wide cleanup on all dirty nodes. It is done by introducing new global command "global_topology_request::cleanup".	2024-01-14 15:45:53 +02:00
Gleb Natapov	0adb3904d8	storage_service: topology coordinator: manage cluster cleanup as part of the topology management Sometimes it is unsafe to start a new topology operation before cleanup runs on dirty nodes. This patch detects the situation when the topology operation to be executed cannot be run safely until all dirty nodes do cleanup and initiates the cleanup automatically. It also waits for cleanup to complete before proceeding with the topology operation. There can be a situation that nodes that needs cleanup dies and will never clear the flag. In this case if a topology operation that wants to run next does not have this node in its ignore node list it may stuck forever. To fix this the patch also introduces the "liveness aware" request queue management: we do not simple choose _a_ request to run next, but go over the queue and find requests that can proceed considering the nodes liveness situation. If there are multiple requests eligible to run the patch introduces the order based on the operation type: replace, join, remove, leave, rebuild. The order is such so to not trigger cleanup needlessly.	2024-01-14 15:45:50 +02:00
Nadav Har'El	2d04070120	Update seastar submodule * seastar 0ffed835...8b9ae36b (4): > net/posix: Track ap-server ports conflict Fixes #16720 > include/seastar/core: do not include unused header > build: expose flag like -std=c++20 via seastar.pc > src: include used headers for C++ modules build Closes scylladb/scylladb#16769	2024-01-14 14:51:11 +02:00
Gleb Natapov	c9b7bd5a33	storage_service: topology coordinator: provide a version of get_excluded_nodes that does not need node_to_work_on as a parameter Needed by the next patch.	2024-01-14 14:44:07 +02:00
Gleb Natapov	0e68073b22	test: use servers_see_each_other when needed In the next patch we want to abort topology operations if there is no enough live nodes to perform them. This will break tests that do a topology operation right after restarting a node since a topology coordinator may still not see the restarted node as alive. Fix all those tests to wait between restart and a topology operation until UP state propagates.	2024-01-14 14:44:07 +02:00
Gleb Natapov	455ffaf5d8	test: add servers_see_each_other helper The helper makes sure that all nodes in the cluster see each other as alive.	2024-01-14 14:44:07 +02:00
Gleb Natapov	067267ff76	storage_service: topology coordinator: make topology coordinator lifecycle subscriber We want to change the coordinator to consider nodes liveness when processing the topology operation queue. If there is no enough live nodes to process any of the ops we want to cancel them. For that to work we need to be able to kick the coordinator if liveness situation changes.	2024-01-14 14:44:07 +02:00
Gleb Natapov	a4ac64a652	system_keyspace: raft topology: load ignore nodes parameter together with removenode topology request Next patch will need ignore nodes list while processing removenode request. Load it.	2024-01-14 14:44:07 +02:00
Gleb Natapov	f70c4127c6	storage_service: topology coordinator: introduce sstable cleanup fiber Introduce a fiber that waits on a topology event and when it sees that the node it runs on needs to perform sstable cleanup it initiates one for each non tablet, non local table and resets "cleanup" flag back to "clean" in the topology.	2024-01-14 14:44:07 +02:00
Gleb Natapov	5b246920ae	storage_proxy: allow to wait for all ongoing writes We want to be able to wait for all writes started through the storage proxy before a fence is advanced. Add phased_barrier that is entered on each local write operation before checking the fence to do so. A write will be either tracked by the phased_barrier or fenced. This will be needed to wait for all non fenced local writes to complete before starting a cleanup.	2024-01-14 14:44:07 +02:00
Gleb Natapov	b2ba77978c	storage_service: topology coordinator: mark nodes as needing cleanup when required A cleanup needs to run when a node loses an ownership of a range (during bootstrap) or if a range movement to an normal node failed (removenode, decommission failure). Mark all dirty node as "cleanup needed" in those cases.	2024-01-14 14:43:59 +02:00
Gleb Natapov	dbededb1a6	storage_service: add mark_nodes_as_cleanup_needed function The function creates a mutation that sets cleanup to "needed" for each normal node that, according to the erm, has data it does not own after successful or unsuccessful topology operation.	2024-01-14 14:43:33 +02:00
Gleb Natapov	23a27ccc24	vnode_effective_replication_map: add get_all_pending_nodes() function Add a function that returns all nodes that have vnode been moved to them during a topology change operation. Needed to know which nodes need to do cleanup in case of failed topology change operation.	2024-01-14 14:37:16 +02:00
Gleb Natapov	a8f11852da	vnode_effective_replication_map: pre calculate dirty endpoints during topology change Some topology change operations causes some nodes loose ranges. This information is needed to know which nodes need to do cleanup after topology operation completes. Pre calculate it during erm creation.	2024-01-14 14:11:19 +02:00
Gleb Natapov	cc54796e23	raft topology: add cleanup state to the topology state machine The patch adds cleanup state to the persistent and in memory state and handles the loading. The state can be "clean" which means no cleanup needed, "needed" which means the node is dirty and needs to run cleanup at some point, "running" which means that cleanup is running by the node right now and when it will be completed the state will be reset to "clean".	2024-01-14 13:30:54 +02:00
Nadav Har'El	1bcaeb89c7	view: revert cleanup filter that doesn't work with tablets This patch reverts commit `10f8f13b90` from November 2022. That commit added to the "view update generator", the code which builds view updates for staging sstables, a filter that ignores ranges that do not belong to this node. However, 1. I believe this filter was never necessary, because the view update code already silently ignores base updates which do not belong to this replica (see get_view_natural_endpoint()). After all, the view update needs to know that this replica is the Nth owner of the base update to send its update to the Nth view replica, but if no such N exists, no view update is sent. 2. The code introduced for that filter used a per-keyspace replication map, which was ok for vnodes but no longer works for tablets, and causes the operation using it to fail. 3. The filter was used every time the "view update generator" was used, regardless of whether any cleanup is necessary or not, so every such operation would fail with tablets. So for example the dtest test_mvs_populating_from_existing_data fails with tablets: * This test has view building in parallel with automatic tablet movement. * Tablet movement is streaming. * When streaming happens before view building has finished, the streamed sstables get "view update generator" run on them. This causes the problematic code to be called. Before this patch, the dtest test_mvs_populating_from_existing_data fails when tablets are enabled. After this patch, it passes. Fixes #16598 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:24:44 +02:00
Nadav Har'El	0fe40f729e	mv: sleep a bit before view-update-generator restart The "view update generator" is responsible for generating view updates for staging sstables (such as coming from repair). If the processing fails, the code retries - immediately. If there is some persistent bug, such as issue #16598, we will have a tight loop of error messages, potentially a gigabyte of identical messages every second. In this patch we simply add a sleep of one second after view update generation fails before retrying. We can still get many identical error messages if there is some bug, but not more than one per second. Refs #16598. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-01-14 13:13:52 +02:00
Kamil Braun	4e18f8b453	Merge 'topology_state_load: stop waiting for IP-s' from Petr Gusev The loop in `id2ip` lambda makes problems if we are applying an old raft log that contains long-gone nodes. In this case, we may never receive the `IP` for a node and stuck in the loop forever. In this series we replace the loop with an if - we just don't update the `host_id <-> ip` mapping in the `token_metadata.topology` if we don't have an `IP` yet. The PR moves `host_id -> IP` resolution to the data plane, now it happens each time the IP-based methods of `erm` are called. We need this because IPs may not be known at the time the erm is built. The overhead of `raft_address_map` lookup is added to each data plane request, but it should be negligible. In this PR `erm/resolve_endpoints` continues to treat missing IP for `host_id` as `internal_error`, but we plan to relax this in the follow-up (see this PR first comment). Closes scylladb/scylladb#16639 * github.com:scylladb/scylladb: raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes storage_service: topology_state_load: remove IP waiting loop storage_service: sync_raft_topology_nodes: add target_node parameter storage_service: sync_raft_topology_nodes: move loops to the end storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node storage_service: sync_raft_topology_nodes: move update_topology up storage_service: topology_state_load: remove clone_async/clear_gently overhead storage_service: fix indentation storage_service: extract sync_raft_topology_nodes storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata address_map: move gossiper subscription logic into storage_service topology_coordinator: exec_global_command: small refactor, use contains + reformat storage_service: wait_for_ip for new nodes storage_service.idl.hh: fix raft_topology_cmd.command declaration erm: for_each_natural_endpoint_until: use is_vnode == true erm: switch the internal data structures to host_id-s erm: has_pending_ranges: switch to host_id	2024-01-12 18:46:51 +01:00
Petr Gusev	e24bee545b	raft ips: rename gossiper_state_change_subscriber_proxy -> raft_ip_address_updater	2024-01-12 18:29:22 +04:00
Petr Gusev	6e7bbc94f4	gossiper_state_change_subscriber_proxy: call sync_raft_topology_nodes When a node changes its IP we need to store the mapping in system.peers and update token_metadata.topology and erm in-memory data structures. The test_change_ip was improved to verify this new behaviour. Before this patch the test didn't check that IPs used for data requests are updated on IP change. In this commit we add the read/write check. It fails on insert with 'node unavailable' error without the fix.	2024-01-12 18:28:57 +04:00
Petr Gusev	6d6e1ba8fb	storage_service: topology_state_load: remove IP waiting loop The loop makes problems if we are applying an old raft log that contains long-gone nodes. In this case, we may never receive the IP for a node and stuck in the loop forever. The idea of the patch is to replace the loop with an if - we just don't update the host_id <-> ip mapping in the token_metadata.topology if we don't have an IP yet. When we get the mapping later, we'll call sync_raft_topology_nodes again from gossiper_state_change_subscriber_proxy.	2024-01-12 15:37:50 +04:00
Petr Gusev	260874c860	storage_service: sync_raft_topology_nodes: add target_node parameter If it's set, instead of going over all the nodes in raft topology, the function will update only the specified node. This parameter will be used in the next commit, in the call to sync_raft_topology_nodes from gossiper_state_change_subscriber_proxy.	2024-01-12 15:37:50 +04:00
Petr Gusev	a9d58c3db5	storage_service: sync_raft_topology_nodes: move loops to the end	2024-01-12 15:37:50 +04:00
Petr Gusev	d1bce3651b	storage_service: sync_raft_topology_nodes: rename extract process_left_node and process_transition_node	2024-01-12 15:37:50 +04:00
Petr Gusev	aa37b6cfd3	storage_service: sync_raft_topology_nodes: rename add_normal_node -> process_normal_node	2024-01-12 15:37:50 +04:00
Petr Gusev	a508d7ffc5	storage_service: sync_raft_topology_nodes: move update_topology up In this and the following commits we prepare sync_raft_topology_nodes to handle target_node parameter - the single host_id which should be updated.	2024-01-12 15:37:50 +04:00
Petr Gusev	1b12f4b292	storage_service: topology_state_load: remove clone_async/clear_gently overhead Before the patch we used to clone the entire token_metadata and topology only to immediately drop everything in clear_gently. This is a sheer waste.	2024-01-12 15:37:50 +04:00
Petr Gusev	1531e5e063	storage_service: fix indentation	2024-01-12 15:37:50 +04:00
Petr Gusev	9c50637f28	storage_service: extract sync_raft_topology_nodes In the following commits we need part of the topology_state_load logic to be applied from gossiper_state_change_subscriber_proxy. In this commit we extract this logic into a new function sync_raft_topology_nodes.	2024-01-12 15:37:50 +04:00
Petr Gusev	9679b49cf4	storage_service: topology_state_load: move remove_endpoint into mutate_token_metadata In the next commit we extract the loops by nodes into a new function, in this commit we just move them closer to each other. Now the remove_endpoint function might be called under token_metadata_lock (mutate_token_metadata takes it). It's not a problem since gossiper event handlers in raft_topology mode doesn't modify token_metadata so we won't get a deadlock.	2024-01-12 15:37:50 +04:00
Petr Gusev	15b8e565ed	address_map: move gossiper subscription logic into storage_service We are going to remove the IP waiting loop from topology_state_load in subsequent commits. An IP for a given host_id may change after this function has been called by raft. This means we need to subscribe to the gossiper notifications and call it later with a new id<->ip mapping. In this preparatory commit we move the existing address_map update logic into storage_service so that in later commits we can enhance it with topology_state_load call.	2024-01-12 15:37:50 +04:00
Petr Gusev	743be190f9	topology_coordinator: exec_global_command: small refactor, use contains + reformat	2024-01-12 15:37:50 +04:00
Petr Gusev	db1f0d5889	storage_service: wait_for_ip for new nodes When a new node joins the cluster we need to be sure that it's IP is known to all other nodes. In this patch we do this by waiting for the IP to appear in raft_address_map. A new raft_topology_cmd::command::wait_for_ip command is added. It's run on all nodes of the cluster before we put the topology into transition state. This applies both to new and replacing nodes. It's important to run wait_for_ip before moving to topology::transition_state::join_group0 since in this state node IPs are already used to populate pending nodes in erm.	2024-01-12 15:37:46 +04:00
Michał Jadwiszczak	013487e1e1	test:cql-pytest: change service levels intervals in tests Set the interval to 0.5s to reduce required sleep time.	2024-01-12 10:28:28 +01:00
Michał Jadwiszczak	f6a464ad81	configure service levels interval So far the service levels interval, responsible for updating SL configuration, was hardcoded in main. Now it's extracted to `service_levels_interval_ms` option.	2024-01-12 10:28:24 +01:00
Kefu Chai	a0e5c14c55	alternator: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16736	2024-01-12 10:53:32 +02:00
Botond Dénes	5f44ae8371	Merge 'Add more logging for `gossiper::lock_endpoint` and `storage_service::handle_state_normal`' from Kamil Braun In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. Thus the plan is to continue debugging using the longevity test, but we need more logs. To check whether `handle_state_normal` was called and which branches were taken, include some INFO level logs there. Also, detect deadlocks inside `gossiper::lock_endpoint` by reporting an error message if `lock_endpoint` waits for the lock for too long. Ref: scylladb/scylladb#16668 Closes scylladb/scylladb#16733 * github.com:scylladb/scylladb: gossiper: report error when waiting too long for endpoint lock gossiper: store source_location instead of string in endpoint_permit storage_service: more verbose logging in handle_state_normal	2024-01-12 10:51:21 +02:00
Lakshmi Narayanan Sreethar	cd9e027047	types: fix ambiguity in align_up call Compilation fails with recent boost versions (>=1.79.0) due to an ambiguity with the align_up function call. Fix that by adding type inference to the function call. Fixes #16746 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16747	2024-01-12 10:50:31 +02:00
Kefu Chai	344ea25ed8	db: add fmt::format for db::consistency_level before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for `db::consistency_level` * drop its `operator<<`, as it is not used anymore Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16755	2024-01-12 10:49:00 +02:00
Patryk Wrobel	87545e40c7	test/boost/auth_resource_test.cc: do not rely on templated operator<< This change is intended to remove the dependency to operator<<(std::ostream&, const std::unordered_set<T>&) from auth_resource_test.cc. It prepares the test for removal of the templated helpers from utils/to_string.hh, which is one of goals of the referenced issue that is linked below. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16754	2024-01-12 10:48:01 +02:00
Petr Gusev	802da1e7a5	storage_service.idl.hh: fix raft_topology_cmd.command declaration Make IDL correspond to the declaration of raft_topology_cmd::command in topology_state_machine.hh.	2024-01-12 12:23:22 +04:00
Petr Gusev	41c15814e6	erm: for_each_natural_endpoint_until: use is_vnode == true This is an optimisation - for_each_natural_endpoint_until is called only for vnode tokens, we don't need to run the binary search for it in tm.first_token. Also the function is made private since it's only used in erm itself.	2024-01-12 12:23:22 +04:00
Petr Gusev	07f2ec63c7	erm: switch the internal data structures to host_id-s Before this patch the host_id -> IP mapping was done in calculate_effective_replication_map. This function is called from mutate_token_metadata, which means we have to have an IP for each host_id in topology_state_load, otherwise we get an error. We are going to remove the IP waiting loop from topology_state_load, so we need to get rid of IPs resolution from calculate_effective_replication_map. In this patch we move the host_id -> IP resolution to the data plane. When a write or read request is sent the target endpoints are requested from erm through get_natural_endpoints_without_node_being_replaced, get_pending_endpoints and get_endpoints_for_reading methods and this is where the IP resolution will now occur.	2024-01-12 12:23:22 +04:00
Petr Gusev	1928dc73a8	erm: has_pending_ranges: switch to host_id In the next patches we are going to change erm data structures (replication_map and ring_mapping) from IP to host_id. Having locator::host_id instead of IP in has_pending_ranges arguments makes this transition easier.	2024-01-12 12:23:19 +04:00
Botond Dénes	b69f7126c3	Update tools/java submodule * tools/java 24e51259...c75ce2c1 (1): > Update JNA dependency to 5.14.0	2024-01-12 09:47:20 +02:00
Benny Halevy	3e938dbb5a	storage_service: get rid of handle_state_moving declaration The implementation was already removed in `e64613154f` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16742	2024-01-12 09:38:23 +02:00
Nadav Har'El	5c7e029012	test/cql-pytest: add reproducer for task-tracking memory leak This patch adds a reproducer test for the memory leak described in issue #16493: If a table is repeatedly created and dropped, memory is leaked by task tracking. Although this "leak" can be temporary if task_ttl_in_seconds is properly configured, it may still use too much memory if tables are too frequently created and dropped. The test here shows that (before #16493 was fixed) as little as 100 tables created and deleted can cause Scylla to run out of memory. The problem is severely exacerbated when tablets are used which is why the test here uses tablets. Before the fix for #16493 (a Seastar patch, scylladb/seastar#2023), this test of 100 iterations always failed (with test/cql-pytest/run's default memory allowance). After the fix, the test doesn't fail in 100 iterations - and even if increased manually to 10,000 iterations it doesn't fail. The new test uses the initial_tablets feature, so requires Scylla to be run with the "tablets" experimental option turned on. This is not currently the default of test.py or test/cql-pytest/run, so I turned it on manually to check this test. I also checked that the test is correctly skipped if tablets are not turned on. Refs #16493 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16717	2024-01-12 09:37:32 +02:00
Botond Dénes	63b266e94c	Merge ' db: Make the "me" sstable format mandatory' from Kefu Chai The `me` sstable format includes an important feature of storing the `host_id` of the local node when writing sstables. The is crucial for validating the sstable's `replay_position` in stats metadata as it is valid only on the originating node and shard (#10080), therefor we would like to make the me format mandatory. in this series, `sstable_format` option is deprecated, and the default sstable format is bumped up from `mc` to `md`, so that a cluster composed of nodes with this change should always use `me` as the sstable format. if a node with this change joins a 5.x cluster which still using `md` because they are configured as such, this node will also be using `md`, unless the other node(s) changes its `sstable_format` setting to `me`. Fixes #16551 Closes scylladb/scylladb#16716 * github.com:scylladb/scylladb: db/config.cc: do not respect sstable_format option feature_service: abort if sstable_format < md db, sstable: bump up default sstable format to "md"	2024-01-12 09:33:08 +02:00
Kamil Braun	cf646022cb	gossiper: report error when waiting too long for endpoint lock In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127). Modify the code so it reports an error if `lock_endpoint` waits for the lock for more than a minute. When the issue reproduces again in longevity, we will see if `lock_endpoint` got stuck.	2024-01-11 17:29:25 +01:00
Kefu Chai	7abd263ee6	db/config.cc: do not respect sstable_format option "me" sstable format includes an important feature of storing the `host_id` of the local node when writing sstables. The is crucial for validating the sstable's `replay_position` in stats metadata as it is valid only on the originating node and shard (#10080), therefor we would like to make the `me` format mandatory. before making `me` mandatory, we need to stop handling `sstable_format` option if it is "md". in this change - gms/feature_service: do not disable `ME_SSTABLE_FORMAT` even if `sstable_format` is configured with "md". and in that case, instead, a warning is printed in the logging message to note that this setting is not valid anymore. - docs/architecture/sstable: note that "me" is used by default now. after this change, "sstable_format" will only accept "me" if it's explicitly configured. and when a server with this change joins a cluster, it uses "md" if the any of the node in the cluster still has `sstable_format`. practically, this change makes "me" mandatory in a 6.x cluster, assuming this change will be included in 6.x releases. Fixes #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Kefu Chai	bece3eff0c	feature_service: abort if sstable_format < md sstable_format comes from scylla.yaml or from the command line arguments, and we gate scylla from unallowed sstable formats lower than `md` when parsing the configuration, and scylla bails out at seeing the unallowed sstable format like: ``` terminate called after throwing an instance of 'std::invalid_argument' what(): Invalid value for sstable_format: got ka which is not inside the set of allowed values md, me Aborted (core dumped) ``` scylla errors out way before `feature_config_from_db_config()` gets called -- it throws in `bpo::notify(configuration)`, way before `func` is evaluated in `app_template::run_deprecated()`. so, in this change, we do not handle these values anymore, and consider it a bug if we run into any of them. Refs #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Kefu Chai	54d49c04e0	db, sstable: bump up default sstable format to "md" before this change, we defaults to use "mc" sstable format, and switch to "md" if the cluster agrees on using it, and to "me" if the cluster agrees on using this. the cluster feature is used to get the consensus across the members in the cluster, if any of the existing nodes in the cluster has its `sstable_format` configured to, for instance, "mc", then the cluster is stuck with "mc". but we disabled "mc" sstable format back in `3d345609`, the first LTS release including that change was scylla v5.2.0. which means, the cluster of the last major version Scylla should be using "md" or "me". per our document on upgrade, see docs/upgrade/index.rst, > You should perform the upgrades consecutively - to each > successive X.Y version, without skipping any major or minor version. > > Before you upgrade to the next version, the whole cluster (each > node) must be upgraded to the previous version. we can assume that, a 6.x node will only join a cluster with 5.x or 6.x nodes. (joining a 7.x cluster should work, but this is not relevant to this change). in both cases, since 5.x and up scylla can only configured with "md" `sstable_format`, there is no need to switch from "mc" to "md" anymore. so we can ditch the code supporting it. Refs #16551 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 22:43:05 +08:00
Avi Kivity	f0d6330204	build: add crypto++ to dependencies We depend on the crypto++ library (see utils/hashers.hh) but don't list it in install-dependencies.sh. Currently this works because Seastar's install-dependencies.sh installs it, but that's going away in [1]. List crypto++ directly to keep install-dependencies.sh working. Regenerating the frozen toolchain is unnecessary since we're re-adding an existing dependency. [1] `6bdef1e431` Closes scylladb/scylladb#16563	2024-01-11 16:26:20 +02:00
Patryk Jędrzejczak	e99d03a21e	topology_coordinator: clarify warnings It was unclear where the error messages ended if they consisted of multiple sentences.	2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak	b4b170047b	raft topology: join: allow only the first response to be a succesful acceptance The joining node might receive more than one join response (see the comment at the beginning of `join_node_response_handler`). If the first response was a rejection or it was an acceptance but the joining node failed while handling it, the following acceptances by the coordinator shouldn't succeed. The joining node considers the join operation as failed. Currently, we always immediately return from non-first response handler calls. However, if the response is an acceptance, and the first response wasn't a successfully handled acceptance, we need to throw an exception to ensure the topology coordinator moves the node to the left state. We do it in this patch. We throw the exception set while handling the first response. It explains why we are failing the current acceptance. We don't want to throw the exception on rejection. The topology coordinator will move the node to the left state anyway. Also, failing the rejection with an error message containing "the topology coordinator rejected request to join the cluster" (from the previous rejection) would be very confusing.	2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak	f3a08757af	storage_service: join_node_response_handler: fix indentation Broken in the previous patch.	2024-01-11 14:19:42 +01:00
Patryk Jędrzejczak	ddfd9c3173	raft topology: join: shut down a node on error in response handler If the joining node fails while handling the response from the topology coordinator, it hangs even though it knows the join operation has failed. Therefore, we ensure it shuts down in this patch. We rethrow the caught exception to ensure the topology coordinator knows the RPC has failed. In case of rejection, it does not matter because the coordinator behaves the same way in both cases: RPC success and RPC failure. It transitions the rejected node to the left state. However, in case of acceptance, this only happens if the RPC fails. Otherwise, the coordinator continues handling the request. On abort, one of the two events happens first: - the new catch statement catches `abort_requested_exeption` and sets it on `_join_node_response_done`, - `co_await _ss._join_node_response_done.get_shared_future(as);` in `join_node_rpc_handshaker::post_server_start` resolves with `abort_requested_exception` after triggering `as`. In both cases, `join_node_rpc_handshaker::post_server_start` throws `abort_requested_exception`. Therefore, we don't need a separate catch statement for `abort_requested_exception` in `join_node_response_handler`.	2024-01-11 14:19:37 +01:00
Botond Dénes	697ebef149	Merge 'tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk Make compaction tasks internal. Drop all internal tasks without parents immediately after they are done. Fixes: #16735 Refs: #16694. Closes scylladb/scylladb#16698 * github.com:scylladb/scylladb: compaction: make regular compaction tasks internal tasks: don't keep internal root tasks after they complete	2024-01-11 12:10:44 +02:00
Nadav Har'El	5762170526	main: fix "starting {}" messages The supervisor::notify() function expects a single string - not a format and parameters. Calls we have in main.cc like supervisor::notify("starting {}", what); end up printing the silly message "starting {}". The second parameter "what" is converted to a bool, also having an unintended consequence for telling notify we're "ready". This patch fixes it to call fmt::format, as intended. Fixes #16728 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16729	2024-01-11 11:43:07 +02:00
Botond Dénes	ac69473bac	Merge 'utils/pretty_printers: add "I" specifier support' from Kefu Chai this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Closes scylladb/scylladb#16726 * github.com:scylladb/scylladb: utils/pretty_printers: add "I" specifier support utils/pretty_printers: use the formatting of to_hr_size()	2024-01-11 10:54:14 +02:00
Kefu Chai	0c2ef5de54	test/unit/bptree_validation: use "{}" for formatting test_data before this change, "{:d}" is used for formatting `test_data` y bptree_stress_test.cc. but the "d" specifier is only used for formatting integers, not for formatting `test_data` or generic data types, so this fails when the test is compiled with {fmt} v10, like: ``` In file included from /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:20: /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:294:35: error: call to consteval function 'fmt::basic_format_string<char, test_data &, test_data &>::basic_format_string<char[31], 0>' is not a constant expression 294 \| fmt::print(std::cout, "Iterator broken, {:d} != {:d}\n", val, *_fwd); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_validation.hh:267:20: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::forward_check' requested here 267 \| return forward_check(); \| ^ /home/kefu/dev/scylladb/test/unit/bptree_stress_test.cc:92:35: note: in instantiation of member function 'bplus::iterator_checker<tree_test_key_base, test_data, test_key_compare, 16>::step' requested here 92 \| if (!itc->step()) { \| ^ /usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression 2322 \| if (!in(arg_type, set)) throw_format_error("invalid format specifier"); \| ^ ``` in this change, instead of specifying "{:d}", let's just use "{}", which works for both integer and `test_data`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16727	2024-01-11 10:53:33 +02:00
Kefu Chai	6c06751640	cdc: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16725	2024-01-11 09:13:37 +02:00
Kefu Chai	5874652967	cql3: define format_as() for formatting cql3::cql3_type in the same spirit of `724a6e26`, format_as() is defined for cql3::cql3_type. despite that this is not used yet by fmt v9, where we still have FMT_DEPRECATED_OSTREAM, this prepares us for fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16232	2024-01-11 09:07:18 +02:00
Botond Dénes	3d1667c720	Update ./tools/java submodule * ./tools/java e106b500...24e51259 (1): > build.xml: update io.airlift to 0.9	2024-01-11 08:55:51 +02:00
Lakshmi Narayanan Sreethar	76f0d5e35b	reader_permit: store schema_ptr instead of raw schema pointer Store schema_ptr in reader permit instead of storing a const pointer to schema to ensure that the schema doesn't get changed elsewhere when the permit is holding on to it. Also update the constructors and all the relevant callers to pass down schema_ptr instead of a raw pointer. Fixes #16180 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16658	2024-01-11 08:37:56 +02:00
Kefu Chai	f11a53856d	utils/pretty_printers: add "I" specifier support this is to mimic the formatting of `human_readable_value`, and to prepare for consolidating these two formatters, so we don't have two pretty printers in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:47 +08:00
Patryk Wrobel	f4e311e871	cql3: add formatter for cql3::expr::oper_t This change introduces a specialization of fmt::formatter for cql3::expr::oper_t. This enables the usage of this type with FMTv10, which dropped the default generated formatter. Usage of cql3::expr::oper_t without the defined formatter resulted in compilation error when compiled with FMTv10. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16719	2024-01-11 08:33:35 +02:00
Kefu Chai	7d627b328f	utils/pretty_printers: use the formatting of to_hr_size() keep the precision of 4 digits, for instance, so that we format "8191" as "8191" instead of as "8 Ki". this is modeled after the behavior of `to_hr_size()`. for better user experience. and also prepares to consolidate these two formatters. tests are updated to exercise both IEC and SI notations. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-11 14:33:03 +08:00
Kefu Chai	8c4576f55d	api: storage_service: correct the descriptions of two APIs this change is more about documentation of the RESTful API of storage_service. as we define the API using Swagger 2.0 format, and generate the API document from the definitions. so would be great if the document matches with the API. in this change, since the keyspace is not queried but mutated. so changed to a more accurate description. from the code perspective, it is but cosmetic. as we don't read the description fields or verify them in our tests. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16637	2024-01-11 08:28:14 +02:00
Kamil Braun	6e39c2ffde	gossiper: store source_location instead of string in endpoint_permit The original code extracted only the function_name from the source_location for logging. We'll use more information from the source_location in later commits.	2024-01-10 17:02:52 +01:00
Kamil Braun	664349a10f	storage_service: more verbose logging in handle_state_normal In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. Improve the INFO level logging in handle_state_normal to aid debugging in the future. The amount of logs is still constant per-node. Even though some log messages report all tokens owned by a node, handle_state_normal calls are still rare. The most "spammy" situation is when a node starts and calls handle_state_normal for every other node in the cluster, but it is a once-per-startup event.	2024-01-10 16:39:55 +01:00
Patryk Wrobel	a64eb92369	utils: specialize fmt::formatter for utils::tagged_integer This change introduces a specialization of fmt::formatter for utils::tagged_integer. This enables the usage of this type with FMTv10, which dropped the default generated formatter. Usage of utils::tagged_integer without the defined formatter resulted in compilation error when compiled with FMTv10. Refs: #13245 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#16715	2024-01-10 18:32:43 +03:00
Nadav Har'El	083868508c	Update seastar submodule * seastar 70349b74...0ffed835 (15): > http/client: include used header files > treewide: s/format/fmt::format/ when appropriate > shared_future: shared_state::run_and_dispose(): release reserve of _peers Fixes #16493 > metrics_tester - A demo app to test metrics > build: silence the waring of -Winclude-angled-in-module-purview > estimated_histogram.hh: Support native histograms > prometheus.cc: Clean the pick representation code > prometheus.cc add native histogram > memory: fix the indentation. > metrics_types.hh: add optional native histogram information > memory: include used header > prometheus.cc: Add filter, aggregate by label and skip_when_empty > src/proto/metrics2.proto: newer proto buf definition > print: deprecate format_separated() > reactor: use fmt::join() when appropriate Closes scylladb/scylladb#16712	2024-01-10 14:02:04 +02:00
Nadav Har'El	39dd2a2690	cql-pytest: translated Cassandra's test for LWT with static column This is a translation of Cassandra's CQL unit test source file validation/operations/InsertUpdateIfConditionStaticsTest.java into our cql-pytest framework. This test file checks various LWT conditional updates which involve static columns or UDTs (there are separate test file for LWT conditional updates that do not involve static columns). This test did not uncover any new bugs, but demonstrates yet again several places where we intentionally deviated from Cassandra's behavior, forcing me to add "is_scylla" checks in many of the checks to allow them to pass on both Scylla and Cassanda. These deviations are known, intentional and some are documented in docs/kb/lwt-differences.rst but not all, so it's worth listing here the ones re-discovered by this test: 1. On a successful conditional write, Cassandra returns just True, Scylla also returns the old contents of the row. This difference is officially documented in docs/kb/lwt-differences.rst. 2. On a batch request, Scylla always returns a row per statement, Cassandra doesn't - it often returns just a single failed row, or just True if the whole batch succeeded. This difference is officially documented in docs/kb/lwt-differences.rst. 3. In a DELETE statement with a condition, in the returned row Cassandra lists the deleted column first - while Scylla lists the static column first (as in any other row). This difference is probably inconsequential, because columns also have names so their order in the response usually doesn't matter. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16643	2024-01-10 12:14:06 +02:00
Nadav Har'El	b1a441ba56	test/cql-pytest: correct xfail status of timestamp parser The recently-added test test_fromjson_timestamp_submilli demonstrated a difference between Scylla's and Cassandra's parsing timestamps in JSON: Trying to use too many (more than 3) digits of precision is forbidden in Scylla, but ignored in Cassandra. So we marked the test "xfail", suggesting we think it's a Scylla bug that should be fixed in the future. However, it turns out that we already had a different test, test_type_timestamp_from_string_overprecise, which showed the same difference in a different context (without JSON). In that older test, the decision was to consider this a Cassandra bug, not Scylla bug - because Cassandra seemingly allows the sub-millisecond timestap but in reality drops the extra precision. So we need to be consistent in the tests - this is either a Scylla bug or a Cassandra bug, we can't make once choice in one test and another in a different test :-) So let's accept our older decision, and consider Scylla's behavior the correct one in this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16586	2024-01-10 12:12:26 +02:00
Kefu Chai	eb9216ef11	compaction: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16707	2024-01-10 11:07:36 +02:00
Kefu Chai	317af97e41	test/pylib: shutdown unix RESTful client when stopping the ManagerClient, it would be better to close all connected connector, otherwise aiohttp complains like: ``` 13:57:53.763 ERROR> Unclosed connector connections: ['[(<aiohttp.client_proto.ResponseHandler object at 0x7f939d2ca5f0>, 96672.211256817)]'] connector: <aiohttp.connector.UnixConnector object at 0x7f939d2da890> ``` this warning message is printed to the console, and it is distracting when testing manually. so, in this change, let's close the client connecting to unix domain socket. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16675	2024-01-10 11:07:14 +02:00
Kefu Chai	f61f6c27e3	gms: add formatter for gms::endpoint_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::endpoint_state, and change update the callers of `operator<<` to use `fmt::print()`. but we cannot drop `operator<<` yet, as we are still using the templated operator<< and templated fmt::formatter to print containers in scylla and in seastar -- they are still using `operator<<` under the hood. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16705	2024-01-10 09:16:23 +02:00
Sylwia Szunejko	eabe97bcd0	transport: remove additional options from TABLETS_ROUTING_V1 Closes scylladb/scylladb#16701	2024-01-10 09:00:25 +02:00
Botond Dénes	5981900dca	Update tools/jmx submodule * tools/jmx 80ce5996...3257897a (1): > scylla-apiclient: drop hk2-locator dependency	2024-01-10 08:53:20 +02:00
Kefu Chai	34b03867b2	tools: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16673	2024-01-10 08:44:09 +02:00
Kefu Chai	0dc7db54d1	build: cmake: add "unit_test_list" target this target is used by test.py for enumerating unit tests * test/CMakeLists.txt: append executable's full path to `scylla_tests`. add `unit_test_list` target printing `scylla_tests`, please note, `cmake -E echo` does not support the `-e` option of `echo`, and ninja does not support command line with newline in it, we have to use `echo` to print the list of tests. * test/{boost,raft,unit}/CMakeLists.txt: set scylla_tests only if $PWD/suite.yaml exists. we could hardwire this logic in these files, as it is known that this file exists in these directory, but this is still put this way, so that it serves as a comment explaining that the reason why we update scylla_tests here but not somewhere else where we also use `add_scylla_test()` function is just suite.yaml exists here. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16702	2024-01-10 08:43:04 +02:00
Botond Dénes	4aba445ef6	Merge 'test.py: adapt to cmake building system' from Kefu Chai in this series, we adapt to cmake building system by mapping scylla build mode to `CMAKE_BUILD_TYPE` and by using `build/build.ninja` if it exists, as `configure.py` generates `build.ninja` in `build` when using CMake for creating `build.ninja`. Closes scylladb/scylladb#16703 * github.com:scylladb/scylladb: test.py: build using build/build.ninja when it exists test.py: extract ninja() test.py: extract path_to() test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE	2024-01-10 08:39:33 +02:00
Kefu Chai	382a5e2d0c	test.py: build using build/build.ninja when it exists CMake puts `build.ninja` under `build`, so use it if it exists, and fall back to current directory otherwise. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	6674e87842	test.py: extract ninja() use ninja() to build target using `ninja`. since CMake puts `build.ninja` under "build", while `configure.py` puts it under the root source directory, this change prepares us for a follow-up change to build with build/build.ninja. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	5fda822c4e	test.py: extract path_to() use path_to() to find the path to the directory under build directory. this change helps to find the executables built using CMake as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	0b11ae9fe6	test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE because scylla build mode and CMAKE_BUILD_TYPE is not identical, let's define `all_modes` as a dict so we can look it up. this change prepares for a follow-up commit which adds a path resolver which support both build system generator: the plain `configure.py` and CMake driven by `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Botond Dénes	f4f724921c	load_meter: get_load_map(): don't unconditionally dereference _lb Said method has a check on `_lb` not being null, before accessing it. However, since `0e5754a`, there was an unconditional access, adding an entry for the local node. Move this inside the if, so it is covered by the null-check. The only caller is the api (probably nodetool), the worst that can happend is that they get completely empty load-map if they call too early during startup. Fixes: #16617 Closes scylladb/scylladb#16659	2024-01-09 16:02:12 +03:00
Aleksandra Martyniuk	6b87778ef2	compaction: make regular compaction tasks internal Regular compaction tasks are internal. Adjust test_compaction_task accordingly: modify test_regular_compaction_task, delete test_running_compaction_task_abort (relying on regular compaction) which checks are already achived by test_not_created_compaction_task_abort. Rename the latter.	2024-01-09 13:13:54 +01:00
Aleksandra Martyniuk	6b2b384c83	tasks: don't keep internal root tasks after they complete	2024-01-09 13:13:54 +01:00
Pavel Emelyanov	cdf5124003	Merge 'tools/scylla-sstable: pass error handler to utils::config_file::read_from_file()' from Botond Dénes The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538 Closes scylladb/scylladb#16657 * github.com:scylladb/scylladb: tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() tools/scylla-sstable: allow always passing --scylla-yaml-file option	2024-01-09 14:28:49 +03:00
Kefu Chai	b91eb89ffa	gms: heart_beat_state: add formatter for gms::heart_beat_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for gms::heart_beat_state, and remove its operator<<(). the only caller site of its operator<< is updated to use `fmt::print()` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16652	2024-01-09 11:52:40 +02:00
Kefu Chai	cca786e847	gms: endpoint_state: fix a typo in comment Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16653	2024-01-09 11:51:49 +02:00
Kefu Chai	c1beba1f7d	utils: config_file: throw bpo::invalid_option_value() when seeing invalid option before this change, `std::invalid_argument` is thrown by `bpo::notify(configuration)` in `app_template::run_deprecated()` when invalid option is passed in via command line. `utils::named_value` throws `std::invalid_argument` if the given value is not listed in `_allowed_values`. but we don't handle `std::invalid_argument` in `app_template::run_deprecated()`. so the application aborts with unhandled exception if the specified argument is not allowed. in this change, we convert the `std::invalid_argument` to a derived class of `bpo::error` in the customized notify handler, so that it can be handled in `app_template::run_deprecated()`. because `name_value::operator()` is also used otherwhere, we should not throw a bpo::error there. so its exception type is preserved. Fixes #16687 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16688	2024-01-09 11:49:06 +02:00
Kefu Chai	a6152cb87b	sstables: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16666	2024-01-09 11:45:44 +02:00
Kefu Chai	be364d30fd	db: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16664	2024-01-09 11:44:19 +02:00
Aleksandra Martyniuk	6f13e55187	tasks: call release_resources when task is finished Call task_manager::task::impl::release_resources when task is finished instead of putting the responsibility on user. Closes scylladb/scylladb#16660	2024-01-09 11:41:54 +02:00
Pavel Emelyanov	cfeff893c6	network_topology_strategy: Print map of dc:rf pairs in one go The strategy constructor prints the dc:rf at the end making the sstring for it by hand. Modern fmt-based logger can format unordered_map-s on its own. The message would look slightly different though: Configured datacenter replicas are: foo:1 bar:2 into Configured datacenter replicas are: {"foo": 1, "bar": 2} Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16443	2024-01-09 11:30:49 +02:00
Kamil Braun	d93074e87e	cql3: don't parallelize select aggregates to local tables We've observed errors during shutdown like the following: ``` ERROR 2023-12-26 17:36:17,413 [shard 0:main] raft - [088f01a3-a18b-4821-b027-9f49e55c1926] applier fiber stopped because of the error: std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down) INFO 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft_state_monitor_fiber aborted with raft::stopped_error (Raft instance is stopped) ERROR 2023-12-26 17:36:17,413 [shard 0:strm] storage_service - raft topology: failed to fence previous coordinator raft::stopped_error (Raft instance is stopped, reason: "background error, std::_Nested_exception<raft::state_machine_error> (State machine error at raft/server.cc:1230): std::runtime_error (forward_service is shutting down)") ``` some CQL statement execution was trying to use `forward_service` during shutdown. It turns out that the statement is in `system_keyspace::load_topology_state`: ``` auto gen_rows = co_await execute_cql( format("SELECT count(range_end) as cnt FROM {}.{} WHERE key = '{}' AND id = ?", NAME, CDC_GENERATIONS_V3, cdc::CDC_GENERATIONS_V3_KEY), gen_uuid); ``` It's querying a table in the `system` keyspace. Pushing local table queries through `forward_service` doesn't make sense as the data is not distributed. Excluding local tables from this logic also fixes the shutdown error. Fixes scylladb/scylladb#16570 Closes scylladb/scylladb#16662	2024-01-08 14:44:22 -05:00
Kamil Braun	d4f4b58f3a	Merge 'topology_coordinator: reject removenode if the removed node is alive' from Patryk Jędrzejczak The removenode operation is defined to succeed only if the node being removed is dead. Currently, we reject this operation on the initiator side (in `storage_service::raft_removenode`) when the failure detector considers the node being removed alive. However, it is possible that even if the initiator considers the node dead, the topology coordinator will consider it alive when handling the topology request. For example, the topology coordinator can use a bigger failure detector timeout, or the node being removed can suddenly resurrect. This PR makes the topology coordinator reject removenode if the node being removed is considered alive. It also adds `test_remove_alive_node` that verifies this change. Fixes scylladb/scylladb#16109 Closes scylladb/scylladb#16584 * github.com:scylladb/scylladb: test: add test_remove_alive_node topology_coordinator: reject removenode if the removed node is alive test: ManagerClient: remove unused wait_for_host_down test: remove_node: wait until the node being removed is dead	2024-01-08 12:39:23 +01:00
Kamil Braun	d11e824802	Merge 'storage_service: make all Raft-based operations abortable' from Patryk Jędrzejczak During a shutdown, we call `storage_service::stop_transport` first. We may try to apply a Raft command after that, or still be in the the process of applying a command. In such a case, the shutdown process will hang because Raft retries replicating a command until it succeeds even in the case of a network error. It will stop when a corresponding abort source is set. However, if we pass `nullptr` to a function like `add_entry`, it won't stop. The shutdown process will hang forever. We fix all places that incorrectly pass `nullptr`. These shutdown hangs are not only theoretical. The incorrect `add_entry` call in `update_topology_state` caused scylladb/scylladb#16435. Additionally, we remove the default `nullptr` values in all member functions of `server` and `raft_group0_client` to avoid similar bugs in the future. Fixes scylladb/scylladb#16435 Closes scylladb/scylladb#16663 * github.com:scylladb/scylladb: server, raft_group0_client: remove the default nullptr values storage_service: make all Raft-based operations abortable	2024-01-08 11:30:56 +01:00
Botond Dénes	9119bcbd67	tools/scylla-sstable: pass error handler to utils::config_file::read_from_file() The default error handler throws an exception, which means scylla-sstable will exit with exception if there is any problem in the configuration. Not even ScyllaDB itself is this harsh -- it will just log a warning for most errors. A tool should be much more lenient. So this patch passes an error handler which just logs all errors with debug level. If reading an sstable fails, the user is expected to investigate turning debug-level logging on. When they do so, they will see any problems while reading the configuration (if it is relevant, e.g. when using EAR). Fixes: #16538	2024-01-08 02:18:15 -05:00
Botond Dénes	16791a63c9	tools/scylla-sstable: allow always passing --scylla-yaml-file option Currently, if multiple schema sources are provided, the tool complains about ambiguity, over which to consider. One of these option is --scylla-yaml-file. However, we want to allow passing this option any time, otherwise encrypted sstables cannot be read. So relax the multiple schema source check to also allow this option to be used even when e.g. --schema-file was used as the schema source.	2024-01-08 02:18:12 -05:00
Nadav Har'El	61395a3658	Update tools/java submodule * tools/java b7ebfd38...e106b500 (3): > build.xml: update scylla-driver-core to 3.11.5.1 > Use ReplicaOrdering.NEUTRAL in TokenAwarePolicy to respect RackAwareness > treewide: update "guava" package Refs https://github.com/scylladb/scylladb/pull/16491 Refs https://github.com/scylladb/scylla-tools-java/pull/372	2024-01-07 15:12:15 +02:00
Patryk Jędrzejczak	df2034ebd7	server, raft_group0_client: remove the default nullptr values The previous commit has fixed 5 bugs of the same type - incorrectly passing the default nullptr to one of the changed functions. At least some of these bugs wouldn't appear if there was no default value. It's much harder to make this kind of a bug if you have to write "nullptr". It's also much easier to detect it in review. Moreover, these default values are rarely used outside tests. Keeping them is just not worth the time spent on debugging.	2024-01-05 18:45:50 +01:00
Patryk Jędrzejczak	3d4af4ecf1	storage_service: make all Raft-based operations abortable During a shutdown, we call `storage_service::stop_transport` first. We may try to apply a Raft command after that, or still be in the the process of applying a command. In such a case, the shutdown process will hang because Raft retries replicating a command until it succeeds even in the case of a network error. It will stop when a corresponding abort source is set. However, if we pass `nullptr` to a function like `add_entry`, it won't stop. The shutdown process will hang forever. We fix all places that incorrectly pass `nullptr`. These shutdown hangs are not only theoretical. The incorrect `add_entry` call in `update_topology_state` caused scylladb/scylladb#16435.	2024-01-05 18:45:20 +01:00
Kefu Chai	7e84e03f52	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. because the removal of `#include "unimplemented.hh"`, `service/migration_manager.cc` misses the definition of `unimplemented::cause::VALIDATION`, so include the header where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16654	2024-01-05 13:37:08 +02:00
Nadav Har'El	94580df1c5	test/alternator: fix flaky test in test_filter_expression.py The test test_filter_expression.py::test_filter_expression_precedence is flaky - and can fail very rarely (so far we've only actually seen it fail once). The problem is that the test generates items with random clustering keys, chosen as an integer between 1 and 1 million, and there is a chance (roughly 2/10,000) that two of the 20 items happen to have the same key, so one of the items is "lost" and the comparison we do to the expected truth fails. The solution is to just use sequential keys, not random keys. There is nothing to gain in this test by using random keys. To make this test bug easy to reproduce, I temporarily changed random_i()'s range from 1,000,000 to 3, and saw the test failing every single run before this patch. After this patch - no longer using random_i() for the keys - the test doesn't fail any more. Fixes #16647 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16649	2024-01-04 21:36:40 +02:00
Kamil Braun	bf068dd023	Merge `handle error in cdc generation propagation during bootstrap` from Gleb Bootstrap cannot proceed if cdc generation propagation to all nodes fails, so the patch series handles the error by rolling the ongoing topology operation back. * 'gleb/raft-cdc-failure' of github.com:scylladb/scylla-dev: test: add test to check failure handling in cdc generation commit storage_service: topology coordinator: rollback on failure to commit cdc generation	2024-01-04 15:38:51 +01:00
Kamil Braun	f942bf4a1f	Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy Currently, `add_saved_endpoint` is called from two paths: One, is when loading states from system.peers in the join path (join_cluster, join_token_ring), when `_raft_topology_change_enabled` is false, and the other is from `storage_service::topology_state_load` when raft topology changes are enabled. In the later path, from `topology_state_load`, `add_saved_endpoint` is called only if the endpoint_state does not exist yet. However, this is checked without acquiring the endpoint_lock and so it races with the gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint state may already be populated. Since `add_saved_endpoint` applies local information about the endpoint state (e.g. tokens, dc, rack), it uses the local heart_beat_version, with generation=0 to update the endpoint states, and that is incompatible with changes applies via gossip that will carry the endpoint's generation and version, determining the state's update order. This change makes sure that the endpoint state is never update in `add_saved_endpoint` if it has non-zero generation. An internal error exception is thrown if non-zero generation is found, and in the only call site that might reach that state, in `storage_service::topology_state_load`, the caller acquires the endpoint_lock for checking for the existence of the endpoint_state, calling `add_saved_endpoint` under the lock only if the endpoint_state does not exist. Fixes #16429 Closes scylladb/scylladb#16432 * github.com:scylladb/scylladb: gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found storage_service: topology_state_load: lock endpoint for add_saved_endpoint raft_group_registry: move on_alive error injection to gossiper	2024-01-04 14:47:10 +01:00
qiulijuan2	7fa2c33ba1	replica: remove duplicated function calling set_skip_when_empty is duplicated of metric column_family_row_hits in replica/table.cc fix: #16582 Signed-off-by: qiulijuan2<qiulijuan2_yewu@cmss.chinamobile.com> Closes scylladb/scylladb#16581	2024-01-04 15:04:31 +02:00
Kefu Chai	ee28a1cf4b	build: enable -Wimplicit-int-float-conversion `a209ae15` addresses that last -Wimplicit-int-float-conversion warning in the tree, so we now have the luxury of enabling this warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16640	2024-01-04 12:45:23 +02:00
Botond Dénes	9f0bd62d78	test/cql-pytest: test_tools.py: add schema-loading tests for MV/SI	2024-01-04 03:20:17 -05:00
Botond Dénes	58d5339baa	test/cql-pytest: test_tools.py: extract some fixture logic to functions Namely, the fixture for preparing an sstable and the fixture for producing a reference dump (from an sstable). In the next patch we will add more similar fixtures, this patch enables them to share their core logic, without repeating code.	2024-01-04 03:20:17 -05:00
Botond Dénes	f7d59b3af0	test/cql-pytest: test_tools.py: extract common schema-loading facilities into base-class In the next patch, we want to add schema-load tests specific to views and indexes. Best to place these into a separate class, so extract the to-be-shared parts into a common base-class.	2024-01-04 03:20:17 -05:00
Botond Dénes	bea21657ec	tools/schema_loader: load_schema_from_schema_tables(): add support for MV/SI schemas The table information of MVs (either user-created, or those backing a secondary index) is stored in system_schema.views, not system_schema.tables. So load this table when system_schema.tables has no entries for the looked-up table. Base table schema is not loaded.	2024-01-04 03:20:17 -05:00
Botond Dénes	79a006d6a8	tools/schema_loader: load_one_schema_from_file(): add support for view/index schemas The underlying infrastructure (`load_schemas()`) already supports loading views and inedxes, extend this to said method. When loading a view/index, expect `load_schemas()` to return two schemas. The first is the base schema, the second is the view/index schema (this is validated). Only the latter is returned.	2024-01-04 03:20:17 -05:00
Botond Dénes	276bb16013	test/boost/schema_loader_test: add test for mvs and indexes	2024-01-04 03:20:17 -05:00
Botond Dénes	f5d4c1216e	tools/schema_loader: load_schemas(): implement parsing views/indexes from CQL Add support for processing cql3::statement::create_view_statement and cql3::statement::create_index_statement statements. The CQL text (usually a file) has to provide the definition of the base table, before the definition of the views/indexes.	2024-01-04 03:20:17 -05:00
Botond Dénes	94aac35169	replica/database: extract existing_index_names and get_available_index_name To standalone functions in index/secondary_index_manager.{hh,cc}. This way, alternative data dictionary implementations (in tools/schema_loader.cc), can also re-use this code without having to instantiate a database or resorting to copy-paste. The functions are slighly changed: there are some additional params added to cover for things not internally available in the database object. const sstring& is converted to std::string_view.	2024-01-04 03:20:17 -05:00
Kefu Chai	cf932888de	Update seastar submodule * seastar e0d515b6...70349b74 (33): > util/log: drop unused function > util/log, rpc, core: use compile-time formatting with fmtlib >= 8.0 > Fix edge case in memory sampler at OOM > exp/geo distribution benchmark > Additional allocation tests > Remove null pointer check on free hot path > Optimize final part of allocation hot path > Optimize zero size checking in allocator > memory: Optimize free fast path > memory: Optimize small alloc alloation path > memory: Limit alloc_sites size > memory: Add general comment about sampling strategy > memory: Use probabilistic sampler > util: Adapt memory sampler to seastar > util: Import Android Memory Sampler > memory: Use separate small pool for tracking sampled allocations > memory: Support enabling memory profiling at runtime > util/source_location-compat: mark `source_location::current()` consteval > build: use new behavior defined by CMP0155 when building C++ modules > circleci: build with C++20 modules enabled > seastar.cc: replace cryptopp with gnutls when building seastar modules > alien: include used header > seastar.cc: include used headers in the global purview > docker: install clang-tools-17 > net/tcp: generate a random src_port hashed to current shard if smp::count > 1 > net, websocket: replace Crypto++ calls with GnuTLS > README-DPDK.md: point user to DPDK's quick start guide > reactor: print fatal error using logger as well > Avoid ping-pong in spinlock::lock > memory: Add allocator perf tests > memory: Add a basic sized deletion test > Prometheus: Disable Prometheus protobuf with a configuration > treewide: bring back prometheus protobuf support * test/manual/sstable_scan_footprint_test: update to adapt to the breaking change of "memory: Use probabilistic sampler" in seastar Closes scylladb/scylladb#16610	2024-01-04 09:36:53 +02:00
Kefu Chai	47d8edc0fc	test.py: s/asyncio.get_event_loop()/asyncio.get_running_loop()/ the latter raises a RuntimeError if there is no no running event loop, while the former gets one from the the default policy in this case. in the use cases in test.py, there is always a running event loop, when `asyncio.get_event_loop()` gets called. so let's use the preferred `asyncio.get_running_loop()`. see https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16398	2024-01-04 08:39:49 +02:00
Botond Dénes	d9c30833ea	tools/schema_loader: make real_db.tables the only source of truth on existing tables Currently, we have `real_db.tables` and `schemas`, the former containing system tables needed to parse statements, and the latter accumulating user tables parsed from CQL. This will be error-prone to maintain with view/index support, so ditch `schemas` and instead add a `user` flag to `table` and accumulate all tables in `real_db.tables`. At the end, just return the schemas of all user tables.	2024-01-04 01:32:10 -05:00
Botond Dénes	ef3d143886	tools/schema_loader: table(): store const keyspace& No need for mutable reference, const ref makes life easier, because some lookup APIs of data_dictinary::database return const keyspace& only.	2024-01-04 01:32:10 -05:00
Botond Dénes	1003508066	tools/schema_loader: make database,keyspace,table non-movable These types contain self-references. Make sure they are not moved, not even accidentally.	2024-01-04 01:32:10 -05:00
Botond Dénes	1f7b03672c	cql3/statements/create_index_statement: build_index_schema(): include index metadata in returned value Scylla's schema tables code determines which index was added, by diffing index definitions with previous ones. This is clunky to use in tools/schema_loader.cc, so also return the index metadata for the newly created index.	2024-01-04 01:32:10 -05:00
Botond Dénes	94dbb7cb29	cql3/statements/create_index_statement: make build_index_schema() public tools/schema_builder.cc wants it.	2024-01-04 01:32:10 -05:00
Botond Dénes	039d41f5d4	cql3/statements/create_index_statement: relax some method's dependence on qp The methods `validate_while_excuting()` and its only caller, `build_index_schema()`, only use the query processor to get db from it. So replace qp parameter with db one, relaxing requirements w.r.t. callers.	2024-01-04 01:32:10 -05:00
Botond Dénes	5f42c2c7c4	cql3/statements/create_view_statement: make prepare_view() public tools/schema_loader.cc wants to use it.	2024-01-04 01:32:10 -05:00
Kefu Chai	50cf62e186	build: cmake: do not link against Boost::dynamic_linking Boost::dynamic_linking was introduced as a compatibility target which adds "BOOST_ALL_DYN_LINK" macro on Win32 platform. but since Scylla only runs on Linux, there is no need to link against this library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16544	2024-01-04 08:06:19 +02:00
Lakshmi Narayanan Sreethar	1d6eaf2985	compaction manager: remove: cleanup _compaction_state on exceptions If for some reason an exception is thrown in compaction_manager::remove, it might leave behind stale table pointers in _compaction_state. Fix that by setting up a deffered action to perform the cleanup. Fixes #16635 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16632	2024-01-03 22:03:24 +02:00
Benny Halevy	9e8998109f	gossiper: get_*_members_synchronized: acquire endpoint update semaphore To ensure that the value they return is synchronized on all shards. This got broken recently by `147f30caff`. Refs https://github.com/scylladb/scylladb/pull/16597#discussion_r1440445432 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16629	2024-01-03 17:41:46 +01:00
Michał Chojnowski	a209ae1573	cql3: type_json: fix an edge case in float-to-int conversion Refer to the added comment for details. This problem was found by a compiler warning, and I'm fixing it mainly to silence the warning. I didn't give any thought to its effects in practice. Fixes #13077 Closes scylladb/scylladb#16625 [avi: changed Refs to Fixes]	2024-01-03 17:59:01 +02:00
Kefu Chai	2ad532df43	test: randomized_nemesis_test: move std::variant formatter up we format `std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>` in this source file. and currently, we format `std::variant<..>` using the default-generated `fmt::formatter` from `operator<<`, so in order to format it using {fmt}'s compile-time check enabled, we have to make the `operator<<` overload for `std::variant<...>` visible from the caller sites which format `std::variant<...>` using {fmt}. in this change, the `operator<<` for `std::variant<...>` is moved to from the middle of the source file to the top of it, so that it can be found when the compiler looks up for a matched `fmt::formatter` for `std::variant<...>`. please note, we cannot use the `fmt::formatter` provided by `fmt/std.h`, as its specialization for `std::variant` requires that all the types of the variant is `is_formattable`. but the default generated formatter for type `T` is not considered as the proof that `T` is formattable. this should address the FTBFS with the latest seastar like: ``` /usr/include/fmt/core.h:2743:12: error: call to deleted constructor of 'conditional_t<has_formatter<mapped_type, context>::value, formatter<mapped_type, char_type>, fallback_formatter<stripped_type, char_type>>' (aka 'fmt::detail::fallback_formatter<std::variant<std::monostate, seastar::timed_out_error, raft::not_a_leader, raft::dropped_entry, raft::commit_status_unknown, raft::conf_change_in_progress, raft::stopped_error, raft::not_a_member>>') ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16616	2024-01-03 16:38:25 +01:00
Kefu Chai	2c394e3f6f	tablets: remove unused #includes the removed #include headers are not used, so let's drop their `#include`s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16619	2024-01-03 15:30:40 +01:00
Avi Kivity	20531872a7	Merge 'test: randomized_nemesis_test: add formatter for append_entry' from Kefu Chai we are using `seastar::format()` to format `append_entry` in `append_reg_model`, so we have to provide a `fmt::formatter` for these callers which format `append_entry`. despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined by fmt v9, we don't have it since fmt v10. so this change prepares us for fmt v10. Refs https://github.com/scylladb/scylladb/issues/13245 Closes scylladb/scylladb#16614 * github.com:scylladb/scylladb: test: randomized_nemesis_test: add formatter for append_entry test: randomized_nemesis_test: move append_reg_model::entry out	2024-01-03 15:06:33 +02:00
Kefu Chai	dde8f694f6	build: cmake: use # for line comment it was a copy-pasta error introduced by `2508d339`. the copyright blob was copied from a C++ source code, but the CMake language define the block comment is different from the C++ language. let's use the line comment of CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16615	2024-01-03 15:05:00 +02:00
Tomasz Grabiec	715e062d4a	Merge 'table, memtable: share log structured allocator statistics across all tablets in a table' from Avi Kivity In `7d5e22b43b` ("replica: memtable: don't forget memtable memory allocation statistics") we taught memtable_list to remember learned memory allocation reserves so a new memtable inherits these statistics from an older memtable. Share it now further across tablets that belong to the same table as well. This helps the statistics be more accurate for tablets that are migrated in, as they can share existing tablet's memory allocation history. Closes scylladb/scylladb#16571 * github.com:scylladb/scylladb: table, memtable: share log-structured allocator statistics across all memtables in a table memtable: consolidate _read_section, _allocating_section in a struct	2024-01-03 14:03:40 +01:00
Avi Kivity	b8a0e3543e	docs: ddl: document the initial_tablets replication strategy option While the feature is experimental, this makes it easier to experiment with it. An example is provided. Closes scylladb/scylladb#16193	2024-01-03 13:49:30 +01:00
Benny Halevy	147f30caff	gossiper: mutate_live_and_unreachable_endpoints: make exception safe Change the mutate_live_and_unreachable_endpoints procedure so that the called `func` would mutate a cloned `live_and_unreachable_endpoints` object in place. Those are replicated to temporary copies on all shards using `foreign<unique_ptr<>>` so that the would be automatically freed on exception. Only after all copies are made, they are applied on all gossiper shards in a noexcept loop and finally, a `on_success` function is called to apply further side effects if everything else was replicated successfully. The latter is still susceptible to exceptions, but we can live with those as long as `_live_endpoints` and `_unreachable_endpoints` are synchronized on all shards. With that, the read-only methods: `get_live_members_synchronized` and `get_unreachable_members_synchronized` become trivial and they just return the required data from shard 0. Fixes #15089 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16597	2024-01-03 14:46:10 +02:00
Benny Halevy	fadcef01f5	database: setup_scylla_memory_diagnostics_producer: replace infinity sign with `unlimited` string The infinity unicode sign used for dumping read concurrency semaphore state, `∞` may be misrendered. For example: https://jenkins.scylladb.com/job/scylla-master/job/dtest-release/451/artifact/logs-full.release.011/1703288463175_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_add_dc_during_mv_insert/node1.log ``` Read Concurrency Semaphores: user: 0/100, 1K/9M, queued: 0 streaming: 0/10, 0B/9M, queued: 0 system: 0/10, 0B/9M, queued: 0 compaction: 0/âˆž, 0B/âˆž ``` Instead, just print the word `unlimited`. This was introduced in `34c213f9bb` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16534	2024-01-03 14:46:10 +02:00
Kefu Chai	3e4159fece	repair: remove unused #include remove the unused #include headers from repair.hh, as they are not directly used. after this change, task_manager_module.hh fails to have access to stream_reason, so include it where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16618	2024-01-03 14:46:10 +02:00
Kefu Chai	1f4b5126f6	build: cmake: add comment explaining CMAKE_CXX_FLAGS_RELWITHDEBINFO to clarify why we need to set this flagset instead of appending to it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16546	2024-01-03 14:46:10 +02:00
Kefu Chai	3ef0345b7f	test/nodetool: log response from mock server when handling JSONDecodeError it's observed that the mock server could return something not decodable as JSON. so let's print out the response in the logging message in this case. this should help us to understand the test failure better if it surfaces again. Refs #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16543	2024-01-03 14:46:10 +02:00
Kefu Chai	0484ac46af	test: randomized_nemesis_test: add formatter for append_entry we are using `seastar::format()` to format `append_entry` in `append_reg_model`, so we have to provide a `fmt::formatter` for these callers which format `append_entry`. despite that, with FMT_DEPRECATED_OSTREAM, the formatter is defined by fmt v9, we don't have it since fmt v10. so this change prepares us for fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-03 08:38:43 +08:00
Kefu Chai	32e55731ab	test: randomized_nemesis_test: move append_reg_model::entry out this change prepares for adding fmt::formatter for append_entry. as we are using its formatter in the inline member functions of `append_reg_model`. but its `fmt::formatter` can only be specialized out of this class. and we don't have access to `format_as()` yet in {fmt} 9.1.0 which is shipped along with fedora38, which is in turn used for our base build image. so, in this change, `append_reg_model::entry` is extracted and renamed to `append_entry`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-03 08:38:43 +08:00
Sylwia Szunejko	91a5a41313	add a way to negotiate generation of the tablet info for drivers Tablets metadata is quite expensive to generate (each data_value is an allocation), so an old driver (without support for tablets) will generate huge amounts of such notifications. This commit adds a way to negotiate generation of the notification: a new driver will ask for them, and an old driver won't get them. It uses the OPTIONS/SUPPORTED/STARTUP protocol described in native_protocol_v4.spec. Closes scylladb/scylladb#16611	2024-01-02 20:00:50 +02:00
Kefu Chai	2508d33946	build: cmake: add Findcryptopp.cmake seastar dropped the dependency to Crypto++, and it also removed Findcryptopp.cmake from its `cmake` directory. but scylladb still depends on this library. and it has been using the `Findcryptopp.cmake` in seastar submodule for finding it. after the removal of this file, scylladb would not be able to use it anymore. so, we have to provide our own `Findcryptopp.cmake`. Findcryptopp.cmake is copied from the Seastar project. So its date of copyright is preserved. and it was licensed under Apache 2.0, since we are creating a derivative work from it. let's relicense it under Apache 2.0 and AGPL 3.0 or later. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16601	2024-01-02 19:09:50 +02:00
Kefu Chai	34259a03d0	treewide: use consteval string as format string when formatting log message seastar::logger is using the compile-time format checking by default if compiled using {fmt} 8.0 and up. and it requires the format string to be consteval string, otherwise we have to use `fmt::runtime()` explicitly. so adapt the change, let's use the consteval string when formatting logging messages. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16612	2024-01-02 19:08:47 +02:00
Kefu Chai	64a227fba0	alternator/auth: remove unused #include in `alternator/auth.cc`, none of the symbols in "query" namespace provided by the removed headers is used is used, so there is no need to include this header file. the same applies to other removed header files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16603	2024-01-02 17:50:59 +02:00
Kamil Braun	949658590f	Merge 'raft topology: do not update token metadata in on_alive and on_remove' from Patryk Jędrzejczak In the Raft-based topology, we should never update token metadata through gossip notifications. `storage_service::on_alive` and `storage_service::on_remove` do it, so we ignore their parts that touch token metadata. Additionally, we improve some logs in other places where we ignore the function because of using the Raft-based topology. Fixes scylladb/scylladb#15732 Closes scylladb/scylladb#16528 * github.com:scylladb/scylladb: storage_service: handle_state_left, handle_state_normal: improve logs raft topology: do not update token metadata in on_alive and on_remove	2024-01-02 16:08:50 +01:00
Kefu Chai	dd496afff3	mutation: add formatter for {atomic_cell_view,atomic_cell}::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for `atomic_cell_view::printer` and `atomic_cell::printer` respectively, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16602	2024-01-02 16:14:42 +02:00
Kamil Braun	7f6955b883	Merge 'test: make use of concurrent bootstrap' from Patryk Jędrzejczak In #16102, we added a test for concurrent bootstrap in the raft-based topology. This test was running in CI for some time and never failed. Now, we can believe that concurrent bootstrap is not bugged or at least the probability of a failure is very low. Therefore, we can safely make use of it in all tests using the raft-based topology. This PR: - makes all initial servers start concurrently in topology tests, - replaces all multiple `server_add` calls with a single `servers_add` call in tests using the raft-based topology, - removes no longer needed `test_concurrent_bootstrap`. The changes listed above: - make running tests a bit faster due to concurrent bootstraps, - make multiple tests test concurrent bootstrap previously tested by a single test. Fixes scylladb/scylladb#15423 Closes scylladb/scylladb#16384 * github.com:scylladb/scylladb: test: test_different_group0_ids: fix comments test: remove test_concurrent_bootstrap test: replace multiple server_add calls with servers_add test: ScyllaCluster: start all initial servers concurrently test: ManagerClient: servers_add: specify consistent-topology-changes assumption	2024-01-02 15:11:18 +01:00
Sylwia Szunejko	467d466f7e	put all tablet info into one field of custom_payload and update docs Previously, the tablet information was sent to the drivers in two pieces within the custom_payload. We had information about the replicas under the `tablet_replicas` key and token range information under `token_range`. These names were quite generic and might have caused problems for other custom_payload users. Additionally, dividing the information into two pieces raised the question of what to do if one key is present while the other is missing. This commit changes the serialization mechanism to pack all information under one specific name, `tablets-routing-v1`. From: Sylwia Szunejko <sylwia.szunejko@scylladb.com> Closes scylladb/scylladb#16148	2024-01-02 14:35:37 +02:00
Patryk Jędrzejczak	215534d527	test: test_different_group0_ids: fix comments The test disables consistent topology changes, not cluster management.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	466723a74f	test: remove test_concurrent_bootstrap This test only adds 3 nodes concurrently to the empty cluster. After making many other tests use ManagerClient.servers_add, it serves no purpose. We had added this test before we decided to use ManagerClient.servers_add in many tests to avoid multiple failures in CI if it turned out that the concurrent bootstrap is flaky with high frequency there. This test was running in CI for some time and never failed. Now, we can believe that concurrent bootstrap is not bugged or at least the probability of a failure is very low.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	a8513bd41b	test: replace multiple server_add calls with servers_add ManagerClient.servers_add can be used in every test that uses consistent topology changes. We replace all multiple server_add calls in such tests with a single servers_add call to make these tests faster and simplify their code. Additionally, these servers_add calls will test concurrent bootstraps for free.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	debf1db3ef	test: ScyllaCluster: start all initial servers concurrently Starting all initial servers concurrently makes tests in suites with initial_size > 1 run a bit faster. Additionally, these tests test concurrent bootstraps for free. add_servers can be called only if the cluster uses consistent topology changes. We can use this function unconditionally in install_and_start because every suite uses consistent topology changes by default. The only way to not use it is by adding all servers with a config that contains experimental_features without consistent-topology-changes.	2024-01-02 12:19:33 +01:00
Patryk Jędrzejczak	16b0eeb3d6	test: ManagerClient: servers_add: specify consistent-topology-changes assumption ManagerClient.servers_add can be called only if the cluster uses consistent topology changes. We add this specification to the leading comment.	2024-01-02 12:19:31 +01:00
Kefu Chai	f4bd86384b	install.sh: use a temporary file when packaging scylla.yaml we create a default `scylla.yaml` on the fly in `install.sh`. but the path to the temporary file holding the default yaml file is hardwired to `/tmp/scylla.yaml`. this works fine if we only have a single `install.sh` at a certain time point. but if we have multiple `install.sh` process running in parallel, these packaging jobs could step on each other when they create and remove the `scylla.yaml`. in this change, because the limit of `installconfig`, it always consider the "dest" parameter as a directory, `mktemp` is used for creating a parent directory of the temporary file. Fixes #16591 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16592	2024-01-01 21:50:29 +02:00
Kefu Chai	48b8544a63	.git: add skip more words and directories we use "ue" for the short of "update_expressions", before we change our minds and use a more readable name, let's add "ue" to the "ignore_word_list" option of the codespell. also, use the abslolute path in "skip" option. as the absolute paths are also used by codespell's own github workflow. and we are still observing codespell github workflow is showing the misspelling errors in our "test/" directory even we have it listed in "skip". so this change should silence them as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16593	2024-01-01 14:32:16 +02:00
Avi Kivity	8ba0decda5	Merge 'System.peers: enforce host_id' from Benny Halevy The HOST_ID is already written to system.peers since inception pretty much (See https://github.com/scylladb/scylladb/pull/16376#discussion_r1429248185 for details). However, it is written to the table using an individual CQL query and so it is not set atomically with other columns. If scylla crashes or even hits an exception before updating the host_id, then system.peers might be left in an inconsistent state, and in particular without no HOST_ID value. This series makes sure that HOST_ID is written to system.peers and use it to "seal" the record by upserting it in a single CQL BATCH query when adding the state for new nodes. On the read side, skip rows that have no HOST_ID state in system.peers, assuming they are incomplete, i.e. scylla got an exception or crashed while writing them, so they can't be trusted. With that change we can assume that endpoint state loaded from system.peers will always have a valid host_id. Refs https://github.com/scylladb/scylladb/pull/15903 Closes scylladb/scylladb#16376 * github.com:scylladb/scylladb: gms: endpoint_state: change application_state_map to std::unordered_map system_keyspace: update_peer_info: drop single-column overloads storage_service: drop do_update_system_peers_table storage_service: on_change: fixup indentation endpoint_state subscriptions: batch on_change notification everywhere: drop before_change subscription system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id system_keyspace: drop update_tokens(endpoint, tokens) overload storage_service: seal peer info with host_id storage_service: update_peer_info: pass peer_info to sys_ks gms: endpoint_state: define application_state_map system_keyspace: update_peer_info: use struct peer_info for all optional values query_processor: execute_internal: support unset values types: add data_value_list system_keyspace: get rid of update_cached_values storage_service: do not update peer info for this node	2023-12-31 21:22:04 +02:00
Benny Halevy	cdd5605d81	gms: endpoint_state: change application_state_map to std::unordered_map State changes are processed as a batch and there is no reason to maintain them as an ordered map. Instead, use a std::unordered_map that is more efficient. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	c520fc23f0	system_keyspace: update_peer_info: drop single-column overloads They are no longer used. Instead, all callers now pass peer_info. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	0e5a666e6f	storage_service: drop do_update_system_peers_table It is no longer used after previous patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	13d395fa6a	storage_service: on_change: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	ad8a9104d8	endpoint_state subscriptions: batch on_change notification Rather than calling on_change for each particular application_state, pass an endpoint_state::map_type with all changed states, to be processed as a batch. In particular, thise allows storage_service::on_change to update_peer_info once for all changed states. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	1d07a596bf	everywhere: drop before_change subscription None of the subscribers is doing anything before_change. This is done before changing `on_change` in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	7670f60b83	system_keyspace: load_tokens/peers/host_ids: enforce presence of host_id Skip rows that have no host_id to make sure the node state we load always has a valid host_id. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	74159bb5ae	system_keyspace: drop update_tokens(endpoint, tokens) overload It is unused now after the previous patch to update_peer_info in one call. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	2075c85b70	storage_service: seal peer info with host_id When adding a peer via update_peer_info, insert all columns in a single query using system_keyspace::peer_info. This ensures that `host_id` is inserted along with all other app states, so we can rely on it when loading the peer info after restart. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	eb4cd388ce	storage_service: update_peer_info: pass peer_info to sys_ks Use the newly added system_keyspace::peer_info to pass a struct of all optional system.peea members to system_keyspace::update_peer_info. Add `get_peer_info_for_update` to construct said struct from the endpoint state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	5abf556399	gms: endpoint_state: define application_state_map Have a central definition for the map held in the endpoint_state (before changing it to std::unordered_map). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	b2735d47f7	system_keyspace: update_peer_info: use struct peer_info for all optional values Define struct peer_info holding optional values for all system.peers columns, allowing the caller to update any column. Pass the values as std::vector<std::optional<data_value>> to query_processor::execute_internal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:30 +02:00
Benny Halevy	6123dc6b09	query_processor: execute_internal: support unset values Add overloads for execute_internal and friends accepting a vector of optional<data_value>. The caller can pass nullopt for any unset value. The vector of optionals is translated internally to `cql3::raw_value_vector_with_unset` by `make_internal_options`. This path will be called by system_keyspace::update_peer_info for updating a subset of the system.peers columns. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:21:35 +02:00
Benny Halevy	328ce23c78	types: add data_value_list data_value_list is a wrapper around std::initializer_list<data_value>. Use it for passing values to `cql3::query_processor::execute_internal` and friends. A following path will add a std::variant for data_value_or_unset and extend data_value_list to support unset values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:17:27 +02:00
Benny Halevy	3cba079b26	gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found Currently, when loading peers' endpoint state from system.peers, add_saved_endpoint is called. The first instance of the endpoint state is created with the default heart_beat_state, with both generation and version set to zero. However, if add_saved_endpoint finds an existing instance of the endpoint state, it reuses it, but it updates its heart_beat_state with the local heart_beat_state() rather than keeping the existing heart_beat_state, as it should. This is a problem since it may confuse updates over gossip later on via do_apply_state_locally that compares the remote generation vs. the local generation, so they must stem from the same root that is the endpoint itself. Fixes #16429 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 16:48:57 +02:00
Benny Halevy	3099c5b8ab	storage_service: topology_state_load: lock endpoint for add_saved_endpoint `topology_state_load` currently calls `add_saved_endpoint` only if it finds no endpoint_state_ptr for the endpoint. However, this is done before locking the endpoint and the endpoint state could be inserted concurrently. To prevent that, a permit_id parameter was added to `add_saved_endpoint` allowing the caller to call it while the endpoint is locked. With that, `topology_state_load` locks the endpoint and checks the existence of the endpoint state under the lock, before calling `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 16:48:57 +02:00
Benny Halevy	db434e8cb5	raft_group_registry: move on_alive error injection to gossiper Move the `raft_group_registry::on_alive` error injection point to `gossiper::real_mark_alive` so it can delay marking the endpoint as alive, and calling the `on_alive` callback, but without holding the endpoint_lock. Note that the entry for this endpoint in `_pending_mark_alive_endpoints` still blocks marking it as alive until real_mark_alive completes. Fixes #16506 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 15:28:54 +02:00
Konstantin Osipov	246da8884a	test.py: override SCYLLA_* env keys test.py inherits its env from the user, which is the right thing: some python modules, e.g. logging, do accept env-based configuration. However, test.py also starts subprocesses, i.e. tests, which start scylladb instances. And when the instance is started without an explicit configuration file, SCYLLA_CONF from user environment can be used. If this scylla.conf contains funny parameters, e.g. unsupported configuration options, the tests may break in an unexpected way. Avoid this by resetting the respecting env keys in test.py. Fixes gh-16583 Closes scylladb/scylladb#16577	2023-12-31 13:02:49 +02:00
Benny Halevy	85b3232086	system_keyspace: get rid of update_cached_values It's a no-op. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 10:10:51 +02:00
Benny Halevy	f64ecc2edf	storage_service: do not update peer info for this node system_keyspace had a hack to skip update_peer_info for the local node, and then to remove an entry for the local node in system.peers if `update_tokens(endpoint, ...)` was called for this node. This change unhacks system_keyspace by considering update of system.peers with the local address as an internal error and fixing the call sites that do that. Fixes #16425 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 10:10:51 +02:00
Patryk Jędrzejczak	da37e82fb9	test: add test_remove_alive_node We add a test for the Raft-based topology's new feature - rejecting the removenode operation on the topology coordinator side if the node being removed is considered alive by the failure detector. Additionally, the test tests a case when the removenode operation is rejected on the initiator side.	2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak	bd5ee04c18	topology_coordinator: reject removenode if the removed node is alive The removenode operation is defined to succeed only if the node being removed is dead. Currently, we reject this operation on the initiator side (in storage_service::raft_removenode) when the failure detector considers the node being removed alive. However, it is possible that even if the initiator considers the node dead, the topology coordinator will consider it alive when handling the topology request. For example, the topology coordinator can use a bigger failure detector timeout, or the node being removed can suddenly resurrect. This patch adds a check on the topology coordinator side. Note that the only goal of this change is to improve the user experience. The topology coordinator does not rely on the gossiper to ensure correctness.	2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak	cf955094c1	test: ManagerClient: remove unused wait_for_host_down The previous commit removed the only call to wait_for_host_down. Moreover, this function is identical to server_not_sees_other_server. We can safely remove it.	2023-12-29 17:12:46 +01:00
Patryk Jędrzejczak	7038a033f2	test: remove_node: wait until the node being removed is dead In the following commits, we make the topology coordinator reject removenode requests if the node being removed is considered alive by the gossiper. Before making this change, we need to adapt the testing framework so that we don't have flaky removenode operations that fail because the node being removed hasn't been marked as dead yet. We achieve this by waiting until all other running nodes see the node being removed as dead in all removenode operations. Some tests are simplified after this change because they don't have to call server_not_sees_other_server anymore.	2023-12-29 17:12:45 +01:00
Patryk Jędrzejczak	6ffacae0c7	storage_service: handle_state_left, handle_state_normal: improve logs We log the information about ignoring the `handle_state_left` function after logging the general entry information. It is better to know what exactly is being ignored during debugging. We also add the `permit_id` info to the log. All other functions called through gossip notifications log it.	2023-12-29 15:10:56 +01:00
Patryk Jędrzejczak	3e551ef485	raft topology: do not update token metadata in on_alive and on_remove In the Raft-based topology, we should never update token metadata through gossip notifications. `storage_service::on_alive` and `storage_service::on_remove` do it, so we ignore their parts that touch token metadata. There are other functions in storage_service called through gossip notifications that are not ignored in the Raft-based topology. However, we don't have to or cannot ignore them. We cannot ignore `on_join` and `on_change` because they update the PEERS table used by drivers. The rest of those functions don't have to be ignored. These are: - `before_change` - it does nothing, - `on_dead` and `on_restart` - they only remove the RPC client and send notifications, - `handle_state_bootstrap` and `handle_state_removed` - they are never called in the Raft-based topology.	2023-12-29 15:10:35 +01:00
Patryk Jędrzejczak	f1dea4bc8a	storage_proxy: do not fence reads and writes to local tables Fencing is necessary only for reads and writes to non-local tables. Moreover, fencing a read or write to a local table can cause an error on the bootstrapping node. It is explained in the comment in storage_proxy::get_fence. A scenario described in the comment has been reported in scylladb/scylladb#16423. A write to the local RAFT table failed because of fencing, and it killed server_impl::io_fiber. Fixes scylladb/scylladb#16423 Closes scylladb/scylladb#16525	2023-12-28 19:34:27 +02:00
Nadav Har'El	91636f6d21	test/cql-pytest: reproducer of slightly too strict parser of timestamp Scylla refuses the timestamp format "2014-01-01 12:15:45.0000000Z" that has 6 digits of precision for the fractional second, and only allows 3 digits of precision. This restriction makes sense - after all CQL timestamp columns (note - this is NOT "using timestamp"!) only have millisecond precision. Nevertheless, Cassandra does not have this restriction and does allow these over-precise timestamps. In this patch we add a test that demonstrates this difference. Curiously, in the past Scylla generated this forbidden timestamp format when outputting the timestamp to a string (e.g. toJson()), which it then couldn't read back! This was issue #16575. Today Scylla no longer generates this forbidden timestamp format. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16576	2023-12-28 19:01:25 +02:00
Takuya ASADA	7275b614aa	scylla_util.py: wait for apt operation on other processes apt_install() / apt_uninstall() may fail if background process running apt operation, such as unattended-upgrades. To avoid this, we need to add two things: 1. For apt-get install / remove, we need to option "DPkg::Lock::Timeout=-1" to wait for dpkg lock. 2. For apt-get update, there is no option to wait for cache lock. Therefore, we need to implement retry-loop to wait for apt-get update succeed. Fixes #16537 Closes scylladb/scylladb#16561	2023-12-28 19:00:36 +02:00
Takuya ASADA	331d9ce788	install.sh: fix scylla-server.service failure on nonroot mode On `3da346a86d`, we moved AmbientCapabilities to scylla-server.service, but it causes "Operation not permitted" on nonroot mode. It is because nonroot user does not have enough privilege to set capabilities, we need to disable the parameter on nonroot mode. Closes scylladb/scylladb#16574	2023-12-27 20:52:17 +02:00
Avi Kivity	6394854f04	Merge 'Some cleanups in tests for tablets + MV ' from Nadav Har'El This small series improves two things in the multi-node tests for tablet supports in materialized views: 1. The test for Alternator LSI, which "sometimes" could reproduce the bug by creating 10-node cluster with a random tablet distribution, is replaced by a reliable 2-node cluster which controls the tablet distribution. The new test also confirms that tablets are actually enabled in Alternator (reviewers of the original test noted it would be easy to pass the test if tablets were accidentally not enabled... :-)). 2. Simplify the tablet lookup code in the test to not go through a "table id", and lookup the table's (or view's) name directly (requires a full-table of the tablets table, but that's entirely reasonable in a test). The third patch in this series also fixes a comment typo discovered in a previous review. Closes scylladb/scylladb#16440 * github.com:scylladb/scylladb: materialized views: fix typo in comment test_mv_tablets: simplify lookup of tablets alternator, tablets: improve Alternator LSI tablets test	2023-12-27 20:18:14 +02:00
Gleb Natapov	e31f6893af	storage_service: topology coordinator: fix accessing outdated node in case of barrier failure When metadata barrier fails a guard is released and node becomes outdated. Failure handling path needs to re-take the guard and re-create the node before continuing. Fixes: #16568 Message-ID: <ZYxEm+SaBeFcRT8E@scylladb.com>	2023-12-27 18:40:10 +02:00
Avi Kivity	3ce0576a31	Merge 'Sanitize keyspace_metadata creation' from Pavel Emelyanov The amount of arguments needed to create ks metadata object is pretty large and there are many different ways it can be and it is created over the code. This set simplifies it for the most typical patterns. closes: #16447 closes: #16449 Closes scylladb/scylladb#16565 * github.com:scylladb/scylladb: schema_tables: Use new_keyspace() sugar keyspace_metadata: Drop vector-of-schemas argument from new_keyspace() keyspace_metadata: Add default value for new_keyspace's durable_writes keyspace_metadata: Pack constructors with default arguments	2023-12-27 17:15:04 +02:00
Botond Dénes	1647b29cba	tools/schema_loader: add db::config parameter to all load methods So that a single centrally managed db::config instance can be shared by all code requiring it, instead of creating local instances where needed. This is required to load schema from encrypted schema-tables, and it also helps memory consumption a bit (db::config consumes a lot of memory). Fixes: #16480 Closes scylladb/scylladb#16495	2023-12-27 16:28:38 +02:00
Nadav Har'El	e6dc9bca0d	Merge 'Profile dumping rest api support' from Eliran Sinvani This change is motivated by wanting to have code coverage reporting support. Currently the only way to get a profile dump in ScyllaDB is stopping it with SIGTERM, however, this doesn't suite all cases, more specifically: 1. In dtest, when some of the tests intentionally abruptly kill a node 2. In test.py, where we would like to distinguish (at least for now), graceful shutdown of ScyllaDB testing and teardown procedures (which currently kills the nodes). This mini series adds two changes: 1. It adds the support for profile dumping in ScyllaDB with rest api ('/system/dump_profile') 2. It adds the support for this API in test.py and also adds a call for it as part of the node stop procedure in a permissive way that will not fail the teardown or test if the call doesn't succeed for whatever reason - after this change, all current test.py suits except for pylib_test (expected) dumps profiles if instrumented and will be able to participate in coverage reporting. Refs #16323 Closes scylladb/scylladb#16557 * github.com:scylladb/scylladb: test.py: Dump coverage profile before killing a node rest api: Add an api for profile dumping	2023-12-27 12:06:39 +02:00
Eliran Sinvani	e49b3ffc89	test.py: Dump coverage profile before killing a node Up until now the only way to get a coverage profile was to shut down the ScyllaDB nodes gracefully (using SIGTERM), this means that the coverage profile was lost for every node that was killed abruptly (SIGKILL). This in turn would have been requiring us to shut down all nodes gracefully which is not something we set out to do. Here we use the rest API for dumping the coverage profile which will cause the most minimal impact possible on the test runs. If the dumping fails (due to the node doesn't support the API or due to a real error in dumping we ignore it as it is not part of the system we would like to test. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-27 07:17:26 +02:00
Eliran Sinvani	4c60804c4c	rest api: Add an api for profile dumping As part of code coverage support we need to work with dumped profiles for ScyllaDB executables. Those profiles are created on two occasions: 1. When an application exits notmaly (which will trigger __llvm_dump_profile registered in the exit hooks. 2. For ScyllaDB commit `d7b524cf10` introduced a manual call to __llvm_dump_profile upon receiving a SIGTERM signal. This commit adds a third option, a rest API to dump the profile. In addition the target file is logged and the counters are reset, which enables incremental dumping of the profile. Except for logging, if the executable is not instrumented, this API call becomes a no-op so it bears minimal risk in keeping it in our releases. Specifically for code coverage, the gain will be that we will not be required to change the entire test run to shut down clusters gracefully and this will cause minimal effect to the actual test behavior. The change was tested by manually triggering the API in with and without instrumentation as well as re triggering it with write permissions for the profile file disabled (to test fault tolerance). Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-27 07:06:54 +02:00
Avi Kivity	2a76065e3d	table, memtable: share log-structured allocator statistics across all memtables in a table The log-structured allocator collects allocation statistics (which it uses to manage memory reserves) in some objects kept in memtable_table_shared_data. Right now, this object is local to memtable_list, which itself is local to a tablet replica. Move it to table scope so different tablets in the shard share the statistics. This helps a newly-migrated tablet adjust more quickly.	2023-12-26 21:24:51 +02:00
Avi Kivity	02111d6754	memtable: consolidate _read_section, _allocating_section in a struct Those two members are passed from memtable_list to memtable. Since we wish to pass them from table, it becomes awkward to pass them as two separate variables as their contents are specific to memtable internals. Wrap them in a name that indicates their role (being table-wide shared data for memtables) and pass them as a unit.	2023-12-26 21:11:48 +02:00
Nadav Har'El	fc71c34597	Merge 'select statement: verify EXECUTE permissions only for non native functions' from Eliran Sinvani Commit `62458b8e4f` introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs. The code does not distinguish between the two so the permissions are also unintenionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions. Fixes #16526 Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions. Also added test that checks for regression on native functions execution and verified that it fails on authorization before the fix and passes after the fix. Closes scylladb/scylladb#16556 * github.com:scylladb/scylladb: test.py: Add test for native functions permissions select statement: verify EXECUTE permissions only for non native functions	2023-12-26 18:14:21 +02:00
Gleb Natapov	74d17719db	test: add test to check failure handling in cdc generation commit	2023-12-26 16:01:34 +02:00
Gleb Natapov	21063b80fb	storage_service: topology coordinator: rollback on failure to commit cdc generation If the coordinator fail to notify all nodes about new cdc generation during bootstrap it cannot proceed booting since it can cause data lose with cdc. Rollback the topology operation if failure happens during this state.	2023-12-26 15:58:15 +02:00
Pavel Emelyanov	129196db98	schema_tables: Use new_keyspace() sugar The create_keyspace_from_schema_partition code creates ks metadata without schemas and user-types. There's new_keyspace() convenience helper for such cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 13:26:58 +03:00
Pavel Emelyanov	a1ad2571fc	keyspace_metadata: Drop vector-of-schemas argument from new_keyspace() It's only testing code that wants to call new_keyspace with existing schemas, all the other callers either construct the ks metadata directly, or use convenience new_keyspace with explicitly empty schemas. By and large it's nicer if new_keyspace() doesn't requires this argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 13:00:44 +03:00
Pavel Emelyanov	ffdafe4024	keyspace_metadata: Add default value for new_keyspace's durable_writes Almost all callers call new_keyspace with durable writes ON, so it's worth having default value for it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 11:47:37 +03:00
Pavel Emelyanov	9ab0065796	keyspace_metadata: Pack constructors with default arguments There's a cascade of keyspace_metadata constructors each adding one default argument to the prevuous one. All this can be expressed shorter with the help of native default argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-26 11:41:01 +03:00
Eliran Sinvani	a336550041	test.py: Add test for native functions permissions Native functions (non UDF/UDA functions), should be usable even if a user is not granted EXECUTE permissions on them. This is a regression test that was added following: https://github.com/scylladb/scylladb/issues/16526 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-26 10:27:04 +02:00
Eliran Sinvani	cac79977d6	select statement: verify EXECUTE permissions only for non native functions Commit `62458b8e4f` introduced the enforcement of EXECUTE permissions of functions in cql select. However, according to the reference in #12869, the permissions should be enforced only on UDFs and UDAs. The code does not distinguish between the two so the permissions are also unintentionally enforced also on native function. This commit introduce the distinction and only enforces the permissions on non native functions. Fixes #16526 Manually verified (before and after change) with the reproducer supplied in #16526 and also with some the `min` and `max` native functions. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-12-26 10:27:04 +02:00
Avi Kivity	3968fc11bf	Merge 'cql: fix regression in SELECT * GROUP BY' from Nadav Har'El This short series fixes a regression from Scylla 5.2 to Scylla 5.4 in "SELECT * GROUP BY" - this query was supposed to return just a single row from each partition (the first one in clustering order), but after the expression rewrite started to wrongly return all rows. The series also includes a regression test that verifies that this query works doesn't work correctly before this series, but works with this patch - and also works as expected in Scylla 5.2 and in Cassadra. Fixes #16531. Closes scylladb/scylladb#16559 * github.com:scylladb/scylladb: test/cql-pytest: check that most aggregators don't take "" cql-pytest: add reproducer for GROUP BY regression cql: fix regression in SELECT GROUP BY	2023-12-25 19:53:55 +02:00
Avi Kivity	3da346a86d	Merge 'Drop CentOS7 specific codes' from Takuya ASADA Since we decided to drop CentOS7 support from latest version of Scylla, now we can drop CentOS7 specific codes from packaging scripts and setup scripts. Related scylladb/scylla-enterprise#3502 Closes scylladb/scylladb#16365 * github.com:scylladb/scylladb: scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+ dist: drop legacy control group parameters scylla-server.slice: Drop workaround for MemorySwapMax=0 bug dist: move AmbientCapabilities to scylla-server.service Revert "scylla_setup: add warning for CentOS7 default kernel" [avi: CentOS 7 reached EOL on June 2024]	2023-12-25 18:25:05 +02:00
Kefu Chai	68c98d2203	build: cmake: link against boost static when --static-boost is specified `--static-boost` is an option provided by `configure.py`. this option is not used by our CI or building scripts. but in order to be compatible with the existing behavior of `configure.py`, let's support this option when building with CMake. `Boost_USE_STATIC_LIBS` is a cmake variable supported by CMake's FindBoost and Boost's own `BoostConfig.cmake`. see https://cmake.org/cmake/help/latest/module/FindBoost.html#other-variables by default boost is linked via its shared libraries. by setting this variable, we link boost's static libraries. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16545	2023-12-25 18:23:49 +02:00
Avi Kivity	da022ca4e8	Merge 'build: cmake: add "mode_list" target ' from Kefu Chai scylla uses build modes like "debug" and "release" to differentiate different build modes. while we intend to use the typical build configurations / build types used by CMake like "Debug" and "RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and CMAKE_BUILD_TYPE. the former is used for naming the build directory and for the preprocess macro named "SCYLLA_BUILD_MODE". `test.py` and scylladb's CI are designed based on the naming of build directory. in which, `test.py` lists the build modes using the dedicated build target named `list_modes`, which is added by `configure.py`. so, in this change, the target is added to CMake as well. the variables of "scylla_build_mode" defined by the per-mode configuration are collected and printed by the `list_modes`. because, by default, CMake generates a target for each build configuration when a multi-config generator is used. but we only want to print the build mode for a single time when "list_modes" is built. so a "BYPRODUCTS" is deliberately added for the target, and the patch of this "BYPRODUCTS" is named without the "$<CONFIG>" it its path. Closes scylladb/scylladb#16532 * github.com:scylladb/scylladb: build: cmake: add "mode_list" target build: cmake: define scylla_build_mode	2023-12-25 18:20:34 +02:00
Kefu Chai	4a817f8a2a	data_dictionary: use insert_or_assign() when appropriate when compiling clang-18 in "release" mode, `assert()` is optimized out. so `i` is not used. and clang complains like: ``` /home/kefu/dev/scylladb/data_dictionary/user_types_metadata.hh:29:14: error: unused variable 'i' [-Werror,-Wunused-variable] 29 \| auto i = _user_types.find(type->_name); \| ^ ``` in this change, we use `i` as the hint for the insertion, for two reasons: - silence the warning. - avoid the looking up in the unordered_map twice with the same key. `type` is not moved away when being passed to `insert_or_assign()`, because otherwise, `type->_name` could be referencing a moved-away shared_ptr, because the order of evaluating a function's parameter is not determined. since `type` is a shared_ptr, the overhead is negligible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16530	2023-12-25 18:18:20 +02:00
Takuya ASADA	0b894a7cac	locator::ec2_snitch: change retry logic to exponential backoff Since Amazon recommended to use exponential backoff logic when retries to call AWS API, we should switch the logic on ec2_snitch. see https://docs.aws.amazon.com/general/latest/gr/api-retries.html Related with #12160 Closes scylladb/scylladb#13442	2023-12-25 18:17:23 +02:00
Yaron Kaikov	8917947f29	build_docker: Add `description` and `summary` labels Adding description and summary labels to our docker images per @tzach and @mykaul request, Closes scylladb/scylladb#16419	2023-12-25 18:14:56 +02:00
Pavel Emelyanov	ac3dd4bf5d	test: Coroutinize some secondary_index_test cases Now they are long then-chains that are hard to read Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16547	2023-12-25 18:08:19 +02:00
Nadav Har'El	55317666c6	test/cql-pytest: check that most aggregators don't take "" Although you can "SELECT COUNT()", this has special handling in the CQL parser (it is converted into a special row-counting request) and you can't give "" to other aggregators - e.g., "SELECT SUM()". This patch includes a simple test that confirms this. I wanted to check this in relation to the previous patch, which did, sort of, a "SELECT $$first$$(*)" - a syntax which this test shows wouldn't have actually worked if we tried it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-25 17:53:42 +02:00
Nadav Har'El	e2773b4a3a	cql-pytest: add reproducer for GROUP BY regression test/cql-pytest/test_group_by.py has tests that verifies that requests like SELECT p,c1,c2,v FROM tbl WHERE p=0 GROUP BY p work as expected - the "GROUP BY p" means in this case that we should only return the first row in the p=0 partition. As a user discovered, it turns out that the almost identical request: SELECT * FROM tbl WHERE p=0 GROUP BY p Doesn't work the same - before the fix in the previous patch, it erroneously returned all rows in p=0, not just the first one. The test in this patch demonstrates this - it fails on Scylla 5.4, passes on Scylla 5.2 and on Cassandra - and passes when the fix from the previous patch is used. This patch includes another tiny test, to check the interaction of GROUP BY with filtering. This second test passes on Scylla - but I want it in anyway because it is yet another interaction that might break (the user that reported #16531 also had filtering, and I was worried it might have been related). Refs #16531 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-25 17:53:42 +02:00
Nadav Har'El	1aea2136c8	cql: fix regression in SELECT * GROUP BY Recently, the expression-rewrite effort changed the way that GROUP BY is implemented. Usually GROUP BY involves an aggregation function (e.g., if you want a separate SUM per partition). But there's also a query like SELECT p, c1, c2, v FROM tbl GROUP BY p This query is supposed to return one row - the first row in clustering order - per group (in this case, partition). The expression rewrite re-implemented this feature by introducing a new internal aggregator, first(), which returns the first aggregated value. The above query is rewritten into: SELECT first(p), first(c1), first(c2), first(v) FROM tbl GROUP BY p This case works correctly, and we even have a regression test for it. But unfortunately the rewrite broke the following query: SELECT * FROM tbl GROUP BY p Note the "" instead of the explicit list of columns. In our implementation, a selection of "" is looks like an empty selection, and it didn't get the "first()" treatment and it remained a "SELECT " - and wrongly returned all rows instead of just the first one in each partition. This was a regression - it worked correctly in Scylla 5.2 (and also in Cassandra) - see the next patch for a regression test. In this patch we fix this regression. When there is a GROUP BY, the "" is rewritten to the appropriate list of all visible columns and then gets the first() treatment, so it will return only the first row as expected. The next patch will be a test that confirms the bug and its fix. Fixes #16531 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-25 17:52:57 +02:00
Avi Kivity	a7efaca878	Merge 'Move initial_tablets to system_schema.scylla_keyspaces' from Pavel Emelyanov Right now the initial_tablets is kept as replication strategy option in the legacy system_schema.keyspaces table. However, r.s. options are all considered to be replication factors, not anything else. Other than being confusing, this also makes it impossible to extend keyspace configuration with non-integer tablets-related values. This PR moves the initial_tablets into scylla-specific part of the schema. This opens a way to more ~~ugly~~ flexible ways of configuring tablets for keyspace, in particular it should be possible to use boolean on/off switch in CREATE KEYSPACE or some other trick we find appropriate. Mos of what this PR does is extends arguments passed around keyspace_metadata and abstract_replication_strategy. The essence of the change is in last patches * schema_tables: Relax extract_scylla_specific_ks_info() check * locator,schema: Move initial tablets from r.s. options to params refs: #16319 refs: #16364 Closes scylladb/scylladb#16555 * github.com:scylladb/scylladb: test: Add sanity tests for tablets initialization and altering locator,schema: Move initial tablets from r.s. options to params schema_tables: Relax extract_scylla_specific_ks_info() check locator: Keep optional initial_tablets on r.s. params ks_prop_defs: Add initial_tablets& arg to prepare_options() keyspace_metadata: Carry optional<initial_tablets> on board locator: Pass abstract_replication_strategy& into validate_tablet_options() locator: Carry r.s. params into process_tablet_options() locator: Call create_replication_strategy() with r.s. params locator: Wrap replication_strategy_config_options into replication_strategy_params locator: Use local members in ..._replication_strategy constructors	2023-12-25 17:44:10 +02:00
Pavel Emelyanov	1d2c871219	test: Add sanity tests for tablets initialization and altering Check that the initial_tablets appears in system_schema.scylla_keyspaces if turned on explicitly Check that it's possible to change initial_tablets with ALTER KEYSPACE Check that changing r.s. from simple to network-topology doesn't activate tablets Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:09:01 +03:00
Pavel Emelyanov	c43501d973	locator,schema: Move initial tablets from r.s. options to params The option is kepd in DDL, but is _not_ stored in system_schema.keyspaces. Instead, it's removed from the provided options and kept in scylla_keyspaces table in its own column. All the places that had optional initial_tablets disengaged now set this value up the way the find appropriate. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:07:10 +03:00
Pavel Emelyanov	30e7273658	schema_tables: Relax extract_scylla_specific_ks_info() check Nowadays reading scylla-specific info from schema happens under respective schema feature. However (at least in raft case) when a new node joins the cluster merging schema for the first time may happen _before_ features are merged and enabled. Thus merging schema can go the wrong way by errorneously skipping the scylla-specific info. On the other hand, if system_schema.scylla_keyspaces is there it's there, there's no reason _not_ to pick this data up in that case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:05:01 +03:00
Pavel Emelyanov	562fcf0c19	locator: Keep optional initial_tablets on r.s. params Now all the callers have it at hands (spoiler: not yet initialized, but still) so the params can also have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:02:41 +03:00
Pavel Emelyanov	2d480a2093	ks_prop_defs: Add initial_tablets& arg to prepare_options() The prepare_options() method is in charge of pre-tuning the replication strategy CQL parameters so that real keyspace and r.s. creation code doesn't see some of those. The "initial_tablets" option is going to be removed from the real options and be placed into scylla-specific part of the schema. So the prepare_options() will need to modify both -- the legacy options _and_ the (soon to be separate) initial_tablets thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:00:50 +03:00
Pavel Emelyanov	a67c535539	keyspace_metadata: Carry optional<initial_tablets> on board The object in question fully describes the keyspace to be created and, among other things, contains replication strategy options. Next patches move the "initial_tablets" option out of those options and keep it separately, so the ks metadata should also carry this option separately. This patch is _just_ extending the metadata creation API, in fact the new field is unused (write-only) so all the places that need to provide this data keep it disengaged and are explicitly marked with FIXME comment. Next patches will fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:58:05 +03:00
Pavel Emelyanov	45f4276de6	locator: Pass abstract_replication_strategy& into validate_tablet_options() It will need to check if the r.s. in question had been marked as per-table one in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:56:49 +03:00
Pavel Emelyanov	bf824d79d9	locator: Carry r.s. params into process_tablet_options() The latter method is the one that will need extended params in next patches. It's called from network_topology_strategy() constructor which already has params at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:56:02 +03:00
Pavel Emelyanov	a943bd927b	locator: Call create_replication_strategy() with r.s. params Previous patch added params to r.s. classes' constructors, but callers don't construct those directly, instead they use the create_r.s.() wrapper. This patch adds params to the wrapper too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:54:59 +03:00
Pavel Emelyanov	f88ba0bf5a	locator: Wrap replication_strategy_config_options into replication_strategy_params When replication strategy class is created caller parr const reference on the config options which is, in turn, a map<string, string>. In the future r.s. classes will need to get "scylla specific" info along with legacy options and this patch prepares for that by passing more generic params argument into constructor. Currently the only inhabitant of the new params is the legacy options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:53:03 +03:00
Pavel Emelyanov	ecbafd81f2	locator: Use local members in ..._replication_strategy constructors The `config_options` arg had been used to initialize `_config_options` field of the base abstract_replication_strategy class, so it's more idiomatic to use the latter. Also it makes next patches simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:51:51 +03:00
Pavel Emelyanov	f621afa3ec	database: Copy storage options too when updating keyspace metadata When altering a keyspace several keyspace_metadata objects are created along the way. The last one, that is then kept on the keyspace_metadata object, forgets to get its copy of storage options thus transparently converting to LOCAL type. The bug surfaces itself when altering replication strategy class for S3-backed storage -- the 2nd attempt fails, because after the 1st one the keyspace_metadata gets LOCAL storage options and changing storage options is not allowed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16524	2023-12-25 13:31:15 +02:00
Benny Halevy	060b16f987	view: apply_to_remote_endpoints: fix use-after-free `b815aa021c` added a yield before the trace point, causing the moved `frozen_mutation_and_schema` (and `inet_address_vector_topology_change`) to drop out of scope and be destroyed, as the rvalue-referenced objects aren't moved onto the coroutine frame. This change passes them by value rather than by rvalue-reference so they will be stored in the coroutine frame. Fixes #16540 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16541	2023-12-24 21:43:48 +02:00
Botond Dénes	da033343b7	tools/schema_loader: read_schema_table_mutation(): close the reader The reader used to read the sstables was not closed. This could sometimes trigger an abort(), because the reader was destroyed, without it being closed first. Why only sometimes? This is due to two factors: * read_mutation_from_flat_mutation_reader() - the method used to extract a mutation from the reader, uses consume(), which does not trigger `set_close_is_required()` (#16520). Due to this, the top-level combined reader did not complain when destroyed without close. * The combined reader closes underlying readers who have no more data for the current range. If the circumstances are just right, all underlying readers are closed, before the combined reader is destoyed. Looks like this is what happens for the most time. This bug was discovered in SCT testing. After fixing #16520, all invokations of `scylla-sstable`, which use this code would trigger the abort, without this patch. So no further testing is required. Fixes: #16519 Closes scylladb/scylladb#16521	2023-12-24 17:21:32 +02:00
Nadav Har'El	6640278aa7	materialized views: fix typo in comment Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-24 10:12:44 +02:00
Nadav Har'El	f9f20e779c	test_mv_tablets: simplify lookup of tablets The tests looked up a table's tablets in an elaborate two-stage search - first find the table's "id", and then look up this id in the list of tablets. It is much simpler to just look up the table's name directly in the list of tablets - although this name is not a key, an ALLOW FILTERING search is good enough for a test. As a bonus, with the new technique we don't care if the given name is the name of a table or a view, further simplifying the test. This is just a test code cleanup - there is no functional change in the test. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-24 10:12:44 +02:00
Nadav Har'El	cdd5b19f12	alternator, tablets: improve Alternator LSI tablets test The test test_tablet_alternator_lsi_consistency, checking that Alternator LSI allow strongly-consistent reads even with tablets, used a large cluster (10 nodes), to improve the chance of reaching an "unlucky" tablet placement - and even then only failed in about half the runs without the code fixed. In this patch, we rewrite the test using a much more reliable approach: We start only two nodes, and force the base's tablet onto one node, and the view's table onto the second node. This ensures with 100% certainty that the view update is remote, and the new test fails every single time before the code fix (I reverted the fix to verify) - and passes after it. The new test is not only more reliable, it's also significantly faster because it doesn't need to start a 10-node cluster. We can also remove the tag that excluded this test from debug build mode tests because the 10-node boot was too slow. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-24 10:11:43 +02:00
Kefu Chai	2bec6751d3	build: cmake: add "mode_list" target scylla uses build modes like "debug" and "release" to differentiate different build modes. while we intend to use the typical build configurations / build types used by CMake like "Debug" and "RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and CMAKE_BUILD_TYPE. the former is used for naming the build directory and for the preprocess macro named "SCYLLA_BUILD_MODE". `test.py` and scylladb's CI are designed based on the naming of build directory. in which, `test.py` lists the build modes using the dedicated build target named `list_modes`, which is added by `configure.py`. so, in this change, the target is added to CMake as well. the variables of "scylla_build_mode" defined by the per-mode configuration are collected and printed by the `list_modes`. because, by default, CMake generates a target for each build configuration when a multi-config generator is used. but we only want to print the build mode for a single time when "list_modes" is built. so a "BYPRODUCTS" is deliberately added for the target, and the patch of this "BYPRODUCTS" is named without the "$<CONFIG>" it its path. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-24 12:35:02 +08:00
Kefu Chai	79943e0516	build: cmake: define scylla_build_mode scylla uses build modes like "debug" and "release" to differentiate different build modes. while we intend to use the typical build configurations / build types used by CMake like "Debug" and "RelWithDebInfo" for naming CMAKE_CONFIGURATION_TYPES and CMAKE_BUILD_TYPE. the former is used for naming the build directory and for the preprocess macro named "SCYLLA_BUILD_MODE". `test.py` and scylladb's CI are designed based on the naming of build directory. in which, `test.py` lists the build modes using the dedicated build target named "list_modes", which is added by `configure.py`. so, in this change, to prepare for adding the target, "scylla_build_mode" is defined, so we can reuse it in a following-up change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-24 12:28:23 +08:00
Tomasz Grabiec	2590274f95	Merge 'Don't allow ALTER KEYSPACE to change replication strategy vnode/per-table flavor' from Pavel Emelyanov This switch is currently possible, but results in not supported keyspace state Closes scylladb/scylladb#16513 * github.com:scylladb/scylladb: test: Add a test that switching between vnodes and tablets is banned cql3/statements: Don't allow switching between vnode and per-table replication strategies cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate	2023-12-22 17:22:36 +01:00
Kefu Chai	642652efab	test/cql-pytest/test_tools.py: test shard-of with a single partition test_scylla_sstable_shard_of takes lots of time preparing the keys for a certain shard. with the debug build, it takes 3 minutes to complete the test. so in order to test the "shard-of" subcommand in an more efficient way, in this change, we improve the test in two ways: 1. cache the output of 'scylla types shardof`. so we can avoid the overhead of running a seastar application repeatly for the same keys. 2. reduce the number of partitions from 42 to 1. as the number of partitions in an sstable does not matter when testing the output of "shard-of" command of a certain sstable. because, the sstable is always generated by a certain shard. before this change, with pytest-profiling: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 181.950 60.650 runner.py:219(call_and_report) 4/3 0.000 0.000 181.948 60.649 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 181.948 60.649 runner.py:318(from_call) 4/3 0.000 0.000 181.948 60.649 runner.py:262(<lambda>) 44/11 0.000 0.000 181.935 16.540 _hooks.py:427(__call__) 43/11 0.000 0.000 181.935 16.540 _manager.py:103(_hookexec) 43/11 0.000 0.000 181.935 16.540 _callers.py:30(_multicall) 361 0.001 0.000 181.531 0.503 contextlib.py:141(__exit__) 782/81 0.001 0.000 177.578 2.192 {built-in method builtins.next} 1044 0.006 0.000 92.452 0.089 base_events.py:1894(_run_once) 11 0.000 0.000 91.129 8.284 fixtures.py:686(<lambda>) 17/11 0.000 0.000 91.129 8.284 fixtures.py:1025(finish) 4 0.000 0.000 91.128 22.782 fixtures.py:913(_teardown_yield_fixture) 2/1 0.000 0.000 91.055 91.055 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 91.055 91.055 runner.py:119(runtestprotocol) 2 0.000 0.000 91.052 45.526 conftest.py:50(cql) 2 0.000 0.000 91.040 45.520 util.py:161(cql_session) 1 0.000 0.000 91.040 91.040 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 91.040 91.040 runner.py:509(teardown_exact) 1945 0.002 0.000 90.722 0.047 events.py:82(_run) ``` after this change: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 8.271 2.757 runner.py:219(call_and_report) 44/11 0.000 0.000 8.270 0.752 _hooks.py:427(__call__) 44/11 0.000 0.000 8.270 0.752 _manager.py:103(_hookexec) 44/11 0.000 0.000 8.270 0.752 _callers.py:30(_multicall) 4/3 0.000 0.000 8.269 2.756 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 8.269 2.756 runner.py:318(from_call) 4/3 0.000 0.000 8.269 2.756 runner.py:262(<lambda>) 48 0.000 0.000 8.269 0.172 {method 'send' of 'generator' objects} 27 0.000 0.000 5.671 0.210 contextlib.py:141(__exit__) 11 0.000 0.000 4.297 0.391 fixtures.py:686(<lambda>) 2/1 0.000 0.000 4.228 4.228 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 4.228 4.228 runner.py:119(runtestprotocol) 2 0.000 0.000 4.213 2.106 capture.py:877(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:509(teardown_exact) 2 0.000 0.000 3.628 1.814 capture.py:872(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 runner.py:160(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 python.py:1797(runtest) 114/81 0.001 0.000 3.505 0.043 {built-in method builtins.next} 15 0.784 0.052 3.183 0.212 subprocess.py:417(check_output) ``` Fixes #16516 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16523	2023-12-22 15:20:03 +02:00
Petr Gusev	c05fd8c018	storage_service: node_ops_cmd_handler: decommission rollback, ignore the node if's already removed This is a regression after #15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes #16466 Closes scylladb/scylladb#16508	2023-12-22 12:42:38 +01:00
Avi Kivity	6f6170aae7	Update seastar submodule * seastar ae8449e04f...e0d515b6cf (18): > reactor: poll less frequently in debug mode > build: s/exec_program/execute_process/ > Merge 'httpd: support temporary redirect from inside async reply' from Noah Watkins > Merge 'core: enable seastar to run multiple times in a single process' from Kefu Chai > rpc/rpc_types: add formatter for rpc::optional<T> > memory: do not set_reclaim_hook if cpu_mem_ptr is not set > circleci: do not set disable dpdk explicitly > fair_queue: Do not pop unplugged class immediately > build: install Finducontext.cmake and FindSystem-SDT.cmake > treewide: include used headers > build: define SEASTAR_COROUTINES_ENABLED for Seastar module > seastar.cc: include "core/prefault.hh" > build: enable build C++20 modules with GCC 14 > build: replace seastar_supports_flag() with check_cxx_compiler_flag() > Merge 'build: cleanups configure.py to be more PEP8 compatible' from Kefu Chai > circleci: build with dpdk enabled > build: add "--enable-cxx-modules" option to configure.py > build: use a different *_CMAKE_API for CMake 3.27 Closes scylladb/scylladb#16500	2023-12-22 12:58:39 +02:00
Tzach Livyatan	45ffa5221e	Improve nodetool scrub definition fix #16505 Closes scylladb/scylladb#16518	2023-12-22 12:09:58 +02:00
Tomasz Grabiec	9c7e5f6277	Merge 'Fix secondary index feature with tablets' from Nadav Har'El Before this series, materialized views already work correctly on keyspaces with tablets, but secondary indexes do not. The goal of these series is make CQL secondary indexes fully supported on tablets: 1. First we need to make CREATE INDEX work with tablets (it didn't before this series). Fixes #16396. 2. Then we need to keep the promise that our documentation makes - that local secondary index should be synchronously updated - Fixes #16371. As you can see in the patches below, and as was expected already in the design phase, the code changes needed to make indexes support tablets were minimal. But writing reliable tests for these issues was the biggest effort that went into this series. Closes scylladb/scylladb#16436 * github.com:scylladb/scylladb: secondary-index, tablets: ensure that LSI are synchronous test: add missing "tags" schema extension to cql_test_env mv, test: fix delay_before_remote_view_update injection point secondary index: fix view creation when using tablets	2023-12-21 23:37:00 +01:00
Botond Dénes	1ce07c6f27	test/cql-pytest: test_select_from_mutation_fragments: bump timeout for test_many_partitions The test test_many_partitions is very slow, as it tests a slow scan over a lot of partitions. This was observed to time out on the slower ARM machines, making the test flaky. To prevent this, create an extra-patient cql connection with a 10 minutes timeout for the scan itself. This is a follow-up to `fb9379edf1`, which attempted to fix this, but didn't patch all the places doing slow scans. This patch fixes the other scan, the one actually observed to time-out in CI. Fixes: #16145 Closes scylladb/scylladb#16370	2023-12-21 19:55:06 +02:00
Pavel Emelyanov	a03755d6d7	test: Add a test that switching between vnodes and tablets is banned Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:57:55 +03:00
Pavel Emelyanov	4de433ac23	cql3/statements: Don't allow switching between vnode and per-table replication strategies When ALTER-ing a keyspace one may as well change its vnode/tablet flavor, which is not currently supported, so prohibit this change explicitly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:57:00 +03:00
Pavel Emelyanov	299219833b	cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate For convenience of next patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:56:18 +03:00
Nadav Har'El	79011eeb24	Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity Both virtual tables and schema registry contain thread_local caches that are destroyed at thread exit. after a Seastar change[1], these destructions can happen after the reactor is destroyed, triggering a use-after-free. Fix by scoping the destruction so it takes place earlier. [1] `101b245ed7` Closes scylladb/scylladb#16510 * github.com:scylladb/scylladb: schema_registry, database: flush entries when no longer in use virtual_tables: scope virtual tables registry in system_keyspace	2023-12-21 17:10:25 +02:00
Avi Kivity	c00b376a3e	schema_registry, database: flush entries when no longer in use The schema registry disarms internal timers when it is destroyed. This accesses the Seastar reactor. However, after [1] we don't have ordering between the reactor destruction and the thread_local registry destruction. Fix this by flushing all entries when the database is destroyed. The database object is fundamental so it's unlikely we'll have anything using the registry after it's gone. [1] `101b245ed7`	2023-12-21 17:00:41 +02:00
Michał Chojnowski	d7b524cf10	main: add a call to LLVM profile dump before exit Scylla skips exit hooks so we have to manually trigger the data dump to disk from the LLVM profiling instrumentation runtime which we need in order to support code coverage. We use a weak symbol to get the address of the profile dump function. This is legal: the function is a public interface of the instrumentation runtime. Closes scylladb/scylladb#16430	2023-12-21 16:48:42 +02:00
Avi Kivity	2853f79f96	virtual_tables: scope virtual tables registry in system_keyspace Virtual tables are kept in a thread_local registry for deduplication purposes. The problem is that thread_local variables are destroyed late, possibly after the schema registry and the reactor are destroyed. Currently this isn't a problem, but after a seastar change to destroy the reactor after termination [1], things break. Fix by moving the registry to system_keyspace. system_keyspace was chosen since it was the birthplace of virtual tables. Pimpl is used to avoid increasing dependencies. [1] `101b245ed7`	2023-12-21 16:19:42 +02:00
Nadav Har'El	a41140f569	Merge 'scylla-sstable: handle attempt to load schema for non-existent tables more gracefully' from Botond Dénes In other words, print more user-friendly messages, and avoid crashing. Specifically: * Don't crash when attempting to load schema tables from configured data-dir, while configuration does not have any configured data-directories. * Detect the case where schema mutations have no rows for the current table -- the keyspace exists, but the table doesn't. * Add negative tests for schema-loading. Fixes: https://github.com/scylladb/scylladb/issues/16459 Closes scylladb/scylladb#16494 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add test for failed schema loadig tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs tools/schema_loader: also check for empty table/column mutations tools/schema_loader: log more details when loading schema from schema tables	2023-12-21 15:40:51 +02:00
Kefu Chai	6018e0fea7	database: log when done with truncating truncating is an unusual operation, and we write a logging message when the truncate op starts with INFO level, it would be great if we can have a matching logging messge indicating the end of truncate on the server side. this would help with investigation the TRUNCATE timeout spotted on the client. at least we can rule out the problem happening we server is performing truncate. Refs #15610 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16247	2023-12-21 13:59:09 +02:00
Raphael S. Carvalho	5e55954f27	replica: Make the storage snapshot survive concurrent compactions Consider this: 1) file streaming takes storage snapshot = list of sstables 2) concurrent compaction unlink some of those sstables from file system 3) file streaming tries to send unlinked sstables, but files other than data and index cannot be read as only data and index have file descriptors opened To fix it, the snapshot now returns a set of files, one per sstable component, for each sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16476	2023-12-21 12:50:28 +02:00
Botond Dénes	e6147c1853	Merge 'Some cleanup in compaction group' from Raphael "Raph" Carvalho Closes scylladb/scylladb#16448 * github.com:scylladb/scylladb: replica: Fix indentation replica: Kill unused calculate_disk_space_used_for()	2023-12-21 12:48:38 +02:00
Nadav Har'El	a613a3cad2	secondary-index, tablets: ensure that LSI are synchronous CQL Local Secondary Index is a Scylla-only extension to Cassandra's secondary index API where the index is separate per partition. Scylla's documentation guarantees that: "As of Scylla Open Source 4.0, updates for local secondary indexes are performed synchronously. When updates are synchronous, the client acknowledges the write operation only after both the base table modification and the view up date are written." This happened automatically with vnodes, because the base table and the view have the same partition key, so base and view replicas are co-located, and the view update is always local and therefore done synchronously. But with tablets, this does NOT happen automatically - the base and view tablets may be located on different nodes, and the view update may be remote, and NOT synchronous. So in this patch we explicitly mark the view as synchronous_update when building the view for an LSI. The bigger part of this patch is to add a test which reliably fails before this patch, and passes after it. The test creates a two-node cluster and a table with LSI, and pins the base's tablets to one node and the view's to the second node, forcing the view updates to be remote. It also uses an injection point to make the view update slower. The test then writes to the base and immediately tries to use the index to read. Before this patch, the read doesn't find the new data (contrary to the guarantee in the documentation). After this patch, the read does find the new data - because the write waited for the index to be updated. Fixes #16371 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	7c5092cb8f	test: add missing "tags" schema extension to cql_test_env One of the unfortunate anti-features of cql_test_env (the framework used in our CQL tests that are written in C++) is that it needs to repeat various bizarre initializations steps done in main.cc, otherwise various requests work incorrectly. One of these steps that main.cc is to initialize various "schema extensions" which some of the Scylla features need to work correctly. We remembered to initialize some schema extensions in cql_test_env, but forgot others. The one I will need in the following patch is the "tags" extension, which we need to mark materialized views used by local secondary indexes as "synchronous_updates" - without this patch the LSI tests in secondary_index_test.cc will crash. In addition to adding the missing extension, this patch also replaces the segmentation-fault crash when it's missing (caused by a dynamic cast failure) by a clearer on_internal_error() - so if we ever have this bug again, it will be easier to debug. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	b815aa021c	mv, test: fix delay_before_remote_view_update injection point The "delay_before_remote_view_update" is a recently-added injection point which should add a delay before remove view updates, but NOT force the writer to wait for it (whether the writer waits for it or not depends on whether the view is configured as synchronous or not). Unfortunately, the delay was added at the WRONG place, which caused it to sometimes be done even on asynchronous views, breaking (with false-negative) the tests that need this delay to reproduce bugs of missing synchronous updates (Refs #16371). The fix here is even simpler then the (wrong) old code - we just add the sleep to the existing function apply_to_remote_endpoints() instead of making the caller even more complex. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	8181e28731	secondary index: fix view creation when using tablets In commit `88a5ddabce`, we fixed materialized view creation to support tablets. We added to the function called to create materialized views in CQL, prepare_new_view_announcement() a missing call to the on_before_create_column_family() notifier that creates tablets for this new view. Unfortunately, We have the same problem when creating a secondary index, because it does not use prepare_new_view_announcement(), and instead uses a generic function to "update" the base table, which in some cases ends up creating new views when a new index is requested. In this path, the notifier did not get called to the notifier, so we must add it here too. Unfortunately, the notifiers must run in a Seastar thread, which means that yet another function now needs to run in a Seastar thread. Before this patch, creating a secondary index in a table using tablets fails with "Tablet map not found for table <uuid>". With this patch, it works. The patch also includes tests for creating a regular and local secondary index. Both tests fail (with the aforementioned error) before this patch, and pass with it. Fixes #16396 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Raphael S. Carvalho	ee203f846e	test: Fix segfault when running offstrategy test Observer, that references table_for_test, must of course, not outlive table_for_test. Observer can be called later after the last input sstable is removed from sstable manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16428	2023-12-20 19:04:41 +02:00
David Garcia	9af6c7e40b	docs: add myst parser Closes scylladb/scylladb#16316	2023-12-20 19:04:41 +02:00
Raphael S. Carvalho	d1e6dfadea	sstables: Harden estimate_droppable_tombstone_ratio() interface The interface is fragile because the user may incorrectly use the wrong "gc before". Given that sstable knows how to properly calculate "gc before", let's do it in estimate__d__t__r(), leaving no room for mistakes. sstable_run's variant was also changed to conform to new interface, allowing ICS to properly estimate droppable ratio, using GC before that is calculated using each sstable's range. That's important for upcoming tablets, as we want to query only the range that belongs to a particular tablet in the repair history table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15931	2023-12-20 19:04:41 +02:00
Botond Dénes	758d9cf005	Merge 'build: cmake: map 'release' to 'RelWithDebInfo'' from Kefu Chai this preserves the existing behavior of `configure.py` in the CMake generated `build.ninja`. * configure.py: map 'release' to 'RelWithDebInfo' * cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake * CMakeLists.txt: s/Release/RelWithDebInfo/ Closes scylladb/scylladb#16479 * github.com:scylladb/scylladb: build: cmake: map 'release' to 'RelWithDebInfo' build: define BuildType for enclosing build_by_default	2023-12-20 19:04:40 +02:00
Pavel Emelyanov	5866d265c3	Merge ' tools/utils: tool_app_template: handle the case of no args ' from Botond Dénes Currently, `tool_app_template::run_async()` crashes when invoked with empty argv (with just `argv[0]` populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of `argv[1]` to get the current operation. To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2. Fixes: #16451 Closes scylladb/scylladb#16456 * github.com:scylladb/scylladb: test: add regression tests for invoking tools with no args tools/utils: tool_app_template: handle the case of no args tools/utils: tool_app_template: remove "scylla-" prefix from app name	2023-12-20 19:04:40 +02:00
Kamil Braun	6fcaec75db	Merge 'Add maintenance socket' from Mikołaj Grzebieluch It enables interaction with the node through CQL protocol without authentication. It gives full-permission access. The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node. It is created before the node joins the cluster. To set up the maintenance socket, use the `maintenance-socket` option when starting the node. * If set to `ignore` maintenance socket will not be created. * If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`. * Otherwise maintenance socket will be created in the specified path. The default value is `ignore`. * With python driver ```python from cassandra.cluster import Cluster from cassandra.connection import UnixSocketEndPoint from cassandra.policies import HostFilterPolicy, RoundRobinPolicy socket = "<node's workdir>/cql.m" cluster = Cluster([UnixSocketEndPoint(socket)], # Driver tries to connect to other nodes in the cluster, so we need to filter them out. load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket)) session = cluster.connect() ``` Merge note: apparently cqlsh does not support unix domain sockets; it will have to be fixed in a follow-up. Closes scylladb/scylladb#16172 * github.com:scylladb/scylladb: test.py: add maintenance socket test test.py: enable maintenance socket in tests by default docs: add maintenance socket documentation main: add maintenance socket main: refactor initialization of cql controller and auth service auth/service: don't create system_auth keyspace when used by maintenance socket cql_controller: maintenance socket: fix indentation cql_controller: add option to start maintenance socket db/config: add maintenance_socket_enabled bool class auth: add maintenance_socket_role_manager db/config: add maintenance_socket variable	2023-12-20 19:04:40 +02:00
Botond Dénes	5ef0d16eb3	test/cql-pytest: test_tools.py: add test for failed schema loadig	2023-12-20 10:31:03 -05:00
Botond Dénes	3e0058a594	tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs The configuration is not guaranteed to have any, so use the safe variant, to simply abort the schema load attempt, instead of crashing the tool.	2023-12-20 10:31:03 -05:00
Botond Dénes	208d2e890e	tools/schema_loader: also check for empty table/column mutations system_schema.tables and system_schema.columns must have content for every existing table. To detect a failed load of a table, before attempting to invoke `db::schema_tables::create_table_from_mutations()`, we check for the mutations read from these two tables, to not be disengaged. There is another failure scenario however. The mutations are not null, but do not have any clustering rows. This currently results in a cryptic error message, about failing to lookup a row in a result-set. This happens when the lookup-up keyspace exists, but the table doesn't. Add this to the check, so we get a human-readeable error message when this happens.	2023-12-20 10:31:00 -05:00
Botond Dénes	81e5033902	tools/schema_loader: log more details when loading schema from schema tables Currently, there is no visibility at all into what happens when attempting to load schema from schema tables. If it fails, we are left guessing on what went wrong. Add a logger and add various debug/trace logs to help following the process and identify what went wrong.	2023-12-20 10:30:21 -05:00
Nadav Har'El	7ee55dd03e	cdc, tablets: don't allow enabling CDC with tablets We do not yet support enabling CDC in a keyspace that uses tablets (Refs #16317). But the problem is that today, if this is attempted, we get a nasty failure: the CDC code creates the extra CDC log table, it doesn't get tablets, and Raft gets surprised and croaks with a message like: Raft instance is stopped, reason: "background error, std::_Nested_exceptionraft::state_machine_error (State machine error at raft/server.cc:1230): std::runtime_error (Tablet map not found for table 48ca1620-9ea5-11ee-bd7c-22730ed96b85) After Raft croaks, Scylla never recovers until it is rebooted. In this patch, we replace this disaster by a graceful error - a CREATE TABLE or ALTER TABLE operation with CDC enabled will fail in a clear way, and allowing Scylla to continue operating normally after this failed request. This fix is important for allowing us to run tests on Scylla with tablets, and although CDC tests will fail as expected, they won't fail the other tests that follow (Refs #16473). Fixes #16318 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16474	2023-12-20 10:06:34 +01:00
Kamil Braun	ffb6ae917f	Merge 'Add support for tablets in Alternator' from Nadav Har'El The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views) to work. After this series support for tablets in Alternator _mostly_ work, but not completely: 1. CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams"). 2. Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace. Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight. Fixes #16203 Fixes #16313 Closes scylladb/scylladb#16353 * github.com:scylladb/scylladb: mv, tablets, alternator: test for Alternator LSI with tablets mv: coroutinize wait code for remote view updates mv, test: add injection point to delay remove view update alternator: explicitly request synchronous updates for LSI alternator: fix view creation when using tablets alternator: add experimental method to create a table with tablets	2023-12-20 10:00:31 +01:00
Kamil Braun	1f6460972b	Merge 'Fix crash on table drop concurrent with streaming ' from Tomasz Grabiec The observed crash was in the following piece on "cf" access: if (table_is_dropped) { sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name()); Fixes #16181 Also, add a test case which reproduces the problem by doing table drop during tablet migration. But note that the problem is not tablet-specific. Closes scylladb/scylladb#16341 github.com:scylladb/scylladb: test: tablets: Add test case which tests table drop concurrent with migration tests: tablets: Do read barrier in get_tablet_replicas() streaming: Keep table by shared ptr to avoid crash on table drop	2023-12-20 09:57:06 +01:00
Kefu Chai	db9e314965	treewide: apply codespell to the comments in source code for less spelling errors in comment. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16408	2023-12-20 10:25:03 +02:00
Kefu Chai	fafe9d9c38	build: cmake: map 'release' to 'RelWithDebInfo' this preserves the existing behavior of `configure.py` in the CMake generated `build.ninja`. * configure.py: map 'release' to 'RelWithDebInfo' * cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake * CMakeLists.txt: s/Release/RelWithDebInfo/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-20 15:07:43 +08:00
Kefu Chai	72dcb2466d	build: define BuildType for enclosing build_by_default in existing `modes` defined in `configure.py`, "release" is mapped to "RelWithDebInfo". this behavior matches that of seastar's `configure.py`, where we also map "release" build mode to "RelWithDebInfo" CMAKE_BUILD_TYPE. but in scylladb's existing cmake settings, it maps "release" to "Release", despite "Release" is listed as one of the typical CMAKE_BUILD_TYPE values. so, in this change, to prepare for the mapping, `BuildType` is introduced to map a build mode to its related settings. the building settings are still kept in `cmake.${CMAKE_BUILD_TYPE}.cmake`, but the other settings, like if a build type should be enabled or its mappings, are stored in `BuildType` in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-20 15:07:43 +08:00
Nadav Har'El	2e031f2d8e	mv, tablets, alternator: test for Alternator LSI with tablets This patch adds a test (in the topology test framework) for issue #16313 - the bug where Alternator LSI must use synchronous view updates but didn't. This test fails with high probability (around 50%) before the previous patch, which fixed this bug - and passes consistently after the patch (I ran it 100 times and it didn't fail even once). This is the first test in the topology framework that uses the DynamoDB API and not CQL. This required a couple of tiny convenience functions, which are introduced in the only test file that uses them - but if we want we can later move them out to a library file. Unfortunately, the standard AWS SDK for Python - boto3 - is not asynchronous, so this test is also not really asynchronous, and will block the event loop while making requests to Alternator. However, for now it doesn't matter (we do NOT run multiple tests in the same event loop), and if it ever matters, I mentioned a couple of options what we can do in a comment. Because this test uses a 10-node cluster, it is skipped in debug-mode runs. In a later patch we will replace it by a more efficent - and more reliable - 2-node test. Refs #16313 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-19 15:41:15 +02:00
Avi Kivity	15acceb69f	Merge 'commitlog_test::test_commitlog_reader: handle segment_truncation' from Calle Wilund Fixes #16312 This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments). (First patch makes the test coroutinized to make for a nicer, easier fix change. Closes scylladb/scylladb#16368 * github.com:scylladb/scylladb: commitlog_test::test_commitlog_reader: handle segment_truncation commitlog_test: coroutinize test_commitlog_reader	2023-12-19 15:33:38 +02:00
Botond Dénes	6abdced7b9	test: add regression tests for invoking tools with no args This was recently found to produce a crash. Add a simple regression test, to make sure future changes don't re-introduce problems with this rarely used code-path.	2023-12-19 04:08:48 -05:00
Botond Dénes	76492407ab	tools/utils: tool_app_template: handle the case of no args Currently, tool_app_template::run_async() crashes when invoked with empty argv (with just argv[0] populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of argv[1] to get the current operation. To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2.	2023-12-19 04:08:33 -05:00
Botond Dénes	975c11a54b	tools/utils: tool_app_template: remove "scylla-" prefix from app name In other words, have all tools pass their name without the "scylla-" prefix to `tool_app_template::config::name`. E.g., replace "scylla-nodetool" with just "nodetool". Patch all usages to re-add the prefix if needed. The app name is just more flexible this way, some users might want the name without the "scylla-" prefix (in the next patch).	2023-12-19 04:04:57 -05:00
Botond Dénes	ce317d50bc	bytes.hh: correct spelling of delimiter and delimited Pointed out by the new spellcheck workflow. Closes scylladb/scylladb#16450	2023-12-18 20:46:21 +02:00
Mikołaj Grzebieluch	ef10b497e1	test.py: add maintenance socket test Test that when connecting to the maintenance socket, the user has superuser permissions, even if the authentication is enabled on the regular port.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	e327478bb5	test.py: enable maintenance socket in tests by default	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	21b3ba4927	docs: add maintenance socket documentation	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	f96d30c2b5	main: add maintenance socket Add initialization of maintenance_auth_service and cql_maintenance_server_ctl. Create maintenance socket which enables interaction with the node through CQL protocol without authentication. The maintenance port is available by Unix domain socket. It gives full-permission access. It is created before the node joins the cluster.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	16ab2c28e4	main: refactor initialization of cql controller and auth service Move initialization of cql controller and auth service to functions. It will make it easier to create a new cql controller with a seperate auth service, for example for the maintenance socket. Make it possible to initialize new services before joining group0.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	999be1d14b	auth/service: don't create system_auth keyspace when used by maintenance socket The maintenance socket is created before joining the cluster. When maintenance auth service is started it creates system_auth keyspace if it's missing. It is not synchronized with other nodes, because this node hasn't joined the group0 yet. Thus a node has a mismatched schema and is unable to join the cluster. The maintenance socket doesn't use role management, thus the problem is solved by not creating system_auth keyspace when maintenance auth service is created. The logic of regular CQL port's auth service won't be changed. For the maintenance socket will be created a new separate auth service.	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	2b9a88d17a	cql_controller: maintenance socket: fix indentation	2023-12-18 17:58:13 +01:00
Mikołaj Grzebieluch	ac61d0f695	cql_controller: add option to start maintenance socket Add an option to listen on the maintenance socket. It is set up on an unix domain socket and the metrics are disabled. This enables having an independent authentication mechanism for this socket. To start the maintenance socket, a new cql_controller has to be created with `db::maintenance_socket_enabled::yes` argument. Creating maintenance socket will raise an exception if * the path is longer than 107 chars (due to linux limits), * a file or a directory already exists in the path. The indentation is fixed in the next commit.	2023-12-18 17:58:13 +01:00
Tomasz Grabiec	84ea8b32b2	test: tablets: Restart cluster in a graceful manner to avoid connection drop in the middle of request serving After restarting each node, we should wait for other nodes to notice the node is UP before restarting the next server. Otherwise, the next node we restart may not send the shutdown notification to the previously restarted node, if it still sees it as down when we initiate its shutdown. In this case, the node will learn about the restart from gossip later, possible when we already started CQL requests. When a node learns that some node restarted while it considers it as UP, it will close connections to that node. This will fail RPC sent to that node, which will cause CQL request to time-out. Fixes #14746 Closes scylladb/scylladb#16010	2023-12-18 16:22:02 +01:00
Raphael S. Carvalho	63e4d6c965	test: Enable debug compaction logging for sstable_compaction_test It will make it easier to understand obscure issues like https://github.com/scylladb/scylladb/issues/13280. Refs #13280. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16426	2023-12-18 16:57:46 +03:00
Kefu Chai	db16048761	test/pylib: avoid using asyncio.get_event_loop() asyncio.get_event_loop() returns the current event loop. but if there is not, the result of `get_event_loop_policy().get_event_loop()` is returned. but this behavior is deprecated since Python 3.12, so let's use asyncio.run() as recommended by https://docs.python.org/3/library/asyncio-eventloop.html. asyncio.run() was introduced by Python 3.7, so we should be able to use it. this change should silence the waring when running this script as a stand-alone script with Python 3.12. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16385	2023-12-18 16:47:31 +03:00
Raphael S. Carvalho	5fa69b8a67	replica: Fix indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:23:22 -03:00
Raphael S. Carvalho	8a9784d29c	replica: Kill unused calculate_disk_space_used_for() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-18 10:22:19 -03:00
Avi Kivity	cd88f9eb76	Update tools/java submodule (native nodetool) * tools/java 3963c3abf7...b7ebfd38ef (1): > Merge 'Add nodetool interposer script' from Botond Dénes	2023-12-18 14:50:25 +02:00
Mikołaj Grzebieluch	cf43787295	db/config: add maintenance_socket_enabled bool class	2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch	11a2748d7f	auth: add maintenance_socket_role_manager Add `maintenance_socket_role_manager` which will disable all operations associated with roles to not depend on system_auth keyspace, which may be not yet created when the maintenance socket starts listening	2023-12-18 11:42:40 +01:00
Mikołaj Grzebieluch	e682e362a3	db/config: add maintenance_socket variable If set to "ignore", maintenance socket will be disabled. If set to "workdir", maintenance socket will be opened on <scylla's workdir>/cql.m. Otherwise it will be opened on path provided by maintenance_socket variable. It is set by default to 'ignore'.	2023-12-18 11:42:05 +01:00
Kamil Braun	3b108f2e31	Merge 'db: config: make consistent_cluster_management mandatory' from Patryk Jędrzejczak We make `consistent_cluster_management` mandatory in 5.5. This option will be always unused and assumed to be true. Additionally, we make `override_decommission` deprecated, as this option has been supported only with `consistent_cluster_management=false`. Making `consistent_cluster_management` mandatory also simplifies the code. Branches that execute only with `consistent_cluster_management` disabled are removed. We also update documentation by removing information irrelevant in 5.5. Fixes scylladb/scylladb#15854 Note about upgrades: this PR does not introduce any more limitations to the upgrade procedure than there are already. As in scylladb/scylladb#16254, we can upgrade from the first version of Scylla that supports the schema commitlog feature, i.e. from 5.1 (or corresponding Enterprise release) or later. Assuming this PR ends up in 5.5, the documented upgrade support is from 5.4. For corresponding Enterprise release, it's from 2023.x (based on 5.2), so all requirements are met. Closes scylladb/scylladb#16334 * github.com:scylladb/scylladb: docs: update after making consistent_cluster_management mandatory system_keyspace, main, cql_test_env: fix indendations db: config: make consistent_cluster_management mandatory test: boost: schema_change_test: replace disable_raft_schema_config db: config: make override_decommission deprecated db: config: make force_schema_commit_log deprecated	2023-12-18 09:44:52 +01:00
Botond Dénes	a6200e99e6	Merge 'Handle S3 partial read overflows' from Pavel Emelyanov The test case that validates upload-sink works does this by getting several random ranges from the uploaded object and checks that the content is what it should be. The range boundaries are generated like this: ``` uint64_t len = random(1, chunk_size); uint64_t offset = random(file_size) - len; ``` The 2nd line is not correct, if random number happens less than the len the offset befomes "negative", i.e. -- very large 64-bit unsigned value. Next, this offset:len gets into s3 client's get_object_contiguous() helper which in turn converts them into http range header's bytes-specifier format which is "first_bytet-last_byte" one. The math here is ``` first_byte = offset; last_byte = offset + len - 1; ``` Here the overflow of the offset thing results in underflow of the last_byte -- it becomes less than the first_byte. According to RFC this range-specifier is invalid and (!) can be ignored by the server. This is what minio does -- it ignores invalid range and returns back full object. But that's not all. When returning object portion the http request status code is PartialContent, but when the range is ignored and full object is returned, the status is OK. This makes s3 client's request fail with unexpected_status_error in the middle of the test. Then the object is removed with deferred action and actual error is printed into logs. In the end of the day logs look as if deletion of an object failed with OK status %) fixes: #16133 Closes scylladb/scylladb#16324 * github.com:scylladb/scylladb: test/s3: Avoid object range overflow s3/client: Handle GET-with-Range overflows correctly	2023-12-18 10:00:32 +02:00
Avi Kivity	081f30d149	Merge 'Add support to tablet storage splitting' from Raphael "Raph" Carvalho Support for splitting tablet storage is added. Until now, tablet storage was composed of a single compaction group, i.e. a group of sstables eligible to be compacted together. For splitting, tablet storage can now be composed of multiple compaction groups, main, left and right. Main group stores sstables that require splitting, whereas left and right groups store sstables that were already split according to the tablet's token range. After table storage is put in splitting mode, new writes will only go to either left or right group, depending on the token. When all main groups completed splitting their sstables, then coordinator can proceed with tablet metadata changes. The coordination part is not implemented yet. Only the storage part. The former will come next and will be wired into the latter. Missing: - splitting monitor (verify whether coordinator asked for splitting and acts accordingly) (will come next) Closes scylladb/scylladb#16158 * github.com:scylladb/scylladb: replica: Introduce storage group splitting replica: Add storage_group::memtable_count() replica: Add compaction_group::empty() replica: Rename compaction_group_manager to storage_group_manager replica: Introduce concept of storage group compaction: Add splitting compaction task to manager compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor replica: Allow uncompacted SSTables to be moved into a new set compaction: Add splitting compaction flat_mutation_reader: Allow interposer consumers to be stacked mutation_writer: Introduce token-group-based mutation segregator locator: Introduce tablet_map::get_tablet_id_and_range_side(token)	2023-12-17 21:12:01 +02:00
Nadav Har'El	37b5c03865	mv: coroutinize wait code for remote view updates In the previous patch we added a delay injection point (for testing) in the view update code. Because the code was using continuation style, this resulted in increased indentation and ugly repetition of captures. So in this patch we coroutinize the code that waits for remote view updates, making it simpler, shorter, and less indented. Note that this function still uses continuations in one place: The remote view update is still composed of two steps that need to happen one after another, but we don't necessarily need to wait for them to happen. This is easiest to do with chaining continuations, and then either waiting or not waiting for the resulting future. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 20:15:08 +02:00
Nadav Har'El	bf6848d277	mv, test: add injection point to delay remove view update It's difficult to write a test (as we plan to do in to in the next patch) that verifies that synchronous view updates are indeed synchronous, i.e., that write with CL=QUORUM on the base-table write returns only after CL=QUORUM was also achieved in the view table. The difficulty is that in a fast test machine, even if the synchronous-view-update is completely buggy, it's likely that by the time the test reads from the view, all view updates will have been completed anyway. So in this patch we introduce an injection point, for testing, named "delay_before_remote_view_update", which adds a delay before the base replica sends its update to the remote view replica (in case the view replica is indeed remote). As usual, this injection point isn't configurable - when enabled it adds a fixed (0.5 second) delay, on all view updates on all tables. The existing code used continuation-style Seastar programming, and the addition of the injection point in this patch made it even uglier, so in the next patch we will coroutine-ize this code. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 20:15:08 +02:00
Nadav Har'El	2c0b472f44	alternator: explicitly request synchronous updates for LSI DynamoDB's local secondary index (LSI) allows strongly-consistent reads from the materialized view, which must be able to read what was previously written to the base. To support this, we need the view to use the "synchronous_updates". Previously, with vnodes, there was no need for using this option explicitly, because an LSI has the same partition key as the base table so the base and view replicas are the same, and the local writes are done synchronously. But with tablets, this changes - there is no longer a guarantee that the base and view tablets are located on the same node. So to restore the strong consistency of LSIs when tablets are enabled, this patch explicitly adds the "synchronous_updates" option to views created by Alternator LSIs. We do not add this option for GSIs - those do not support strongly-consistent reads. This fix was tested by a test that will be introduced in the following patches. The test showed that before this patch, it was possible that reading with ConsistentRead=True from an LSI right after the base was written would miss the new changes, but after this patch, it always sees the new data in the LSI. Fixes #16313. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 20:14:59 +02:00
Nadav Har'El	d11f5e9625	alternator: fix view creation when using tablets In commit `88a5ddabce`, we fixed materialized view creation to support tablets. We added to the function called to create materialized views in CQL, prepare_new_view_announcement() a missing call to the on_before_create_column_family() notifier that creates tablets for this new view. We have the same problem in Alternator when creating a view (GSI or LSI). The Alternator code does not use prepare_new_view_announcement(), and instead uses the lower-level function add_table_or_view_to_schema_mutation() so it didn't get the call to the notifier, so we must add it here too. Before this patch, creating an Alternator table with tablets (which has become possible after the previous patch) fails with "Tablet map not found for table <uuid>". With this patch, it works. A test for materialized views in Alternator will come in a following patch, and will test everything together - the CreateTable tag to use tablets (from the previous patch), the LSI/GSI creation (fixed in this patch) and the correct consistency of the LSI (fixed in the next patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 19:55:36 +02:00
Nadav Har'El	8e356d8c31	alternator: add experimental method to create a table with tablets As explained in issue #16203, we cannot yet enable tablets on Alternator keyspaces by default, because support for some of the features that Alternator needs, such as CDC, is not yet available. Nevertheless, to start testing Alternator integration with tablets, we want to provide a way to enable tablets in Alternator for tests. In this patch we add support for a tag, 'experimental:initial_tablets', which if added on a table during creation, uses tablets for its keyspace. The value of this tag is a numeric string, and it is exactly analogous to the 'initial_tablets' property we have in CQL's NetworkTopologyStrategy. We name this tag with the "experimental:" prefix to emphesize that it is experimental, and the way to enable or disable tablets will probably change later. The new tag only has effect when added while creating a table. Adding, deleting or changing it later on an existing table will have no effect. A later patch will have tests that use this tag to test Alternator with tablets. Refs #16203. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-17 19:55:30 +02:00
Kefu Chai	e436856cf7	token_metadata: pass node id when formatting it before this change, we use the format string of "Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended. in this change, we pass `existing_node`, so it can be formatted, and the intended error message can be printed in log. Refs `11a4908683` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16342	2023-12-17 19:54:09 +02:00
Evgeniy Naydanov	10eebe3c66	test: use different IP addresses for listen and RPC addresses Scylla can be configured to use different IPs for the internode communication and client connections. This test allocates and configure unique IP addresses for the client connections (`rpc_address`) for 2-nodes cluster. Two scenarios tested: 1) Change RPC IPs sequentially 2) Change RPC IPs simultaneously Closes scylladb/scylladb#15965	2023-12-17 18:00:09 +02:00
Raphael S. Carvalho	546b31846a	replica: Introduce storage group splitting This introduces the ability to split a storage group. The main compaction group is split into left and right groups. set_split() is used to set the storage group to splitting mode, which will create left and right compaction groups. Incoming writes will now be placed into memtable of either left or right groups. split() is used to complete the splitting of a group. It only returns when all preexisting data is split. That means main compaction group will be empty and all the data will be stored in either left or right group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 12:02:01 -03:00
Raphael S. Carvalho	3c5b00ea04	replica: Add storage_group::memtable_count() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	e5a9299696	replica: Add compaction_group::empty() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	213b2f1382	replica: Rename compaction_group_manager to storage_group_manager That's to reflect the fact that the manager now works with storage groups instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	15de1cdcbc	replica: Introduce concept of storage group Storage group is the storage of tablets. This new concept is helpful for tablet splitting, where the storage of tablet will be split in multiple compaction groups, where each can be compacted independently. The reason for not going with arena concept is that it added complexity, and it felt much more elegant to keep compaction group unchanged which at the end of the day abstracts the concept of a set of sstables that can be compacted and operated independently. When splitting, the storage group for a tablet may therefore own multiple compaction groups, left, right, and main, where main keeps the data that needs splitting. When splitting completes, only left and right compaction groups will be populated. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	dd1a6d6309	compaction: Add splitting compaction task to manager The task for splitting compaction will run until all sstables in the main set are split. The only exceptions are shutdown or user has explicitly asked for abort. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	f87161e556	compaction: Prepare rewrite_sstables_compaction_task_executor to be reused for splitting Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	c96938c49b	compaction: remove scrub-specific code from rewrite_sstables_compaction_task_executor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	55bcfba4de	replica: Allow uncompacted SSTables to be moved into a new set With off-strategy, we allow sstables to be moved into a new sstable set even if they didn't undergo reshape compaction. That's done by specifying a sstable is present both in input and output, with the completion desc. We want to do the same with other compaction types. Think for example of split compaction: compaction manager may decide a sstable doesn't need splitting, yet it wants that sstable to be moved into a new sstable set. Theoretically, we could introduce new code to do this movement, but more code means increased maintenance burden and higher chances of bugs. It makes sense to reuse the compaction completion path, as we do today with off-strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:09 -03:00
Raphael S. Carvalho	b1c5d5dd4e	compaction: Add splitting compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:08 -03:00
Raphael S. Carvalho	3dcb800a96	flat_mutation_reader: Allow interposer consumers to be stacked reader_consumer_v2 being a noncopyable_function imposes a restriction when stacking one interposer consumer on top of another. Think for example of a token-based segregator on top of a timestamp based one. To achieve that, the interposer consumer creator must be reentrant, such that the consumer can be created on each "channel", but today the creator becomes unusable after first usage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Raphael S. Carvalho	c8668b90e3	mutation_writer: Introduce token-group-based mutation segregator Token group is an abstraction that allows us to easily segregate a mutation stream into buckets. Groups share the same properties as compaction groups. Groups follow the ring order and they don't overlap each other. Groups are defined according to a classifier, which return an id given a token. It's expected that classifier return ids in monotonic increasing order. The reasons for this abstraction are: 1) we don't want to make segregator aware of compaction groups 2) splitting happens before tablet metadata is changed, so the the segregator will have to classify based on whether the token belongs to left (group id 0) or right (group id 1) side of the range to be split. The reason for not extending sstable writer instead, is that today, writer consumer can only tell producer to switch to a new writer, when consuming the end of a partition, but that would be too late for us, as we have to decide to move to a new writer at partition start instead. It will be wired into compaction when it happens in split mode. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Raphael S. Carvalho	bcbba9a5e3	locator: Introduce tablet_map::get_tablet_id_and_range_side(token) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:26:32 -03:00
Kefu Chai	c36945dea2	tasks: include used headers when compiling with Clang-18 + libstdc++-13, the tree fails to build: ``` /home/kefu/dev/scylladb/tasks/task_manager.hh:45:36: error: no template named 'list' in namespace 'std' 45 \| using foreign_task_list = std::list<foreign_task_ptr>; \| ~~~~~^ ``` so let's include the used header Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16433	2023-12-17 15:28:02 +02:00
Kefu Chai	81d5c4e661	db/system_keyspace: explicitly instantiate used template future<std::optional<utils::UUID>> system_keyspace::get_scylla_local_param_as<utils::UUID>(const sstring&) is used by db/schema_tables.cc. so let's instantiate this template explicitly. otherwise we'd have following link failure: ``` : && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 -fuse-ld=lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections CMakeFiles/scylla_version.dir/Release/release.cc.o CMakeFiles/scylla.dir/Release/main.cc.o -o Release/scylla Release/libscylla-main.a api/Release/libapi.a alternator/Release/libalternator.a db/Release/libdb.a cdc/Release/libcdc.a compaction/Release/libcompaction.a cql3/Release/libcql3.a data_dictionary/Release/libdata_dictionary.a gms/Release/libgms.a index/Release/libindex.a lang/Release/liblang.a message/Release/libmessage.a mutation/Release/libmutation.a mutation_writer/Release/libmutation_writer.a raft/Release/libraft.a readers/Release/libreaders.a redis/Release/libredis.a repair/Release/librepair.a replica/Release/libreplica.a schema/Release/libschema.a service/Release/libservice.a sstables/Release/libsstables.a streaming/Release/libstreaming.a test/perf/Release/libtest-perf.a thrift/Release/libthrift.a tools/Release/libtools.a transport/Release/libtransport.a types/Release/libtypes.a utils/Release/libutils.a seastar/Release/libseastar.a /usr/lib64/libboost_program_options.so.1.81.0 test/lib/Release/libtest-lib.a Release/libscylla-main.a -Xlinker --push-state -Xlinker --whole-archive auth/Release/libscylla_auth.a -Xlinker --pop-state /usr/lib64/libcrypt.so cdc/Release/libcdc.a compaction/Release/libcompaction.a mutation_writer/Release/libmutation_writer.a -Xlinker --push-state -Xlinker --whole-archive dht/Release/libscylla_dht.a -Xlinker --pop-state index/Release/libindex.a -Xlinker --push-state -Xlinker --whole-archive locator/Release/libscylla_locator.a -Xlinker --pop-state message/Release/libmessage.a gms/Release/libgms.a sstables/Release/libsstables.a readers/Release/libreaders.a schema/Release/libschema.a -Xlinker --push-state -Xlinker --whole-archive tracing/Release/libscylla_tracing.a -Xlinker --pop-state service/Release/libservice.a node_ops/Release/libnode_ops.a service/Release/libservice.a node_ops/Release/libnode_ops.a raft/Release/libraft.a repair/Release/librepair.a streaming/Release/libstreaming.a replica/Release/libreplica.a /usr/lib64/libabsl_raw_hash_set.so.2308.0.0 /usr/lib64/libabsl_hash.so.2308.0.0 /usr/lib64/libabsl_city.so.2308.0.0 /usr/lib64/libabsl_bad_variant_access.so.2308.0.0 /usr/lib64/libabsl_low_level_hash.so.2308.0.0 /usr/lib64/libabsl_bad_optional_access.so.2308.0.0 /usr/lib64/libabsl_hashtablez_sampler.so.2308.0.0 /usr/lib64/libabsl_exponential_biased.so.2308.0.0 /usr/lib64/libabsl_synchronization.so.2308.0.0 /usr/lib64/libabsl_graphcycles_internal.so.2308.0.0 /usr/lib64/libabsl_kernel_timeout_internal.so.2308.0.0 /usr/lib64/libabsl_stacktrace.so.2308.0.0 /usr/lib64/libabsl_symbolize.so.2308.0.0 /usr/lib64/libabsl_malloc_internal.so.2308.0.0 /usr/lib64/libabsl_debugging_internal.so.2308.0.0 /usr/lib64/libabsl_demangle_internal.so.2308.0.0 /usr/lib64/libabsl_time.so.2308.0.0 /usr/lib64/libabsl_strings.so.2308.0.0 /usr/lib64/libabsl_int128.so.2308.0.0 /usr/lib64/libabsl_strings_internal.so.2308.0.0 /usr/lib64/libabsl_string_view.so.2308.0.0 /usr/lib64/libabsl_throw_delegate.so.2308.0.0 /usr/lib64/libabsl_base.so.2308.0.0 /usr/lib64/libabsl_spinlock_wait.so.2308.0.0 /usr/lib64/libabsl_civil_time.so.2308.0.0 /usr/lib64/libabsl_time_zone.so.2308.0.0 /usr/lib64/libabsl_raw_logging_internal.so.2308.0.0 /usr/lib64/libabsl_log_severity.so.2308.0.0 -lsystemd /usr/lib64/libz.so /usr/lib64/libdeflate.so types/Release/libtypes.a utils/Release/libutils.a /usr/lib64/libcryptopp.so /usr/lib64/libboost_regex.so.1.81.0 /usr/lib64/libicui18n.so /usr/lib64/libicuuc.so /usr/lib64/libboost_unit_test_framework.so.1.81.0 seastar/Release/libseastar_perf_testing.a /usr/lib64/libjsoncpp.so.1.9.5 interface/Release/libinterface.a /usr/lib64/libthrift.so db/Release/libdb.a data_dictionary/Release/libdata_dictionary.a cql3/Release/libcql3.a transport/Release/libtransport.a cql3/Release/libcql3.a transport/Release/libtransport.a lang/Release/liblang.a /usr/lib64/liblua-5.4.so -lm rust/Release/libwasmtime_bindings.a rust/librust_combined.a /usr/lib64/libsnappy.so.1.1.10 mutation/Release/libmutation.a seastar/Release/libseastar.a /usr/lib64/libboost_program_options.so /usr/lib64/libboost_thread.so /usr/lib64/libboost_chrono.so /usr/lib64/libboost_atomic.so /usr/lib64/libcares.so /usr/lib64/libcryptopp.so /usr/lib64/libfmt.so.10.0.0 /usr/lib64/liblz4.so -ldl /usr/lib64/libgnutls.so -latomic /usr/lib64/libsctp.so /usr/lib64/libyaml-cpp.so /usr/lib64/libhwloc.so //usr/lib64/liburing.so /usr/lib64/libnuma.so /usr/lib64/libxxhash.so && : ld.lld: error: undefined symbol: seastar::future<std::optional<utils::UUID>> db::system_keyspace::get_scylla_local_param_as<utils::UUID>(seastar::basic_sstring<char, unsigned int, 15u, true> const&) >>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981) >>> schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()()) in archive db/Release/libdb.a >>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981) >>> schema_tables.cc.o:(db::schema_tables::recalculate_schema_version(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&)::$_0::operator()() const) in archive db/Release/libdb.a >>> referenced by schema_tables.cc:981 (./build/./db/schema_tables.cc:981) >>> schema_tables.cc.o:(db::schema_tables::merge_schema(seastar::sharded<db::system_keyspace>&, seastar::sharded<service::storage_proxy>&, gms::feature_service&, std::vector<mutation, std::allocator<mutation>>, bool)::$_1::operator()() (.resume)) in archive db/Release/libdb.a clang++: error: linker command failed with exit code 1 (use -v to see invocation) ``` it seems that, without the explicit instantiation, clang-18 just inlines the body of the instantiated template function at the caller site. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16434	2023-12-17 15:12:05 +02:00
Wojciech Mitros	629ea63922	rust: update dependencies The currently used versions of "time" and "rustix" depencies had minor security vulnerabilities. In this patch: - the "rustix" crate is updated - the "chrono" crate that we depend on was not compatible with the version of the "time" crate that had fixes, so we updated the "chrono" crate, which actually removed the dependency on "time" completely. Both updated were performed using "cargo update" on the relevant package and the corresponding version. Fixes #15772 Closes scylladb/scylladb#16378	2023-12-17 13:20:25 +02:00
Kefu Chai	10a11c2886	token_metadata: pass node id when formatting it before this change, we use the format string of "Can't replace node {} with itself", but fail to include the host id as seastar::format()'s arguments. this fails the compile-time check of fmt, which is yet merged. so, if we really run into this problem, {fmt} would throw before the intended runtime_error is raised -- currently, seastar::log formats the logging messages at runtime, this is not intended. in this change, we pass `existing_node`, so it can be formatted, and the intended error message can be printed in log. Refs `11a4908683` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16422	2023-12-15 16:43:44 +01:00
Kefu Chai	273ee36bee	tools/scylla-sstable: add `scylla sstable shard-of` command when migrating to the uuid-based identifiers, the mapping from the integer-based generation to the shard-id is preserved. we used to have "gen % smp_count" for calculating the shard which is responsible to host a given sstable. despite that this is not a documented behavior, this is handy when we try to correlate an sstable to a shard, typically when looking at a performance issue. in this change, a new subcommand is added to expose the connection between the sstable and its "owner" shards. Fixes #16343 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16345	2023-12-15 11:36:45 +02:00
Kefu Chai	fa3efe6166	.git: use ssh/key or token for auth enable checkout action to get authenticated if the action need to clone a non-public repo. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16421	2023-12-15 11:34:50 +02:00
Kamil Braun	6a4106edf3	migration_manager: don't attach empty system.scylla_local mutation in migration request handler In `effb9fb3cb` migration request handler (called when a node requests schema pull) was extended with a `system.scylla_local` mutation: ``` cm.emplace_back(co_await self._sys_ks.local().get_group0_schema_version()); ``` This mutation is empty if the GROUP0_SCHEMA_VERSIONING feature is disabled. Nevertheless, it turned out to cause problems during upgrades. The following scenario shows the problem: We upgrade from 5.2 to enterprise version with the aforementioned patch. In 5.2, `system.scylla_local` does not use schema commitlog. After the first node upgrades to the enterprise version, it immediately on boot creates a new enterprise-only table (`system_replicated_keys.encrypted_keys`) -- the specific table is not important, only the fact that a schema change is performed. This happens before the restarting node notices other nodes being UP, so the schema change is not immediately pushed to the other nodes. Instead, soon after boot, the other non-upgraded nodes pull the schema from the upgraded node. The upgraded node attaches a `system.scylla_local` mutation to the vector of returned mutations. The non-upgraded nodes try to apply this vector of mutations. Because some of these mutations are for tables that already use schema commitlog, while the `system.scylla_local` table does not use schema commitlog, this triggers the following error (even though the mutation is empty): ``` Cannot apply atomically across commitlog domains: system.scylla_local, system_schema.keyspaces ``` Fortunately, the fix is simple -- instead of attaching an empty mutation, do not attach a mutation at all if the handler of migration request notices that group0_schema_version is not present. Note that group0_schema_version is only present if the GROUP0_SCHEMA_VERSIONING feature is enabled, which happens only after the whole upgrade finishes. Refs: scylladb/scylladb#16414 Not using "Fixes" because the issue will only be fixed once this PR is merged to `master` and the commit is cherry-picked onto next-enterprise. Closes scylladb/scylladb#16416	2023-12-14 22:58:13 +01:00
Avi Kivity	2b8392b8b8	Merge 'database, reader_concurrency_semaphore: deduplicate reader_concurrency_semaphore metrics ' from Botond Dénes Reduce code duplication by defining each metric just once, instead of three times, by having the semaphore register metrics by itself. This also makes the lifecycle of metrics contained in that of the semaphore. This is important on enterprise where semaphores are added and removed, together with service levels. We don't want all semaphores to export metrics, so a new parameter is introduced and all call-sites make a call whether they opt-in or not. Fixes: https://github.com/scylladb/scylladb/issues/16402 Closes scylladb/scylladb#16383 * github.com:scylladb/scylladb: database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_semaphore: add register_metrics constructor parameter sstables: name sstables_manager	2023-12-14 18:26:24 +02:00
Patryk Jędrzejczak	f23f8628b7	docs: update after making consistent_cluster_management mandatory We remove Raft documentation irrelevant in 5.5. One of the changes is removing a part of the "Enabling Raft" section in raft.rst. Since Raft is mandatory in 5.5, the only way to enable it in this version is by performing a rolling upgrade from 5.4. We only need to have this case well-documented. In particular, we remove information that also appears in the upgrade guides like verifying schema synchronization. Similarly, we remove a sentence from the "Manual Recovery Procedure" section in handling-node-failures.rst because it mentions enabling Raft manually, which is impossible in 5.5. The rest of the changes are just removing information about checking or setting consistent_cluster_management, which has become unused.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	dced4bb924	system_keyspace, main, cql_test_env: fix indendations Broken in the previous patch.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	5ebfbf42bc	db: config: make consistent_cluster_management mandatory Code that executed only when consistent_cluster_management=false is removed. In particular, after this patch: - raft_group0 and raft_group_registry are always enabled, - raft_group0::status_for_monitoring::disabled becomes unused, - topology tests can only run with consistent_cluster_management.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	7dd7ec8996	test: boost: schema_change_test: replace disable_raft_schema_config In the following commits, we make consistent cluster management mandatory. This will make disable_raft_schema_config unusable, so we need to get rid of it. However, we don't want to remove tests that use it. The idea is to use the Raft RECOVERY mode instead of disabling consistent cluster management directly.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	a54f9052fc	db: config: make override_decommission deprecated The override_decommission option is supported only when consistent_cluster_management is disabled. In the following commit, we make consistent_cluster_management mandatory, which makes overwrite_decommission unusable.	2023-12-14 16:54:04 +01:00
Patryk Jędrzejczak	571db3c983	db: config: make force_schema_commit_log deprecated In scylladb/scylladb#16254, we made force_schema_commit_log unused. After this change, if someone passes this option as the command line argument, the boot fails. This behavior is undesired. We only want this option to be ignored. We can achieve this effect by making it deprecated.	2023-12-14 16:53:46 +01:00
Paweł Zakrzewski	5af066578a	doc: Offer replication_factor=3 as the default in the examples The goal is to make the available defaults safe for future use, as they are often taken from existing config files or documentation verbatim. Referenced issue: #14290 Closes scylladb/scylladb#15947	2023-12-14 16:14:01 +01:00
Piotr Dulikowski	c0cf3e398a	raft_rpc: use compat source location instead of std one The std::source_location is broken on some versions of clang. In order to be able to use its functionality in code, seastar defines seastar::compat::source_location, which is a typedef over std::source_location if the latter works, or s custom, dummy implementation if the std type doesn't work. Therefore, sometimes seastar::compat::source_location == std::source_location, but not always. In service/raft/raft_rpc.cc, both std source location and compat source location are used and std source location sometimes passed as an argument to compat source location, breaking builds on older toolchains. Fix this by switching the code there to only use compat source location. Fixes: scylladb/scylladb#16336 Closes scylladb/scylladb#16337	2023-12-14 16:14:01 +01:00
Kefu Chai	764d1e01da	locator: include used headers * exceptions/exceptions.hh is not used * std::set is not used, while std::unordered_set is uset Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16406	2023-12-14 16:14:01 +01:00
Kefu Chai	37868e5fdc	tools: fix spelling errors in user-facing messages they are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16409	2023-12-13 21:39:46 +02:00
Kefu Chai	caa0230e5d	test/cql-pytest: use raw string when appropriate we use "\w" to represent a character class in Python. see https://docs.python.org/3/library/re.html. but "\" should be escaped as well, CPython accepts "\w" after trying to find an escaped character of "\." but failed, and leave "\." as it is. but it complains. in this change, we use raw string to avoid escaping "\" in the regular expression. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16405	2023-12-13 21:14:32 +02:00
Israel Fruchter	514ef48d75	docker: put cqlsh configuration in correct place since always we were putting cqlsh configuration into `~/.cqlshrc` acording to commit from 8 years ago [1], this path is deprecated. until this commit [2], actully remove this path from cqlsh code as part of moving to scylla-cqlsh, we got [2], and didn't notice until the first release with it. this change write the configuration into `~/.casssndra/cqlshrc` as this is the default place cqlsh is looking. [1]: `13ea8a6669/bin/cqlsh.py (L264)` [2]: `2024ea4796` Fixes: scylladb/scylladb#16329 Closes scylladb/scylladb#16340	2023-12-13 18:40:52 +02:00
Kamil Braun	26cbd28883	Merge 'token_metadata: switch to host_id' from Petr Gusev In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes. The refactoring is structured as follows: * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version. * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading. * Go over all the places which read `token_metadata` and switch them to the new version. * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template. These series [depends](`1745a1551a`) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](`95c726a8df`) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node. Closes scylladb/scylladb#15903 * github.com:scylladb/scylladb: topology: remove_endpoint: remove inet_address overload token_metadata: topology: cleanup add_or_update_endpoint token_metadata: add_replacing_endpoint: forbid replacing node with itself topology: drop key_kind, host_id is now the primary key dc_rack_fn: make it non-template token_metadata: drop the template shared_token_metadata: switch to the new token_metadata gossiper: use new token_metadata database: get_token_metadata -> new token_metadata erm: switch to the new token_metadata storage_service: get_token_metadata -> token_metadata2 storage_service: get_token_to_endpoint_map: use new token_metadata api/token_metadata: switch to new version storage_service::on_change: switch to new token_metadata cdc: switch to token_metadata2 calculate_natural_endpoints: fix indentation calculate_natural_endpoints: switch to token_metadata2 storage_service: get_changed_ranges_for_leaving: use new token_metadata decommission_with_repair, removenode_with_repair -> new token_metadata rebuild_with_repair, replace_with_repair: use new token_metadata bootstrap: use new token_metadata tablets: switch to token_metadata2 calculate_effective_replication_map: use new token_metadata calculate_natural_endpoints: fix formatting abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata network_topology_strategy_test: update new token_metadata storage_service: on_alive: update new token_metadata storage_service: handle_state_bootstrap: update new token_metadata storage_service: snitch_reconfigured: update new token_metadata storage_service: leave_ring: update new token_metadata storage_service: node_ops_cmd_handler: update new token_metadata storage_service: node_ops_cmd_handler: add coordinator_host_id storage_service: bootstrap: update new token_metadata storage_service: join_token_ring: update new token_metadata storage_service: excise: update new token_metadata storage_service: join_cluster: update new token_metadata storage_service: on_remove: update new token_metadata storage_service: handle_state_normal: fill new token_metadata storage_service: topology_state_load: fill new token_metadata storage_service: adjust update_topology_change_info to update new token_metadata topology: set self host_id on the new topology locator::topology: allow being_replaced and replacing nodes to have the same IP token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known token_metadata: get_host_id: exception -> on_internal_error token_metadata: add get_all_ips method token_metadata: support host_id-based version token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. locator: make dc_rack_fn a template locator/topology: add key_kind parameter token_metadata: topology_change_info: change field types to token_metadata_ptr token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-13 16:35:52 +01:00
Avi Kivity	7fce057cda	database, reader_concurrency_sempaphore: deduplicate reader_concurrency_sempaphore metrics reader_concurrency_sempaphore are triplicated: each metrics is registered for streaming, user, and system classes. To fix, just move the metrics registration from database to reader_concurrency_sempaphore, so each reader_concurrency_sempaphore instantiated will register its metrics (if its creator asked for it). Adjust the names given to reader_concurrency_sempaphore so we don't change the labels. scylla-gdb is adjusted to support the new names.	2023-12-13 09:16:18 -05:00
Nadav Har'El	89d311ec23	tablet, mv: fix doc on implicit synchronous update The document docs/cql/cql-extensions.md documents Scylla's extension of synchronous view updates, and mentioned a few cases where view updates are synchronous even if synchronous updates are not requested explicitly. But with tablets, these statements and examples are no longer correct - with tablets, base and view tablets may find themselves migrated to entirely different nodes. So in this patch we correct the statements that are no longer accurate. Note that after this patch we still have in this document, and in other documents, similar promises about CQL local secondary indexes. Either the documentation or the implementation needs to change in that case too, but we'll do it in a separate patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16369	2023-12-13 14:58:06 +02:00
Botond Dénes	e1b30f50be	reader_concurrency_semaphore: add register_metrics constructor parameter To be used in the next patch to control whether the semaphore registers and exports metrics or not. We want to move metric registration to the semaphore but we don't want all semaphores to export metrics. The decision on whether a semaphore should or shouldn't export metrics should be made on a case-by-case basis so this new parameter has no default value (except for the for_tests constructor).	2023-12-13 06:25:45 -05:00
Avi Kivity	814f3eb6b5	sstables: name sstables_manager Soon, the reader_concurrency_semaphore will require a unique and meaningful name in order to label its metrics. To prepare for that, name sstable_manager instances. This will be used to generate a name for sstable_manager's reader_concurrency_semaphore.	2023-12-13 04:40:33 -05:00
Kefu Chai	5ea3af067d	.git: add codespell workflow to identify misspelling in the code. The GitHub actions in this workflow run codespell when a new pull request is created targetting master or enterprise branch. Errors will be annotated in the pull request. A new entry along with the existing tests like build, unit test and dtest will be added to the "checks" shown in github PR web UI. one can follow the "Details" to find the details of the errors. unfortunately, this check checks all text files unless they are explicitly skipped, not just the new ones added / changed in the PR under test. in other words, if there are 42 misspelling errors in master, and you are adding a new one in your PR, this workflow shows all of the 43 errors -- both the old and new ones. the misspelling in the code hurts the user experience and some time developer's experience, but the text files under test/cql can be sensitive to the text, sometimes, a tiny editing could break the test, so it is added to the skip list. So far, since there are lots of errors identified by the tool, before we address all of them, the identified problem are only annotated, they are not considered as error. so, they don't fail the check. and in this change `only_warn` is set, so the check does not fail even if there are misspellings. this prevents the distractions before all problems are addressed. we can remove this setting in future, once we either fix all the misspellings or add the ignore words or skip files. but either way, the check is not considered as blockers for merging the tested PR, even if this check fails -- the check failure is just represented for information purpose, unless we make it a required in the github settings for the target branch. if want to change this, we can configure it in github's Branch protectionn rule on a per-branch basis, to make this check a must-pass. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16285	2023-12-13 10:53:09 +02:00
Aleksandra Martyniuk	9b9ea1193c	tasks: keep task's children in list If std::vector is resized its iterators and references may get invalidated. While task_manager::task::impl::_children's iterators are avoided throughout the code, references to its elements are being used. Since children vector does not need random access to its elements, change its type to std::list<foreign_task_ptr>, which iterators and references aren't invalidated on element insertion. Fixes: #16380. Closes scylladb/scylladb#16381	2023-12-13 10:47:27 +02:00
Yaniv Kaul	0b0a3ee7fc	Typos: fix typos in code Last batch, hopefully, sing codespell, went over the docs and fixed some typos. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16388	2023-12-13 10:45:21 +02:00
Botond Dénes	57f5ac03e1	Merge 'scripts/coverage.py: cleanups' from Kefu Chai various cleanups in `scripts/coverage.py`. they do not change the behavior of this script in the happy path. Closes scylladb/scylladb#16399 * github.com:scylladb/scylladb: scripts/coverage.py: s/exit/sys.exit/ scripts/coverage.py: do not inherit Value from argparse.Action scripts/coverage.py: use `is not None` scripts/coverage.py: correct the formatted string in error message scripts/coverage.py: do not use f-string when nothing to format scripts/coverage.py: use raw string to avoid escaping "\"	2023-12-13 10:25:44 +02:00
Kefu Chai	1b57ba44eb	scripts/coverage.py: s/exit/sys.exit/ the former is supposed to be used in "the interactive interpreter shell and should not be used in programs.". this function prints out its argument, and the exit code is 1. so just print the error message using sys.exit() see also https://docs.python.org/3/library/sys.html#sys.exit and https://docs.python.org/3/library/constants.html#exit Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:50:00 +08:00
Kefu Chai	7600b68d5c	scripts/coverage.py: do not inherit Value from argparse.Action as Value is not an argparse.Action, and it is not passed as the argument of the "action" parameter. neither does it implement the `__call__` function. so just derive it from object. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	9c112dacc4	scripts/coverage.py: use `is not None` `is not None` is the more idiomatic Python way to check if an expression evaluates to not None. and it is more readable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	0d15fc57d5	scripts/coverage.py: correct the formatted string in error message the formatted string should be `basename`. `input_file` is not defined in that context. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	bc94b7bc04	scripts/coverage.py: do not use f-string when nothing to format there is no string interpolation in this case, so drop the "f" prefix. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Kefu Chai	c3c715236d	scripts/coverage.py: use raw string to avoid escaping "\" we use "\." to escape "." in a regular expression. but "\" should be escaped as well, CPython accepts "\." after trying to find an escaped character of "\." but failed, and leave "\." as it is. but it complains: ``` /home/kefu/dev/scylladb/scripts/coverage.py:107: SyntaxWarning: invalid escape sequence '\.' input_file_re_str = f"(.+)\.profraw(\.{__DISTINCT_ID_RE})?" ``` in this change, we use raw string to avoid escaping "\" in the regular expression. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-13 10:41:52 +08:00
Tomasz Grabiec	cdc53d0a49	test: tablets: Add test case which tests table drop concurrent with migration	2023-12-13 00:06:56 +01:00
Avi Kivity	1f7c049791	Update tools/java submodule (minor security fixes) * tools/java 29fe44da84...3963c3abf7 (2): > Revert "build: update `guava` dependency" > Merge "Update Netty , Guava and Logback dependencies" from Yaron Kaikov Ref scylladb/scylla-tools-java#363 Ref scylladb/scylla-tools-java#364	2023-12-12 22:23:20 +02:00
Avi Kivity	c3d679e31e	Merge 'sstables, utils: do not include unused header' from Kefu Chai do not include unused header Closes scylladb/scylladb#16386 * github.com:scylladb/scylladb: utils: bit_cast: drop unused #includes sstables: writer: do not include unused header	2023-12-12 22:22:36 +02:00
Avi Kivity	22b77edef3	Merge 'scylla-nodetool: implement the scrub command' from Botond Dénes On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented: * Expose quarantine-mode option of the scrub_keyspace REST API * Exit with error and print a message, when scrub finishes with abort or validation_errors return code The command comes with tests and all tests pass with both the new and the current nodetool implementations. Refs: #15588 Refs: #16208 Closes scylladb/scylladb#16391 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the scrub command test/nodetool: rest_api_mock.py: add missing "f" to error message f string api: extract scrub_status into its own header	2023-12-12 22:22:35 +02:00
Petr Gusev	9d93a518ac	topology: remove_endpoint: remove inet_address overload The overload was used only in tests.	2023-12-12 23:19:54 +04:00
Petr Gusev	fbf507b1ba	token_metadata: topology: cleanup add_or_update_endpoint Make host_id parameter non-optional and move it to the beginning of the arguments list. Delete unused overloads of add_or_update_endpoint. Delete unused overload of token_metadata::update_topology with inet_address argument.	2023-12-12 23:19:54 +04:00
Petr Gusev	11a4908683	token_metadata: add_replacing_endpoint: forbid replacing node with itself This used to work before in replace-with-same-ip scenario, but with host_id-s it's no longer relevant. base_token_metadata has been removed from topology_change_info because the conditions needed for its creation are no longer met.	2023-12-12 23:19:54 +04:00
Petr Gusev	3b59919a9c	topology: drop key_kind, host_id is now the primary key	2023-12-12 23:19:54 +04:00
Petr Gusev	8c551f9104	dc_rack_fn: make it non-template	2023-12-12 23:19:54 +04:00
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	799f747c8f	shared_token_metadata: switch to the new token_metadata	2023-12-12 23:19:54 +04:00
Petr Gusev	c7314aa8e2	gossiper: use new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	e50dbef3e2	database: get_token_metadata -> new token_metadata database::get_token_metadata() is switched to token_metadata2. get_all_ips method is added to the host_id-based token_metadata, since its convenient and will be used in several places. It returns all current nodes converted to inet_address by means of the topology contained within token_metadata. hint_sender::can_send: if the node has already left the cluster we may not find its host_id. This case is handled in the same way as if it's not a normal token owner - we simply send a hint to all replicas.	2023-12-12 23:19:53 +04:00
Petr Gusev	11cc21d0a9	erm: switch to the new token_metadata In this commit we replace token_metadata with token_metadata2 in the erm interface and field types. To accommodate the change some of strategy-related methods are also updated. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	309e08e597	storage_service: get_token_metadata -> token_metadata2 In this commit we change the return type of storage_service::get_token_metadata_ptr() to token_metadata2_ptr and fix whatever breaks. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	f53f34f989	storage_service: get_token_to_endpoint_map: use new token_metadata The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map method was used only here. It's inlined in this commit since it's too specific and incurs the overhead of creating an intermediate map.	2023-12-12 23:19:53 +04:00
Petr Gusev	0e4c90dca6	api/token_metadata: switch to new version	2023-12-12 23:19:53 +04:00
Petr Gusev	b2d3dc33e2	storage_service::on_change: switch to new token_metadata The check ep == endpoint is needed when a node changes its IP - on_change can be called by the gossiper for old IP as part of its removal, after handle_state_normal has already been called for the new one. Without the check, the do_update_system_peers_table call overwrites the IP back to its old value. Previously token_metadata used endpoint as the key and the ep == endpoint condition was followed from the is_normal_token_owner check. Now with host_id-s we have an additional layer of indirection, and we need *ep == endpoint check to get the same end condition. This case was revealed by the dtest update_cluster_layout_tests.py::TestUpdateClusterLayout::test_change_node_ip	2023-12-12 23:19:53 +04:00
Petr Gusev	7eb7863635	cdc: switch to token_metadata2 Change the token_metadata type to token_metadata2 in the signatures of CDC-related methods in storage_service and cdc/generation. Use get_new_strong to get a pointer to the new host_id-based token_metadata from the inet_address-based one, living in the shared_token_metadata. The starting point of the patch is in storage_service::handle_global_request. We change the tmptr type to token_metadata2 and propagate the change down the call chains. This includes token-related methods of the boot_strapper class.	2023-12-12 23:19:53 +04:00
Petr Gusev	b2fb650098	calculate_natural_endpoints: fix indentation	2023-12-12 23:19:53 +04:00
Petr Gusev	80ccbc0d53	calculate_natural_endpoints: switch to token_metadata2 All usages of calculate_natural_endpoints are migrated, now we can change its interface to take token_metadata2 instead of token_metadata.	2023-12-12 23:19:53 +04:00
Petr Gusev	933acb0f72	storage_service: get_changed_ranges_for_leaving: use new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	7c7dbe3779	decommission_with_repair, removenode_with_repair -> new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	ef534ac876	rebuild_with_repair, replace_with_repair: use new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	93263bf9e7	bootstrap: use new token_metadata Just mechanical changes to the new token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	d9283bd025	tablets: switch to token_metadata2 locator_topology_test, network_topology_strategy_test and tablets_test are fully switched to the host_id-based token_metadata, meaning they no longer populate the old token_metadata. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	f5038f6c72	calculate_effective_replication_map: use new token_metadata In this commit we switch the function calculate_effective_replication_map to use the new token_metadata. We do this by employing our new helper calculate_natural_ips function. We can't use this helper for current_endpoints/target_endpoints though, since in that case we won't add the IP to the pending_endpoints in the replace-with-same-ip scenario The token_metadata_test is migrated to host_ids in the same commit to make it pass. Other tests work because they fill both versions of the token_metadata, but for this test it was simpler to just migrate it straight away. The test constructs the old token_metadata over the new token_metadata, this means only the get_new() method will work on it. That's why we also need to switch some other functions (maybe_remove_node_being_replaced, do_get_natural_endpoints, get_replication_factor) to the new version in the same commit. All the boost and topology tests pass with this change.	2023-12-12 23:19:53 +04:00
Petr Gusev	fe3c543c4e	calculate_natural_endpoints: fix formatting	2023-12-12 23:19:53 +04:00
Petr Gusev	d5b4b02b28	abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata We've updated all the places where token_metadata is mutated, and now we can progress to the next stage of the refactoring - gradually switching the read code paths. The calculate_natural_endpoints function is at the core of all of them. It decides to what nodes the given token should be replicated to for the given token_metadata. It has a lot of usages in various contexts, we can't switch them all in one commit, so instead we allowed the function to behave in both ways. If use_host_id parameter is false, the function uses the provided token_metadata as is and returns endpoint_set as a result. If it's true, it uses get_new() on the provided token_metadata and returns host_id_set as a result. The scope of the whole refactoring is limited to the erm data structure, its interface will be kept inet_address based for now. This means we'll often need to resolve host_ids to inet_address-es as soon as we got a result from calculated_natural_endpoints. A new calculate_natural_ips function is added for convenience. It uses the new token_metadata and immediately resolves returned host_id-s to inet_address-es. The auxiliary declarations natural_ep_type, set_type, vector_type, get_self_id, select_tm are introduced only for the sake of migration, they will be removed later.	2023-12-12 23:19:53 +04:00
Petr Gusev	1960436d93	network_topology_strategy_test: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	90234861ac	storage_service: on_alive: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	5c04a47d6f	storage_service: handle_state_bootstrap: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	4e03ba3ede	storage_service: snitch_reconfigured: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	0aab20d3fe	storage_service: leave_ring: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	278c832285	storage_service: node_ops_cmd_handler: update new token_metadata	2023-12-12 23:19:53 +04:00
Petr Gusev	1745a1551a	storage_service: node_ops_cmd_handler: add coordinator_host_id We'll need it in the next commits to address to replacing and bootstrapping nodes by id. We assume this change will be shipped in 6.0 with upgrade from 5.4, where host_id already exists in client_info. We don't support upgrade between non-adjacent versions.	2023-12-12 23:19:48 +04:00
Botond Dénes	47450ae4db	tools/scylla-nodetool: implement the scrub command On top of the capabilities of the java-nodetool command, the following additional functionalit is implemented: * Expose quarantine-mode option of the scrub_keyspace REST API * Exit with error and print a message, when scrub finishes with abort or validation_errors return code	2023-12-12 09:39:58 -05:00
Botond Dénes	892683cace	test/nodetool: rest_api_mock.py: add missing "f" to error message f string	2023-12-12 09:33:39 -05:00
Botond Dénes	8064d17f78	api: extract scrub_status into its own header So it can be shared with scylla-nodetool code.	2023-12-12 09:33:39 -05:00
Petr Gusev	2794b14a80	storage_service: bootstrap: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	c20c8c653c	storage_service: join_token_ring: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	fde20bddc0	storage_service: excise: update new token_metadata excise is called from handle_state_left, the endpoint may have already been removed from tm by then - test_raft_upgrade_majority_loss fails if we use unconditional tmptr->get_new()->get_host_id instead of get_host_id_if_known	2023-12-12 17:27:25 +04:00
Petr Gusev	23811486d8	storage_service: join_cluster: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	711aaa0e29	storage_service: on_remove: update new token_metadata	2023-12-12 17:27:25 +04:00
Petr Gusev	6412cd64f1	storage_service: handle_state_normal: fill new token_metadata	2023-12-12 17:27:15 +04:00
Kefu Chai	c485644303	utils: bit_cast: drop unused #includes Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-12 21:09:51 +08:00
Kefu Chai	af0ba3d648	sstables: writer: do not include unused header the helpers in bit_cast.hh are not used, so drop this #include. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-12 21:09:51 +08:00
Tomasz Grabiec	9b0d9e7c6b	tests: tablets: Do read barrier in get_tablet_replicas() In order for the call to see all prior changes to group0. Also, we should query on the host on which we executed the barrier. I hope this will reduce flakiness observed in CI runs on https://github.com/scylladb/scylladb/pull/16341 where the expected tablet replica didn't match the one returned by get_tablet_replica() after tablet movement, possibly because the node is still behind group0 changes.	2023-12-12 12:46:39 +01:00
Botond Dénes	493b6bc65f	Merge 'Guard tables in compaction tasks' from Benny Halevy Currently, if a compaction function enters the table or compaction_group async_gate, we can't stop it on the table/compaction_group stop path as they co_await their respective async_gate.close(). This series introduces a table_ptr smart pointer to guards the table object by entering its async_gate, and it also defers awaiting the gate.close future till after stopping ongoing compaction so that closing the gate will prevent starting new compactions while ongoing compaction can be stopped and finally awaiting the close() future will wait for them to unwind and exit the gate after being stopped. Fixes #16305 Closes scylladb/scylladb#16351 * github.com:scylladb/scylladb: compaction: run_on_table: skip compaction also on gate_closed_exception compaction: run_on_table: hold table table: add table_holder and hold method table: stop: allow compactions to be stopped while closing async_gate	2023-12-12 12:50:17 +02:00
Botond Dénes	885a807c71	Merge 'api: storage_service: api for starting async compaction' from Aleksandra Martyniuk For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api. Closes scylladb/scylladb#15092 * github.com:scylladb/scylladb: test: use async api in test_not_created_compaction_task_abort test: test compaction task started asynchronously api: tasks: api for starting async compaction api: compaction: pass pointer to top level compaction tasks	2023-12-12 12:06:52 +02:00
Asias He	5f20e33e15	api: Reject unsupported http api options for repair If an option is not supported, reject the request instead of silently ignoring the unsupported options. It prevents the user thinks the option is supported but it is ignored by scylla core. Fixes #16299 Closes scylladb/scylladb#16300	2023-12-12 09:18:00 +02:00
Benny Halevy	7843025a53	compaction: run_on_table: skip compaction also on gate_closed_exception Similar to the no_such_column_family error, gate_closed_exception indicates that the table is stopped and we should skip compaction on it gracefully. Fixes #16305 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:46:37 +02:00
Benny Halevy	92c718c60a	compaction: run_on_table: hold table To ensure the table will not be dropped while the compaction task is ongoing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:45:59 +02:00
Benny Halevy	cddcf3ad0c	table: add table_holder and hold method A smart pointer that guards the table object while it's being used by async functions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:43:49 +02:00
Benny Halevy	c8768f9102	table: stop: allow compactions to be stopped while closing async_gate To make sure a table object is kept valid throughout the lifetime of compaction a following patch will enter the table's _async_gate when the compaction task starts. This change defers awaiting the gate.close future till after stopping ongoing compaction so that closing the gate will prevent starting new compactions while ongoing compaction can be stopped and finally awaiting the close() future will wait for them to unwind and exit the gate after being stopped. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-12 08:31:50 +02:00
Anna Stuchlik	ff2457157d	doc: add the 5.4-to-5.5 upgrade guide This commit adds the upgrade guide from version 5.4 to 5.5. Also, it removes all previous OSS guides not related to version 5.5. The guide includes the required Raft-related information. NOTE: The content of the guide must be further verified closer to the release. I'm making these updates now to avoid errors and warnings related to outdated upgrade guides in other PRs, and to include the Raft information. Closes scylladb/scylladb#16350	2023-12-11 16:58:43 +01:00
Botond Dénes	3c125891f4	Update ./tools/java submodule * ./tools/java 26f5f71c...29fe44da (3): > tools: catch and print UnsupportedOperationException > tools/SSTableMetadataViewer: continue if sstable does not exist > throw more informative error when fail to parse sstable generation Fixes: scylladb/scylla-tools-java#360	2023-12-11 17:08:01 +02:00
Tomasz Grabiec	a33d45f889	streaming: Keep table by shared ptr to avoid crash on table drop The observed crash was in the following piece on "cf" access: if (*table_is_dropped) { sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name()); Fixes #16181	2023-12-11 14:58:04 +01:00
Calle Wilund	b34366957e	commitlog_test::test_commitlog_reader: handle segment_truncation Fixes #16312 This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments).	2023-12-11 11:53:12 +00:00
Calle Wilund	d85c0ea26f	commitlog_test: coroutinize test_commitlog_reader To make it easier to read and modify.	2023-12-11 11:47:48 +00:00
Takuya ASADA	7c38aff368	scylla_swap_setup: fix AttributeError On `dffadabb94` we mistakenly added "if args.overwrite_unit_file", but the option is comming from unmerged patch. So we need to drop this to fix script error. Fixes #16331 Closes scylladb/scylladb#16358	2023-12-11 13:41:00 +02:00
Tomasz Grabiec	effb9fb3cb	Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620). If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957). When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary. We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`. Fixes: #7620 Fixes: #13957 --- This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller. I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR. Closes scylladb/scylladb#16242 * github.com:scylladb/scylladb: docs: describe group 0 schema versioning in raft docs test: add test for group 0 schema versioning feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations schema_tables: use schema version from group 0 if present migration_manager: store `group0_schema_version` in `scylla_local` during schema changes system_keyspace: make `get/set_scylla_local_param` public feature_service: add `GROUP0_SCHEMA_VERSIONING` feature	2023-12-11 12:17:57 +01:00
Eliran Sinvani	befd910a06	install-dependencies.sh : Add packages for supporting code coverage As part of code coverage we need some additional packages in order to being able to process the code coverage data and being able to provide some meaningful information in logs. Here we add the following packages: fedora packages: ---------------- lcov - A package of utilities to manipulate lcov traces and generate coverage html reports fedora python3 packages: ------------------------ The following packages are added into fedora_packages and not the python3_packages since we don't need them to be packaged into scylla-python3 package but we only require them for the build environment. python3-unidiff - A python library for working with patch files, this is required in order to generate "patch coverage" reports. python3-humanfriendly - A python library to format some quantities into a human readable strings (time spans, sizes, etc...) we use it to print meaningful logs that tracks the volume and time it takes to process coverage data so we can better debug and optimize it in the future. python3-jinja3 - This is a template based generator that will eventually will allow to consolidate and rearrange several reports into one so we can publish a single report "site" for all of the coverage information. For example, include both, coverage report as well as patch report in a tab based site. pip packages: ------------- treelib - A tree data structure that supports also pretty printing of the tree data. We use it to log the coverage processing steps in order to have debugging capabilities in the future. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes scylladb/scylladb#16330 [avi: regenerate toolchain] Closes scylladb/scylladb#16357	2023-12-11 13:12:05 +02:00
Aleksandra Martyniuk	31977a1cde	test: use async api in test_not_created_compaction_task_abort	2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk	68f6886d50	test: test compaction task started asynchronously Check whether task id returned by asynchronous api is correct and whether tasks of proper type are created.	2023-12-11 11:39:41 +01:00
Aleksandra Martyniuk	b485897704	api: tasks: api for starting async compaction For all compaction types which can be started with api, add an asynchronous version of api, which returns task_id of the corresponding task manager task. With the task_id a user can check task status, abort, or wait for it, using task manager api.	2023-12-11 11:39:33 +01:00
Takuya ASADA	cc90ff1646	scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+ Since we dropped CentOS7 support, now we can switch from "PermissionsStartsOnly=True" to "ExecStartPre=+". Fixes scylladb/scylla-enterprise#1067	2023-12-11 19:38:28 +09:00
Takuya ASADA	6f1fff58ba	dist: drop legacy control group parameters Since we dropped CentOS7 support, now we can drop legacy control group parameters which is deprecated on systemd v252.	2023-12-11 19:38:28 +09:00
Takuya ASADA	dcb5fd6fce	scylla-server.slice: Drop workaround for MemorySwapMax=0 bug It was workaround for https://github.com/systemd/systemd/issues/8363, but the bug was fixed at `906bdbf5e7` and merged from systemd v239-8. Since we dropped support CentOS7, now we don't need the workaround anymore.	2023-12-11 19:38:28 +09:00
Takuya ASADA	6d7cb97645	dist: move AmbientCapabilities to scylla-server.service Since we dropped support CentOS7, now we always can use AmbientCapabilities without systemd version check, so we can move it from capabilities.conf to scylla-server.service. Although, we still cannnot hardcode CAP_PERFMON since it is too new, only newer kernel supported this, so keep it on scylla_post_install.sh	2023-12-11 19:38:28 +09:00
Takuya ASADA	1dc4feb68d	Revert "scylla_setup: add warning for CentOS7 default kernel" This reverts commit `85339d1820`.	2023-12-11 19:38:28 +09:00
Aleksandra Martyniuk	ceec5577d8	api: compaction: pass pointer to top level compaction tasks As a preparation for asynchronous compaction api, from which we cannot take values by reference, top level compaction tasks get pointers which need to be set to nullptr when they are not needed (like in async api).	2023-12-11 11:36:10 +01:00
Nadav Har'El	12f0007ede	Merge 'Skip auto snapshot for non-local storages' from Pavel Emelyanov When a table is truncated or dropped it can be auto-snapshotted if the respective config option is set (by default it is). Non local storages don't implement snapshotting yet and emit on_internal_error() in that case aborting the whole process. It's better to skip snapshot with a warning instead. Closes scylladb/scylladb#16220 * github.com:scylladb/scylladb: database: Do not auto snapshot non-local storages' tables database: Simplify snapshot booleans in truncate_table_on_all_shards()	2023-12-11 12:13:48 +02:00
Petr Gusev	b6fbbe28aa	storage_service: topology_state_load: fill new token_metadata For each inet_address-based modification of token_metadata we make a corresponding host_id-based change in token_metadata->get_new(). The _gossiper.add_saved_endpoint logic is switched to the new token_metadata.	2023-12-11 12:51:34 +04:00
Piotr Dulikowski	e7e1c4e63c	storage_service: adjust update_topology_change_info to update new token_metadata Both versions of the token_metadata need to be updated. For the new version we provide a dc_rack_fn function which looks for dc_rack by host_id in topology_state_machine if raft topology is on. Otherwise, it looks for IP for the given host_id and falls back to the gossiper-based function get_dc_rack_for.	2023-12-11 12:51:34 +04:00
Petr Gusev	66c30e4f8e	topology: set self host_id on the new topology With this commit, we begin the next stage of the refactoring - updating the new version of the token_metadata in all places where the old version is currently being updated. In this commit we assign host_id of this node, both in main.cc and in boost tests.	2023-12-11 12:51:34 +04:00
Petr Gusev	e4253776a1	locator::topology: allow being_replaced and replacing nodes to have the same IP When we're replacing a node with the same IP address, we want the following behavior: * host_id -> IP mapping should work and return the same IP address for two different host_ids - old and new. * the IP -> host_id mapping should return the host_id of the old (replaced) host. This variant is most convenient for preserving the current behavior of the code, especially the functions maybe_remove_node_being_replaced, erm::get_natural_endpoints_without_node_being_replaced, erm::get_pending_endpoints. The 'being_replaced' node will be properly removed in maybe_remove_node_being_replaced and 'replacing' node will be added to the pending_endpoints.	2023-12-11 12:51:34 +04:00
Petr Gusev	5a1418fdba	token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known This commit fixes an inconsistency in method names: get_host_id and get_host_id_if_known are (internal_error, returns null), but there was only one method for the opposite conversion - get_endpoint_for_host_id, and it returns null. In this commit we change it to on_internal_error if it can't find the argument and add another method get_endpoint_for_host_id_if_known which returns null in this case. We can't use get_endpoint_for_host_id/get_host_id in host_id_or_endpoint::resolve since it's called from storage_service::parse_node_list -> token_metadata::parse_host_id_and_endpoint, and exceptions are caught and handled in `storage_service::parse_node_list`.	2023-12-11 12:51:34 +04:00
Petr Gusev	08b47d645a	token_metadata: get_host_id: exception -> on_internal_error It's a bug to use get_host_id on a non-existent endpoint, so on_internal_error is more appropriate. Also, it's easier to debug since it provides a backtrace. If a missing inet_address is expected, get_host_id_if_known should be used instead. We update one such case in storage_service::force_remove_completion. Other usages of get_host_id are correct.	2023-12-11 12:51:34 +04:00
Petr Gusev	39bbe5f457	token_metadata: add get_all_ips method This is convenient for migrating code that uses get_all_endpoints.	2023-12-11 12:51:34 +04:00
Petr Gusev	9edf0709e6	token_metadata: support host_id-based version In this commit we enhance token_metadata with a pointer to the new host_id-based generic_token_metadata specialisation (token_metadata2). The idea is that in the following commits we'll go over all token_metadata modifications and make the corresponding modifications to its new host_id-based alternative. The pointer to token_metadata2 is stored in the generic_token_metadata::_new_value field. The pointer can be mutable, immutable, or absent altogether (std::monostate). It's mutable if this generic_token_metadata owns it, meaning it was created using the generic_token_metadata(config cfg) constructor. It's immutable if the generic_token_metadata(lw_shared_ptr<const token_metadata2> new_value); constructor was used. This means this old token_metadata is a wrapper for new token_metadata and we can only use the get_new() method on it. The field _new_value is empty for the new host_id-based token_metadata version. The generic_token_metadata(std::unique_ptr<token_metadata_impl<NodeId>> impl, token_metadata2 new_value); constructor is used for clone methods. We clone both versions, and we need to pass a cloned token_metadata2 into constructor. There are two overloads of get_new, for mutable and immutable generic_token_metadata. Both of them throws an exception if they can't get the appropriate pointer. There is also a get_new_strong method, which returns an immutable owning pointer. This is convenient since a lot of API's want an owning pointer. We can't make the get_new/get_new_strong API simpler and use get_new_strong everywhere since it mutate the original generic_token_metadata by incrementing the reference counter and this causes raises when it's passed between shards in replicate_to_all_cores.	2023-12-11 12:51:34 +04:00
Petr Gusev	63f64f3303	token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. generic_token_metadata::update_topology overload with host_id parameter is added to make update_topology_change_info work, it now uses NodeId as a parameter type. topology::remove_endpoint(host_id) is added to make generic_token_metadata::remove_endpoint(NodeId) work. pending_endpoints_for and endpoints_for_reading are just removed - they are not used and not implemented. The declarations were left by mistake from a refactoring in which these methods were moved to erm. generic_token_metadata_base is extracted to contain declarations, common to both token_metadata versions. Templates are explicitly instantiated inside token_metadata.cc, since implementation part is also a template and it's not exposed to the header. There are no other behavioral changes in this commit, just syntax fixes to make token_metadata a template.	2023-12-11 12:51:34 +04:00
Petr Gusev	c9fbe3d377	locator: make dc_rack_fn a template In the next commits token_metadata will be made a template with NodeId=inet_address\|host_id parameter. This parameter will be passed to dc_rack_fn function, so it also should be made a template.	2023-12-11 12:51:33 +04:00
Piotr Dulikowski	5227b71363	locator/topology: add key_kind parameter For the host_id-based token_metadata we want host_id to be the main node key, meaning it should be used in add_or_update_endpoint to find the node to update. For the inet_address-based token_metadata version we want to retain the old behaviour during transition period. In this commit we introduce key_kind parameter and use key_kind::inet_address in all current topology usages. Later we'll use key_kind::host_id for the new token_metadata. In the last commits of the series, when the new token_metadata version is used everywhere, we will remove key_kind enum.	2023-12-11 12:51:33 +04:00
Petr Gusev	2f137776c3	token_metadata: topology_change_info: change field types to token_metadata_ptr In subsequent commits we'll need the following api for token_metadata: token_metadata(token_metadata2_ptr); get_new() -> token_metadata2* where token_metadata2 is the new version of token_metadata, based on host_id. In other words: * token_metadata knows the new version of itself and returns a pointer to it through get_new() * token_metadata can be constructed based solely on the new version, without its own implementation. In this case the only method we can use on it is get_new. This allows to pass token_metadata2 to API's with token_metadata in method signature, if these APIs are known to only use the get_new method on the passed token_metadata. And back to topology_change_info - if we got it from the new token_metadata we want to be able to construct token_metadata from token_metadata2 contained in it, and this requires it to be a ptr, not value.	2023-12-11 12:51:33 +04:00
Petr Gusev	f21f23483c	token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-11 12:51:22 +04:00
Alexander Turetskiy	f30b5473ab	cql: Reject empty options while altering a keyspace Reject ALTER KEYSPACE request for NetworkTopologyStrategy when replication options are missed. Also reject CREATE KEYSPACE with no replication factor options. Cassandra has a default_keyspace_rf configuration that may allow such CREATE KEYSPACE commands, but Scylla doesn't have this option (refs #16028). fixes #10036 Closes scylladb/scylladb#16221	2023-12-10 17:44:35 +02:00
Kefu Chai	818343b57d	build: build session.cc in CMake building system this source file was added in `d3d83869`. so let's update cmake as well. sessions_tests was added in the same commit, so add it as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16344	2023-12-09 22:14:47 +02:00
Avi Kivity	d62a5fc60b	Merge 'tools/scylla-nodetool: implement additional commands, part 5/N ' from Botond Dénes This PR implements the following new nodetool commands: * decomission * rebuild * removenode * getlogginglevels * setlogginglevel * move * refresh All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#16348 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the refresh command tools/scylla-nodetool: implement the move command tools/scylla-nodetool: implement setlogginglevel command tools/sclla-sstable: implement the getlogginglevels command tools/scylla-nodetool: implement the removenode command tools/scylla-nodetool: implement the rebuild command tools/scylla-nodetool: implement the decommission command	2023-12-09 21:47:22 +02:00
Pavel Emelyanov	5e69415387	guardrails: Do not validate initial_tablets as replication factor When checking replication strategy options the code assumes (and it's stated in the preceeding code comment) that all options are replication factors. Nowadays it's no longer so, the initial_tablets one is not replication factor and should be skipped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16335	2023-12-09 15:56:41 +02:00
Kamil Braun	3352d9bccc	docs: describe group 0 schema versioning in raft docs	2023-12-08 17:46:31 +01:00
Kamil Braun	30fc36f8d2	test: add test for group 0 schema versioning Perform schema changes while mixing nodes in RECOVERY mode with nodes in group 0 mode: - schema changes originating from RECOVERY node use digest-based schema versioning. - schema changes originating from group 0 nodes use persisted versions committed through group 0. Verify that schema versions are in sync after each schema change, and that each schema change results in a different version. Also add a simple upgrade test, performing a schema change before we enable Raft (which also enables the new versioning feature) in the entire cluster, then once upgrade is finished. One important upgrade test is missing, which we should add to dtest: create a cluster in Raft mode but in a Scylla version that doesn't understand GROUP0_SCHEMA_VERSIONING. Then start upgrading to a version that has this patchset. Perform schema changes while the cluster is mixed, both on non-upgraded and on upgraded nodes. Such test is especially important because we're adding a new column to the `system.scylla_local` table (which we then redact from the schema definition when we see that the feature is disabled).	2023-12-08 17:46:31 +01:00
Kamil Braun	7dad31c78f	feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode As promised in earlier commits: Fixes: #7620 Fixes: #13957 Also modify two test cases in `schema_change_test` which depend on the digest calculation method in their checks. Details are explained in the comments.	2023-12-08 17:46:31 +01:00
Kamil Braun	522540da40	schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 As explained in the previous commit, we use the new `committed_by_group0` flag attached to each row of a `scylla_tables` mutation to decide whether the `version` cell needs to be deleted or not. The rest of #13957 is solved by pre-existing code -- if the `version` column is present in the mutation, we don't calculate a hash for `schema::version()`, but take the value from the column: ``` table_schema_version schema_mutations::digest(db::schema_features sf) const { if (_scylla_tables) { auto rs = query::result_set(_scylla_tables); if (!rs.empty()) { auto&& row = rs.row(0); auto val = row.get<utils::UUID>("version"); if (val) { return table_schema_version(val); } } } ... ``` The issue will therefore be fixed once we enable `GROUP0_SCHEMA_VERSIONING`.	2023-12-08 17:46:31 +01:00
Kamil Braun	defcf9915c	migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations As described in #13957, when creating or altering a table in group 0 mode, we don't want each node to calculate `schema::version()`s independently using a hash algorithm. Instead, we want to all nodes to use a single version for that table, commited by the group 0 command. There's even a column ready for this in `system.scylla_tables` -- `version`. This column is currently being set for system tables, but it's not being used for user tables. Similarly to what we did with global schema version in earlier commits, the obvious thing to do would be to include a live cell for the `version` column in the `system.scylla_tables` mutation when we perform the schema change in Raft mode, and to include a tombstone when performing it outside of Raft mode, for the RECOVERY case. But it's not that simple because as it turns out, we're already sending a `version` live cell (and also a tombstone, with timestamp decremented by 1) in all `system.scylla_tables` mutations. But then we delete that cell when doing schema merge (which begs the question why were we sending it in the first place? but I digress): ``` // We must force recalculation of schema version after the merge, since the resulting // schema may be a mix of the old and new schemas. delete_schema_version(mutation); ``` the above function removes the `version` cell from the mutation. So we need another way of distinguishing the cases of schema change originating from group 0 vs outside group 0 (e.g. RECOVERY). The method I chose is to extend `system.scylla_tables` with a boolean column, `committed_by_group0`, and extend schema mutations to set this column. In the next commit we'll decide whether or not the `version` cell should be deleted based on the value of this new column.	2023-12-08 17:46:31 +01:00
Kamil Braun	87b2c8a041	schema_tables: use schema version from group 0 if present As promised in the previous commit, if we persisted a schema version through a group 0 command, use it after a schema merge instead of calculating a digest. Ref: #7620 The above issue will be fixed once we enable the `GROUP0_SCHEMA_VERSIONING` feature.	2023-12-08 17:46:31 +01:00
Kamil Braun	3db8ac80cb	migration_manager: store `group0_schema_version` in `scylla_local` during schema changes We extend schema mutations with an additional mutation to the `system.scylla_local` table which: - in Raft mode, stores a UUID under the `group0_schema_version` key. - outside Raft mode, stores a tombstone under that key. As we will see in later commits, nodes will use this after applying schema mutations. If the key is absent or has a tombstone, they'll calculate the global schema digest on their own -- using the old way. If the key is present, they'll take the schema version from there. The Raft-mode schema version is equal to the group 0 state ID of this schema command. The tombstone is necessary for the case of performing a schema change in RECOVERY mode. It will force a revert to the old digest-based way. Note that extending schema mutations with a `system.scylla_local` mutation is possible thanks to earlier commits which moved `system.scylla_local` to schema commitlog, so all mutations in the schema mutations vector still go to the same commitlog domain. Also, since we introduce a replicated tombstone to `system.scylla_local`, we need to set GC grace to nonzero. We set it to `schema_gc_grace`, which makes sense given the use case.	2023-12-08 17:45:41 +01:00
Botond Dénes	496459165e	tools/scylla-nodetool: implement the refresh command	2023-12-08 08:58:16 -05:00
Botond Dénes	ad148a9dbc	tools/scylla-nodetool: implement the move command In the java nodetool, this command ends up calling an API endpoint which just throws an exception saying moving tokens is not supported. So in the native implementation we just throw an exception to the same effect in scylla-nodetool itself.	2023-12-08 08:29:39 -05:00
Botond Dénes	58d3850da1	tools/scylla-nodetool: implement setlogginglevel command	2023-12-08 08:18:56 -05:00
Botond Dénes	3a8590e1af	tools/sclla-sstable: implement the getlogginglevels command	2023-12-08 07:32:45 -05:00
Botond Dénes	c35ed794de	tools/scylla-nodetool: implement the removenode command	2023-12-08 07:32:31 -05:00
Botond Dénes	9a484cb145	tools/scylla-nodetool: implement the rebuild command	2023-12-08 07:05:30 -05:00
Botond Dénes	ea62f7c848	tools/scylla-nodetool: implement the decommission command	2023-12-08 06:14:36 -05:00
Kefu Chai	893f319004	sstables: add formatter for index_consume_entry_context_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, in order to enable the code in the header to access the formatter without being moved down after the full specialization's definition, we * move the enum definition out of the class and before the class, * rename the enum's name from state to index_consume_entry_context_state * define a formatter for index_consume_entry_context_state * remove its operator<<(). as fmt v10 is able to use `format_as()` as a fallback, the formatter full specialization is guarded with `#if FMT_VERSION < 10'00'00`. we will remove it after we start build with fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16204	2023-12-08 12:45:38 +02:00
Kurashkin Nikita	c071cd92b5	cql3:statement_restrictions.cc add more conditions to prevent "allow filtering" error to pop up in delete/update statements Modified Cassandra tests to check for Scylla's error messages Fixes #12474 Closes scylladb/scylladb#15811	2023-12-07 21:25:18 +02:00
Avi Kivity	9c0f05efa1	Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later. This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted. The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained. The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was. This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas. Closes scylladb/scylladb#15847 * github.com:scylladb/scylladb: test: tablets: Add test for failed streaming being fenced away error_injection: Introduce poll_for_message() error_injection: Make is_enabled() public api: Add API to kill connection to a particular host range_streamer: Do not block topology change barriers around streaming range_streamer, tablets: Do not keep token metadata around streaming tablets: Fail gracefully when migrating tablet has no pending replica storage_service, api: Add API to disable tablet balancing storage_service, api: Add API to migrate a tablet storage_service, raft topology: Run streaming under session topology guard storage_service, tablets: Use session to guard tablet streaming tablets: Add per-tablet session id field to tablet metadata service: range_streamer: Propagate topology_guard to receivers streaming: Always close the rpc::sink storage_service: Introduce concept of a topology_guard storage_service: Introduce session concept tablets: Fix topology_metadata_guard holding on to the old erm docs: Document the topology_guard mechanism	2023-12-07 16:29:02 +02:00
Avi Kivity	4b1ef00dbb	Merge 'File stream for tablet preparation' from Asias He This series adds preparation patches for file stream tablet implementation in enterprise branch. It minimizes the differences between those two branches. Closes scylladb/scylladb#16297 * github.com:scylladb/scylladb: messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb compaction_group_for_token: Handle minimum_token and maximum_token token serializer: Add temporary_buffer support cql_test_env: Allow messaging_service to start listen	2023-12-07 16:26:22 +02:00
Pavel Emelyanov	3eaadfcd4a	database: Do not auto snapshot non-local storages' tables Snapshotting is not yet supported for those (see #13025) and auto-snapshot would step on internal error. Skip it and print a warning into logs fixes #16078 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:47:12 +03:00
Avi Kivity	ed2a9b8750	Merge 'Commitlog: Fix reading/writing position calculations and allocation size checks' from Calle Wilund Fixes #16298 The adjusted buffer position calculation in buffer_position(), introduced in https://github.com/scylladb/scylladb/pull/15494 was in fact broken. It calculated (like previously) a "position" based on diff between underlying buffer size and ostream size() (i.e. avail), then adjusted this according to sector overhead rules. However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted. The two cannot be compared as such, which means the "positions" we get here are borked. Luckily for us (sarcasm), the position calculation in replayer made a similar error, in that it adjusts up current position by one sector overhead to much, leading to us more or less getting the same, erroneous results in both ends. However, when/iff one needs to adjust the segment file format further, one might very quickly realize that this does not work well if, say, one needs to be able to safely read some extra bytes before first chunk in a segment. Conversely, trying to adjust this also exposes a latent potential error in the skip mechanism, manifesting here. Issue fixed by keeping track of the initial ostream capacity for segment buffer, and use this for position calculation, and in the case of replayer, move file pos adjustment from read_data() to subroutine (shared with skipping), that better takes data stream position vs. file position adjustment. In implementaion terms, we first inc the "data stream" pos (i.e. pos in data without overhead), then adjust for overhead. Also fix replayer::skip, so that we handle the buffer/pos relation correctly now. Added test for intial entry position, as well as data replay consistency for single entry_writer paths. Fixes #16301 The calculation on whether data may be added is based on position vs. size of incoming data. However, it did not take sector overhead into account, which lead us to writing past allowed segment end, which in turn also leads to metrics overflows. Closes scylladb/scylladb#16302 * github.com:scylladb/scylladb: commitlog: Fix allocation size check to take sector overhead into account. commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart	2023-12-07 12:27:54 +02:00
Pavel Emelyanov	44c076472c	database: Simplify snapshot booleans in truncate_table_on_all_shards() There are three of them in this function -- with_snapshot argument, auto_snapshot local copy of db::config option and the should_snapshot local variable that's && of the above two. The code can go with just one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 13:06:28 +03:00
Botond Dénes	fb9379edf1	test/cql-pytest: test_select_from_mutation_fragments: bump timeout for slow test The test test_many_partitions is very slow, as it tests a slow scan over a lot of partitions. This was observed to time out on the slower ARM machines, making the test flaky. To prevent this, create an extra-patient cql connection with a 10 minutes timeout for the scan itself. Fixes: #16145 Closes scylladb/scylladb#16303	2023-12-07 11:55:53 +02:00
Yaniv Kaul	862909ee4f	Typos: fix typos in documentation Using codespell, went over the docs and fixed some typos. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16275	2023-12-07 11:10:17 +02:00
Anna Stuchlik	8b01cb7fb8	doc: set 5.4 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: - 5.4 is the latest version. - 5.4 is removed from the list of unstable versions. It must be merged when ScyllaDB 5.4 is released. No backport is required. Closes scylladb/scylladb#16308	2023-12-07 10:04:26 +02:00
Pavel Emelyanov	76705b6ba2	test/s3: Avoid object range overflow There's a test case the validates uploading sink by getting random portions of the uploaded object. The portions are generated as len = random % chunk_size off = random % file_size - len The latter may apparently render negative value which will translate into huuuuge 64-bit range offset which, in turn, would result in invalid http range specifier and getting object part fails with status OK Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 10:54:54 +03:00
Pavel Emelyanov	3e9309caf4	s3/client: Handle GET-with-Range overflows correctly The get_object_contiguous() accepts optional range argument in a form of offset:lengh and then converts it into first_byte:last_byte pair to satisfy http's Range header range-specifier. If the lat_byte, which is offset + lenght - 1, overflows 64-bits the range specifier becomes invalid. According to RFC9110 servers may ignore invalid ranges if they want to and this is what minio does. The result is pretty interesting. Since the range is specified, client expect PartialContent response, but since the range is ignored by server the result is OK, as if the full object was requested. So instead of some sane "overflow" error, the get_object_contiguous() fails with status "success". The fix is in pre-checking provided ranges and failing early Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-07 10:50:55 +03:00
Calle Wilund	dba39b47bd	commitlog: Fix allocation size check to take sector overhead into account. Fixes #16301 The calculation on whether data may be added is based on position vs. size of incoming data. However, it did not take sector overhead into account, which lead us to writing past allowed segment end, which in turn also leads to metrics overflows.	2023-12-07 07:36:27 +00:00
Calle Wilund	0d35c96ef4	commitlog: Fix commitlog_segment::buffer_position() calculation and replay counterpart Fixes #16298 The adjusted buffer position calculation in buffer_position(), introduced in #15494 was in fact broken. It calculated (like previously) a "position" based on diff between underlying buffer size and ostream size() (i.e. avail), then adjusted this according to sector overhead rules. However, the underlying buffer size is in unadjusted terms, and the ostream is adjusted. The two cannot be compared as such, which means the "positions" we get here are borked. Luckily for us (sarcasm), the position calculation in replayer made a similar error, in that it adjusts up current position by one sector overhead to much, leading to us more or less getting the same, erroneous results in both ends. However, when/iff one needs to adjust the segment file format further, one might very quickly realize that this does not work well if, say, one needs to be able to safely read some extra bytes before first chunk in a segment. Conversely, trying to adjust this also exposes a latent potential error in the skip mechanism, manifesting here. Issue fixed by keeping track of the initial ostream capacity for segment buffer, and use this for position calculation, and in the case of replayer, move file pos adjustment from read_data() to subroutine (shared with skipping), that better takes data stream position vs. file position adjustment. In implementaion terms, we first inc the "data stream" pos (i.e. pos in data without overhead), then adjust for overhead. Also fix replayer::skip, so that we handle the buffer/pos relation correctly now. Added test for intial entry position, as well as data replay consistency for single entry_writer paths.	2023-12-07 07:36:27 +00:00
Asias He	6beadab9e6	messaging_service: Introduce STREAM_BLOB and TABLET_STREAM_FILES verb They will be used to implement file stream for tablet in the future. Reserve the verb ID.	2023-12-07 14:54:12 +08:00
Asias He	67cfa12c7d	compaction_group_for_token: Handle minimum_token and maximum_token token The following error was seen: [shard 0] table - compaction_group_for_token: compaction_group idx=0 range=(minimum token,-6917529027641081857] does not contain token=minimum token Since minimum_token or maximum_token will not be inside a token range. Skip the in token range check.	2023-12-07 14:54:12 +08:00
Asias He	974b28a750	serializer: Add temporary_buffer support It will be used by file stream for tablet.	2023-12-07 09:46:37 +08:00
Asias He	faaf58f62c	cql_test_env: Allow messaging_service to start listen This is needed for rpc calls to work in the tests. With this patch, by default, messaging_service does not listen as it was before. This is useful for file stream for tablet test.	2023-12-07 09:46:36 +08:00
Avi Kivity	92d61def57	Merge 'scylla_swap_setup: run error check before allocating swap and increase swap allocation speed' from Takuya ASADA This patch fixes error check and speed up swap allocation. Following patches are included: - scylla_swap_setup: run error check before allocating swap avoid create swapfile before running error check - scylla_swap_setup: use fallocate on ext4 this inclease swap allocation speed on ext4 Closes scylladb/scylladb#12668 * github.com:scylladb/scylladb: scylla_swap_setup: use fallocate on ext4 scylla_swap_setup: run error check before allocating swap	2023-12-06 21:40:10 +02:00
Avi Kivity	55dacb8480	Merge 'Generalize atomic sstables deletion' from Pavel Emelyanov The current implementation starts in sstables_manager that gets the deletion function from storage which, in turn, should atomically do sst.unlink() over a list of sstables (s3 driver is still not atomic though #13567). This PR generalizes the atomic deletion inside sstables_manager method and removes the atomic deletor function that nobody liked when it was introduced (#13562) Closes scylladb/scylladb#16290 * github.com:scylladb/scylladb: sstables/storage: Drop atomic deleter sstables/storage: Reimplement atomic deletion in sstables_manager sstables/storage: Add prepare/complete skaffold for atomic deletion	2023-12-06 19:48:07 +02:00
Tomasz Grabiec	7d0f4c10a2	test: tablets: Add test for failed streaming being fenced away	2023-12-06 18:37:01 +01:00
Tomasz Grabiec	083a0279a9	error_injection: Introduce poll_for_message() To allow more complex waiting, which involves other exit conditions.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	ce0dc9e940	error_injection: Make is_enabled() public	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	733eb21601	api: Add API to kill connection to a particular host For testing failure scenarios.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	9dac0febce	range_streamer: Do not block topology change barriers around streaming Streaming was keeping effective_replication_map_ptr around the whole process, which blocks topology change barriers. This will inhibit progress of tablet load balancer or concurrent migrations, resulting in worse performance. Fix by switching to the most recent erm on sharder calls. multishard_writer calls shard_of() for each new partition. A better way would be to switch immediately when topology version changes, but this is left for later.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	c228f2c940	range_streamer, tablets: Do not keep token metadata around streaming It holds back global token metadata barrier during streaming, which limits parallelism of load balancing.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	7a59acf248	tablets: Fail gracefully when migrating tablet has no pending replica Before the patch we SIGSEGV trying to access pending replica in this case. Fail early instead.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	d1c1b59236	storage_service, api: Add API to disable tablet balancing Load balancing needs to be disabled before making a series of manual migrations so that we don't fight with the load balancer. Also will be used in tests to ensure tablets stick to expected locations.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	1f57d1ea28	storage_service, api: Add API to migrate a tablet Will be used in tests, or for hot fixes in production.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	31c995332c	storage_service, raft topology: Run streaming under session topology guard Prevents stale streaming operation from running beyond topology operation they were started in. After the session field is cleared, or changed to something else, the old topology_guard used by streaming is interrupted and fenced and the next barrier will join with any remaining work.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	080169cad6	storage_service, tablets: Use session to guard tablet streaming	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	5381792401	tablets: Add per-tablet session id field to tablet metadata range_streamer will pick it up when creating topology_guard. It's materialized in memory only for migrating tablets in tablet_transition_info.	2023-12-06 18:36:17 +01:00
Tomasz Grabiec	fd3c089ccc	service: range_streamer: Propagate topology_guard to receivers	2023-12-06 18:36:16 +01:00
Tomasz Grabiec	063095ea50	streaming: Always close the rpc::sink rpc::sink::~sink aborts if not closed. There is a try/catch clause which ensures that close() is called, but there was code after sink is created which is not covered by it. Move sink construction past that code.	2023-12-06 18:35:41 +01:00
Nadav Har'El	300e549267	tablets, mv: disable self-pairing when tablets are used A write to a base table can generate one or more writes to a materialized view. The write to RF base replicas need to cause writes to RF view replicas. Our MV implementation, based on Cassandra's implementation, does this via "pairing": Each one of the base replicas involved in this write sends each view update to exactly one view replica. The function get_view_natural_endpoint() tells a base replica which of the view replicas it should send the update to. The standard pairing is based on the ring order: The first owner of the base token sends to the first owner of the view token, the second to the second, and so on. However, the existing code also uses an optimization we call self-pairing: If a single node is both a base replica and a base replica, the pairing is modified so this node sends the update to itself. This patch disables the self-pairing optimization in keyspaces that use tablets: The self-pairing optimization can cause the pairing to change after token ranges are moved between nodes, so it can break base-view consistency in some edge cases, leading to "ghost rows". With tablets, these range movements become even more frequent - they can happen even if the cluster doesn't grow. This is why we want to solve this problem for tablets. For backward compatibility and to avoid sudden inconsistencies emerging during upgrades, we decided to continue using the self-pairing optimization for keyspaces that are not using tablets (i.e., using vnoodes). Currently, we don't introduce a "CREATE MATERIALIZED VIEW" option to override these defaults - i.e., we don't provide a way to disable self-pairing with vnodes or to enable them with tablets. We could introduce such a schema flag later, if we ever want to (and I'm not sure we want to). It's important to note, that in some cases, this change has implications on when view updates become synchronous, in the tablets case. For example: * If we have 3 nodes and RF=3, with the self-pairing optimization each node is paired with itself, the view update is local, and is implicitly synchronous (without requiring a "synchronous_updates" flag). * In the same setup with tablets, without the self-pairing optimization (due to this patch), this is not guaranteed. Some view updates may not be synchronous, i.e., the base write will not wait for the view write. If the user really wants synchronous updates, they should be requested explicitly, with the "synchronous_updates" view option. Fixes #16260. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16272	2023-12-06 17:11:17 +02:00
Kefu Chai	f483309165	compaction, api: drop unused functions run_on_existing_tables() is not used at all. and we have two of them. in this change, let's drop them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16304	2023-12-06 14:31:08 +02:00
Takuya ASADA	f90c10260f	scylla_post_install.sh: Add CAP_PERFMON to AmbientCapabilities Add CAP_PERFMON to AmbientCapabilities in capabilities.conf, to enable perf_event based stall detector in Seastar. However, on Debian/Ubuntu CAP_PERFMON with non-root user does not work because it sets kernel.perf_event_paranoid=4 which disallow all non-root user access. (On Debian it kernel.perf_event_paranoid=3) So we need to configure kernel.perf_event_paranoid=2 on these distros. see: https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do Also, CAP_PERFMON is only available on linux-5.8+, older kernel does not have this capability. To enable older kernel environment such as CentOS7, we need to configure kernel.perf_event_paranoid=1 to allow non-root user access even without the capability. Fixes #15743 Closes scylladb/scylladb#16070	2023-12-06 13:53:08 +02:00
Avi Kivity	3e8f37f0a4	Update seastar submodule * seastar 55a821524d...ae8449e04f (22): > Revert "Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai" > http/exception: Make unexpected status message more informative > docker: bump up to clang {16,17} and gcc {12,13} > doc: replace space (0xA0) in unicode with ASCII space (0x20) > file: Remove reactor class friendship > dpdk: adjust for poller in internal namespace > http: make_requests accept optional expected > Merge 'future: future_state_base: assert owner shard in debug mode' from Benny Halevy > Merge 'Keep pollers in internal/poll.hh' from Pavel Emelyanov > sharded: access instance promise only on instance shard > test: network_interface_test: add tests for format and parse > Merge 'reactor: merge pollfn on I/O paths into a single one' from Kefu Chai > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc (v2) > reactor: set local_engine after it is fully initialized > build: do not error when running into GCC BZ-1017852 > Merge 'shared_future: make available() immediate after set_value()' from Piotr Dulikowski > tls: add format_as(subject_alt_name_type) overload > tls: linearize small packets on send > shared_future: remove unused #include > shared_ptr: add fmt::formatter for shared_ptr types > lazy: add fmt::formatter for lazy_eval types > Merge 'file: use unbuffered generator in experimental_list_directory()' from Kefu Chai Closes scylladb/scylladb#16274	2023-12-06 13:24:53 +02:00
Kamil Braun	9b73bff752	docs: raft: mention unavailability for topology changes under quorum loss Closes scylladb/scylladb#16307	2023-12-06 13:18:28 +02:00
Botond Dénes	56c3515751	Merge 'doc: fix Rust Driver release information' from Anna Stuchlik This PR removes the incorrect information that the ScyllaDB Rust Driver is not GA. In addition, it replaces "Scylla" with "ScyllaDB". Fixes https://github.com/scylladb/scylladb/issues/16178 (nobackport) Closes scylladb/scylladb#16199 * github.com:scylladb/scylladb: doc: remove the "preview" label from Rust driver doc: fix Rust Driver release information	2023-12-06 08:59:49 +02:00
Botond Dénes	d2a88cd8de	Merge 'Typos: fix typos in code' from Yaniv Kaul Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16289 * github.com:scylladb/scylladb: Update unified/build_unified.sh Update main.cc Update dist/common/scripts/scylla-housekeeping Typos: fix typos in code	2023-12-06 07:36:41 +02:00
Avi Kivity	12f160045b	Merge 'Get rid of fb_utilities' from Benny Halevy utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address. As part of the effort to get rid of all global state, this series gets rid of fb_utilities. This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address. Closes scylladb/scylladb#16250 * github.com:scylladb/scylladb: treewide: get rid of now unused fb_utilities tracing: use locator::topology rather than fb_utilities streaming: use locator::topology rather than fb_utilities raft: use locator::topology/messaging rather than fb_utilities storage_service: use locator::topology rather than fb_utilities storage_proxy: use locator::topology rather than fb_utilities service_level_controller: use locator::topology rather than fb_utilities misc_services: use locator::topology rather than fb_utilities migration_manager: use messaging rather than fb_utilities forward_service: use messaging rather than fb_utilities messaging_service: accept broadcast_addr in config rather than via fb_utilities messaging_service: move listen_address and port getters inline test: manual: modernize message test table: use gossiper rather than fb_utilities repair: use locator::topology rather than fb_utilities dht/range_streamer: use locator::topology rather than fb_utilities db/view: use locator::topology rather than fb_utilities database: use locator::topology rather than fb_utilities db/system_keyspace: use topology via db rather than fb_utilities db/system_keyspace: save_local_info: get broadcast addresses from caller db/hints/manager: use locator::topology rather than fb_utilities db/consistency_level: use locator::topology rather than fb_utilities api: use locator::topology rather than fb_utilities alternator: ttl: use locator::topology rather than fb_utilities gossiper: use locator::topology rather than fb_utilities gossiper: add get_this_endpoint_state_ptr test: lib: cql_test_env: pass broadcast_address in cql_test_config init: get_seeds_from_db_config: accept broadcast_address locator: replication strategies: use locator::topology rather than fb_utilities locator: topology: add helpers to retrieve this host_id and address snitch: pass broadcast_address in snitch_config snitch: add optional get_broadcast_address method locator: ec2_multi_region_snitch: keep local public address as member ec2_multi_region_snitch: reindent load_config ec2_multi_region_snitch: coroutinize load_config ec2_snitch: reindent load_config ec2_snitch: coroutinize load_config thrift: thrift_validation: use std::numeric_limits rather than fb_utilities	2023-12-05 19:40:14 +02:00
Eliran Sinvani	d1aaca893c	install-dependencies.sh: Complete the pip install logic install-dependencies.sh includes a list of pip packages that the build environment requires. This functionality was added in `729d0feef0`, however, the actual use of the list is missing and instead the `pip install` commands are hard coded into the logic. This change complete the transition to pip-packages list. It includes also modifying the `pip_packages` array to include a constrain (if needed) for every package. Fixes #16269 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes scylladb/scylladb#16282	2023-12-05 16:35:31 +02:00
Benny Halevy	0bcce35abd	treewide: get rid of now unused fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:22:49 +02:00
Benny Halevy	f8a957898b	tracing: use locator::topology rather than fb_utilities Get my_address via query_processor->proxy and pass it to all static make_ methods, instead of getting it from utils::fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:22:15 +02:00
Benny Halevy	6f7de427f0	streaming: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:12:11 +02:00
Anna Stuchlik	409e20e5ab	doc: enabling experimental Raft-managed topology This commit adds a short paragraph to the Raft page to explain how to enable consistent topology updates with Raft - an experimental feature in version 5.4. The paragraph should satisfy the requirements for version 5.4. The Raft page will be rewritten in the next release when consistent topology changes with Raft will be GA. Fixes https://github.com/scylladb/scylladb/issues/15080 Requires backport to branch-5.4. Closes scylladb/scylladb#16273	2023-12-05 14:49:17 +01:00
Pavel Emelyanov	b9abd504be	sstables/storage: Drop atomic deleter Now the deleter function is not in use and can be dropped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 16:47:52 +03:00
Pavel Emelyanov	604279f064	sstables/storage: Reimplement atomic deletion in sstables_manager Right now the atomic deletion is called on manager, but it gets the actual deletion function from storage and off-loads the deletion to it. This patch makes the manager fully responsible for the delition by implemeting the sequence of auto ctx = storage.prepare() for sst in sstables: sst.unlink() storage.complate(ctx) Storage implementations provide the prepare/complete methods. The filesystem storage does it via deletion log and the s3 storage is still not atomic :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 16:46:01 +03:00
Pavel Emelyanov	4ecf4c4a6a	sstables/storage: Add prepare/complete skaffold for atomic deletion The atomic deletion is going to look like auto ctx = storage.prepare() for sst in sstables: sst.unlink() storage.complate(ctx) and this patch prepares the class storage for that by extending it with prepare and complete methods. The opaque ctx object is also here Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 16:44:13 +03:00
Yaniv Kaul	fef565482c	Update unified/build_unified.sh fix sentence overall	2023-12-05 15:23:38 +02:00
Yaniv Kaul	8f97429b16	Update main.cc fix sentence overall, not just the typo	2023-12-05 15:21:48 +02:00
Yaniv Kaul	f2b810a16a	Update dist/common/scripts/scylla-housekeeping cobvert -> convert	2023-12-05 15:20:35 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Tomasz Grabiec	0e42fe4c3c	storage_service: Introduce concept of a topology_guard topology_guard is used to track distributed operations started by the topology change coordinator, e.g. streaming, to make sure that those operations have no side effects after topology change coordinator moved to the next migration stage, of a given tablet or of the whole ring. topology_guard can be sent over the wire in the form of frozen_topology_guard. It can be materialized again on the other side. While in transit, it doesn't block the coordinator barriers. But if the coordinator moved on, materialization of the guard will fail. So tracking safety is preserved. In this patch, the guard implementation is based on tracking work under global sessions, but the concept is flexible and other mechanisms can be used without changing user code.	2023-12-05 14:09:35 +01:00
Tomasz Grabiec	d3d83869ce	storage_service: Introduce session concept	2023-12-05 14:09:34 +01:00
Tomasz Grabiec	2d4cd9c574	tablets: Fix topology_metadata_guard holding on to the old erm Since abort callbacks are fired synchronously, we must change the table's erm before we do that so that the callbacks obtain the new erm. Otherwise, we will block barriers.	2023-12-05 14:09:34 +01:00
Tomasz Grabiec	6cd310fc1a	docs: Document the topology_guard mechanism	2023-12-05 14:09:34 +01:00
Botond Dénes	5fb0d667cb	tools/scylla-sstable: always read scylla.yaml Currently, scylla.yaml is read conditionally, if either the user provided `--scylla-yaml-file` command line parameter, or if deducing the data dir location from the sstable path failed. We want the scylla.yaml file to be always read, so that when working with encrypted file (enterprise), scylla-sstable can pick up the configuration for the encryption. This patch makes scylla-sstable always attempt to read the scylla-yaml file, whether the user provided a location for it or not. When not, the default location is used (also considering the `SCYLLA_CONF` and `SCYLLA_HOME` environment variables. Failing to find the scylla.yaml file is not considered an error. The rational is that the user will discover this if they attempt to do an operation that requires this anyway. There is a debug-level log about whether it was successfully read or not. Fixes: #16132 Closes scylladb/scylladb#16174	2023-12-05 15:06:29 +02:00
Kefu Chai	2ebdc40b0b	docs: add Deprecated to value_status_count despite that the "value_status_count" is not rendered/used yet, it'd be better to keep it in sync with the code. since `5fd30578d7` added "Deprecated" to `value_status` enum, let's update the sphinx extension accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16236	2023-12-05 14:52:13 +02:00
Avi Kivity	4498979b14	Merge 'When discarding table's sstables, delete them in one atomic batch' from Pavel Emelyanov The table::discard_sstables() removes sstables attached to a table. For that it tries to atomically delete _each_ suitable sstable, which is a bit heavyweight -- each atomic deletion operation results in a deletion log file written. This PR deletes all table's sstables in one atomic batch. While at it, the body of the discard_sstables is simplified not to allocate the "pruner" object. The latter is possible after the method had become coroutine Closes scylladb/scylladb#16202 * github.com:scylladb/scylladb: discard_sstables: Atomically delete all sstables discard_sstables: Indentation and formatting fix after previous patch discard_sstable: Open-code local prune() lambda discard_sstables: Do not allocate pruner	2023-12-05 14:17:06 +02:00
Kamil Braun	1763c65662	system_keyspace: make `get/set_scylla_local_param` public We'll use it outside `system_keyspace` code in later commit.	2023-12-05 13:03:29 +01:00
Kamil Braun	07984215a3	feature_service: add `GROUP0_SCHEMA_VERSIONING` feature This feature, when enabled, will modify how schema versions are calculated and stored. - In group 0 mode, schema versions are persisted by the group 0 command that performs the schema change, then reused by each node instead of being calculated as a digest (hash) by each node independently. - In RECOVERY mode or before Raft upgrade procedure finishes, when we perform a schema change, we revert to the old digest-based way, taking into account the possibility of having performed group0-mode schema changes (that used persistent versions). As we will see in future commits, this will be done by storing additional flags and tombstones in system tables. By "schema versions" we mean both the UUIDs returned from `schema::version()` and the "global" schema version (the one we gossip as `application_state::SCHEMA`). For now, in this commit, the feature is always disabled. Once all necessary code is setup in following commits, we will enable it together with Raft.	2023-12-05 13:03:28 +01:00
Benny Halevy	6c00c9a45d	raft: use locator::topology/messaging rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 13:26:46 +02:00
Benny Halevy	b3bede8141	storage_service: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 13:23:27 +02:00
Kamil Braun	52ae6b8738	Merge 'fix shutdown order between group0 and storage service' from Gleb Storage service uses group0 internally, but group0 is create long after storage service is initialized and passed to it using ss::set_group0() function. What it means is that during shutdown group0 is destroyed before ss::stop() is called and thus storage service is left with a dangling reference. Fix it by introducing a function that cancels all group0 operations and waits for background fibers to complete. For that we need separate abort source for group0 operation which the patch series also introduces. * 'gleb/group0-ss-shutdown' of github.com:scylladb/scylla-dev: storage_service: topology coordinator: ignore abort_requested_exception in background fibers storage_service: fix de-initialization order between storage service and group0_service	2023-12-05 11:20:52 +01:00
Kefu Chai	e88bd9c5bd	gms/inet_address: pass sstring param by std::move() less overhead this way. the caller of lookup() always passes a rvalue reference. and seastar::dns::get_host_by_name() actually moves away from the parameter, so let's pass by std::move() for slightly better performance, and to match the expectation of the underlying seastar API. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16280	2023-12-05 12:05:21 +03:00
Benny Halevy	a529097d96	storage_proxy: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:44:13 +02:00
Benny Halevy	0b310c471c	service_level_controller: use locator::topology rather than fb_utilities Expose cql3::query_processor in auth::service to get to the topology via storage_proxy.replica::database Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:17:47 +02:00
Pavel Emelyanov	9bbbe7a99f	discard_sstables: Atomically delete all sstables When collected sstables are deleted each is passed into sstables_manager.delete_atomically(). For on-disk sstables this creates a deletion log for each removed stable, which is quite an overkill. The atomic deletion callback already accepts vector of shared sstables, so it's simpler (and a bit faster) to remove them all in a batch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:14:23 +03:00
Pavel Emelyanov	96bc530a57	discard_sstables: Indentation and formatting fix after previous patch By "formatting" fix I mean -- remove the temporary on-stack references that were left for the ease of patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	6d135fea43	discard_sstable: Open-code local prune() lambda The lambda in question was the struct pruner method and was left there for the ease of patching. Now, when this lambda is only called once inside the function it is declared in, it can be open-coded into the place where it's called Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Pavel Emelyanov	68cb2e66fc	discard_sstables: Do not allocate pruner This allocation remained from the pre-coroutine times of the method. Now the contents of prumer -- refernce on table, vector and replay_position can reside on coroutine frame Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-05 11:13:40 +03:00
Benny Halevy	0e5754adc6	misc_services: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:01:36 +02:00
Benny Halevy	d49d10dbdb	migration_manager: use messaging rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:48:33 +02:00
Benny Halevy	860b2d38c6	forward_service: use messaging rather than fb_utilities Use _forwarder._messaging to get to the broadcast address rather than the global fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:48:12 +02:00
Benny Halevy	984a576405	messaging_service: accept broadcast_addr in config rather than via fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:46:25 +02:00
Benny Halevy	586f35bb55	messaging_service: move listen_address and port getters inline And make them const noexcept. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:44:41 +02:00
Benny Halevy	eabd4570da	test: manual: modernize message test Basically, make it work (great) again. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:44:26 +02:00
Benny Halevy	f9acc90926	table: use gossiper rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:43:47 +02:00
Benny Halevy	6826d87052	repair: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:09:06 +02:00
Benny Halevy	e1239e63bf	dht/range_streamer: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 09:01:31 +02:00
Benny Halevy	63b556123b	db/view: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:55:46 +02:00
Benny Halevy	f40bb7c583	database: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	64145388c9	db/system_keyspace: use topology via db rather than fb_utilities So not to rely on fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	4bb4d673c3	db/system_keyspace: save_local_info: get broadcast addresses from caller So not to rely on fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	6e79d647e6	db/hints/manager: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	4c20b84680	db/consistency_level: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	e5d3c6741f	api: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	03fe674314	alternator: ttl: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	f3e0358563	gossiper: use locator::topology rather than fb_utilities And add `get_endpoint_state_ptr` for this_node. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	25754f843b	gossiper: add get_this_endpoint_state_ptr Returns this node's endpoint_state_ptr. With this entry point, the caller doesn't need to get_broadcast_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	21ace44f03	test: lib: cql_test_env: pass broadcast_address in cql_test_config For getting rid of fb_utilities. In the future, that could be used to instantiate multiple scylla node instances. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	3c846d3801	init: get_seeds_from_db_config: accept broadcast_address Pass the broadcast_address from main to get_seeds_from_db_config rather than getting it from fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	4d461fc788	locator: replication strategies: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	86716b2048	locator: topology: add helpers to retrieve this host_id and address And respective `is_me()` predicates, to prepare for getting rid of fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	52412087b7	snitch: pass broadcast_address in snitch_config To untangle snitch from fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	94fc8e2a9a	snitch: add optional get_broadcast_address method and set broadcast_address / broadcast_rpc_address in main to remove this dependency of snitch on fb_utilities. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	1d0e71308b	locator: ec2_multi_region_snitch: keep local public address as member To be used in the next patch to retrieve the broadcast_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	90af71ffa7	ec2_multi_region_snitch: reindent load_config Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	fecb597ad6	ec2_multi_region_snitch: coroutinize load_config Now that ec2_snitch::load_config is a coroutine there's no need for a seastar thread here either. Refs #16241 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	cb7e096a59	ec2_snitch: reindent load_config Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	1c1a048d3f	ec2_snitch: coroutinize load_config Fixes #16241 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:48 +02:00
Benny Halevy	9e1dd78539	thrift: thrift_validation: use std::numeric_limits rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:48 +02:00
Kefu Chai	50332f796e	script/base36-uuid.py: interpret timestamp with Gregorian calendar UUID v1 uses an epoch derived frmo Gregorian calendar. but base36-uuid.py interprets the timestamp with the UNIX epoch time. that's why it prints a UUID like ```console $ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx date = 2411-02-16 16:05:52 decimicro_seconds = 0x7ad550 lsb = 0xafe141a195fe0d59 ``` even this UUID is generated on nov 30, 2023. so in this change, we shift the time with the timestamp of UNIX epoch derived from the Gregorian calendar's day 0. so, after this change, we have: ```console $ ./scripts/base36-uuid.py -d 3gbi_0mhs_4sjf42oac6rxqdsnyx date = 2023-11-30 16:05:52 decimicro_seconds = 0x7ad550 lsb = 0xafe141a195fe0d59 ``` see https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.4 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16235	2023-12-05 07:39:34 +02:00
Anna Stuchlik	97244eb68e	doc: add metric upgrade info to the 5.4 upgrade This commit adds the information about metrics update to the 5.2-to-5.4 upgrade guide. Fixes https://github.com/scylladb/scylladb/issues/15966 Closes scylladb/scylladb#16161	2023-12-05 07:36:29 +02:00
Kefu Chai	3608d9be97	gms/inet_address: remove unused '#include' neither <iomanip> nor "utils/to_string.hh" is used in `gms/inet_address.cc`, so let's remove their "#include"s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16281	2023-12-05 08:30:03 +03:00
Kurashkin Nikita	1438e531f8	cql3: statement_restrictions: cartesian product size error message fix. This commit fixes: 1.The error message will be specific about what type of keys exceeds the limit (e.g clustering keys or partition keys). 2.Error message will be more general about what causes it, cartesian product or simple list. 3.Error message will advise to use --max-partition-key-restrictions-per-query or --max-clustering-key-restrictions-per-query configuration options to override current (100) limit. Fixes #15627 Closes scylladb/scylladb#16226	2023-12-05 07:27:03 +02:00
Kefu Chai	a03be17da7	test/boost/sstable_generation_test: s/LE/LT/ when appropriate in `7a1fbb38`, a new test is added to an existing test for comparing the UUIDs with different time stamps, but we should tighten the test a little bit to reflect the intention of the test: the timestamp of "2023-11-24 23:41:56" should be less than "2023-11-24 23:41:57". in this change, we replace LE with LT to correct it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16245	2023-12-05 08:25:04 +03:00
Anna Stuchlik	1e80bdb440	doc: fix rollback in the 4.6-to-5.0 upgrade guide This commit fixes the rollback procedure in the 4.6-to-5.0 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4, branch-5.2, and branch-5.1 Closes scylladb/scylladb#16155	2023-12-05 07:17:49 +02:00
Anna Stuchlik	52c2698978	doc: fix rollback for RHEL (install) in 5.4 This commit fixes the installation command in the Rollback section for RHEL/Centos in the 5.2-5.4 upgrade guide. It's a follow-up to https://github.com/scylladb/scylladb/pull/16114 where the command was not updated. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4. Closes scylladb/scylladb#16156	2023-12-05 07:17:14 +02:00
Anna Stuchlik	91cddb606f	doc: fix rollback in the 5.1-to-5.2 upgrade guide This commit fixes the rollback procedure in the 5.1-to-5.2 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Also, I've the section removed the rollback section for images, as it's not correct or relevant. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4 and branch-5.2. Closes scylladb/scylladb#16152	2023-12-05 07:16:44 +02:00
Anna Stuchlik	7ad0b92559	doc: fix rollback in the 5.0-to-5.1 upgrade guide This commit fixes the rollback procedure in the 5.0-to-5.1 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Also, I've the section removed the rollback section for images, as it's not correct or relevant. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4, branch-5.2, and branch-5.1 Closes scylladb/scylladb#16154	2023-12-05 07:15:41 +02:00
Patryk Jędrzejczak	c8ee7d4499	db: make schema commitlog feature mandatory Using consistent cluster management and not using schema commitlog ends with a bad configuration throw during bootstrap. Soon, we will make consistent cluster management mandatory. This forces us to also make schema commitlog mandatory, which we do in this patch. A booting node decides to use schema commitlog if at least one of the two statements below is true: - the node has `force_schema_commitlog=true` config, - the node knows that the cluster supports the `SCHEMA_COMMITLOG` cluster feature. The `SCHEMA_COMMITLOG` cluster feature has been added in version 5.1. This patch is supposed to be a part of version 6.0. We don't support a direct upgrade from 5.1 to 6.0 because it skips two versions - 5.2 and 5.4. So, in a supported upgrade we can assume that the version which we upgrade from has schema commitlog. This means that we don't need to check the `SCHEMA_COMMITLOG` feature during an upgrade. The reasoning above also applies to Scylla Enterprise. Version 2024.2 will be based on 6.0. Probably, we will only support an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even if we support an upgrade from 2023.x, this patch won't break anything because 2023.1 is based on 5.2, which has schema commitlog. Upgrades from 2022.x definitely won't be supported. When we populate a new cluster, we can use the `force_schema_commitlog=true` config to use schema commitlog unconditionally. Then, the cluster feature check is irrelevant. This check could fail because we initiate schema commitlog before we learn about the features. The `force_schema_commitlog=true` config is especially useful when we want to use consistent cluster management. Failing feature checks would lead to crashes during initial bootstraps. Moreover, there is no point in creating a new cluster with `consistent_cluster_management=true` and `force_schema_commitlog=false`. It would just cause some initial bootstraps to fail, and after successful restarts, the result would be the same as if we used `force_schema_commitlog=true` from the start. In conclusion, we can unconditionally use schema commitlog without any checks in 6.0 because we can always safely upgrade a cluster and start a new cluster. Apart from making schema commitlog mandatory, this patch adds two changes that are its consequences: - making the unneeded `force_schema_commitlog` config unused, - deprecating the `SCHEMA_COMMITLOG` feature, which is always assumed to be true. Closes scylladb/scylladb#16254	2023-12-04 21:02:16 +02:00
Calle Wilund	75a8be5b87	commitlog.hh: Fix numeric constant for file format version 3 to be actual '3' Fixes #16277 When the PR for 'tagged pages' was submitted for RFC, it was assumed that PR #12849 (compression) would be committed first. The latter introduced v3 format, and the format in #12849 (tagged pages) was assumed to have to be bumped to 4. This ended up not the case, and I missed that the code went in with file format tag numeric value being '4' (and constant named v3). While not detrimental, it is confusing, and should be changed asap (before anything depends on files with the tag applied). Closes scylladb/scylladb#16278	2023-12-04 21:01:44 +02:00
Calle Wilund	e94070db64	commitlog_test: Add test for commit log replay skip past EOF Refs #15269 Unit test to check that trying to skip past EOF in a borked segment will not crash the process. file_data_input_impl asserts iff caller tries this.	2023-12-04 20:50:42 +02:00
Takuya ASADA	6eb9344cb3	dist: introduce scylla-tune-sched.service to tune kernel scheduler On /usr/lib/sysctl.d/99-scylla-sched.conf, we have some sysctl settings to tune the scheduler for lower latency. This is mostly to prevent softirq threads processing tcp and reactor threads from injecting latency into each other. However, these parameters are moved to debugfs from linux-5.13+, so we lost scheduler tuneing on recent kernels. To support tuning recent kernel, let's add a new service which support to configure both sysctl and debugfs. The service named scylla-tune-sched.service The service will unconditionally enables when installed, on older kernel it will tune via sysctl, on recent kernel it will tune via debugfs. Fixes #16077 Closes scylladb/scylladb#16122	2023-12-04 19:29:46 +02:00
Kefu Chai	3ffd8737e4	gms/inet_address: format gms::inet_address via net::inet_address in `4ea6e06c`, we specialized fmt::formatter<gms::inet_address> using the formatter of bytes if the underlying address is an IPv6 address. this breaks the tests with JMX which expected the shortened form of the text representation of the IPv6 address. in this change, instead of reinventing the wheel, let's reuse the existing formatter of net::inet_address, which is able to handle both IPv4 and IPv6 addresses, also it follows https://datatracker.ietf.org/doc/html/rfc5952 by compressing the consecutive zeros. since this new formatter is a thin wrapper of seastar::net::inet_addresss, the corresponding unit test will be added to Seastar. Refs #16039 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16267	2023-12-04 19:24:00 +02:00
Kefu Chai	28906725df	repair: add formatter for row_level_diff_detect_algorithm before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for row_level_diff_detect_algorithm. but its operator<<() is preserved, as we are still using our homebrew the generic formatter for std::vector, and this formatter is still using operator<< for formatting the elements in the vector. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16248	2023-12-04 18:59:52 +02:00
Yaniv Kaul	21cce458d8	test: alternator: fix typo passs instead of pass in test_gsi.py Fix a typo. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16258	2023-12-04 18:58:31 +02:00
Avi Kivity	c1d0baf11a	Merge 'build: add an option to create building system with CMake' from Kefu Chai as part of the efforts to migrate to the CMake-based building system, this change enables us to `configure.py` to optionally create `build.ninja` with CMake. in this change, we add a new option named `--use-cmake` to `configure.py` so we can create `build.ninja`. please note, instead of using the "Ninja" generator used by Seastar's `configure.py` script, we use "Ninja Multi-Config" generator along with `CMAKE_CROSS_CONFIGS` setting in this project. so that we can generate a `build.ninja` which is capable of building the same artifacts with multiple configuration. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15916 * github.com:scylladb/scylladb: build: cmake: add compatibility target of dev-headers build: add an option to use CMake as the build build system	2023-12-04 18:51:24 +02:00
Kefu Chai	3a8a3100af	raft: add formatter for raft::logical_clock::time_point before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we * define a formatter for logical_clock::time_point, as fmt does not provide formatter for this time_point, as it is not a part of the standard library * remove operator<<() for logical_clock::time_point, as its soly purpose is to generate the corresponding fmt::formatter when FMT_DEPRECATED_OSTREAM is defined. * remove operator<<() for logical_clock::duration, as fmt provides a default implementation for formatting std::chrono::nanoseconds already, which uses `int64_t` as its rep template parameter as well. * include "fmt/chrono.h" so that the source files including this header can have access the formatter without including it by themselves, this preserve the existing behavior which we have before removal of "operator<<()". Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16263	2023-12-04 18:32:03 +02:00
Nadav Har'El	4505a86f46	tablets, mv: fix base-view pairing to consider base replication map In the view update code, the function get_view_natural_endpoint() determines which view replica this base replica should send an update to. It currently gets the view table's replication map (i.e., the map from view tokens to lists of replicas holding the token), but assumes that this is also the base table's replication map. This assumption was true with vnodes, but is no longer true with tablets - the base table's replication map can be completely different from the view table's. By looking at the wrong mapping, get_view_natural_endpoint() can believe that this node isn't really a base-replica and drop the view update. Alternatively, it can think it is a base replica - but use the wrong base-view pairing and create base-view inconsistencies. This patch solves this bug - get_view_natural_endpoint() now gets two separate replication maps - the base's and the view's. The callers need to remember what the base table was (in some cases they didn't care at the point of the call), and pass it to the function call. This patch also includes a simple test that reproduces the bug, and confirms it is fixed: The test has a 6-node cluster using tablets and a base table with RF=1, and writes one row to it. Before this patch, the code usually gets confused, thinking the base replica isn't a replica and loses the view update. With this patch, the view update works. Fixes #16227. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16228	2023-12-04 16:38:54 +02:00
Avi Kivity	60af2f3cb2	Merge 'New commitlog file format using tagged pages' from Calle Wilund Prototype implementation of format suggested/requested by @avikivity: Divides segments into disk-write-alignment sized pages, each tagged with segment ID + CRC of data content. When read, we both verify sector integrity (CRC) to detect corruption, as well as matching ID read with expected one. If the latter mismatches we have a prematurely terminated segment (read truncation), which, depending on whether the CL is written in batch or periodic mode, as well as explicit sync, can mean data loss. Note: all-zero pages are treated as kosher, both to align with newly allocated segments, as well as fully terminated (zero-page) ones. Note: This is a preview/RFC - the rest of the file format is not modified. At least parts of entry CRC could probably be removed, but I have not done so yet (needs some thinking). Note: Some slight abstraction breaks in impl. and probably less than maximal efficiency. v2: * Removed entry CRC:s in file format. * Added docs on format v3 * Added one more test for recycling-truncation v3: * Fixed typos in size calc and docs * Changed sect metadata order * Explicit iter type Closes scylladb/scylladb#15494 * github.com:scylladb/scylladb: commitlog_test: Add test for replaying large-ish mutation commitlog_test: Add additional test for segmnent truncation docs: Add docs on commitlog format 3 commitlog: Remove entry CRC from file format commitlog: Implement new format using CRC:ed sectors commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges fragmented_temporary_buffer: Add const iterator access to underlying buffers commitlog_replayer: differentiate between truncated file and corrupt entries	2023-12-04 13:31:13 +01:00
Avi Kivity	8fa2e3ad2a	Merge 'Remove sstables::remove_by_toc_name()' from Pavel Emelyanov The helper in question complicates the logic of sstable_directory::process() by making garbage collection differently for sstables deleted "atomically" and deleted "one-by-one". Also, the code that deletes sstables one-by-one and uses remove_by_toc_name() renders excessive TOC file reading, because there's sstable object at hand and it had all_components() ready for use. Surprisingly, there was no test for the deletion-log functionality. This PR adds one. The test passes before the g.c. and regular unlink fix, and (of course) continues passing after it. Closes scylladb/scylladb#16240 * github.com:scylladb/scylladb: sstables: Drop remove_by_name() sstables/fs_storage: Wipe by recognized+unrecognized components sstable_directory: Enlight deletion log replay sstables: Split remove_by_toc_name() test: Add test case to validate deletion log work sstable_directory: Close dir on exception sstable_directory: Fix indentation after previous patch sstable_directory: Coroutinize delete_with_pending_deletion_log() test: Sstable on_delete() is not necessarily in a thread sstable_directory: Split delete_with_pending_deletion_log()	2023-12-03 17:29:34 +02:00
Wojciech Mitros	a8c9451fb2	commitlog: add max disk size api Currently, the max size of commitlog is obtained either from the config parameter commitlog_total_space_in_mb or, when the parameter is -1, from the total memory allocated for Scylla. To facilitate testing of the behavior of commitlog hard limit, expose the value of commitlog max_disk_size in a dedicated API. Closes scylladb/scylladb#16020	2023-12-03 17:16:58 +02:00
Kefu Chai	39b2ee9751	dist/redhat: avoid mixed use of spaces and tabs rpmlint complains about "mixed-use-of-spaces-and-tabs". and it does not good in the editor. so let's replace tab with spaces. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16246	2023-12-03 17:11:03 +02:00
Nadav Har'El	59ff27ea4a	Merge 'Typos: fix typos in comments' from Yaniv Kaul Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Closes scylladb/scylladb#16257 * github.com:scylladb/scylladb: Update service/topology_state_machine.hh Update raft/tracker.hh Update db/view/view.cc Typos: fix typos in comments	2023-12-03 11:23:51 +02:00
Yaniv Kaul	030d421931	Update service/topology_state_machine.hh	2023-12-03 10:08:11 +02:00
Yaniv Kaul	7c4b742583	Update raft/tracker.hh	2023-12-03 10:07:55 +02:00
Yaniv Kaul	2b73793a39	Update db/view/view.cc	2023-12-03 10:07:45 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Kamil Braun	01e54f5b12	Merge 'test: delete topology_raft_disabled suite' from Patryk Jędrzejczak This PR is a necessary step to fix #15854 -- making consistent cluster management mandatory on master. Before making consistent cluster management mandatory, we have to get rid of all tests that depend on the `consistent_cluster_management=false` config. These are the tests in the `topology_raft_disabled` suite. There's the internal Raft upgrade procedure, which is the bulk of the upgrade logic. Then, there are two thin "layers" around it that invoke it underneath: recovery procedure and enable-raft-in-the-cluster procedure. We're getting rid of the second one by making Raft always enabled, so we naturally have to get rid of tests that depend on it. The idea is to replace every necessary enable-raft-in-the-cluster procedure in these tests with the recovery procedure. Then, we will still be testing the internal Raft upgrade procedure in the in-tree tests. The enable-raft-in-the-cluster procedure is already tested by QA tests, so we don't need to worry about these changes. Unfortunately, we cannot adapt `test_raft_upgrade_no_schema`. After making consistent cluster management mandatory on master, schema commitlog will also become mandatory because `consistent_cluster_management: True`, `force_schema_commit_log: False` is considered a bad configuration. These changes will make `test_raft_upgrade_no_schema` unimplementable in the Scylla repo. Therefore, we remove this test. If we want to keep it, we must rewrite it as an upgrade dtest. After making all tests in `topology_raft_disabled` use consistent cluster management, there is no point in keeping this suite. Therefore, we delete it and move all the tests to `topology_custom`. Closes scylladb/scylladb#16192 * github.com:scylladb/scylladb: test: delete topology_raft_disabled suite test: topology_raft_disabled: move tests to topology_custom suite test: topology_raft_disabled: move utils to topology suite test: topology_raft_disabled: use consistent cluster management test: topology_raft_disabled: add new util functions test: topology_raft_disabled: delete test_raft_upgrade_no_schema	2023-12-01 17:11:32 +01:00
Pavel Emelyanov	17fd558df8	sstables: Drop remove_by_name() It was used by deletion log replay and by storage wipe, now it's unused Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	4405a625f6	sstables/fs_storage: Wipe by recognized+unrecognized components Currently wiping fs-backed sstable happens via reading and parsing its TOC file back. Then the three-step process goes: - move TOC -> TOC.tmp - remove components (obtained from TOC.tmp) - remove TOC.tmp However, wiping sstable happens in one of two cases -- the sstable was loaded from the TOC file _or_ sstable had evaluated the needed components and generated TOC file. With that, the 2nd step can be made without reading the TOC file, just by looking at all components sitting on the sstable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	de931702ec	sstable_directory: Enlight deletion log replay Garbage collection of sstables is scattered between two strages -- g.c. per-se and the regular processing. The former stage collects deletion logs and for each log found goes ahead and deletes the full sstable with the standard sequence: - move TOC -> TOC.tmp - remove components - remove TOC.tmp The latter stage picks up partially unlinked sstables that didn't go via atomic deletion with the log. This comes as - collect all components - keep TOC's and TOC.tmp's in separate lists - attach other components to TOC/TOC.tmp by generation value - for all TOC.tmp's get all attached components and remove them - continue loading TOC's with attached components Said that, replaying deletion log can be as light as just the first step out of the above sequence -- just move TOC to TOC.tmp. After that the regular processing would pick the remaining components and clean them Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	5ff5946520	sstables: Split remove_by_toc_name() The helper consists of three phases: - move TOC -> TOC.tmp - remove components listed in TOC - remove TOC.tmp The first step is needed separately by the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	b10ca96e07	test: Add test case to validate deletion log work The test sequence is - create several sstables - create deletion log for a sub-set of them - partially unlink smaller sub-sub-set - make sstable directory do the processing with g.c. - check that the sstables loaded do NOT include the deleted ones The .throw_on_missing_toc bit set additionally validates that the directory doesn't contain garbage not attached to any other TOCs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 18:20:20 +03:00
Pavel Emelyanov	fcf080b63b	sstable_directory: Close dir on exception When committing the deletion log creation its containing directory is sync-ed via opened file. This place is not exception safe and directory can be left unclosed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	bb167dcca5	sstable_directory: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	28b1289d4b	sstable_directory: Coroutinize delete_with_pending_deletion_log() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	92f0aa04d0	test: Sstable on_delete() is not necessarily in a thread One of the test cases injects an observer into sstable->unlink() method via its _on_delete() callback. The test's callback assumes that it runs in an async context, but it's a happy coincidence, because deletion via the deletion log runs so. Next patch is changing it and the test case will no longer work. But since it's a test case it can just directly call a libc function for its needs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:38 +03:00
Pavel Emelyanov	ed043e5762	sstable_directory: Split delete_with_pending_deletion_log() The helper consists of three parts -- prepare the deletion log, unlink sstables and drop the deletion log. For testing the first part is needed as a separate step, so here's this split. It renders two nested async contexts, but it will change soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-01 15:00:37 +03:00
Nadav Har'El	bae6f3387f	CODEOWNERS: remove some entries The ".github/CODEOWNERS" is used by github to recommend reviewers for pull requests depending on the directories touched in the pull request. Github ignores entries on that file who are not maintainers. Since Jan is no longer a Scylla maintainer, I remove his entries in the list. Additionally, I am removing myself from some of the directories. For many years, it was an (unwritten) policy that experienced Scylla developers are expected to help in reviewing pieces of the code they are familiar with - even if they no longer work on that code today. But as ScyllaDB the company grew, this is no longer true; The policy is now that experienced developers are requested review only code in their own or their team's area of responsibility - experienced developers should help review designs of other parts, but not the actual code. For this reason I'm removing my name from various directories. I can still help review such code if asked specifically - but I will no longer be the "default" reviewer for such code. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16239	2023-11-30 20:29:05 +02:00
Tomasz Grabiec	c64ae7b733	scripts: Introduce tablet-mon.py Closes scylladb/scylladb#15512	2023-11-30 19:15:36 +02:00
Nadav Har'El	49860952f9	Merge '`LIST EFFECTIVE SERVICE LEVEL` statement' from Michał Jadwiszczak Add `LIST EFFECTIVE SERVICE LEVEL` statement to be able to display from which service level come which service level options. Example: There are 2 roles: role1 and role2. Role1 is assigned with sl1 (timeout = 2s, workload_type = interactive) and role2 is assigned with sl2 (timeout = 10s, workload_type = batch). Then, if we grant role1 to role2, the user with role2 will have 2s timeout (from sl1) and batch workload type (from sl2). ``` > LIST EFFECTIVE SERVICE LEVEL OF role2; service_level_option \| effective_service_level \| value ----------------------+-------------------------+------------- workload_type \| sl2 \| batch timeout \| sl1 \| 2s ``` Fixes: https://github.com/scylladb/scylladb/issues/15604 Closes scylladb/scylladb#14431 * github.com:scylladb/scylladb: cql-pytest: add `LIST EFFECTIVE SERVICE LEVEL OF` test docs: add `LIST EFFECTIVE SERVICE LEVEL` statement docs cql3:statements: add `LIST EFFECTIVE SERVICE LEVEL` statement service:qos: add option to include effective names to SLO	2023-11-30 18:12:52 +02:00
Gleb Natapov	3ddc1458ee	storage_service: topology coordinator: ignore abort_requested_exception in background fibers The exception may be thrown by "event" CV during shutdown.	2023-11-30 17:52:40 +02:00
Gleb Natapov	8ed8b151da	storage_service: fix de-initialization order between storage service and group0_service Storage service uses group0 internally, but group0 is create long after storage service is initialized and passed to it using ss::set_group0() function. But what it means is that during shutdown group0 is destroyed before ss::stop() is called and thus storage service is left with a dangling reference. Fix it by introducing a function that cancels all group0 operations and waits for background fibers to complete. For that we need separate abort source for group0 operation which the patch also introduces.	2023-11-30 17:52:38 +02:00
Patryk Jędrzejczak	77c4ee92e5	test: delete topology_raft_disabled suite After moving all tests out of topology_raft_disabled, we can safely remove this suite.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	ba990d90bb	test: topology_raft_disabled: move tests to topology_custom suite We move the remaining tests in topology_raft_disabled to topology_custom. We choose topology_custom because these tests cannot use consistent topology changes. We need to modify these tests a bit because we cannot pass RandomTables to a test case function if the initial cluster size equals 0. RandomTables.__init__ requires manager.cql to be present.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	659ac9c7f5	test: topology_raft_disabled: move utils to topology suite We move all used util functions from topology_raft_disabled to topology before we remove topology_raft_disabled. After this change, util.py in topology will be the single util file for all topology tests. Some util functions in topology_raft_disabled aren't used anymore. We don't move such functions and remove them instead.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	684b070b20	test: topology_raft_disabled: use consistent cluster management Soon, we will make consistent cluster management mandatory on master. Before this, we have to change all tests in the topology_raft_disabled suite so that they do not depend on the consistent_cluster_management=false config. Adapting test_raft_upgrade_majority_loss is simple. We only have to get rid of the initial upgrade. This initial upgrade didn't test anything. Every test in topology_raft_disabled had to do it at the beginning because of consistent_cluster_management=false. Adapting test_raft_upgrade_basic and test_raft_upgrade_stuck is more difficult. It requires changing the initial upgrade to clearing Raft data in RECOVERY mode on all servers and restarting them. Then, the servers will run the same upgrade procedure as before. After changing the tests, we also update their names appropriately. test_raft_upgrade_stuck becomes a bit slower, so we remove the comment about running time. Also, one TODO was fixed in the process of rewriting the test. This fix forced us to skip the test in the release mode since we cannot update the list of error injections through manager.server_update_config in this mode.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	1059fece19	test: topology_raft_disabled: add new util functions They are shorter and more readable than long CQL queries. We use them even more in the following commit.	2023-11-30 15:50:22 +01:00
Patryk Jędrzejczak	7e43ebf88e	test: topology_raft_disabled: delete test_raft_upgrade_no_schema After making consistent cluster management mandatory on master, schema commitlog will also become mandatory because consistent_cluster_management: True, force_schema_commit_log: False is considered a bad configuration. These changes will make test_raft_upgrade_no_schema unimplementable in the Scylla repo, so we remove it. If we want to keep this test, we must rewrite it as an upgrade dtest.	2023-11-30 15:50:21 +01:00
Kefu Chai	7a1fbb38f9	sstable: order uuid-based generation as timeuuid under most circumstances, we don't care the ordering of the sstable identifiers, as they are just identifiers. so, as long as they can be compared, we are good. but we have tests with expect that the sstables can be ordered by the time they are created. for instance, sstable_run_based_compaction_test has this expectaion. before this change, we compare two UUID-based generations by its (MSB, LSB) lexicographically. but UUID v1 put the lower bits of the timestamp at the higher bits of MSB, so the ordering of the "time" in timeuuid is not preserved when comparing the UUID-based generations. this breaks the test of sstable_run_based_compaction_test, which feeds the sstables to be compacted in a set, and the set is ordered with the generation of the sstables. after this change, we consider the UUID-based generation as a timeuuid when comparing them. Fixes #16215 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16238	2023-11-30 14:50:44 +02:00
Michał Jadwiszczak	e3515cfc1b	cql-pytest: add `LIST EFFECTIVE SERVICE LEVEL OF` test	2023-11-30 13:07:20 +01:00
Michał Jadwiszczak	e1d86f9afb	docs: add `LIST EFFECTIVE SERVICE LEVEL` statement docs	2023-11-30 13:07:20 +01:00
Michał Jadwiszczak	2438965b6a	cql3:statements: add `LIST EFFECTIVE SERVICE LEVEL` statement Add statement to print effective service level of a specified role.	2023-11-30 13:07:20 +01:00
Michał Jadwiszczak	1b08338fe7	service:qos: add option to include effective names to SLO Allow to include `slo_effective_names` in `service_level_options` to be able to determine from which service level the specific option value comes from.	2023-11-30 13:07:20 +01:00
Yaron Kaikov	7ce6962141	build_docker.sh: Upgrade package during creation and remove sshd service When scanning our latest docker image using `trivy` (command: `trivy image docker.io/scylladb/scylla-nightly:latest`), it shows we have OS packages which are out of date. Also removing `openssh-server` and `openssh-client` since we don't use it for our docker images Fixes: https://github.com/scylladb/scylladb/issues/16222 Closes scylladb/scylladb#16224	2023-11-30 14:00:15 +02:00
Botond Dénes	d6d9751dd8	tools/scylla-sstable: validate,validate-checksums: print JSON last Said commands print errors as they validate the sstables. Currently this intermingles with the regular JSON output of these commands, resulting in ugly and confusing output. This is not a problem for scripted use, as logs go to stderr while the JSON go to stdout, but it is a problem for human users. Solve this by outputting the JSON into a std::stringstream and printing it in one go at the very end. This means JSON is accumulated in a memory buffer, but these commands don't output a lot of JSON, so this shouldn't be a problem. Closes scylladb/scylladb#16216	2023-11-30 09:53:47 +03:00
Piotr Smaroń	5fd30578d7	config: introduce value_status::Deprecated Current mechanism to deprecate config options is implemented in a hacky way in `main.cpp` and doesn't account for existing `db::config/boost::po` API controlling lifetime of config options, hence it's being replaced in this PR by adding yet another `value_status` enumerator: `Deprecated`, so that deprecation of config options is controlled in one place in `config.cc`,i.e. when specifying config options. Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing With this change, if a `Deprecated` config option is specified as 1. a command line parameter, scylla will run and log: ``` WARN 2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated) ``` (Previously it was only a message printed to standard output, not a scylla log of warn level). 2. an option in `scylla.yaml`, scylla will run and log: ``` WARN 2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota ``` Fixes #15887 Incorporates dropped https://github.com/scylladb/scylladb/pull/15928 Closes scylladb/scylladb#16184	2023-11-30 08:52:57 +03:00
Avi Kivity	8e9d3af431	Merge 'Commitlog: complete prerequisites and enforce hard limit by default' from Eliran Sinvani This miniset, completes the prerequisites for enabling commitlog hard limit on by default. Namely, start flushing and evacuating segments halfway to the limit in order to never hit it under normal circumstances. It is worth mentioning that hitting the limit is an exceptional condition which it's root cause need to be resolved, however, once we do hit the limit, the performance impact that is inflicted as a result of this enforcement is irrelevant. Tests: unit tests. LWT write test (#9331) A whitebox testing has been performed by @wmitros , the test aimed at putting as much pressure as possible on the commitlog segments by using a write pattern that rewrites the partitions in the memtable keeping it at ~85% occupancy so the dirty memory manager will not kick in. The test compared 3 configurations: 1. The default configuration 2. Hard limit on (without changing the flush threshold) 3. the changes in this PR applied. The last exhibited the "best" behavior in terms of metrics, the graphs were the flattest and less jaggy from the others. Closes scylladb/scylladb#10974 * github.com:scylladb/scylladb: commitlog: enforce commitlog size hard limit by default commitlog: set flush threshold to half of the limit size commitlog: unfold flush threshold assignment	2023-11-29 20:55:53 +02:00
Kamil Braun	8a14839a00	Merge 'handle more failures during topology operations' from Gleb This series adds handling for more failures during a topology operation (we already handle a failure during streaming). Here we add handling of tablet draining errors by aborting the operation and handling of errors after streaming where an operation cannot be aborted any longer. If the error happens when rollback is no longer possible we wait for ring delay and proceed to the next step. Each individual patch that adds the sleep has an explanation what the consequences of the patch are. * 'gleb/topology-coordinator-failures' of github.com:scylladb/scylla-dev: test: add test to check errro handling during tablet draining test: fix test_topology_streaming_failure test to not grep the whole file storage_service: add error injection into the tablet migration code storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure storage_service: topology coordinator: add rollback_to_normal node state storage_service: topology coordinator: put fence version into the raft state storage_service: topology coordinator: do fencing even if draining failed	2023-11-29 19:02:35 +01:00
Avi Kivity	cccd2e7fa7	Merge 'Generalize sstables TOC file reading' from Pavel Emelyanov TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place. To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream. Closes scylladb/scylladb#16175 * github.com:scylladb/scylladb: sstable: Generalize toc file read and parse s3/client: Don't GET object contents on out-of-bound reads s3/client: Cache stats on readable_file	2023-11-29 19:18:31 +02:00
Nadav Har'El	62f89d49e5	tablets, mv: fix on_internal_error on write to base table This situation before this patch is that when tablets are enabled for a keyspace, we can create a materialized view but later any write to the base table fails with an on_internal_error(), saying that: "Tried to obtain per-keyspace effective replication map of test but it's per-table." Indeed, with tablets, the replication is different for each table - it's not the same for the entire keyspace. So this patch changes the view update code to take the replication map from the specific base table, not the keyspace. This is good enough to get materialized-views reads and writes working in a simple single-node case, as the included test demonstrates (the test fails with on_internal_error() before this patch, and passes afterwards). But this fix is not perfect - the base-view pairing code really needs to consider not only the base table's replication map, but also the view table's replication map - as those can be different. We'll fix this remaining problem as a followup in a separate patch - it will require a substantially more elaborate test to reproduce the need for the different mapping and to verify that fix. Fixes #16209. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16211	2023-11-29 15:29:17 +01:00
Anna Stuchlik	ce6b15af34	doc: remove the "preview" label from Rust driver	2023-11-29 15:01:31 +01:00
Avi Kivity	cd732b1364	Update seastar submodule * seastar 830ce8673...55a821524 (34): > Revert "reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc" > epoll: Avoid spinning on aborted connections Fixes #12774 Fixes #7753 Fixes #13337 > Merge 'Sanitize test-only reactor facilities' from Pavel Emelyanov > test/unit: fix fmt version check > reactor/scheduling_group: Handle at_destroy queue special in init_new_scheduling_group_key etc > build: add spaces before () and after commands > reactor: use zero-initialization to initialize io_uring_params > Merge 'build: do not return a non-false condition if the option is off ' from Kefu Chai > memory: do not use variable length array > build: use tri_state_option() to link against Sanitizers > build: do not define SEASTAR_TYPE_ERASE_MORE on all builds > Revert "shared_future: make available() immediate after set_value()" > test_runner: do not throw when seastar.app fails to start > Merge 'Address issue where Seastar faults in toeplitz hash when reassembling fragment' from John Hester > defer, closeable: do not use [[nodiscard(str)]] > Merge 'build: generate config-specific rules using generator expressions' from Kefu Chai > treewide: use _v and _t for better readability > build: use different names for .pc files for each build mode > perftune.py: skip discovering IRQs for iSCSI disks > io-tester: explicit use uint64_t for boost::irange(...) > gate: correct the typo in doxygen comment > shared_future: make available() immediate after set_value() > smp: drop unused templates > include fmt/ostream.h to make headers self-sufficient > Support ccache in ./configure.py > rpc_tester: Disable -Wuninitialized when including boost.accumulators > file: construct directory_entry with aggregated ctor > file: s/ino64_t/ino_t/, s/off64_t/off_t/ > sstring_test: include fmt/std.h only if fmtlib >= 10.0.0 > file: do not include coroutine headers if coroutine is disabled > fair_queue::unregister_priority_class:fix assertion > Merge 'Generalize `net::udp_channel` into `net::datagram_channel`' from Michał Sala > Merge 'Add file::list_directory() that co_yields entries' from Pavel Emelyanov > http/file_handler: remove unnecessary cast Closes scylladb/scylladb#16201	2023-11-29 14:34:30 +02:00
Kefu Chai	c40da20092	utils/pretty_printers: stop using undocumented fmt api format_parse_context::on_error() is an undocumented API in fmt v9 and in fmt v10, see - https://fmt.dev/9.1.0/api.html#_CPPv4I0EN3fmt16basic_format_argE - https://fmt.dev/10.0.0/api.html#_CPPv4I0EN3fmt26basic_format_parse_contextE despite that this API was once used in its document for fmt v10.0.0, see https://fmt.dev/10.0.0/api.html#formatting-user-defined-types. it's still, well, undocumented. so, to have better compatibility, let's use the documented API in place of undocumented one. please note, `throw_format_error()` was still not a public API before 10.1.0, so before that release we have to throw `fmt::format_error` explicitly. so we cannot use it yet during the transitional period. because the class of `fmt::format_error` is defined in `fmt/format.h`, we need to include this header for using it. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16212	2023-11-29 12:49:04 +02:00
Pavel Emelyanov	0da37d5fa6	sstable: Generalize toc file read and parse There are several places where TOC file is parsed into a vector of components -- sstable::read_toc(), remove_by_toc_name() and remove_by_registry_entry(). All three deserve some generalization. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:09:52 +03:00
Pavel Emelyanov	c5d85bdf79	s3/client: Don't GET object contents on out-of-bound reads If S3 readable file is used inside file input stream, the latter may call its read methods with position that is above file size. In that case server replies with generic http error and the fact that the range was invalid is encoded into reply body's xml. That's not great to catch this via wrong reply status exception and xml parsing all the more so we can know that the read is out-of-bound in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:09:52 +03:00
Pavel Emelyanov	339182287f	s3/client: Cache stats on readable_file S3-based sstables components are immutable, so every time stat is called there's no need to ping server again. But the main intention of this patch is to provide stats for read calls in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:06:54 +03:00
Calle Wilund	3b70fde3cd	commitlog: Make named_files in delete_segments have updated size Fixes #16207 commitlog::delete_segments deletes (or recycles) segments replayed. The actual file size here is added to footprint so actual delete then can determine iff things should be recycled or removed. However, we build a pending delete list of named_files, and the files we added did not have size set. Bad. Actual deletion then treated files as zero-byte sized, i.e. footprint calculations borked. Simple fix is just filling in the size of the objects when addind. Added unit test for the problem. Closes scylladb/scylladb#16210	2023-11-29 09:58:47 +02:00
Yaron Kaikov	c3ee53f3be	test.py: enable xml validation Following https://github.com/scylladb/scylladb/issues/4774#issuecomment-1752089862 Adding back xml validation Closes: https://github.com/scylladb/scylla-pkg/issues/3441 Closes scylladb/scylladb#16198	2023-11-29 09:02:36 +02:00
Botond Dénes	3ed6925673	Merge 'Major compaction: flush commitlog by forcing new active segment and flushing all tables' from Benny Halevy Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Closes scylladb/scylladb#15820 * github.com:scylladb/scylladb: docs: nodetool: flush: enrich examples docs: nodetool: compact: fix example api: add /storage_service/compact api: add /storage_service/flush compaction_manager: flush_all_tables before major compaction database: add flush_all_tables api: compaction: add flush_memtables option test/nodetool: jmx: fix path to scripts/scylla-jmx scylla-nodetool, docs: improve optional params documentation	2023-11-29 08:48:40 +02:00
Kefu Chai	65994b1e83	build: cmake: add compatibility target of dev-headers our CI builds "dev-headers" as a gating check. but the target names generated by CMake's Ninja Multi-Config generator does not follow this naming convention. we could have headers:Dev, but still, it's different from what we are using, before completely switching to CMake, let's keep this backward compatibility by adding a target with the same name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-29 10:08:59 +08:00
Kefu Chai	2d284f4749	build: add an option to use CMake as the build build system as part of the efforts to migrate to the CMake-based building system, this change enables us to `configure.py` to optionally create `build.ninja` with CMake. in this change, we add a new option named `--use-cmake` to `configure.py` so we can create `build.ninja`. please note, instead of using the "Ninja" generator used by Seastar's `configure.py` script, we use "Ninja Multi-Config" generator along with `CMAKE_CROSS_CONFIGS` setting in this project. so that we can generate a `build.ninja` which is capable of building the same artifacts with multiple configuration. Fixes #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-29 10:08:59 +08:00
Nadav Har'El	88a5ddabce	tablets, mv: create tablets for a new materialized view Before this patch, trying to create a materialized view when tablets are enabled for a keyspace results in a failure: "Tablet map not found for table <uuid>", with uuid referring to the new view. When a table schema is created, the handler on_before_create_column_family() is called - and this function creates the tablet map for the new table. The bug was that we forgot to do the same when creating a materialized view - which also a bona-fide table. In this patch we call on_before_create_column_family() also when creating the materialized view. I decided not to create a new callback (e.g., on_before_create_view()) and rather call the existing on_before_create_column_family() callback - after all, a view is a column family too. This patch also includes a test for this issue, which fails to create the view before this patch, and passes with the patch. The test is in the test/topology_experimental_raft suite, which runs Scylla with the tablets experimental feature, and will also allow me to create tests that need multiple nodes. However, the first test added here only needs a single node to reproduce the bug and validate its fix. Fixes #16194. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16205	2023-11-28 21:54:32 +01:00
Kamil Braun	3582095b79	schema_tables: use smaller timestamp for base mutations included with view update When a view schema is changed, the schema change command also includes mutations for the corresponding base table; these mutations don't modify the base schema but are included in case if the receiver of view mutations somehow didn't receive base mutations yet (this may in theory happen outside Raft mode). There are situations where the schema change command contains both mutations that describe the current state of the base table -- included by a view update, as explained above -- and mutations that want to modify the base table. Such situation arises, for example, when we update a user-defined type which is referenced by both a view and its corresponding base table. This triggers a schema change of the view, which generates mutations to modify the view and includes mutations of the current base schema, and at the same time it triggers a schema change of the base, which generates mutations to modify the base. These two sets of mutations are conflicting with each other. One set wants to preserve the current state of the base table while the other wants to modify it. And the two sets of mutations are generated using the same timestamp, which means that conflict resolution between them is made on a per-mutation-cell basis, comparing the values in each cell and taking the "larger" one (meaning of "larger" depends on the type of each cell). Fortunately, this conflict is currently benign -- or at least there is no known situation where it causes problems. Unfortunately, it started causing problems when I attempted to implement group 0 schema versioning (PR scylladb/scylladb#15331), where instead of calculating table versions as hashes of schema mutations, we would send versions as part of schema change command. These versions would be stored inside the `system_schema.scylla_tables` table, `version` column, and sent as part of schema change mutations. And then the conflict showed. One set of mutations wanted to preserve the old value of `version` column while the other wanted to update it. It turned out that sometimes the old `version` prevailed, because the `version` column in `system_schema.scylla_tables` uses UUID-based comparison (not timeuuid-based comparison). This manifested as issue scylladb/scylladb#15530. To prevent this, the idea in this commit is simple: when generating mutations for the base table as part of corresponding view update, do not use the provided timestamp directly -- instead, decrement it by one. This way, if the schema change command contains mutations that want to modify the base table, these modifying mutations will win all conflicts based on the timestamp alone (they are using the same provided timestamp, but not decremented). One could argue that the choice of this timestamp is anyway arbitrary. The original purpose of including base mutations during view update was to ensure that a node which somehow missed the base mutations, gets them when applying the view. But in that case, the "most correct" solution should have been to use the original base mutations -- i.e. the ones that we have on disk -- instead of generating new mutations for the base with a refreshed timestamp. The base mutations that we have on disk have smaller timestamps already (since these mutations are from the past, when the base was last modified or created), so the conflict would also not happen in this case. But that solution would require doing a disk read, and we can avoid the read while still fixing the conflict by using an intermediate solution: regenerating the mutations but with `timestamp - 1`. Ref: scylladb/scylladb#15530 Closes scylladb/scylladb#16139	2023-11-28 21:51:18 +01:00
Benny Halevy	310ff20e1e	docs: nodetool: flush: enrich examples Provide 3 examples, like in the nodetool/compact page: global, per-keyspace, per-table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:48:22 +02:00
Benny Halevy	d32b90155a	docs: nodetool: compact: fix example It looks like `nodetool compact standard1` is meant to show how to compact a specified table, not a keyspace. Note that the previous example like is for a keyspace. So fix the table compaction example to: `nodetool compact keyspace1 standard1` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:45:20 +02:00
Benny Halevy	b12b142232	api: add /storage_service/compact For major compacting all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool compact` translates to a sequence of `/storage_service/keyspace_compaction` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1b576f358b	api: add /storage_service/flush For flushing all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool flush` translates to a sequence of `/storage_service/keyspace_flush` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	66ba983fe0	compaction_manager: flush_all_tables before major compaction Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	be763bea34	database: add flush_all_tables Flushes all tables after forcing force_new_active_segment of the commitlog to make sure all commitlog segments can get recycled. Otherwise, due to "false sharing", rarely-written tables might inhibit recycling of the commitlog segments they reference. After `f42eb4d1ce`, that won't allow compaction to purge some tombstones based on the min_gc_time. To be used in the next patch by major compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	1fd85bd37b	api: compaction: add flush_memtables option When flushing is done externally, e.g. by running `nodetool flush` prior to `nodetool compact`, flush_memtables=false can be passed to skip flushing of tables right before they are major-compacted. This is useful to prevent creation of small sstables due to excessive memtable flushing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	7f860d612a	test/nodetool: jmx: fix path to scripts/scylla-jmx The current implementation makes no sense. Like `nodetool_path`, base the default `jmx_path` on the assumption that the test is run using, e.g. ``` (cd test/nodetool; pytest --nodetool=cassandra test_compact.py) ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Benny Halevy	9324363e55	scylla-nodetool, docs: improve optional params documentation Document the behavior if no keyspace is specified or no table(s) are specified for a given keyspace. Fixes scylladb/scylladb#16032 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-28 16:37:42 +02:00
Anna Stuchlik	bfe19c0ed2	doc: add experimental support for object storage This commit adds information on how to enable object storage for a keyspace. The "Keyspace storage options" section already existed in the doc, but it was not valid as the support was only added in version 5.4 The scope of this commit: - Update the "Keyspace storage options" section. - Add the information about object storage support to the Data Definition> CREATE KEYSPACE section * Marked as "Experimental". * Excluded from the Enterprise docs with the .. only:: opensource directive. This commit must be backported to branch-5.4, as support for object storage was added in version 5.4. Closes scylladb/scylladb#16081	2023-11-28 14:27:01 +02:00
Anna Stuchlik	37f20f2628	doc: fix Rust Driver release information This PR removes the incorrect information that the ScyllaDB Rust Driver is not GA. In addition, it replaces "Scylla" with "ScyllaDB". Fixes https://github.com/scylladb/scylladb/issues/16178	2023-11-28 10:32:08 +01:00
Botond Dénes	f46cdce9d3	Merge 'Make memtable flush tolerate misconfigured S3 storage' from Pavel Emelyanov Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal. Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well. fixes: #13745 Closes scylladb/scylladb#15635 * github.com:scylladb/scylladb: test: Add object_store test to validate config reloading works test: Add config update facility to test cluster test: Make S3_Server export config file as pathlib.Path config: Make object storage config updateable_value_source memtable: Extend list of checking codes sstables/storage/s3: Fix missing TOC status check s3/client: Map http exceptions into storage_io_error exceptions: Extend storage_io_error construction options	2023-11-28 09:33:37 +02:00
Botond Dénes	3ccf1e020b	Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16177 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-28 09:08:04 +02:00
Pavel Emelyanov	1efddc228d	sstable: Do not nest io-check wrappers into each other When sealing an sstable on local storage the storage driver performs several flushes on a file that is directory open via checked-file. Flush calls are wrapped with sstable_write_io_check, but that's excessive, the checked file will wrap flushes with io-checks on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16173	2023-11-27 15:53:02 +02:00
Kefu Chai	724a6e26f3	cql3: define format_as() for formatting cql3::cql3_type::raw before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. to define a formatter which can be used by raw class and its derived classes, we have to put the full template specialization before the call sites. also, please note, the forward declaration is not enough, as the compile-time formatter check of fmt requires the definition of formatter. since fmt v10 also enables us to use `format_as()` to format a certain type with the return value of `format_as()`. this fulfills our needs. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16125	2023-11-27 15:28:19 +02:00
Kefu Chai	0b69a1badc	transport: cast unaligned<T> to T for formatting it in fmt v10, it does not cast unaligned<T> to T when formatting it, instead it insists on finding a matched fmt::formatter<> specialization for it. that's why we have FTBFS with fmt v10 when printing these packed<T> variables with fmtlib v10. in this change, we just cast them to the underlying types before formatting them. because seastar::unaligned<T> does not provide a method for accessing the raw value, neither does it provide a type alias of the type of the underlying raw value, we have to cast to the type without deducing it from the printed value. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16167	2023-11-27 15:26:13 +02:00
Gleb Natapov	e68e998b15	test: add test to check errro handling during tablet draining The test checks that the topology operation is aborted if an error happens during tablet migration stage.	2023-11-27 15:06:52 +02:00
Gleb Natapov	b1c0b57acf	test: fix test_topology_streaming_failure test to not grep the whole file A cluster can be reused between tests, so lets grep only the part of the log that is relevant for the test itself.	2023-11-27 15:05:21 +02:00
Petr Gusev	dca28417b2	storage_service: drop unused method handle_state_replacing_update_pending_ranges	2023-11-27 12:37:26 +01:00
Tomasz Grabiec	ae5220478c	tablets: Release group0 guard when waiting for streaming to finish This bug manifested as delays in DDL statement execution, which had to wait until streaming is finished so that the topology change coordinator releases the guard. The reason is that topology change coordinator didn't release the group0 guard if there is no work to do with active migrations, and awaits the condition variable without leaving the scope. Fixes #16182 Closes scylladb/scylladb#16183	2023-11-27 12:24:27 +01:00
Gleb Natapov	c83ff5a0dd	storage_service: add error injection into the tablet migration code	2023-11-27 13:09:58 +02:00
Gleb Natapov	4ebdddc31b	storage_service: topology coordinator: rollback on handle_tablet_migration failure during tablet_draining stage During remove or decommission as a first step tables are drained from the leaving node. Theoretically this step may fail. Rollback the topology operation if it happen. Since some tables may stay in migration state the topology needs to go to the tablet_migration state. Lets do it always since it should be save to do it even if there is no on going tablet migrations.	2023-11-27 13:09:58 +02:00
Nadav Har'El	8d040325ab	cql: fix SELECT toJson() or SELECT JSON of time column The implementation of "SELECT TOJSON(t)" or "SELECT JSON t" for a column of type "time" forgot to put the time string in quotes. The result was invalid JSON. This is patch is a one-liner fixing this bug. This patch also removes the "xfail" marker from one xfailing test for this issue which now starts to pass. We also add a second test for this issue - the existing test was for "SELECT TOJSON(t)", and the second test shows that "SELECT JSON t" had exactly the same bug - and both are fixed by the same patch. We also had a test translated from Cassandra which exposed this bug, but that test continues to fail because of other bugs, so we just need to update the xfail string. The patch also fixes one C++ test, test/boost/json_cql_query_test.cc, which enshrined the wrong behavior - JSON output that isn't even valid JSON - and had to be fixed. Unlike the Python tests, the C++ test can't be run against Cassandra, and doesn't even run a JSON parser on the output, which explains how it came to enshrine wrong output instead of helping to discover the bug. Fixes #7988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16121	2023-11-27 10:03:04 +02:00
Anna Stuchlik	24d5dbd66f	doc: replace the OSS-only link on the Raft page This commit replaces the link to the OSS-only page (the 5.2-to-5.4 upgrade guide not present in the Enterprise docs) on the Raft page. While providing the link to the specific upgrade guide is more user-friendly, it causes build failures of the Enterprise documentation. I've replaced it with the link to the general Upgrade section. The ".. only:: opensource" directive used to wrap the OSS-only content correctly excludes the content form the Enterprise docs - but it doesn't prevent build warnings. This commit must be backported to branch-5.4 to prevent errors in all versions. Closes scylladb/scylladb#16176	2023-11-27 08:52:58 +02:00
Kefu Chai	c937827308	mutation_query: add formatter for reconcilable_result::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for reconcilable_result::printer, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16186	2023-11-26 20:20:50 +02:00
Konstantin Osipov	f0aa325187	test: provide overview of the contents of test/ directory Fixes #16080 Closes scylladb/scylladb#16088	2023-11-26 15:51:07 +02:00
Marcin Maliszkiewicz	81be3e0935	test/alternator/run: port -h and --omit-scylla-output options from cql-pytest Closes scylladb/scylladb#16171	2023-11-26 13:52:01 +02:00
Botond Dénes	fe7c81ea30	Update ./tools/jmx and ./tools/java submodules * ./tools/jmx 05bb7b68...80ce5996 (4): > StorageService: Normalize endpoint inetaddress strings to java form Fixes #16039 > ColumnFamilyStore: only quote table names if necessary > APIBuilder: allow quoted scope names > ColumnFamilyStore: don't fail if there is a table with ":" in its name Fixes #16153 * ./tools/java 10480342...26f5f71c (1): > NodeProbe: allow addressing table name with colon in it Also needed for #16153 Closes scylladb/scylladb#16146	2023-11-26 13:35:38 +02:00
Kefu Chai	ba3dce3815	build: do escape "\" in regular string in Python, a raw string is created using 'r' or 'R' prefix. when creating the regex using Python string, sometimes, we have to use "\" to escape the parenthesis so the tools like "sed" can consider the parenthesis as a capture group. but "\" is also used to escape strings in Python, in order to put "\" as it is, we use "\" instead of escaping "\" with "\\" which is obscure. when generating rules, we use multiple-lines string and do not want to have an empty line at the beginning of the string so added "\" continuation mark. but we fail to escape some of the "\" in the string, and just put "\(", despite that Python accepts it after failing to find a matched escaped char for it, and interprets it as "\\(". this should still be considered a misuse of oversight. with python's warning enabled, one is able see its complaints. in this change, we escape the "\" properly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16179	2023-11-26 13:34:10 +02:00
Kefu Chai	3053d63c7f	main: notify systemd that the service is ready this change addresses a regression introduced by `f4626f6b8e`, which stopped notifying systemd with the status that scylla is READY. without the notification, systemd would wait in vain for the readiness of scylla. Refs `f4626f6b8e` Fixes #16159 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16166	2023-11-26 10:38:53 +02:00
Aleksandra Martyniuk	9c2c964b8e	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk	8639eae0ce	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-24 19:25:20 +01:00
Botond Dénes	a472700309	Merge 'Minor fixes and refactors' from Kamil Braun - remove some code that is obsolete in newer Scylla versions, - fix some minor bugs. These bugs appear to be benign, there are no known issues caused by them, but fixing them is a good idea nevertheless, - refactor some code for better maintainability. Parts of this PR were extracted from https://github.com/scylladb/scylladb/pull/15331 (which was merged but later reverted), parts of it are new. Closes scylladb/scylladb#16162 * github.com:scylladb/scylladb: test/pylib: log_browsing: fix type hint migration_manager: take `abort_source&` in get_schema_for_read/write migration_manager: inline merge_schema_in_background migration_manager: remove unused merge_schema_from overload migration_manager: assume `canonical_mutation` support migration_manager: add `std::move` to avoid a copy schema_tables: refactor `scylla_tables(schema_features)` schema_tables: pass `reload` flag when calling `merge_schema` cross-shard system_keyspace: fix outdated comment	2023-11-24 17:34:21 +02:00
Patryk Jędrzejczak	15d3ed4357	test: topology: update run_first lists `run_first` lists in `suite.yaml` files provide a simple way to shorten the tests' average running time by running the slowest tests at first. We update these lists, since they got outdated over time: - `test_topology_ip` was renamed to `test_replace` and changed suite, - `test_tablets` changed suite, - new slow tests were added: - `test_cluster_features`, - `test_raft_cluster_features`, - `test_raft_ignore_nodes`, - `test_read_repair`. Closes scylladb/scylladb#16104	2023-11-24 16:18:30 +01:00
Aleksandra Martyniuk	c74b3ec596	tasks: fail if a task was aborted run() method of task_manager::task::impl does not have to throw when a task is aborted with task manager api. Thus, a user will see that the task finished successfully which makes it inconsistent. Finish a task with a failure if it was aborted with task manager api.	2023-11-24 15:45:00 +01:00
Aleksandra Martyniuk	aa7bba2d8b	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-11-24 15:44:34 +01:00
Kefu Chai	ca31dab9d2	sstable: drop repaired_at related code before we support incremental repair, these is no point have the code path setting / getting it. and even worse, it incurs confusion. so, in this change, we * just set the field to 0, * drop the corresponding field in metadata_collector, as we never update it. * add a comment to explain why this variable is initialized to 0 Fixes #16098 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16169	2023-11-24 15:12:25 +02:00
Botond Dénes	697cf41b9b	Merge 'repair: Introduce small table optimization' from Asias He repair: Introduce small table optimization ) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. ) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. ) Performance: 1) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X 2) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 3) 3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3} , 10 ms network latency Before: ``` repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=2305, round_nr=6916, round_nr_fast_path_already_synced=6915, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015 seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2, 1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385}, {127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178}, {127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2, 0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793}, {127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7, 0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634}, {127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4, 2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634}, {127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9, 2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to repair is reduced from 986s (16 minutes) to 1.6s ) Summary So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes #16011 Refs #15159 Closes scylladb/scylladb#15974 github.com:scylladb/scylladb: repair: Introduce small table optimization repair: Convert put_row_diff_with_rpc_stream to use coroutine	2023-11-24 15:11:42 +02:00
Kamil Braun	1f56962591	Merge 'test: topology: test concurrent bootstrap' from Patryk Jędrzejczak We add a test for concurrent bootstrap in the raft-based topology. Additionally, we extend the testing framework with a new function - `ManagerClient.servers_add`. It allows adding multiple servers concurrently to a cluster. This PR is the first step to fix #15423. After merging it, if the new test doesn't fail for some time in CI, we can: - use `ManagerClient.servers_add` in other tests wherever possible, - start initial servers concurrently in all suites with `initial_size > 0`. Closes scylladb/scylladb#16102 * github.com:scylladb/scylladb: test: topology: add test_concurrent_bootstrap test: ManagerClient: introduce servers_add test: ManagerClient: introduce _create_server_add_data	2023-11-24 12:41:05 +01:00
Kefu Chai	f99223919a	compaction: add formatter for map<timestamp_type, vector<shared_sstable>> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for map<timestamp_type, vector<shared_sstable>>. since the operator<< for this type is only used in the .cc file, and the only use case of it is to provide the formatter for fmt, so the operator<< based formatter is remove in this change. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16163	2023-11-24 11:56:28 +02:00
Kamil Braun	5acfcd8ef5	Merge 'raft: send group0 RPCs only if the destination group0 server is seen as alive' from Piotr Dulikowski In topology on raft mode, the events "new node starts its group0 server" and "new node is added to group0 configuration" are not synchronized with each other. Therefore it might happen that the cluster starts sending commands to the new node before the node starts its server. This might lead to harmless, but ugly messages like: INFO 2023-09-27 15:42:42,611 [shard 0:stat] rpc - client 127.0.0.1:56352 msg_id 2: exception "Raft group b8542540-5d3b-11ee-99b8-1052801f2975 not found" in no_wait handler ignored In order to solve this, the failure detector verb is extended to report information about whether group0 is alive. The raft rpc layer will drop messages to nodes whose group0 is not seen as alive. Tested by adding a delay before group0 is started on the joining node, running all topology tests and grepping for the aforementioned log messages. Fixes: scylladb/scylladb#15853 Fixes: scylladb/scylladb#15167 Closes scylladb/scylladb#16071 * github.com:scylladb/scylladb: raft: rpc: introduce destination_not_alive_error raft: rpc: drop RPCs if the destination is not alive raft: pass raft::failure_detector to raft_rpc raft: transfer information about group0 liveness in direct_fd_ping raft: add server::is_alive	2023-11-24 10:34:05 +01:00
Patryk Jędrzejczak	a8d06aa9fd	test: topology: add test_concurrent_bootstrap We add a test for concurrent bootstrap support in the raft-based topology. The plan is to make this test temporary. In the future, we will: - use ManagerClient.servers_add in other tests wherever possible, - start initial servers concurrently in all suites with initial_size > 0. So, this test will not test anything unique. We could make the changes proposed above now instead of adding this small test. However, if we did that and it turned out that concurrent bootstrap is flaky in CI, we would make almost every CI run fail with many failures. We want to avoid such a situation. Running only this test for some time in CI will reduce the risk and make investigating any potential failures easier.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	cd7b282db6	test: ManagerClient: introduce servers_add We add a new function - servers_add - that allows adding multiple servers concurrently to a cluster. It makes use of a concurrent bootstrap now supported in the raft-based topology. servers_add doesn't have the replace_cfg parameter. The reason is that we don't support concurrent replace operations, at least for now. There is an implementation detail in ScyllaCluster.add_servers. We cannot simply do multiple calls to add_server concurrently. If we did that in an empty cluster, every node would take itself as the only seed and start a new cluster. To solve this, we introduce a new field - initial_seed. It is used to choose one of the servers as a seed for all servers added concurrently to an empty cluster. Note that the add_server calls in asyncio.gather in add_servers cannot race with each other when setting initial_seed because there is only one thread. In the future, we will also start all initial servers concurrently in ScyllaCluster.install_and_start. The changes in this commit were designed in a way that will make changing install_and_start easy.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	aca90e6640	test: ManagerClient: introduce _create_server_add_data We introduce this function to avoid code duplication. After the following commits, it will also be used in the new ManagerClient.servers_add function.	2023-11-24 09:39:01 +01:00
Botond Dénes	c47a63835e	Merge 'test/sstable_compaction_test: check every sstable replaced sstable ' from Kefu Chai before this change, in sstable_run_based_compaction_test, we check every 4 sstables, to verify that we close the sstable to be replaced in a batch of 4. since the integer-based generation identifier is monotonically incremental, we can assume that the identifiers of sstables are like 0, 1, 2, 3, .... so if the compaction consumes sstable in a batch of 4, the identifier of the first one in the batch should always be the multiple of 4. unfortunately, this test does not work if we use uuid-based identifier. but if we take a closer look at how we create the dataset, we can have following facts: 1. the `compaction_descriptor` returned by `sstable_run_based_compaction_strategy_for_tests` never set `owned_ranges` in the returned descriptor 2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no` is used, if `_owned_ranges_checker` is empty 3. `mutation_reader_merger` respects the `fwd_mr` passed to its ctor, so it closes current sstable immediately when the underlying mutation reader reaches the end of stream. in other words, we close every sstable once it is fully consumed in sstable_ompaction_test. and the reason why the existing test passes is that we just sample the sstables whose generation id is a multiple of 4. what happens when we perform compaction in this test is: 1. replace 5 with 33, closing 5 2. replace 6 with 34, closing 6 3. replace 7 with 35, closing 7 4. replace 8 with 36, closing 8 << let's check here.. good, go on! 5. replace 13 with 37, closing 13 ... 8. replace 16 with 40, closing 16 << let's check here.. also, good, go on! so, in this change, we just check all old sstables, to verify that we close each of them once it is fully consumed. Fixes https://github.com/scylladb/scylladb/issues/16073 Closes scylladb/scylladb#16074 * github.com:scylladb/scylladb: test/sstable_compaction_test: check every sstable replaced sstable test/sstable_compaction_test: s/old_sstables.front()/old_sstable/	2023-11-24 07:25:28 +02:00
Kamil Braun	35bb025f99	test/pylib: log_browsing: fix type hint	2023-11-23 17:23:47 +01:00
Kamil Braun	819f542ee6	migration_manager: take `abort_source&` in get_schema_for_read/write No callsite needed the `nullptr` case, so we can convert pointer to reference.	2023-11-23 17:23:47 +01:00
Kamil Braun	ddfe4f65a8	migration_manager: inline merge_schema_in_background There was only one use site of this template.	2023-11-23 17:23:47 +01:00
Kamil Braun	42f6c5c2db	migration_manager: remove unused merge_schema_from overload The `frozen_mutation` version is now dead code.	2023-11-23 17:23:47 +01:00
Kamil Braun	8f5c2c88b8	migration_manager: assume `canonical_mutation` support Support for `canonical_mutation`s was added way back in Scylla 3.2. A lot of code in `migration_manager` is still checking whether the old `frozen_mutations` are received or need to be sent. We no longer need this code, since we don't support version skips during upgrade (and certainly not upgrades like 3.2->5.4). Leave a sanity checks in place, but otherwise delete the `frozen_mutation` branches.	2023-11-23 17:23:47 +01:00
Kamil Braun	0479e5529a	migration_manager: add `std::move` to avoid a copy	2023-11-23 17:23:47 +01:00
Kamil Braun	269a189526	schema_tables: refactor `scylla_tables(schema_features)` The `scylla_tables` function gives a different schema definition for the `system_schema.scylla_tables` table, depending on whether certain schema features are enabled or not. The way it was implemented, we had to write `θ(2^n)` amount of code and comments to handle `n` features. Refactor it so that the amount of code we have to write to handle `n` features is `θ(n)`.	2023-11-23 17:23:47 +01:00
Raphael S. Carvalho	157a5c4b1b	treewide: Avoid using namespace sstables in header to avoid conflicts That's needed for compaction_group.hh to be included in headers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-23 17:36:57 +02:00
Kamil Braun	c3257bf546	Revert "test: cql_test_env: Interrupt all components on cql_test_env teardown" This reverts commit `93ee7b7df9`. It's causing assertion failures when shutting down `cql_test_env` in boost unit tests: scylladb/scylladb#16144	2023-11-23 15:32:13 +01:00
Gleb Natapov	7267376eac	storage_service: topology coordinator: do not retry the metadata barrier forever in write_both_read_new state Handle the barrier failure by sleeping for a "ring delay" and continuing. The purpose of the barrier is to wait for all reads to old replica set to complete and fence the remaining requests. If the barrier fails we give the fence some time to propagate and continue with the topology change. Of fence did not propagate we may have stale reads, but this is not worse that we have with gossiper.	2023-11-23 15:30:10 +02:00
Gleb Natapov	7ea8fa459c	storage_service: topology coordinator: do not retry the metadata barrier forever in left_token_ring state Handle the barrier failure by sleeping for a "ring delay" and continuing. The purpose of the barrier is to wait for unfinished writes to decommissioned node complete. If barrier fails we give them some time to complete and then proceed with node decommission. The worse thing that may happen if some write will fail because the node will be shutdown.	2023-11-23 15:30:10 +02:00
Gleb Natapov	11b7ee32ec	storage_service: topology coordinator: return a node that is being removed from get_excluded_nodes Not that is removed is dead, so no need to talk to it.	2023-11-23 15:30:10 +02:00
Gleb Natapov	4c76b8b59f	storage_service: topology_coordinator: use new rollback_to_normal state in the rollback procedure Go through the rollback_to_normal state when the node needs to move to normal during the rollback and update fence in this state before moving the node to normal. This guaranties that the fence update will not be missed. Not that when a node moves to left state it already passes through left_token_ring which guaranties the same.	2023-11-23 15:29:36 +02:00
Gleb Natapov	95dd0e453d	storage_service: topology coordinator: add rollback_to_normal node state When a topology coordinator rolls back from unsuccessful topology operation it advances the fence (which is now in the raft state) after moving to normal state. We do not want this to fail (only majority of nodes is needed for it to not to), but currently it may fail in case the coordinator moves to another node after changing the rollback node's state to normal, but before updating the fence. To solve that the rollback operation needs to go through a new rollback_to_normal state that will do the fencing before moving to normal. This patch introduces that state, but does not use it yet.	2023-11-23 15:27:28 +02:00
Kamil Braun	5223d32fab	schema_tables: pass `reload` flag when calling `merge_schema` cross-shard In `0c86abab4d` `merge_schema` obtained a new flag, `reload`. Unfortunately, the flag was assigned a default value, which I think is almost always a bad idea, and indeed it was in this case. When `merge_schema` is called on shard different than 0, it recursively calls itself on shard 0. That recursive call forgot to pass the `reload` flag. Fix this.	2023-11-23 14:06:40 +01:00
Kamil Braun	de3607810d	system_keyspace: fix outdated comment	2023-11-23 14:06:27 +01:00
Piotr Dulikowski	c58ff554d8	raft: rpc: introduce destination_not_alive_error Add a new destination_not_alive_error, thrown from two-way RPCs in case when the RPC is not issued because the destination is not reported as alive by the failure detector. In snapshot transfer code, lower the verbosity of the message printed in case it fails on the new error. This is done to prevent flakiness in the CI - in case of slow runs, nodes might get spuriously marked as dead if they are busy, and a message with the "error" verbosity can cause some tests to fail.	2023-11-23 11:14:28 +01:00
Kamil Braun	03ecc8457c	Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive. Apart from adding this change, this PR adds a test case - `test_replacing_alive_node_fails` - that verifies it. A few testing framework adjustments were necessary to implement this test and to avoid flakiness in other tests that use the replace operation after the change. From now, we need to ensure that all nodes see the node being replaced as dead before starting the replace. Otherwise, the check added in this PR could reject the replace. Additionally, this PR changes the replace procedure in a way that if the replacing node reuses the IP of the node being replaced, other nodes can see it as alive only after the topology coordinator accepts its join request. The replacing node may become alive before the topology coordinator checks if the node being replaced is dead. If that happens and the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. Fixes #15863 Closes scylladb/scylladb#15926 * github.com:scylladb/scylladb: test: add test_replacing_alive_node_fails raft topology: reject replace if the node being replaced is not dead raft topology: add the gossiper ref to topology_coordinator test: test_cluster_features: stop gracefully before replace test: decrease failure_detector_timeout_in_ms in replace tests test: move test_replace to topology_custom test: server_add: wait until the node being replaced is dead test: server_add: add support for expected errors raft topology: join: delay advertising replacing node if it reuses IP raft topology: join: fix a condition in validate_joining_node	2023-11-23 10:31:59 +01:00
Kefu Chai	55103f4a6b	hints: move formatter of db::hints::sync_point to test the operator<<() based formatter is only used in its test, so let's move it to where it is used. we can always bring it back later if it is required in other places. but better off implementing it as a fmt::formatter<> then. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16142	2023-11-23 11:22:31 +02:00
Kefu Chai	a9c1a435ec	result_message: add formatter for result_message::rows before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for `cql_transport::messages::result_message::rows` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16143	2023-11-23 11:12:55 +02:00
Kefu Chai	6749d963ed	config: define formatter for db::seed_provider_type before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for db::seed_provider_type. please note, we are still formatting vector<db::seed_provider_type> with the helper provided by seastar/core/sstring.hh, which uses operator<<() to print the elements in the vector being printed. so we have to keep the operator<< formatter before disabling the generic formatter for vector<T>. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16138	2023-11-23 11:04:35 +02:00
Kefu Chai	ef76c4566b	gossiper: do not use {:d} fmt specifier when formating generation_number generation_number's type is `generation_type`, which in turn is a `utils::tagged_integer<struct generation_type_tag, int32_t>`, which formats using either fmtlib which uses ostream_formatter backed by operator<< . but `ostream_formatter` does not provide the specifier support. so {:d} does apply to this type, when compiling with fmtlib v10, it rejects the format specifier (the error is attached at the end of the commit message). so in this change, we just drop the format specifier. as fmtlib prints `int32_t` as a decimal integer, so even if {:d} applied, it does not change the behavior. ``` /home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: error: call to consteval function 'fmt::basic_format_string<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &>::basic_format_string<char[48], 0>' is not a constant expression 1798 \| auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen); \| ^ /usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression 2322 \| if (!in(arg_type, set)) throw_format_error("invalid format specifier"); \| ^ /usr/include/fmt/core.h:2395:14: note: in call to 'parse_presentation_type.operator()(1, 510)' 2395 \| return parse_presentation_type(pres::dec, integral_set); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2706:9: note: in call to 'parse_format_specs<char>(&"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47], formatter<mapped_type, char_type>().formatter::specs_, checker(s).context_, 13)' 2706 \| detail::parse_format_specs(ctx.begin(), ctx.end(), specs_, ctx, type); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2561:10: note: in call to 'formatter<mapped_type, char_type>().parse<fmt::detail::compile_parse_context<char>>(checker(s).context_)' 2561 \| return formatter<mapped_type, char_type>().parse(ctx); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2647:39: note: in call to 'parse_format_specs<utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, fmt::detail::compile_parse_context<char>>(checker(s).context_)' 2647 \| return id >= 0 && id < num_args ? parse_funcs_[id](context_) : begin; \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2485:15: note: in call to 'handler.on_format_specs(0, &"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47])' 2485 \| begin = handler.on_format_specs(adapter.arg_id, begin + 1, end); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2541:13: note: in call to 'parse_replacement_field<char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>> &>(&"Remote generation {:d} != local generation {:d}"[19], &"Remote generation {:d} != local generation {:d}"[47], checker(s))' 2541 \| begin = parse_replacement_field(p, end, handler); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/core.h:2769:7: note: in call to 'parse_format_string<true, char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>>>({&"Remote generation {:d} != local generation {:d}"[0], 47}, checker(s))' 2769 \| detail::parse_format_string<true>(str_, checker(s)); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: note: in call to 'basic_format_string<char[48], 0>("Remote generation {:d} != local generation {:d}")' 1798 \| auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16126	2023-11-23 11:02:44 +02:00
Tzach Livyatan	225f0ff5aa	Remove i3i from EC2 recommended EC2 instance types list There is no reason to prefer i3i over i4i. Closes scylladb/scylladb#16141	2023-11-23 10:09:34 +02:00
Kefu Chai	0e3f6186cb	build: disable enum-constexpr-conversion Clang-18 starts to complain when a constexp value is casted to a enum and the value is out of the range of the enum values. in this case, boost intentially cast the out-of-range values to the type to be casted. so silence this warning at this moment. since `lexical_cast.hpp` is included in multiple places in the source tree, this warning is disabled globally. the warning look like: ``` In file included from /home/kefu/dev/scylladb/types/types.cc:9: In file included from /usr/include/boost/lexical_cast.hpp:32: In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43: In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36: In file included from /usr/include/boost/numeric/conversion/cast.hpp:33: In file included from /usr/include/boost/numeric/conversion/converter.hpp:13: In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13: In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18: In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19: In file included from /usr/include/boost/mpl/integral_c.hpp:32: /usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'udt_buil tin_mixture_enum' [-Wenum-constexpr-conversion] 73 \| typedef AUX_WRAPPER_INST( BOOST_MPL_AUX_STATIC_CAST(AUX_WRAPPER_VALUE_TYPE, (value - 1)) ) prior; \| ^ /usr/include/boost/mpl/aux_/static_cast.hpp:24:47: note: expanded from macro 'BOOST_MPL_AUX_STATIC_CAST' 24 \| # define BOOST_MPL_AUX_STATIC_CAST(T, expr) static_cast<T>(expr) \| ^ In file included from /home/kefu/dev/scylladb/types/types.cc:9: In file included from /usr/include/boost/lexical_cast.hpp:32: In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:43: In file included from /usr/include/boost/lexical_cast/detail/converter_numeric.hpp:36: In file included from /usr/include/boost/numeric/conversion/cast.hpp:33: In file included from /usr/include/boost/numeric/conversion/converter.hpp:13: In file included from /usr/include/boost/numeric/conversion/conversion_traits.hpp:13: In file included from /usr/include/boost/numeric/conversion/detail/conversion_traits.hpp:18: In file included from /usr/include/boost/numeric/conversion/detail/int_float_mixture.hpp:19: In file included from /usr/include/boost/mpl/integral_c.hpp:32: /usr/include/boost/mpl/aux_/integral_wrapper.hpp:73:31: error: integer value -1 is outside the valid range of values [0, 3] for the enumeration type 'int_float_mixture_enum' [-Wenum-constexpr-conversion] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16082	2023-11-23 10:08:56 +02:00
Kefu Chai	d28598763d	build: s/-Wignore-qualifiers/-Wignored-qualifiers/ this was a typo introduced by `781b7de5`. which intended to add -Wignored-qualifiers to the compiling options, but it ended up adding -Wignore-qualifiers. in this change, the typo is corrected. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16124	2023-11-23 09:47:35 +02:00
Pavel Emelyanov	2f7f4ebb74	raft_state_machine: Check system.topology presense before tying to find it The write_mutations_to_database() decides if it needs to flush the database by checking if the mutations came to system.topology table and performing some more checks if they did. Overall this looks like auto topo_schema = db.find_schema(system.topology) if (target_schema != topo_schema) return false; // extra checks go here However, the system.topology table exists only if the feature named CONSISTENT_TOPOLOGY_CHANGES is enabled via commandline. If it's not, the call to db.find_schema(system.topology) throws and the whole attempt to write mutations throws too stopping raft state machine. Since the intention is to check if the target schema is the topology table, the presense of this table should come first. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16089	2023-11-23 09:35:43 +02:00
Takuya ASADA	c9d77699e1	scylla_setup: stop listing virtual devices on the NIC prompt Currently, the NIC prompt on scylla_setupshows up virtual devices such as VLAN devices and bridge devices, but perftune.py does not support them. To prevent causing error while running scylla_setup, we should stop listing these devices from the NIC prompt. closes #6757 Closes scylladb/scylladb#15958	2023-11-23 10:27:09 +03:00
Piotr Dulikowski	ab42932ba4	raft: rpc: drop RPCs if the destination is not alive If the failure detector sees the destination as dead, there is no use to send the RPC so drop it silently. This only affects two-way RPCs and "request" one-way RPCs. The one-way RPCs used as responses to other one-way RPCs are not affected.	2023-11-23 00:34:22 +01:00
Piotr Dulikowski	3e32ee2d36	raft: pass raft::failure_detector to raft_rpc In following commits, raft_rpc will drop outgoing messages if the destination is not seen as alive by the failure detector.	2023-11-23 00:34:22 +01:00
Piotr Dulikowski	a8ee4d543a	raft: transfer information about group0 liveness in direct_fd_ping Add a new variant of the reply to the direct_fd_ping which specifies whether the local group0 is alive or not, and start actively using it. There is no need to introduce a cluster feature. Due to how our serialization framework works, nodes which do not recognize the new variant will treat it as the existing std::monostate. The std::monostate means "the node and group0 is alive"; nodes before the changes in this commit would send a std::monostate anyway, so this is completely transparent for the old nodes.	2023-11-23 00:34:22 +01:00
Piotr Dulikowski	a1ebfcf006	raft: add server::is_alive Add a method which reports whether given raft server is running. In following commits, the information about whether the local raft group 0 is running or not will be included in the response to the failure detector ping, and the is_alive method will be used there.	2023-11-23 00:34:22 +01:00
Avi Kivity	00d82c0d54	Update tools/java submodule * tools/java 8485bef333...1048034277 (1): > resolver: download sigar artifact only for Linux / AMD64	2023-11-22 18:02:04 +02:00
Kefu Chai	cfcd34ba64	cql3: test_assignment: define formatter for assignment_testable add fmt formatter for `assignment_testable`. this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `assignment_testabe` without the help of `operator<<`. since we are still printing the shared_ptr<assignment_testable> using operator<<(.., const assignment_testable&), we cannot drop this operator yet. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16127	2023-11-22 17:44:07 +02:00
Tomasz Grabiec	b06a0078fb	Merge 'Support for sending tablet info to the drivers' from Sylwia Szunejko There is a need for sending tablet info to the drivers so they can be tablet aware. For the best performance we want to get this info lazily only when it is needed. The info is send when driver asks about the information that the specific tablet contains and it is directed to the wrong node/shard so it could use that information for every subsequent query. If we send the query to the wrong node/shard, we want to send the RESULT message with additional information about the tablet (replicas and token range) in custom_payload. Mechanism for sending custom_payload added. Sending custom_payload tested using three node cluster and cqlsh queries. I used RF=1 so choosing wrong node was testable. I also manually tested it with the python-driver and confirmed that the tablet info can be deserialized properly. Automatic tests added. Closes scylladb/scylladb#15410 * github.com:scylladb/scylladb: docs: add documentation about sending tablet info to protocol extensions Add tests for sending tablet info cql3: send tablet if wrong node/shard is used during modification statement cql3: send tablet if wrong node/shard is used during select statement locator: add function to check locality locator: add function to check if host is local transport: add function to add tablet info to the result_message transport: add support for setting custom payload	2023-11-22 17:44:07 +02:00
Botond Dénes	0ae1335daa	Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk" This reverts commit `11cafd2fc8`, reversing changes made to `2bae14f743`. Reverting because this series causes frequent CI failures, and the proposed quickfix causes other failures of its own. Fixes: #16113	2023-11-22 17:44:07 +02:00
Kefu Chai	48340380dd	scylla-sstable: print "validate" result in JSON instead of printing the result of the "validate" subcommand in a free-style plain text, let's print it using JSON. for two reasons: 1. it is simpler to consume the output with other tools and tests. 2. more consistent with other commands. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16105	2023-11-22 17:44:07 +02:00
Botond Dénes	8c5f5b7722	service/migration_manager: only reload schema when enabling disabled features Instead of unconditionally reloading schema when enabling any schema feature, only create a listener, if the feature was disabled in the first place. So that we don't trigger reloading of the schema on each schema feature, on node restarts. In this case, the node will start with all these features enabled already. This prevents unnecessary work on restarts. Fixes: #16112 Closes scylladb/scylladb#16118	2023-11-22 17:44:07 +02:00
Kefu Chai	ca1828c718	scylla-sstable: print "validate-checksum" result in JSON instead of printing the result of the "validate-checksum" subcommand with the logging message, let's print it using JSON. for three reasons: 1. it is simpler to consume the output with other tools and tests. 2. more consistent with other commands. 3. the logging system is used for audit the behavior and for debugging purposes, not for building a user-facing command line interface. 4. the behavior should match with the corresponding document. and in docs/operating-scylla/admin-tools/scylla-sstable.sst, we claim that `validate-checksums` subcommand prints a dict of ``` $ROOT := { "$sstable_path": Bool, ... } ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16106	2023-11-22 17:44:07 +02:00
Kefu Chai	43fd63e28c	clocks-impl: format time_point using fmt instead of relying on the operator<<() of an opaque type, use fmtlib to print a timepoint for better support of new fmtlib which dropped the default-generated formatter for types with operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16116	2023-11-22 17:44:07 +02:00
Nadav Har'El	242a4b23c0	Merge 'tests: Skip unnecessary sleeps in cql_test_env teardown' from Tomasz Grabiec This PR contains two patches which get rid of unnecessary sleeps on cql_test_env teardown greatly reducing run time of tests. Reduces run time of `build/dev/test/boost/schema_change_test` from 90s to 6s. Closes scylladb/scylladb#16111 * github.com:scylladb/scylladb: test: cql_test_env: Interrupt all components on cql_test_env teardown tests: cql_test_env: Skip gossip shutdown sleep	2023-11-22 17:44:07 +02:00
Anna Stuchlik	3751acce42	doc: fix rollback in the 5.2-to-5.4 upgrade guide This commit fixes the rollback procedure in the 5.2-to-5.4 upgrade guide: - The "Restore system tables" step is removed. - The "Restore the configuration file" command is fixed. - The "Gracefully shutdown ScyllaDB" command is fixed. In addition, there are the following updates to be in sync with the tests: - The "Backup the configuration file" step is extended to include a command to backup the packages. - The Rollback procedure is extended to restore the backup packages. - The Reinstallation section is fixed for RHEL. Also, I've removed the optional step to enable consistent schema management from the list of steps - the appropriate section has already been removed, but it remained in the procedure description, which was misleading. Refs https://github.com/scylladb/scylladb/issues/11907 This commit must be backported to branch-5.4 Closes scylladb/scylladb#16114	2023-11-22 17:44:07 +02:00
Takuya ASADA	b97df92d76	scylla_setup: stop aborting on old kernel warning when non-interactive mode On non-interactive mode setup, RHEL/CentOS7 old kernel check causes "Setup aborted", this is not what we want. We should keep warning but proceed setup, so default value of the kernel check should be True, since it will automatically applied on non-interactive mode. Fixes #16045 Closes scylladb/scylladb#16100	2023-11-22 17:44:07 +02:00
Botond Dénes	b1a76ebb93	Merge 'Sanitize storage service init/deinit sequences' from Pavel Emelyanov Currently storage service starts too early and its initialization is split into several steps. This PR makes storage service start "late enough" and makes its initialization (minimally required before joining cluster) happen in on place. refs: #2795 refs: #2737 Closes scylladb/scylladb#16103 * github.com:scylladb/scylladb: storage_service: Drop (un)init_messaging_service_part() pair storage_service: Init/Deinit RPC handlers in constructor/stop storage_service: Dont capture container() on RPC handler storage_service: Use storage_service::_sys_dist_ks in some places storage_service: Add explicit dependency on system dist. keyspace storage_service: Rurn query processor pointer into reference storage_service: Add explicity query_processor dependency main: Start storage service later	2023-11-22 17:44:07 +02:00
sylwiaszunejko	ac51c417ea	docs: add documentation about sending tablet info to protocol extensions	2023-11-22 09:23:43 +01:00
sylwiaszunejko	207d673ad6	Add tests for sending tablet info	2023-11-22 09:23:43 +01:00
sylwiaszunejko	cea4c40685	cql3: send tablet if wrong node/shard is used during modification statement	2023-11-22 09:23:43 +01:00
sylwiaszunejko	54f22927a3	cql3: send tablet if wrong node/shard is used during select statement	2023-11-22 09:23:43 +01:00
sylwiaszunejko	954d51389c	locator: add function to check locality	2023-11-22 09:23:43 +01:00
Eliran Sinvani	bfa839ce92	commitlog: enforce commitlog size hard limit by default Since the commitlog size hard limit is a failsafe mechanism, we don't expect to ever hit it. If we do hit the limit, it means that we have an exceptional condition in the system. Hence, the impact of enforcing the commitlog hard limit is irrelevant. Here we enforce the limit by default. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-22 08:48:28 +02:00
Eliran Sinvani	63d62a7db2	commitlog: set flush threshold to half of the limit size Once we enable commitlog hard limit by default, we would like to have some room in case flushing memtables takes some time to catch up. This threshold is half the limit. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-22 08:48:28 +02:00
Eliran Sinvani	d2a8651bce	commitlog: unfold flush threshold assignment This commit is only a cosmetic change. It is meant to make the flush threshold assignment more readable and comprehensible so future changes are easier to review. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-22 08:48:28 +02:00
sylwiaszunejko	a0c8531875	locator: add function to check if host is local	2023-11-21 15:15:20 +01:00
sylwiaszunejko	93420353f4	transport: add function to add tablet info to the result_message	2023-11-21 15:15:20 +01:00
sylwiaszunejko	75b3dbf7ea	transport: add support for setting custom payload A custom payload can now be added to response_message. If it is set, it will be sent to client and the custom_payload flag will be set. write_string_bytes_map method is added to response class and a missing custom_payload flag is added to cql_frame_flags.	2023-11-21 15:09:36 +01:00
Pavel Emelyanov	74329e5aee	test: Add object_store test to validate config reloading works The test case is - start scylla with broken object storage endpoint config - create and populate s3-backed keyspace - try flushing it (API call would hang, so do it in the background) - wait for a few seconds, then fix the config - wait for the flush to finish and stop scylla - start scylla again and check that the keyspace is properly populated Nice side effect of this test is that once flush fails (due to broken config) it tries to remove the not-yet-sealed sstables and (!) fails again, for the same reason. So during the restart there happen to be several sstables in "creating" state with no stored objects, so this additionally tests one more g.c. corner case Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	26f8202651	test: Add config update facility to test cluster The Cluster wrapper used by object_store test already has the ability to access cluster via CQL and via API. Add the sugar to make the cluster re-read its scylla.yaml and other configs Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	4a531e4129	test: Make S3_Server export config file as pathlib.Path The pylib minio server does that already. A test case added by the next patch would need to have both cases as path, not as string Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	210b01a5ce	config: Make object storage config updateable_value_source Now its plain updateable_value, but without the ..._source object the updateable_value is just a no-op value holder. In order for the observers to operate there must be the value source, updating it would update the attached updateable values _and_ notify the observers. In order for the config to be the u.v._source, config entries should be comparable to each other, thus the <=> operator for it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	9eb96a03f0	memtable: Extend list of checking codes When flushing an sstable there can be errors that are not fatal and shouldn't cause the whole scylla to die. Currently only ENOSPC and EDQUOT are considered as such, but there's one more possibility -- access denied errors. Those can happen, for example, if datadir is chmod/chown-ed by mistake or intentionally while scylla is running (doing it pre-start time won't trigger the issue as distributed loader checks permissions of datadir on boot). Another option to step on "access denied" error is to flush memtable on S3 storage with broken configuration. Anyway, seeing the access denied error is also a good reason not to crash, but print a warning in logs and retry in a hope that the node administrator fixed things. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	a34dae8c37	sstables/storage/s3: Fix missing TOC status check When TOC file is missing while garbage collecting the S3 server would resolve with storage_io_error(ENOENT) nowadays Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Pavel Emelyanov	855626f7de	s3/client: Map http exceptions into storage_io_error When http request resolves with excpetion it makes sense to translate the network exception into storage exceptio to make upper layers think that it was some sort of IO error, not SUDDENLY and http one. The translation is, for now, pretty simple: - 404 and 3xx -> ENOENT - 403(forbidden) and 401(unauthorized) -> EACCESS - anything else -> EIO Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 16:47:50 +03:00
Patryk Jędrzejczak	566176bcd1	test: add test_replacing_alive_node_fails We add a test for the Raft-based topology's new feature - rejecting the replace operation if the node being replaced is considered alive by the failure detector. This test is not so fast, and it does not test any critical paths so we run it only in dev mode.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	bf7a67224c	raft topology: reject replace if the node being replaced is not dead The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	94ffdb4792	raft topology: add the gossiper ref to topology_coordinator It is used in the following commit.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	8605cdd9cd	test: test_cluster_features: stop gracefully before replace In on of the previous commits, we have made ManagerClient.server_add wait until all running nodes see the node being replaced as dead. Unfortunately, the waiting time is around 20 s if we stop the node being replaced ungracefully. We change the stop procedure to graceful to not slow down the test.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	206a446a02	test: decrease failure_detector_timeout_in_ms in replace tests In one of the previous commits, we have made ManagerClient.server_add wait until all running nodes see the node being replaced as dead. Unfortunately, the waiting time can be around 20 s if we stop the node being replaced ungracefully. 20 s is the default value of the failure detector timeout. We don't want to slow down the replace operations this much for no good reason. We could use server_stop_gracefully instead of server_stop everywhere, but we should have at least a few replace tests with server_stop. For now, test_replace and test_raft_ignore_nodes will be these tests. To keep them reasonably fast, we decrease the failure_detector_timeout_in_ms value on all initial servers. We also skip test_replace in debug mode to avoid flakiness due to low failure_detector_timeout_in_ms (test_raft_ignore_nodes is already skipped).	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	7062ff145e	test: move test_replace to topology_custom In the following commit, we make all servers in test_replace use failure-detector-timeout-in-ms = 2000. Therefore, we need test_replace to be in a suite with initial_size equal to 0.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	9775b1c12d	test: server_add: wait until the node being replaced is dead In the following commits, we make the topology coordinator reject join requests if the node being replaced is considered alive by the gossiper. Before making this change, we need to adapt the testing framework so that we don't have flaky replace operations that fail because the node being replaced hasn't been marked as dead yet. We achieve this by waiting until all other running nodes see the node being replaced as dead in all replace operations.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	18ed89f760	test: server_add: add support for expected errors After this change, if we try to add a server and it fails with an expected error, the add_server function will not throw. Also, the server will be correctly installed and stopped. Two issues are motivating this feature. The first one is that if we want to add a server while expecting an error, we have to do it in two steps: - call server_add with the start parameter set to False, - call server_start with the expected_error parameter. It is quite inconvenient. The second one is that we want to be able to test the replace operation when it is considered incorrect, for example when we try to replace an alive node. To do this, we would have to remove some assertions from ScyllaCluster.add_server. However, we should not remove them because they give us clear information when we write an incorrect test. After adding the expected_error parameter, we can ignore these assertions only when we expect an error. In this way, we enable testing failing replace operations without sacrificing the testing framework's protection.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	ee45a1c430	raft topology: join: delay advertising replacing node if it reuses IP After this change, other nodes can see the replacing node as alive only after the topology coordinator accepts its join request. In the following commits, we make the topology coordinator reject join requests if the node being replaced is considered alive by the gossiper. However, the replacing node may become alive before the topology coordinator does the validation. If the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. The gossiper-based topology also delays the replacing node from advertising itself if it reuses the IP. To achieve the same effect in raft-based topology, we only need to move the definition of replacing_a_node_with_same_ip. However, there is a code that puts bootstrap tokens of the node being replaced into the gossiper state, and it depends on replacing_a_node_with_same_ip and replacing_a_node_with_diff_ip being always false in the raft-based topology mode. We prevent it from breaking by changing the condition.	2023-11-21 12:39:16 +01:00
Patryk Jędrzejczak	c0e4b8e9c0	raft topology: join: fix a condition in validate_joining_node It was incorrect. node.rs->state evaluated to node_state::none for both join and replace.	2023-11-21 12:39:13 +01:00
Tomasz Grabiec	93ee7b7df9	test: cql_test_env: Interrupt all components on cql_test_env teardown This should interrupt all sleeps in component teardown. Before this patch, there was a 1s sleep on gossiper shutdown, which I don't know where it comes from. After the patch there is no such sleep.	2023-11-21 12:22:32 +01:00
Tomasz Grabiec	7f3a74efab	tests: cql_test_env: Skip gossip shutdown sleep Removes unnecessary 2s sleep on each cql test env teardown.	2023-11-21 12:22:24 +01:00
Pavel Emelyanov	0e9428ab4a	exceptions: Extend storage_io_error construction options To make it possible to construct it with plain errno value and a string Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-21 13:37:52 +03:00
Calle Wilund	33fba28265	commitlog_test: Add test for replaying large-ish mutation (i.e. cross several normal-sized buffers).	2023-11-21 08:50:57 +00:00
Calle Wilund	0d41769daa	commitlog_test: Add additional test for segmnent truncation Emulate replay of a non-sealed segment, verifying we don't get data beyond termination point, as well as the correct exception.	2023-11-21 08:50:57 +00:00
Calle Wilund	57a4645c81	docs: Add docs on commitlog format 3	2023-11-21 08:50:57 +00:00
Calle Wilund	6b66daabfc	commitlog: Remove entry CRC from file format Since CRC is already handled by disk blocks, we can remove some of the entry CRC:ing, both simplifying code and making at least that part of both write and read faster.	2023-11-21 08:50:57 +00:00
Calle Wilund	e29bf6f9e8	commitlog: Implement new format using CRC:ed sectors Breaks the file into individually tagged + crc:ed pages. Each page (sized as disk write alignment) gets a trailing 12-byte metadata, including CRC of the first page-12 bytes, and the ID of the segment being written. When reading, each page read is CRC:ed and checked to be part of the expected segment by comparing ID:s. If crc is broken, we have broken data. If crc is ok, but ID does not match, we have a prematurely terminated segment (truncated), which, depending on whether we use batch mode or not, implied data loss.	2023-11-21 08:50:54 +00:00
Calle Wilund	18e79d730e	commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges With somewhat less overhead than creating 100+ temporary_buffer proxies	2023-11-21 08:42:33 +00:00
Calle Wilund	560364d278	fragmented_temporary_buffer: Add const iterator access to underlying buffers Breaks abstraction a bit, but some (me) might need something like it...	2023-11-21 08:42:33 +00:00
Calle Wilund	862f4f2ed3	commitlog_replayer: differentiate between truncated file and corrupt entries Refs #11845 When replaying, differentiate between the two cases for failure we have: - A broken actual entry - i.e. entry header/data does not hold up to crc scrutiny - Truncated file - i.e. a chunk header is broken or unreadable. This can be due to either "corruption" (i.e. borked write, post-corruption, hw whatever), or simply an unterminated segment. The difference is that the former is recoverable, the latter is not. We now signal and report the two separately. The end result for a user is not much different, in either case they imply data loss and the need for repair. But there is some value in differentiating which of the two we encountered. Modifies and adds test cases.	2023-11-21 08:42:33 +00:00
Botond Dénes	65e42e4166	Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, will not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired. This series fixes the bug and adds a minimal reproducer test. Fixes #10598 Closes scylladb/scylladb#16003 * github.com:scylladb/scylladb: mutation_query_test: test that range tombstones are sent in reverse queries mutation_query: properly send range tombstones in reverse queries	2023-11-21 09:19:14 +02:00
Kefu Chai	691f7f6edb	util: do not use variable length array vla (variable length array) is an extension in GCC and Clang. and it is not part of the C++ standard. so let's avoid using it if possible, for better standard compliant. it's also more consistent with other places where we calculate the size of an array of T in the same source file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16084	2023-11-20 23:02:41 +02:00
Nadav Har'El	0fd10690d4	Merge 'When creating S3-backed keyspace, check the endpoint instantly' from Pavel Emelyanov Currently CREATE KEYSPACE ... WITH STORAGE = { 'type' = 'S3' ... } will create keyspace even if the backend configuration is "invalid" in the sense that the requested endpoint is not known to scylla via object_storage.yaml config file. The first time after that when this misconfiguration will reveal itself is when flushing a memtable (see #15635), but it's good to know the endpoint is not configured earlier than that. fixes: #15074 Closes scylladb/scylladb#16038 * github.com:scylladb/scylladb: test: Add validation of misconfigured storage creation sstables: Throw early if endpoint for keyspace is not configured replica: Move storage options validation to sstables manager test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store sstables: Add has_endpoint_client() helper to manager	2023-11-20 21:12:48 +02:00
Kefu Chai	9a3c7cd768	build: cmake: drop Seastar_OptimizationLevel_* in this change, * all `Seastar_OptimizationLevel_` are dropped. mode.Sanitize.cmake: s/CMAKE_CXX_FLAGS_COVERAGE/CMAKE_CXX_FLAGS_SANITIZE/ * mode.Dev.cmake: s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/ Seastar_OptimizationLevel_* variables have nothing to do with Seastar, and they introduce unnecessary indirection. the function of `update_cxx_flags()` already requires an option name for this parameter, so there is no need to have a name for it. the cached entry of `Seastar_OptimizationLevel_DEBUG` is also dropped, if we really need to have knobs which can be configured by user, we should define them in a more formal way. at this moment, this is not necessary. so drop it along with this variable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16059	2023-11-20 19:26:54 +02:00
Botond Dénes	6e9850067b	Merge 'Make test-only write_memtable_to_sstable() overloads shorter' from Pavel Emelyanov There are three of them, one is used by core, another by tests and the third one passes arguments between those two. And the ..._for_tests() helper in test utils. This PR leaves only one for tests out of three. Closes scylladb/scylladb#16068 * github.com:scylladb/scylladb: tests: Shorten the write_memtable_to_sstable_for_test() replica: Squash two write_memtable_to_sstable() replica: Coroutinize one of write_memtable_to_sstable() overloads	2023-11-20 16:05:06 +02:00
Pavel Emelyanov	9b16c298e9	test: Add validation of misconfigured storage creation In an attempt to create a non-local keyspace with unknown endpoint, there should pop up the configuration exception. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 15:25:58 +03:00
Pavel Emelyanov	2bf1e2a294	sstables: Throw early if endpoint for keyspace is not configured When a keyspace is created it initiaizes the storage for it and initialization of S3 storage is the good place to check if the endpoint for the storage is configured at all. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 15:25:58 +03:00
Pavel Emelyanov	f2a99ad30a	replica: Move storage options validation to sstables manager Currently the cql statement .validate() callback is responsible for checking if the non-local storage options are allowed with the respective feature. Next patch will need to extend this check to also validate the details of the provided storage options, but doing it at cql level doesn't seem correct -- it's "too far" from query processor down to sstables manager. Good news is that there's a lower-level validation of the new keyspace, namely the database::validate_new_keyspace() call. Move the storage options validation into sstables manager, while at it, reimplement it as a visitor to facilitate further extentions and plug the new validation to the aforementioned database::validate_new_keyspace(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 15:24:59 +03:00
Botond Dénes	f53961248d	gms,service: add a feature to protect the usage of allow_mutation_read_page_without_live_row allow_mutation_read_page_without_live_row is a new option in the partition_slice::option option set. In a mixed clusters, old nodes possibly don't know this new option, so its usage must be protected by a cluster feature. This patch does just that. Fixes: #15795 Closes scylladb/scylladb#15890	2023-11-20 13:03:55 +01:00
Botond Dénes	935065fd8d	Update tools/java submodule * tools/java b776096d...8485bef3 (2): > dist: Require jre-11-headless in from rpm > dist: remove duplicated java-headless from "Requires"	2023-11-20 13:55:55 +02:00
Pavel Emelyanov	b31b51ae90	test/cql-pytest/test_keyspaces: Move DESCRIBE case to object store We're going to ban creation of a keyspace with S3 type in case the requested endpoint is not configured. The problem is that this test case of cql-pytest needs such keyspace to be created and in order to provide the object storage configuration we'd need to touch the generic scylla cluster management which is an overill for generic cql-pytest case. Simpler solution is to make object_store test suite perform all the S3-related checks, including the way DESCRIBE for S3-backed ks works. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:31:08 +03:00
Pavel Emelyanov	2c31cd7817	sstables: Add has_endpoint_client() helper to manager It's the get_endpoint_client() peer that only checks the client presense. To be used by next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:31:08 +03:00
Pavel Emelyanov	8ae751a3ff	tests: Shorten the write_memtable_to_sstable_for_test() The wrapper just calls the test-only core write_memtable_to_sstable() overload, tests can do it on their own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:27:57 +03:00
Pavel Emelyanov	1d7d2dff50	replica: Squash two write_memtable_to_sstable() There are three of them and one acts purely as arguments passer between other two. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:27:57 +03:00
Pavel Emelyanov	e9826858a9	replica: Coroutinize one of write_memtable_to_sstable() overloads Simpler to read and patch further this way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 14:27:57 +03:00
Pavel Emelyanov	f4626f6b8e	storage_service: Drop (un)init_messaging_service_part() pair It's no longer needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:59:08 +03:00
Pavel Emelyanov	c42c13e658	storage_service: Init/Deinit RPC handlers in constructor/stop All the services that need to register RPC handlers do it in service constructor or .start() method. Unregistration happens in .stop(). Storage service explicitly (de)initializes its RPC handlers in dedicated calls, but there's no point in that. The handlers' accessibility is determined by messaging service start_lister/shutdown, handlers themselves can be registered any time before it and unregistered any time after it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:57:07 +03:00
Pavel Emelyanov	40cb9dd66f	storage_service: Dont capture container() on RPC handler The handlers are about to be initialized from inside storage_service constructor. At that time container() is not yet available and its invalid to capture it on handlers' lambda. Fortunately, there's only one handler that does it, other handlers capture 'this' and call container() explicitly. This patch fixes the remaining one to do the same. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:55:56 +03:00
Pavel Emelyanov	cc76f03f63	storage_service: Use storage_service::_sys_dist_ks in some places The main goal here is to drop sys.dist.ks argument from the init_messaging_service call to make future patching simpler. While doing it it turned out that the argument was needed to be passed all the way down to the mark_existing_views_as_built(), so this patch also dropes this argument from this whole call-trace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:53:55 +03:00
Pavel Emelyanov	4df5af931a	storage_service: Add explicit dependency on system dist. keyspace This effectively reverts `bc051387c5` (storage_service: Remove sys_dist_ks from storage_service dependencies) since now storage service needs the sys. disk. ks not only cluster join time. Next patch will make more use of it as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:52:42 +03:00
Pavel Emelyanov	a7f23930cb	storage_service: Rurn query processor pointer into reference It's non-nullptr all the time after previous patch and can be a reference instead Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:52:04 +03:00
Pavel Emelyanov	e59544674a	storage_service: Add explicity query_processor dependency It's now set via a dedicated call that happens after query processor is started. Now query processor is started before storage service and the latter can get the q.p. local reference via constructor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:51:09 +03:00
Pavel Emelyanov	6ee8e7a031	main: Start storage service later The storage service is top-level service which depends on many other services. Recently (see `d42685d0cb` storage_service: Load tablet metadata on boot and from group0 changes) it also got implicit dependency on query processor, but it still starts too early for explicit reference on q.p. This patch moves storage service start to later times. This is possible because storage service is not explicitly needed by any other component start/init in between its old and new start places. Also, cql_test_ent starts storage service "that late" too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-20 13:48:30 +03:00
Nadav Har'El	5752dc875b	Merge 'Materialize_views: don't construct `global_schema_ptr` from views schemas that lacks base information' from Eliran Sinvani This miniset addresses two potential conversions to `global_schema_ptr` of incomplete materialized views schemas. One of them was completely unnecessary and also is a "chicken and an egg" problem where on the sync schema procedure itself a view schema was converted to `global_schema_ptr` solely for the purposes of logging. This can create a "hickup" in the materialized views updates if they are comming from a node with a different mv schema. The reason why sometimes a synced schema can have no base info is because of deactivision and reactivision of the schema inside the `schema_registry` which doesn't restore the base information due to lack of context. When a schema is synced the problem becomes easy since we can just use the latest base information from the database. Fixes #14011 Closes scylladb/scylladb#14861 * github.com:scylladb/scylladb: migration manager: fix incomplete mv schemas returned from get_schema_for_write migration_manager: do not globalize potentially incomplete schema	2023-11-20 11:54:01 +02:00
Pavel Emelyanov	3471f30b58	view_update_generator: Unplug from database later Patch `967ebacaa4` (view_update_generator: Move abort kicking to do_abort()) moved unplugging v.u.g from database from .stop() to .do_abort(). The latter call happens very early on stop -- once scylla receives SIGINT. However, database may still need v.u.g. plugged to flush views. This patch moves unplug to later, namely to .stop() method of v.u.g. which happens after database is drained and should no longer continue view updates. fixes: #16001 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16091	2023-11-20 11:47:55 +02:00
Botond Dénes	fd11eeeaa3	Merge 'dist/redhat: drop unnecessary variables and tags' from Kefu Chai this is a cleanup in `scylla.spec`. Closes scylladb/scylladb#16097 * github.com:scylladb/scylladb: dist/redhat: group sub-package preambles together dist/redhat: drop unused `defines` variable dist/redhat: remove tags for subpackage which are same as main preamble	2023-11-20 11:46:56 +02:00
Asias He	c605220bb3	repair: Introduce small table optimization ) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. ) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. ) Performance: 1) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X 2) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 3) 3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3} , 10 ms network latency Before: ``` repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=2305, round_nr=6916, round_nr_fast_path_already_synced=6915, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015 seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2, 1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385}, {127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178}, {127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2, 0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793}, {127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7, 0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634}, {127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4, 2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634}, {127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9, 2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to repair is reduced from 986s (16 minutes) to 1.6s *) Summary So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes #16011 Refs #15159	2023-11-20 15:11:16 +08:00
Kefu Chai	71f352896d	dist/redhat: group sub-package preambles together group sections like `%build` and `%install` together, to improve the readability of the spec recipe. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-20 12:19:33 +08:00
Kefu Chai	3f108629b9	dist/redhat: drop unused `defines` variable this variable was introduced in `6d7d0231`. back then, we were still building the binaries in .spec, but we've switched to the relocatable package now, so there is no need to use keep these compilation related flags anymore. in this change, the `defines` variable is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-20 12:19:33 +08:00
Kefu Chai	d69b4838ea	dist/redhat: remove tags for subpackage which are same as main preamble this is a cleanup. if a subpackage is licensed under a different license from the one specified in the main preamble, we need to use a distinct License tag on a per-subpackage basis. but if it is licensed with the identical license, it is not necessary. since all three subpackages of "*-{server, conf, kernel-conf}" are licensed under AGPLv3, there is no need to repeat the "License:" tag in their own preamble section. the same applies to the "URL" tag. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-20 12:19:33 +08:00
Eliran Sinvani	63631257db	migration manager: fix incomplete mv schemas returned from get_schema_for_write Sometimes a view registry can get deactivated inside the schema registry, this happens due to dactivating and reactivating the registry entry which doesn't rebuild the base table information in the view. This error is later caught when trying to convert the schema into a `global_schema_ptr`, however, the real bug here is that not all schemas returned from `get_schema_for_write` are suitable for write because the mv schemas can be incomplete. This commit changes the aforementioned function in order to fix the bug. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-20 06:07:20 +02:00
Piotr Grabowski	321459ec51	install-dependencies.sh: update node_exporter to 1.7.0 Update node_exporter to 1.7.0. The previous version (1.6.1) was flagged by security scanners (such as Trivy) with HIGH-severity CVE-2023-39325. 1.7.0 release fixed that problem. [Botond: regenerate frozen toolchain] Fixes #16085 Closes scylladb/scylladb#16086 Closes scylladb/scylladb#16090	2023-11-19 18:15:44 +02:00
Calle Wilund	6ffb482bf3	Commitlog replayer: Range-check skip call Fixes #15269 If segment being replayed is corrupted/truncated we can attempt skipping completely bogues byte amounts, which can cause assert (i.e. crash) in file_data_source_impl. This is not a crash-level error, so ensure we range check the distance in the reader. v2: Add to corrupt_size if trying to skip more than available. The amount added is "wrong", but at least will ensure we log the fact that things are broken Closes scylladb/scylladb#15270	2023-11-19 17:44:55 +02:00
Gleb Natapov	6edbf4b663	storage_service: topology coordinator: put fence version into the raft state Currently when the coordinator decides to move the fence it issues an RPC to each node and each node locally advances fence version. This is fine if there are no failures or failures are handled by retrying fencing, but if we want to allow topology changes to progress even in the presence of barrier failures it is easier to store the fence version in the raft state. The nodes that missed fence rpc may easily catch up to the latest fence version by simply executing a raft barrier.	2023-11-19 15:28:08 +02:00
Eliran Sinvani	562403b82f	migration_manager: do not globalize potentially incomplete schema There was a case where maybe sync function of a materialized view could fail to sync if the view version was old. This is because adding the base information to the view is only relevant until the record is synced. This triggers an internal error in the `global_schem_ptr` constructor. The conversion to global pointer in that case was solely for logging purposes so instead, we pass the pieces of information needed for the logging itself. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-11-19 14:13:01 +02:00
Botond Dénes	eb674128ca	Merge 'treewide: do not mark return value const if this has no effect ' from Kefu Chai this change is a cleanup to add `-Wignore-qualifiers` when building the tree. to mark a return value without value semantics has no effect. these `const` specifier useless. so let's drop them. and, if we compile the tree with `-Wignore-qualifiers`, the compiler would warn like: ``` /home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] 245 \| const index_metadata_kind kind() const; \| ^~~~~ ``` so this change also silences the above warnings. Closes scylladb/scylladb#16083 * github.com:scylladb/scylladb: build: enable -Wignore-qualifiers treewide: do not mark return value const if this has no effect	2023-11-17 15:35:20 +02:00
Kefu Chai	781b7de502	build: enable -Wignore-qualifiers `-Wignore-qualifiers` is included by -Wextra. but we are not there yet, with this change, we can keep the changes introducing -Wignore-qualifiers warnings out of the repo, before applying `-Wextra`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-17 17:49:47 +08:00
Kefu Chai	15bfa09454	treewide: do not mark return value const if this has no effect this change is a cleanup. to mark a return value without value semantics has no effect. these `const` specifier useless. so let's drop them. and, if we compile the tree with `-Wignore-qualifiers`, the compiler would warn like: ``` /home/kefu/dev/scylladb/schema/schema.hh:245:5: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] 245 \| const index_metadata_kind kind() const; \| ^~~~~ ``` so this change also silences the above warnings. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-17 17:46:19 +08:00
Tomasz Grabiec	6bcf3ac86c	Merge 'Fix a few rare bugs in row cache' from Michał Chojnowski This is a loose collection of fixes to rare row cache bugs flushed out by running test_concurrent_reads_and_eviction several million times. See individual commits for details. Fixes #15483 Closes scylladb/scylladb#15945 * github.com:scylladb/scylladb: partition_version: fix violation of "older versions are evicted first" during schema upgrades cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound() cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity() cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads cache_flat_mutation_reader: fix some cache mispopulations with reverse reads cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads cache_flat_mutation_reader: never make an unlinked last dummy continuous	2023-11-16 23:48:17 +01:00
Michał Chojnowski	9ccd4ea416	partition_version: fix violation of "older versions are evicted first" during schema upgrades A schema upgrade appends a MVCC version B after an existing version A. The last dummy in B is added to the front of LRU, so it will be evicted after the entries in A. This alone doesn't quite violate the "older versions are evicted first" rule, because the new last dummy carries no information. But apply_monotonically generally assumes that entries on the same position have the obvious eviction order, even if they carry no information. Thus, after the merge, the rule can become broken. The proposed fix is as follows: - In the case where A is merged into B, the merged last dummy inherits the link of A. - The merging of B into anything is prevented until its merge with A is finished. This is relatively hacky, because it still involves a state that goes against some natural expectations granted by the "older versions..." rule. A less hacky fix would be to ensure that the new dummy is inserted into a proper place in the eviction order to begin with. Or, better yet, we could eliminate the rule altogether. Aside from being very hard to maintain, it also prevents the introduction of any eviction algorithm other than LRU.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	2aac8690c7	cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound() ensure_population_lower_bound() guarantees that _last_row is valid or null. However, it fails to provide this guarantee in the special rare case when `_population_range_starts_before_all_rows == true` and _last_row is non-null. (This can happen in practice if there is a dummy at before_all_clustering_rows and eviction makes the `(before_all_clustering_rows, ...)` interval discontinous. When the interval is read in this state, _last_row will point to the dummy, while _population_range_starts_before_all_rows will still be true.) In this special case, `ensure_population_lower_bound()` does not refresh `_last_row`, so it can be non-null but invalid after the call. If it is accessed in this state, undefined behaviour occurs. This was observed to happen in a test, in the `read_from_underlying() -- maybe_drop_last_entry()` codepath. The proposed fix is to make the meaning of _population_range_starts_before_all_rows closer to its real intention. Namely: it's supposed to handle the special case of a left-open interval, not the case of an interval starting at -inf.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	0dcf91491e	cache_flat_mutation_reader: fix a continuity loss in maybe_update_continuity() To reflect the final range tombstone change in the populated range, maybe_update_continuity() might insert a dummy at `before_key(_next_row.table_position())`. But the relevant logic breaks down in the special case when that position is equal to `_last_row.position()`. The code treats the dummy as a part of the (_last_row, _next_row) range, but this is wrong in the special case. This can lead to inconsistent state. For example, `_last_row` can be wrongly made continuous, or its range tombstone can be wrongly nulled. The proposed fix is to only modify the dummy if it was actually inserted. If it had been inserted beforehand (which is true in the special case, because of the `ensure_population_lower_bound()` call earlier), then it's already in a valid state and doesn't need changes.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	6601c778dd	cache_flat_mutation_reader: fix continuity losses during cache population races with reverse reads Cache population routines insert new row entries. In non-reverse reads, the new entries (except for the lower bound of the query range) are filled with the correct continuity and range tombstones immediately after insertion, because that information has already arrived from underlying. when the entries are inserted. But in reverse reads, it's the interval after the newly-inserted entry that's made continuous. The continuity information in the new entries isn't filled. When two population routines race, the one which comes later can punch holes in the continuity left by the first routine, which can break the "older versions are evicted first" rule and revert the affected interval to an older version. To fix this, we must make sure that inserting new row entries doesn't change the total continuity of the version.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	47299d6b06	partition_snapshot_row_cursor: fix a continuity loss in ensure_entry_in_latest() with reverse reads The FIXME comment claims that setting continity isn't very important in this place, but in fact this is just wrong. If two calls to read_from_underlying() get into a race, the one which finishes later can call ensure_entry_in_latest() on a position which lies inside a continuous interval in the newest version. If we don't take care to preserve the total continuity of the version, this can punch a hole in the continuity of the newest version, potentially reverting the affected interval to an older version. Fix that.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	b5988fb389	cache_flat_mutation_reader: fix some cache mispopulations with reverse reads `_last_row` is in table schema, but it is sometimes compared with positions in query schema. This leads to unexpected behaviour when reverse reads are used. The previous patch fixed one such case, which was affecting correctness. As far as I can tell, the three cases affected by this patch aren't a correctness problem, but can cause some intervals to fail to be made continuous. (And they won't be cached even if the same read is repeated many times).	2023-11-16 19:01:18 +01:00
Michał Chojnowski	f9eb64b8e0	cache_flat_mutation_reader: fix a logic bug in ensure_population_lower_bound() with reverse reads `_last_row` is in table schema, while `cur.position()` is in query schema (which is either equal to table schema, or its reverse). Thus, the comparison affected by this patch doesn't work as intended. In reverse reads, the check will pass even if `_last_row` has the same key, but opposite bound weight to `cur`, which will lead to the dummy being inserted at the wrong position, which can e.g. wrongly extend a range tombstone. Fix that.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	ec364c3580	cache_flat_mutation_reader: never make an unlinked last dummy continuous It is illegal for an unlinked last dummy to be continuous, (this is how last dummies respect the "older verions are evicted first" rule), but it is technically possible for an unlinked last dummy to be made continuous by read_from_underlying. This commit fixes that. Found by row_cache_test. The bug is very unlikely to happen in practice because the relevant rows_entry is bumped in LRU before read_from_underlying starts. For the bug to manifest, the entry has to fall down to the end of the LRU list and be evicted before read_from_underlying() ends. Usually it takes several minutes for an entry to fall out of LRU, and read_from_underlying takes maybe a few hundred milliseconds. And even if the above happened, there still needs to appear a new version, which needs to have its continuous last dummy evicted before it's merged.	2023-11-16 19:01:18 +01:00
Anna Stuchlik	ca22de4843	doc: mark the link to upgrade guide as OSS-only This commit adds the .. only:: opensource directive to the Raft page to exclude the link to the 5.2-to-5.4 upgrade guide from the Enterprise documentation. The Raft page belongs to both OSS and Enterprise documentation sets, while the upgrade guide is OSS-only. This causes documentation build issues in the Enterprise repository, for example, https://github.com/scylladb/scylla-enterprise/pull/3242. As a rule, all OSS-only links should be provided by using the .. only:: opensource directive. This commit must be backported to branch-5.4 to prevent errors in the documentation for ScyllaDB Enterprise 2024.1 (backport) Closes scylladb/scylladb#16064	2023-11-16 10:36:27 +02:00
Kefu Chai	687ba9cacc	test/sstable_compaction_test: check every sstable replaced sstable before this change, in sstable_run_based_compaction_test, we check every 4 sstables, to verify that we close the sstable to be replaced in a batch of 4. since the integer-based generation identifier is monotonically incremental, we can assume that the identifiers of sstables are like 0, 1, 2, 3, .... so if the compaction consumes sstable in a batch of 4, the identifier of the first one in the batch should always be the multiple of 4. unfortunately, this test does not work if we use uuid-based identifier. but if we take a closer look at how we create the dataset, we can have following facts: 1. the `compaction_descriptor` returned by `sstable_run_based_compaction_strategy_for_tests` never set `owned_ranges` in the returned descriptor 2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no` is used, if `_owned_ranges_checker` is empty 3. `mutation_reader_merger` respects the `fwd_mr` passed to its ctor, so it closes current sstable immediately when the underlying mutation reader reaches the end of stream. in other words, we close every sstable once it is fully consumed in sstable_ompaction_test. and the reason why the existing test passes is that we just sample the sstables whose generation id is a multiple of 4. what happens when we perform compaction in this test is: 1. replace 5 with 33, closing 5 2. replace 6 with 34, closing 6 3. replace 7 with 35, closing 7 4. replace 8 with 36, closing 8 << let's check here.. good, go on! 5. replace 13 with 37, closing 13 ... 8. replace 16 with 40, closing 16 << let's check here.. also, good, go on! so, in this change, we just check all old sstables, to verify that we close each of them once it is fully consumed. Fixes #16073 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-16 16:21:46 +08:00
Kefu Chai	18792fe059	test/sstable_compaction_test: s/old_sstables.front()/old_sstable/ for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-16 16:21:40 +08:00
Botond Dénes	323e34e1ed	Update tools/java submodule * tools/java 97c49094...b776096d (2): > build: take care of old libthrift [PART 2/2] > build: take care of old libthrift [PART 1/2]	2023-11-16 10:14:38 +02:00
Kefu Chai	12f4f9f481	build: cmake: link against cryptopp::cryptopp instead of linking against cryptopp, we should link against crytopp::crytopp. the latter is the target exposed by Findcryptopp.cmake, while the former is but a library name which is not even exposed by any find_package() call. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16060	2023-11-15 17:14:04 +02:00
Anna Stuchlik	e8129d9a0c	doc: remove DateTieredCompactionStrategy This commit removes support for DateTieredCompactionStrategy from the documentation. Support for DTCS was removed in 5.4, so this commit must be backported to branch-5.4. Refs https://github.com/scylladb/scylladb/issues/15869#issuecomment-1784181274 The information is already added to the 5.2-to-5.4 upgrade guide: https://github.com/scylladb/scylladb/pull/15988 (backport) Closes scylladb/scylladb#16061	2023-11-15 15:39:57 +02:00
Pavel Emelyanov	f4fd5c7207	s3/client: Tag pieces of jumbo uploader The jumbo sink is there to upload files that can be potentially larger than 50Gb (10000*5Mb). For that the sink uploads a set of so called "pieces" -- files up to 50Gb each -- then uses the copy-upload APi call to squash the pieces together. After copying the piece is removed. In case of a crash while uploading pieces remain in the bucket forever which is not great. This patch tags pieces with 'kind=piece' tag in order to tell pieces from regular objects. This can be used, for example, by setting up the lifecycle tag-based policy and collect dangling pieces eventually. fixes: #13670 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16023	2023-11-15 15:32:30 +02:00
Kefu Chai	6a753f9f06	build: cmake: define SCYLLA_BUILD_MODE=dev for Dev mode it was a typo in `b234c839`. so let's correct it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16063	2023-11-15 13:17:30 +02:00
Kefu Chai	972b852e0a	build: cmake: explain the build dependencies in check-headers developer might notice that when he/she builds 'check-headers', the whole tree is built. so let's explain this behavior. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16062	2023-11-15 13:16:01 +02:00
Botond Dénes	ba17ae2ab6	Merge 'Fix tests in test/cql-pytest/ that fail on Cassandra' from Nadav Har'El As a general rule, tests in test/cql-pytest shouldn't just pass on Scylla - they also should not fail on Cassandra; A test that fails on Cassandra may indicate that the test is wrong, or that Scylla's behavior is wrong and the test just enshrines that wrong behavior. Each time we see a test fail on Cassandra we need to check if this is not the case. We also have special markers scylla_only and cassandra_bug to put on tests that we know _should_ fail on Cassandra because it is missing some Scylla-only feature or there is a bug in Cassandra, respectively. Such tests will be xfailed/skipped when running on Cassandra, and not report failures. Unfortunately, over time more several tests got into our suite in that did not pass on Cassandra. In this series I went over all of them, and fixed each to pass - or be skipped - on Cassandra, in a way that each patch explains. Fixes #16027 Closes scylladb/scylladb#16033 * github.com:scylladb/scylladb: test/cql-pytest: fix test_describe.py to not fail on Cassandra test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra test/cql-pytest: fix test_keyspace.py to not fail on Cassandra test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra test/cql-pytest: fix test_filtering.py to not fail on Cassandra	2023-11-15 09:13:09 +02:00
Nadav Har'El	8964cce04c	test/cql-pytest: fix test_describe.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). Some of the tests checked on Cassandra things that don't exist there (namely local secondary indexes) and could skip that part. Other tests need to be skipped completely ("scylla_only") because they rely on a Scylla-only feature. We have a bit too many of those in this file, but I don't want to fix this now. Yet another test found a real bug in Cassandra 4.1.1 (CASSANDRA-17918) but passes in Cassandra 4.1.2 and up, so there's nothing to fix except a comment about the situation. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:40:30 +02:00
Nadav Har'El	6802dca6b5	test/cql-pytest: fix select_single_column_relation_test.py to not fail on Cassandra In commit `52bbc1065c`, we started to allow "IN NULL" - it started to match nothing instead of being an error as it is in Cassandra. The commit incorrectly "fixed" the existing translated Cassandra unit test to match the new behavior - but after this "fix" the test started to fail on Cassandra. The appropriate fix is just to comment out this part of the test and not do it. It's a small point where we deliberately decided to deviate from Cassandra's behavior, so the test it had for this behavior is irrelevant. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	d8997d49e7	test/cql-pytest: fix compact_storage_test.py to not fail on Cassandra Some error-message checks in this test file (which was translated in the past from Cassandra) try operations which actually has two errors, and expected to see one error message - but recent Cassandra prints the other one. This caused several tests to fail when running on Cassandra 4.1. Both messages are fine, so let's accept both. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	a7f5eb3621	test/cql-pytest: fix test_secondary_index.py to not fail on Cassandra Fixed two tests thich failed when running on Cassandra: One test waited for a secondary index to appear, but in Cassandra, the index can be broken (cause a read failure) for a short while and we need to wait through this failure as well and not fail the entire test. Another test was for local secondary index, which is a Scylla-only feature, but we forgot the "scylla_only" tag. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	92f591dc38	test/cql-pytest: fix test_materialized_view.py to not fail on Cassandra The test function test_mv_synchronous_updates checks the synchronous_updates feature, which is a ScyllaDB extension and doesn't exist in Cassandra. So it should be marked with "scylla_only" so that it doesn't fail when running the tests on Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	301189ee28	test/cql-pytest: fix test_keyspace.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). When testing some invalid cases of ALTER TABLE, the test required that you cannot choose SimpleStrategy without specifying a replication_factor. As explained in Refs #16028, this isn't true in Cassandra 4.1 and up - it now has a default value for replication_factor and it's no longer required. So in this patch we split that part of the test to a separate test function and mark it scylla_only. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	2b67cd3921	test/cql-pytest: test_guardrail_replication_strategy.py is Scylla-only The tests in test/cql-pytest/test_guardrail_replication_strategy.py are for a Scylla-only feature that doesn't exist in Cassandra, so obviously they all fail on Cassandra. Let's mark them all as scylla_only. We use an autouse fixture to automatically mark all tests in this file as scylla-only, instead of marking each one separately. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	c4d3e08987	test/cql-pytest: partial fix for test_compaction_strategy_validation.py on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). This patch is only a partial fix - it fixes trivial differences in error messages, but some potentially-real differences remain so three of the tests still fail: 1. Trying to set tombstone_threshold to 5.5 is an error in ScyllaDB ("must be between 0.0 and 1.0") but allowed in Cassandra. 2. Trying to set bucket_low to 0.0 is an error in ScyllaDB, giving the wrong-looking error message "must be between 0.0 and 1.0" (so 0.0 should have been fine?!) but allowed in Cassandra. 3. Trying to set timestamp_resolution to SECONDS is an error in ScyllaDB ("invalid timestamp resolution SECONDS") but allowed in Cassandra. I don't think anybody wants to actually use "SECONDS", but it seems legal in Cassandra, so do we need to support it? The patch also simplifies the test to use cql-pytest's util.py, instead of cassandra_tests/porting.py. The latter was meant to make porting existing Cassandra tests easier - not for writing new ones - and made using a regular expression for testing error messages harder so I switched to using pytest.raises() whose "match=" accepts a regular expression. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	8e51ebd8a0	test/cql-pytest: fix test_filtering.py to not fail on Cassandra Yet another test file in cql-pytest which failed when run on Cassandra (via test/cql-pytest/run-cassandra). It turns out that when the token() function is used with incorrect parameters (it needs to be passed all partition-key columns), the error message is different in ScyllaDB and Cassandra. Both are reasonable error messages, so if we insist on checking the error message - we should allow both. Also the same test called its second partition-key column "ck". This is confusing, because we usually use the name "ck" to refer to a clustering key. So just for clarity, we change this name to "pk2". This is not a functional change in the test. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-14 21:27:12 +02:00
Nadav Har'El	64d1d5cf62	Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows. Turns out we had two problems in this area that leads to suboptimal bloom filters. 1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed. 2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count. For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong. Fixes https://github.com/scylladb/scylladb/issues/15704. Closes scylladb/scylladb#15938 * github.com:scylladb/scylladb: streaming: Improve partition estimation with TWCS streaming: Don't adjust partition estimate if segregation is postponed	2023-11-14 20:41:36 +02:00
Kefu Chai	d49ea833fd	scylla-sstable: reject duplicate sstable names before this change, `load_sstables()` fills the output sstables vector by indexing it with the sstable's path. but if there are duplicated items in the given sstable_names, the returned vector would have uninitialized shared_sstable instance(s) in it. if we feed such a sstables to the operation funcs, they would segfault when derferencing the empty lw_shared_ptr. in this change, we error out if duplicated sstable names are specified in the command line. an alternative is to tolerate this usage by initializing the sstables vector with a back_inserter, as we always return a dictionary with the sstable's name as the key, but it might be desirable from user's perspective to preserve the order, like OrderedDict in Python. so let's preserve the ordering of the sstables in the command line. this should address the problem of the segfault if we pass duplicated sstable paths to this tool. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16048	2023-11-14 19:37:14 +02:00
Botond Dénes	11cafd2fc8	Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16050 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-14 14:55:17 +02:00
Kefu Chai	2bae14f743	dist: let scylla-server.service Wants var-lib-systemd-coredump without adding `WantedBy=scylla-server.service` in var-lib-systemd-coredump, if we starts `scylla-server.service`, it does not necessarily starts `var-lib-systemd-coredump` even if the latter is installed. with `WantedBy=scylla-server.service` in var-lib-systemd-coredump, if we starts `scylla-server.service`, var-lib-systemd-coredump will be started also. and `Before=scylla-server.service` ensures that, before `scylla-server.service` is started, var-lib-systemd-coredump is already ready. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15984	2023-11-14 14:54:39 +02:00
Michał Jadwiszczak	0083ddd7a0	generic_server: use mutable reference in `for_each_gently` Make `generic_server::gentle_iterator` a mutable iterator to allow `for_each_gently` to make changes to the connections. Fixes: #16035 Closes scylladb/scylladb#16036	2023-11-14 14:25:22 +02:00
Pavel Emelyanov	a87b5cfbec	test/object_store: Generalize test table creation All two and the upcoming third test cases in the test create the very same ks.cf pair with the very same sequence of steps. Generalize them. For the basic test case also tune up the way "expected" rows are calculated -- now they are SELECT-ed right after insertion and the size is checked to be non zero. Not _exactly_ the same check, but it's good enough for basic testing purposes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15986	2023-11-14 13:55:02 +02:00
Takuya ASADA	338a9492c9	scylla_post_install.sh: detect RHEL correctly $ID_LIKE = "rhel" works only on RHEL compatible OSes, not for RHEL itself. To detect RHEL correctly, we also need to check $ID = "rhel". Fixes #16040 Closes scylladb/scylladb#16041	2023-11-14 13:53:35 +02:00
Kefu Chai	5a6c5320de	test/sstable_compaction_test: use BOOST_REQUIRE_EQUAL when appropriate Boost.Test prints the LHS and RHS when the predicate statement passed to BOOST_REQUIRE_EQUAL() macro evaluates to false. so the error message printed by Boost would be more developer friendly when the test fails. in this test, we replace some BOOST_REQUIRE() with BOOST_REQUIRE_EQUAL() when appropriate. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16047	2023-11-14 13:51:47 +02:00
Botond Dénes	f63645ceab	Merge 'test/cql-pytest: fix test_permissions.py to not fail on Cassandra' from Nadav Har'El This short series fixes test/cql-pytest/test_permissions.py to stop failing on Cassandra. The second patch fixes these failures (and explains why). The first patch is a new test for UDFs, which helped me prove that one of the test_permissions.py failures in Cassandra is a Cassandra bug - some esoteric error path that prints the right message when no permissions are involved, becomes wrong when permissions are added. Fixes #15969 Closes scylladb/scylladb#15979 * github.com:scylladb/scylladb: test/cql-pytest: fix test_permissions.py to not fail on Cassandra test/cql-pytest: add test for DROP FUNCTION	2023-11-14 13:50:51 +02:00
Gleb Natapov	f04e890690	storage_service: topology coordinator: do fencing even if draining failed Token metadata barrier consists for two steps. First old request are drained and then requests that are not drained are fenced. But currently if draining fails then fencing is note done. This is fine if the barrier's failure handled by retrying, but we when to start handling errors differently. In fact during topology operation rollback we already do not retry failed barrier. The patch fixes the metadata barrier to do fencing even if draining failed.	2023-11-14 13:06:41 +02:00
Aleksandra Martyniuk	6af581301b	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-14 10:36:38 +01:00
Botond Dénes	a66ec1d3c1	Merge 'Drop compaction_manager_test' from Pavel Emelyanov This is continuation of `a34c8dc4` (Drop compaction_manager_for_testing). There's one more wrapper over compaction_manager to access its private fields. All such access was recently moved to sstables::test_env's compaction manager, now it's time to drop the remaining legacy wrapper class. Closes scylladb/scylladb#16017 * github.com:scylladb/scylladb: test/utils: Drop compaction_manager_test test/utils: Get compaction manager from test_env test/sstables: Introduce test_env_compaction_manager::perform_compaction() test/env: Add sstables::test_env& to compaction_manager_test::run() test/utils: Add sstables::test_env& to compact_sstables() test/utils: Simplify and unify compaction_manager_test::run() test/utils: Squash two compact_sstables() helpers test/compaction: Use shorter compact_sstables() helper test/utils: Keep test task compaction gate on task itself test/utils: Move compaction_manager_test::propagate_replacement()	2023-11-14 11:25:17 +02:00
Kamil Braun	9212bdc6b1	migration_manager: more verbose logging for schema versions We're observing nodes getting stuck during bootstrap inside `storage_service::wait_for_ring_to_settle()`, which periodically checks `migration_manager::have_schema_agreement()` until it becomes `true`: scylladb/scylladb#15393. There is no obvious reason why that happens -- according to the nodes' logs, their latest in-memory schema version is the same. So either the gossiped schema version is for some reason different (perhaps there is a race in publishing `application_state::SCHEMA`) or missing entirely. Alternatively, `wait_for_ring_to_settle` is leaving the `have_schema_agreement` loop and getting stuck in `update_topology_change_info` trying to acquire a lock. Modify logging inside `have_schema_agreement` so details about missing schema or version mismatch are logged on INFO level, and an INFO level message is printed before we return `true`. To prevent logs from getting spammed, rate-limit the periodic messages to once every 5 seconds. This will still show the reason in our tests which allow the node to hang for many minutes before timing out. Also these schema agreement checks are done on relatively rare occasions such as bootstrap, so the additional logs should not be harmful. Furthermore, when publishing schema version to gossip, log it on INFO level. This is happening at most once per schema change so it's a rare message. If there's a race in publishing schema versions, this should allow us to observe it. Ref: scylladb/scylladb#15393 Closes scylladb/scylladb#16021	2023-11-14 11:24:47 +02:00
Alexey Novikov	bd73536b33	When add duration field to UDT check whether this UDT is used in some clustering key Having values of the duration type is not allowed for clustering columns, because duration can't be ordered. This is correctly validated when creating a table but do not validated when we alter the type. Fixes #12913 Closes scylladb/scylladb#16022	2023-11-14 11:23:05 +02:00
Botond Dénes	4968f50ff7	Merge 'auth: fix error message when consistency level is not met' from Paweł Zakrzewski Propagate `exceptions::unavailable_exception` error message to the client such as cqlsh. Fixes #2339 Closes scylladb/scylladb#15922 * github.com:scylladb/scylladb: test: add the auth_cluster test suite auth: fix error message when consistency level is not met	2023-11-14 11:22:38 +02:00
Kefu Chai	4f361b73c4	build: cmake: consolidate the setting of cxx_flags before this change, we define the CMAKE_CXX_FLAGS_${CONFIG} directly. and some of the configurations are supposed to generate debugging info with "-g -gz" options, but they failed to include these options in the cxx flags. in this change: * a macro named `update_cxx_flags` is introduced to set this option. * this macro also sets -O option instead of using function, this facility is implemented as a macro so that we can update the CMAKE_CXX_FLAGS_${CONFIG} without setting this variable with awkward syntax like set ```cmake set(${flags} "${${flags}}" PARENT_SCOPE) ``` this mirrors the behavior in configure.py in sense that the latter sets the option on a per-mode basis, and interprets the option to compiling option. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16043	2023-11-14 11:21:52 +02:00
Kefu Chai	a846291ce8	build: cmake: define SCYLLA_BUILD_MODE for Release build this macro definition was dropped in `2b961d8e3f` by accident. in this change, let's bring it back. this macro is always necessary, as it is checked in scylla source. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16044	2023-11-14 11:21:33 +02:00
Tomasz Grabiec	dc6a0b2c35	gossiper: Elevate logging level for node restart events They cause connection drops, which is a significant disruptive event. We should log it so that we can know that this is the cause of the problems it may cause, like requests timing out. Connection drop will cause coordinator-side requests to time out in the absence of speculation. Refs #14746 Closes scylladb/scylladb#16018	2023-11-14 11:21:13 +02:00
Kefu Chai	58f3ced4d6	scylla-gdb: raise if no tasks are found the "task" fixture is supposed to return a task for test, if it fails to do so, it would be an issue not directly related to the test. so let's fail it early. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16042	2023-11-14 11:12:43 +02:00
Botond Dénes	22381441b0	migration_manager: also reload schema on enabling digest_insensitive_to_expiry Currently, when said feature is enabled, we recalcuate the schema digest. But this feature also influences how table versions are calculated, so it has to trigger a recalculation of all table versions, so that we can guarantee correct versions. Before, this used to happen by happy accident. Another feature -- table_digest_insensitive_to_expiry -- used to take care of this, by triggering a table version recalulation. However this feature only takes effect if digest_insensitive_to_expiry is also enabled. This used to be the case incidently, by the time the reload triggered by table_digest_insensitive_to_expiry ran, digest_insensitive_to_expiry was already enabled. But this was not guaranteed whatsoever and as we've recently seen, any change to the feature list, which changes the order in which features are enabled, can cause this intricate balance to break. This patch makes digest_insensitive_to_expiry also kick off a schema reload, to eliminate our dependence on (unguaranteed) feature order, and to guarantee that table schemas have a correct version after all features are enabled. In fact, all schema feature notification handlers now kick off a full schema reload, to ensure bugs like this don't creep in, in the future. Fixes: #16004 Closes scylladb/scylladb#16013	2023-11-13 23:32:20 +02:00
Aleksandra Martyniuk	a63a6dcd93	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-13 16:06:36 +01:00
Aleksandra Martyniuk	2a9ee59cc4	tasks: fail if a task was aborted run() method of task_manager::task::impl does not have to throw when a task is aborted with task manager api. Thus, a user will see that the task finished successfully which makes it inconsistent. Finish a task with a failure if it was aborted with task manager api.	2023-11-13 16:06:20 +01:00
Aleksandra Martyniuk	599d6ebd52	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-11-13 15:46:58 +01:00
Kamil Braun	d24b305712	Merge 'raft topology: join: do not time out waiting for the node to be joined' from Patryk Jędrzejczak When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. Additionally, after removing the timeout, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails. Fixes #15865 Closes scylladb/scylladb#15944 * github.com:scylladb/scylladb: raft topology: fix indentation raft topology: join: try sending the response only once raft topology: join: do not time out waiting for the node to be joined group 0: group0_handshaker: add the abort_source parameter to post_server_start	2023-11-13 15:02:27 +01:00
Paweł Zakrzewski	a0dcc154c1	test: add the auth_cluster test suite This commit adds the auth_cluster test suite to test a custom scenario involving password authentication: - create a cluster of 2 nodes with password authentication - down one node - the other node should refuse login stating that it couldn't reach QUORUM References ScyllaDB OSS #2339	2023-11-13 14:04:28 +01:00
Paweł Zakrzewski	400aa2e932	auth: fix error message when consistency level is not met Propagate `exceptions::unavailable_exception` error message to the client such as cqlsh. Fixes #2339	2023-11-13 14:04:23 +01:00
Takuya ASADA	85339d1820	scylla_setup: add warning for CentOS7 default kernel Since CentOS7 default kernel is too old, has performance issues and also has some bugs, we have been recommended to use kernel-ml kernel. Let's check kernel version in scylla_setup and print warning if the kernel is CentOS7 default one. related #7365 Closes scylladb/scylladb#15705	2023-11-13 13:47:06 +02:00
Botond Dénes	2b11a02b67	Merge 'Improvements to gossiper shadow round' from Kamil Braun Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions. Fix the calculation of `nodes_down` which could count a single node multiple times. Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode). Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node). More details in commit messages. Ref: https://github.com/scylladb/scylladb/issues/15675 Closes scylladb/scylladb#15941 * github.com:scylladb/scylladb: gossiper: do_shadow_round: increment `nodes_down` in case of timeout gossiper: do_shadow_round: fix `nodes_down` calculation storage_service: make shadow round mandatory during bootstrap/replace gossiper: do_shadow_round: remove default value for nodes param gossiper: do_shadow_round: remove `fall_back_to_syn_msg`	2023-11-13 13:37:13 +02:00
Botond Dénes	dfd7981fa7	api/storage_service: start/stop native transport in the statement sg Currently, it is started/stopped in the streaming/maintenance sg, which is what the API itself runs in. Starting the native transport in the streaming sg, will lead to severely degraded performance, as the streaming sg has significantly less CPU/disk shares and reader concurrency semaphore resources. Furthermore, it will lead to multi-paged reads possibly switching between scheduling groups mid-way, triggering an internal error. To fix, use `with_scheduling_group()` for both starting and stopping native transport. Technically, it is only strictly necessary for starting, but I added it for stop as well for consistency. Also apply the same treatment to RPC (Thrift). Although no one uses it, best to fix it, just to be on the safe side. I think we need a more systematic approach for solving this once and for all, like passing the scheduling group to the protocol server and have it switch to it internally. This allows the server to always run on the correct scheduling group, not depending on the caller to remember using it. However, I think this is best done in a follow-up, to keep this critical patch small and easily backportable. Fixes: #15485 Closes scylladb/scylladb#16019	2023-11-13 14:08:01 +03:00
Anna Stuchlik	8a4a8f077a	doc: document full support for RBNO This commit updates the Repair-Based Node Operations page. In particular: - Information about RBNO enabled for all node operations is added (before 5.4, RBNO was enabled for the replace operation, while it was experimental for others). - The content is rewritten to remove redundant information about previous versions. The improvement is part of the 5.4 release. This commit must be backported to branch-5.4 Closes scylladb/scylladb#16015	2023-11-13 13:06:15 +02:00
Pavel Emelyanov	492b842929	messaging_service: Define metrics domain for client connections Recent seastar update included RPC metrics (scylladb/seastar#1753). The reported metrics groups together sockets based on their "metrics_domain" configuration option. This patch makes use of this domain to make scylla metrics sane. The domain as this patch defines it includes two strings: First, the datacenter the server lives in. This is because grouping metrics for connections to different datacenters makes little sense for several reasons. For example -- packet delays _will_ differ for local-DC vs cross-DC traffic and mixing those latencies together is pointless. Another example -- the amount of traffic may also differ for local- vs cross-DC connections e.g. because of different usage of enryption and/or compression. Second, each verb-idx gets its own domain. That's to be able to analyze e.g. query-related traffic from gossiper one. For that the existing isolation cookie is taken as is. Note, that the metrics is _not_ per-server node. So e.g. two gossiper connections to two different nodes (in one DC) will belong to the same domain and thus their stats will be summed when reported. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15785	2023-11-13 11:13:20 +01:00
Pavel Emelyanov	f4696f21a8	test/utils: Drop compaction_manager_test This class only provides a .run() method which allocates a task and calls sstables::test_env::perform_compaction(). This can be done in a helper method, no need for the whole class for it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	b68f9c32bb	test/utils: Get compaction manager from test_env This is just to reduce churn in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	9fd270566a	test/sstables: Introduce test_env_compaction_manager::perform_compaction() Take it from compaction_manager_test::run() which is simplified overwite of the compaction_manager::perform_compaction(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	0160265c7d	test/env: Add sstables::test_env& to compaction_manager_test::run() Continuation of the previous patch that will also be used further. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	393c066f3e	test/utils: Add sstables::test_env& to compact_sstables() Will be used in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	ca18db4a71	test/utils: Simplify and unify compaction_manager_test::run() The method is the simplified rewrite of the compaction_manager's perform_compaction() one, but it makes task registration and unregistration to hard way. Keep it shorter and simpler resembling the compaction_manager's prototype. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	9a9e1fdd7d	test/utils: Squash two compact_sstables() helpers Now the one sitting in utils is only called from its peer in compaction test. Things get simpler if they get merged. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	69657a2a97	test/compaction: Use shorter compact_sstables() helper There are several of them spread between the test and utils. One of the test cases can use its local shorter overload for brevity. Also this makes one of the next patches shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	59943267c2	test/utils: Keep test task compaction gate on task itself They both have the same scope, but keeping it on the task frees the caller from the need to mess with its private fields. For now it's not a problem, but it will be critical in one of the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Pavel Emelyanov	aec3fc493a	test/utils: Move compaction_manager_test::propagate_replacement() The purpose of this method is to turn public the private compaction_manager method of the same name. The caller of this method is having sstable_test_env at hand with its test_env_compaction_manager, so the de-private-isation call can be moved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-13 11:44:51 +03:00
Kefu Chai	efd65aebb2	build: cmake: add check-header target to have feature parity with `configure.py`. we won't need this once we migrate to C++20 modules. but before that day comes, we need to stick with C++ headers. we generate a rule for each .hh files to create a corresponding .cc and then compile it, in order to verify the self-containness of that header. so the number of rule is quite large, to avoid the unnecessary overhead. the check-header target is enabled only if `Scylla_CHECK_HEADERS` option is enabled. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15913	2023-11-13 10:27:06 +02:00
Avi Kivity	7b08886e8d	Update tools/java submodule (dependencies update) * tools/java 86a200e324...97c490947c (1): > Merge 'build: update several dependencies' from Piotr Grabowski Ref https://github.com/scylladb/scylla-tools-java/issues/348 Ref https://github.com/scylladb/scylla-tools-java/issues/349 Ref https://github.com/scylladb/scylla-tools-java/issues/350	2023-11-12 18:17:04 +02:00
Nadav Har'El	7f34006ce2	test/cql-pytest: fix test_permissions.py to not fail on Cassandra We shouldn't have cql-pytest tests that report failure when run on Cassandra (with test/cql-pytest/run-cassandra): A test that passes on Scylla but fails on Cassandra indicates a difference between Scylla's behavior and Cassandra's, and this difference should always be investigated: 1. It can be a Scylla bug, which of should be fixed immediately or reported as a bug and the test changed to fail on Scylla ("xfail"). 2. It can be a minor difference in Scylla's and Cassandra's behavior where both can be accepted. In this case the test should me modified to accept both behaviors, and a comment added to explain why we decided to do that. 3. It can be a Cassandra bug which causes a correct test to fail. This case should not be taken lightly, and a serious effort is needed to be convinced that this is really a Cassandra bug and not our misunderstanding of what Cassandra does. In this case the test should be marked "cassandra_bug" and a detailed comment should explain why. 4. Or it can be an outright bug in the test that caused it to fail on Cassandra. This test had most of these cases :-) There was a test bug in one place (in a Cassandra-specific Java UDF), a minor and (aruably) acceptable difference between the error codes returned by Scylla and Cassandra in one case, and two minor Cassandra bugs (in the error path). All of these are fixed here, and after this patch test/cql-pytest/run-cassandra no longer fails on this file. Fixes #15969 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-12 17:14:09 +02:00
Nadav Har'El	0ecf84e83e	test/cql-pytest: add test for DROP FUNCTION We already have in test/cql-pytest various tests for UDF in the bigger context of UDA (test_uda.py), WASM (test_wasm.py) and permissions, but somehow we never had a file for simple tests only for UDF, so we add one here, test/cql-pytest/test_udf.py We add a test for checking something which was already assumed in test_permissions.py - that it is possible to create two different UDFs with the same name and different parameters, and then you must specify the parameters when you want to DROP one of them. The test confirms that ScyllaDB's and Cassandra's behavior is identical in this, as hoped. To allow the test to run on both ScyllaDB and Cassandra, it needs to support both Lua (for ScyllaDB) or Java (for Cassandra), and we introduce a fixture to make it easier to support both. This fixture can later be used in more tests added to this file. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-11-12 17:14:08 +02:00
Tomasz Grabiec	457d170078	Merge 'Multishard mutation query test fix misses expectations' from Botond Dénes There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many shards we have without readers on them. Fixes: https://github.com/scylladb/scylladb/issues/14087 Closes scylladb/scylladb#15806 * github.com:scylladb/scylladb: test/boost/multishard_mutation_query_test: fix querier cache misses expectations test/lib/test_utils: add require_* variants for all comparators	2023-11-12 13:15:29 +01:00
Benny Halevy	68a7bbe582	compaction_manager: perform_cleanup: ignore condition_variable_timed_out The polling loop was intended to ignore `condition_variable_timed_out` and check for progress using a longer `max_idle_duration` timeout in the loop. Fixes #15669 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15671	2023-11-12 13:53:51 +02:00
Patryk Jędrzejczak	2d7bfeb3fa	raft topology: fix indentation Broken in the previous commit.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	e94c7cff28	raft topology: join: try sending the response only once When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. In the previous commit, we have made the operator responsible for shutting down the joining node if the topology coordinator fails to deliver a response by removing the timeout. In this commit, we adjust the topology coordinator. We make it try sending the response (both acceptance and rejection) only once since we do not care if it fails anymore. We only need to ensure that the joining node is moved to the left state if sending fails.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	4ffa692cb3	raft topology: join: do not time out waiting for the node to be joined When a node tries to join the cluster, it asks the topology coordinator to add them and then waits for the response. The response is not guaranteed to come back. If the topology coordinator cannot contact the joining node, it moves the node to the left state and moves on. Currently, to handle the case when the response does not come back, the joining node gives up waiting for it after 3 minutes. However, it might take more time for the topology coordinator to start handling the request to join, as it might be working on other tasks like adding other nodes, performing tablet migrations, etc. In general, any timeout duration would be unreliable. Therefore, we get rid of the timeout. From now on, the operator will be responsible for shutting down the node if the topology coordinator fails to deliver the rejection. This change additionally fixes the TODO in raft_group0::join_group0.	2023-11-10 12:36:37 +01:00
Patryk Jędrzejczak	5f36e1d7f2	group 0: group0_handshaker: add the abort_source parameter to post_server_start Used in the following commit to enable the clean shutdown of a node that does not receive the join rejection from the topology coordinator.	2023-11-10 12:35:38 +01:00
Anna Stuchlik	8d618bbfc6	doc: update cqlsh compatibility with Python This commit updates the cqlsh compatibility with Python to Python 3. In addition it: - Replaces "Cassandra" with "ScyllaDB" in the description of cqlsh. The previous description was outdated, as we no longer can talk about using cqlsh released with Cassandra. - Replaces occurrences of "Scylla" with "ScyllaDB". - Adds additional locations of cqlsh (Docker Hub and PyPI), as well as the link to the scylla-cqlsh repository. Closes scylladb/scylladb#16016	2023-11-10 09:19:41 +02:00
Avi Kivity	d8bf8f0f43	Merge 'Do not create directories in datadir for S3-backed sstables' from Pavel Emelyanov After `146e49d0dd` (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3) fixes: #13020 Closes scylladb/scylladb#16007 * github.com:scylladb/scylladb: test/object_store: Check that keyspace directory doesn't appear sstables/storage: Do storage init/destroy based on storage options replica/{ks\|cf}: Move storage init/destroy to sstables manager database: Add get_sstables_manager(bool_class is_system) method	2023-11-09 20:35:13 +02:00
Kamil Braun	3bcee6a981	Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani" This reverts commit `7c7baf71d5`. If `stop_gracefully` times out during test teardown phase, it crashes the test framework reporting multiple errors, for example: ``` 12:35:52 /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited 12:35:52 self.exit_artifacts = {} 12:35:52 RuntimeWarning: Enable tracemalloc to get the object allocation traceback 12:35:52 Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:52 Traceback (most recent call last): 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for 12:35:52 return fut.result() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait 12:35:52 return await self._transport._wait() 12:35:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait 12:35:52 return await waiter 12:35:52 ^^^^^^^^^^^^ 12:35:52 asyncio.exceptions.CancelledError 12:35:52 12:35:52 The above exception was the direct cause of the following exception: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully 12:35:52 await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS) 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for 12:35:52 raise exceptions.TimeoutError() from exc 12:35:52 TimeoutError 12:35:52 12:35:52 During handling of the above exception, another exception occurred: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789 12:35:52 code = await main() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main 12:35:52 await run_all_tests(signaled, options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests 12:35:52 await reap(done, pending, signaled) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap 12:35:52 result = coro.result() 12:35:52 ^^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run 12:35:52 await test.run(options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run 12:35:52 async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager: 12:35:52 File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__ 12:35:52 await anext(self.gen) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager 12:35:52 await manager.stop() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop 12:35:52 await self.clusters.put(self.cluster, is_dirty=True) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put 12:35:52 await self.destroy(obj) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster 12:35:52 await cluster.stop_gracefully() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully 12:35:52 await asyncio.gather(*(server.stop_gracefully() for server in self.running.values())) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully 12:35:52 raise RuntimeError( 12:35:52 RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited ```	2023-11-09 12:30:35 +01:00
Gleb Natapov	2dd8152c8b	storage_service: topology coordinator: log rollback event before changing node's state The test for the rollback relies on the log to be there after operation fails, but if node's state is changed before the log the operation may fail before the log is printed. Fixes scylladb/scylladb#15980 Message-ID: <ZUuwoq65SJcS+yTH@scylladb.com>	2023-11-09 12:11:58 +01:00
Botond Dénes	d8b6771eb8	Merge 'doc: add CQL Reference for Materialized Views and remove irrelevant version information' from Anna Stuchlik This PR is a follow-up to https://github.com/scylladb/scylladb/pull/15742#issuecomment-1766888218. It adds CQL Reference for Materialized Views to the Materialized Views page. In addition, it removes the irrelevant information about when the feature was added and replaces "Scylla" with "ScyllaDB". (nobackport) Closes scylladb/scylladb#15855 * github.com:scylladb/scylladb: doc: remove versions from Materialized Views doc: add CQL Reference for Materialized Views	2023-11-09 10:43:11 +01:00
Botond Dénes	1cccc86813	Revert "Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk" This reverts commit `2860d43309`, reversing changes made to `a3621dbd3e`. Reverting because rest_api.test_compaction_task started failing after this was merged. Fixes: #16005	2023-11-09 10:43:11 +01:00
Eliran Sinvani	c5956957f3	use_statement: Covert an exception to a future exception The use statement execution code can throw if the keyspace is doesn't exist, this can be a problem for code that will use execute in a fiber since the exception will break the fiber even if `then_wrapped` is used. Fixes #14449 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes scylladb/scylladb#14394	2023-11-09 10:43:11 +01:00
Pavel Emelyanov	7e1017c7d8	test/object_store: Check that keyspace directory doesn't appear When creating a S3-backed keyspace its storage dir shouldn't be made. Also it shouldn't be "resurrected" by boot-time loader of existing keyspaces. For extra confidence check that the system keyspace's directory does exists where the test expects keyspaces' directories to appear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	f6eae191ff	sstables/storage: Do storage init/destroy based on storage options It's only local storage type that needs directores touch/remove, S3 storage initialization is for now a no-op, maybe some day soon it will appear. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	11b704e8b8	replica/{ks\|cf}: Move storage init/destroy to sstables manager It's the manager that knows about storages and it should init/destroy it. Also the "upload" and "staging" paths are about to be hidden in sstables/ code, this code move also facilitates that. The indentation in storage.cc is deliberately broken to make next patch look nicer (spoiler: it won't have to shift those lines right). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Pavel Emelyanov	68cf26587c	database: Add get_sstables_manager(bool_class is_system) method There's one place that does this selection, soon there will appear another, so it's worth having a convenience helper getter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-08 20:23:16 +03:00
Michał Chojnowski	206e313c60	mutation_query_test: test that range tombstones are sent in reverse queries Reproducer for #10598.	2023-11-08 14:54:48 +01:00
Michał Chojnowski	002357e238	mutation_query: properly send range tombstones in reverse queries reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, are not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down, will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired.	2023-11-08 14:54:48 +01:00
Nadav Har'El	6453f41ca9	Merge 'schema: add whitespaces to values of table options' from Michał Jadwiszczak Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`). This improves readability and matches client-side describe format. Fixes: #14895 Closes scylladb/scylladb#15900 * github.com:scylladb/scylladb: cql-pytest:test_describe: add test for whitespaces in json objects schema: add whitespace to description of table options	2023-11-08 15:26:49 +02:00
Anna Stuchlik	ca0f5f39b5	doc: fix info about in 5.4 upgrade guide This commit fixes the information about Raft-based consistent cluster management in the 5.2-to-5.4 upgrade guide. This a follow-up to https://github.com/scylladb/scylladb/pull/15880 and must be backported to branch-5.4. In addition, it adds information about removing DateTieredCompactionStrategy to the 5.2-to-5.4 upgrade guide, including the guideline to migrate to TimeWindowCompactionStrategy. Closes scylladb/scylladb#15988	2023-11-08 13:21:53 +01:00
Kamil Braun	3036a80334	docs: mention Raft getting enabled when upgrading to 5.4 Fixes: scylladb/scylladb#15952 Closes scylladb/scylladb#16000	2023-11-08 14:18:29 +02:00
Raphael S. Carvalho	b551f4abd2	streaming: Improve partition estimation with TWCS When off-strategy is disabled, data segregation is not postponed, meaning that getting partition estimate right is important to decrease filter's false positives. With streaming, we don't have min and max timestamps at destination, well, we could have extended the RPC verb to send them, but turns out we can deduce easily the amount of windows using default TTL. Given partitioner random nature, it's not absurd to assume that a given range being streamed may overlap with all windows, meaning that each range will yield one sstable for each window when segregating incoming data. Today, we assume the worst of 100 windows (which is the max amount of sstables the input data can be segregated into) due to the lack of metadata for estimating the window count. But given that users are recommended to target a max of ~20 windows, it means partition estimate is being downsized 5x more than needed. Let's improve it by using default TTL when estimating window count, so even on absence of timestamp metadata, the partition estimation won't be way off. Fixes #15704. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-08 12:10:03 +02:00
Kamil Braun	f094e23d84	system_keyspace: use system memory for `system.raft` table `system.raft` was using the "user memory pool", i.e. the `dirty_memory_manager` for this table was set to `database::_dirty_memory_manager` (instead of `database::_system_dirty_memory_manager`). This meant that if a write workload caused memory pressure on the user memory pool, internal `system.raft` writes would have to wait for memtables of user tables to get flushed before the write would proceed. This was observed in SCT longevity tests which ran a heavy workload on the cluster and concurrently, schema changes (which underneath use the `system.raft` table). Raft would often get stuck waiting many seconds for user memtables to get flushed. More details in issue #15622. Experiments showed that moving Raft to system memory fixed this particular issue, bringing the waits to reasonable levels. Currently `system.raft` stores only one group, group 0, which is internally used for cluster metadata operations (schema and topology changes) -- so it makes sense to keep use system memory. In the future we'd like to have other groups, for strongly consistent tables. These groups should use the user memory pool. It means we won't be able to use `system.raft` for them -- we'll just have to use a separate table. Fixes: scylladb/scylladb#15622 Closes scylladb/scylladb#15972	2023-11-08 11:21:14 +02:00
Nadav Har'El	284534f489	Merge 'Nodetool additional commands 4/N' from Botond Dénes This PR implements the following new nodetool commands: * snapshot * drain * flush * disableautocompaction * enableautocompaction All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15939 * github.com:scylladb/scylladb: test/nodetool: add README.md tools/scylla-nodetool: implement enableautocompaction command tools/scylla-nodetool: implement disableautocompaction command tools/scylla-nodetool: implement the flush command tools/scylla-nodetool: extract keyspace/table parsing tools/scylla-nodetool: implement the drain command tools/scylla-nodetool: implement the snapshot command test/nodetool: add support for matching aproximate query parameters utils/http: make dns_connection_factory::initialize() static	2023-11-08 11:18:35 +02:00
Kefu Chai	cf70970226	build: cmake: use $<CONFIG:cfgs> when appropriate since CMake 3.19, we are able to use $<CONFIG:cfgs> instead of the more cubersume $<IN_LIST:$<CONFIG>,foo;bar> expression for checking if a config is in a list of configurations. and since the minimal required CMake of scylla is 3.27, so let's use $<CONFIG:cfgs> when possible. see also https://cmake.org/cmake/help/git-stage/manual/cmake-generator-expressions.7.html#configuration-expressions Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15989	2023-11-08 08:50:44 +02:00
Nadav Har'El	3729ea8bfd	cql-pytest: translate Cassandra's test for CREATE operations This is a translation of Cassandra's CQL unit test source file validation/operations/CreateTest.java into our cql-pytest framework. The 15 tests did not reproduce any previously-unknown bug, but did provide additional reproducers for several known issues: Refs #6442: Always print all schema parameters (including default values) Refs #8001: Documented unit "µs" not supported for assigning a duration" type. Refs #8892: Add an option for default RF for new keyspaces. Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Unfortunately, I also had to comment out - and not translate - several tests which weren't real "CQL tests" (tests that use only the CQL driver), and instead relied on Cassandra's Java implementation details: 1. Tests for CREATE TRIGGER were commented out because testing them in Cassandra requires adding a Java class for the test. We're also not likely to ever add this feature to Scylla (Refs #2205). 2. Similarly, tests for CEP-11 (Pluggable memtable implementations) used internal Java APIs instead of CQL, and it also unlikely we'll ever implement it in a way compatible with Cassandra because of its Java reliance. 3. One test for data center names used internal Cassandra Java APIs, not CQL to create mock data centers and snitches. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#15791	2023-11-08 08:46:27 +02:00
Botond Dénes	2860d43309	Merge 'compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#15083 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-08 08:45:16 +02:00
Asias He	194507dffa	repair: Convert put_row_diff_with_rpc_stream to use coroutine It will be easier to add more logics in this function.	2023-11-08 13:52:34 +08:00
Nadav Har'El	a3621dbd3e	Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz alternator: add support for ReturnValuesOnConditionCheckFailure feature As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed. Fixes https://github.com/scylladb/scylladb/issues/14481 Closes scylladb/scylladb#15125 * github.com:scylladb/scylladb: alternator: add support for ReturnValuesOnConditionCheckFailure feature alternator: add ability to send additional fields in api_error	2023-11-07 23:19:51 +02:00
Takuya ASADA	a4aeef2eb0	scylla_util.py: run apt-get update before apt-get install if it necessary Unlike yum, "apt-get install" may fails because package cache is outdated. Let's check package cache mtime and run "apt-get update" if it's too old. Fixes #4059 Closes scylladb/scylladb#15960	2023-11-07 20:40:16 +02:00
Wojciech Mitros	ab743271f1	test: increase timeout for lua UDF execution When running on a particularly slow setup, for example on an ARM machine in debug mode, the execution time of even a small Lua UDF that we're using in tests may exceed our default limits. To avoid timeout errors, the limit in tests is now increased to a value that won't be exceeded in any reasonable scenario (for the current set of tested UDFs), while not making the test take an excessive amount of time in case of an error in the UDF execution. Fixes #15977 Closes scylladb/scylladb#15983	2023-11-07 20:28:28 +02:00
Kamil Braun	07e9522d6c	Merge 'raft topology: handle abort exceptions better in fence_previous_coordinator' from Piotr Dulikowski When topology coordinator tries to fence the previous coordinator it performs a group0 operation. The current topology coordinator might be aborted in the meantime, which will result in a `raft::request_aborted` exception being thrown. After the fix to scylladb/scylladb#15728 was merged, the exception is caught, but then `sleep_abortable` is called which immediately throws `abort_requested_exception` as it uses the same abort source as the group0 operation. The `fence_previous_coordinator` function which does all those things is not supposed to throw exceptions, if it does - it causes `raft_state_monitor_fiber` to exit, completely disabling the topology coordinator functionality on that node. Modify the code in the following way: - Catch `abort_requested_exception` thrown from `sleep_abortable` and exit the function if it happens. In addition to the described issue, it will also handle the case when abort is requested while `sleep_abortable` happens, - Catch `raft::request_aborted` thrown from group0 operation, log the exception with lower verbosity and exit the function explicitly. Finally, wrap both `fence_previous_coordinator` and `run` functions in a `try` block with `on_fatal_internal_error` in the catch handler in order to implement the behavior that adding `noexcept` was originally supposed to introduce. Fixes: scylladb/scylladb#15747 Closes scylladb/scylladb#15948 * github.com:scylladb/scylladb: raft topology: catch and abort on exceptions from topology_coordinator::run Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept" raft topology: don't print an error when fencing previous coordinator is aborted raft topology: handle abort exceptions from sleeping in fence_previous_coordinator	2023-11-07 17:17:49 +01:00
Botond Dénes	60ea940f9e	Merge 'docs: render options with role' from Kefu Chai this series tries to 1. render options with role. so the options can be cross referenced and defined. 2. move the formatting out of the content. so the representation can be defined in a more flexible way. Closes scylladb/scylladb#15860 * github.com:scylladb/scylladb: docs: add divider using CSS docs: extract _clean_description as a filter docs: render option with role docs: parse source files right into rst	2023-11-07 17:01:22 +02:00
Botond Dénes	3088453a09	test/nodetool: add README.md	2023-11-07 09:49:56 -05:00
Botond Dénes	7ff7cdc86a	tools/scylla-nodetool: implement enableautocompaction command	2023-11-07 09:49:56 -05:00
Botond Dénes	0e0401a5c5	tools/scylla-nodetool: implement disableautocompaction command	2023-11-07 09:49:56 -05:00
Botond Dénes	f5083f66f5	tools/scylla-nodetool: implement the flush command	2023-11-07 09:49:56 -05:00
Botond Dénes	f082cc8273	tools/scylla-nodetool: extract keyspace/table parsing Having to extract 1 keyspace and N tables from the command-line is proving to be a common pattern among commands. Extract this into a method, so the boiler-plate can be shared. Add a forward-looking overload as well, which will be used in the next patch.	2023-11-07 09:49:56 -05:00
Botond Dénes	ec5b24550a	tools/scylla-nodetool: implement the drain command	2023-11-07 09:49:56 -05:00
Botond Dénes	598dbd100d	tools/scylla-nodetool: implement the snapshot command	2023-11-07 09:49:56 -05:00
Benny Halevy	6a628dd9a6	docs: operating-scylla: nodetool: improve documentation for {en,dis}ableautocompaction Fixes scylladb/scylladb#15554 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15950	2023-11-07 14:05:55 +02:00
Kamil Braun	e64613154f	Merge 'cleanup no longer used gossiper states' from Gleb Remove no longer used gossiper states that are not needed even for compatibility any longer. * 'remove_unused_states' of github.com:scylladb/scylla-dev: gossip: remove unused HIBERNATE gossiper status gossip: remove unused STATUS_MOVING state	2023-11-07 11:48:04 +01:00
Botond Dénes	07c7109eb6	test/nodetool: add support for matching aproximate query parameters Match paramateres within some delta of the expected value. Useful when nodetool generates a timestamp, whose exact value cannot be predicted in an exact manner.	2023-11-07 04:58:41 -05:00
Botond Dénes	b61822900b	utils/http: make dns_connection_factory::initialize() static Said method can out-live the factory instance. This was not a problem because the method takes care to keep all its need from `this` alive, by copying them to the coroutine stack. However, this fact that this method can out-live the instance is not obvious, and an unsuspecting developer (me) added a new member (_logger) which was not kept alive. This can cause a use-after-free in the factory. Fix by making initialize() static, forcing the instance to pass all parameters explicitely and add a comment explaining that this method can out-live the instance.	2023-11-07 04:39:33 -05:00
Pavel Emelyanov	9443253f3d	Merge 'api: failure_detector: invoke on shard 0' from Kamil Braun These APIs may return stale or simply incorrect data on shards other than 0. Newer versions of Scylla are better at maintaining cross-shard consistency, but we need a simple fix that can be easily and without risk be backported to older versions; this is the fix. Add a simple test to check that the `failure_detector/endpoints` API returns nonzero generation. Fixes: scylladb/scylladb#15816 Closes scylladb/scylladb#15970 * github.com:scylladb/scylladb: test: rest_api: test that generation is nonzero in `failure_detector/endpoints` api: failure_detector: fix indentation api: failure_detector: invoke on shard 0	2023-11-07 11:54:27 +03:00
Botond Dénes	76ab66ca1f	Merge 'Support state change for S3-backed sstables' from Pavel Emelyanov The sstable currently can move between normal, staging and quarantine state runtime. For S3-backed sstables the state change means maintaining the state itself in the ownership table and updating it accordingly. There's also the upload facility that's implemented as state change too, but this PR doesn't support this part. fixes: #13017 Closes scylladb/scylladb#15829 * github.com:scylladb/scylladb: test: Make test_sstables_excluding_staging_correctness run over s3 too sstables,s3: Support state change (without generation change) system_keyspace: Add state field to system.sstables sstable_directory: Tune up sstables entries processing comment system_keyspace: Tune up status change trace message sstables: Add state string to state enum class convert	2023-11-07 10:45:41 +02:00
Botond Dénes	74f68a472f	Merge 'doc: add the upgrade guide from 5.2 to 5.4' from Anna Stuchlik This PR adds the 5.2-5.4 upgrade guide. In addition, it removes the redundant upgrade guide from 5.2 to 5.3 (as 5.3 was skipped), as well as some mentions of version 5.3. This PR must be backported to branch-5.4. Closes scylladb/scylladb#15880 * github.com:scylladb/scylladb: doc: add the upgrade guide from 5.2 to 5.4 doc: remove version "5.3" from the docs doc: remove the 5.2-to-5.3 upgrade guide	2023-11-07 10:35:33 +02:00
David Garcia	afaeb30930	docs: add dynamic version on aws images extension Closes scylladb/scylladb#15940	2023-11-07 10:30:23 +02:00
Takuya ASADA	2e7552a0ca	dist/redhat: drop rpm conflict with ABRT, add systemd conflict instead Currently, "yum install scylla" causes conflict when ABRT is installed. To avoid this behavior and keep using systemd-coredump for scylla coredump, let's drop "Conflicts: abrt" from rpm and add "Conflicts=abrt-ccpp.service" to systemd unit. Fixes #892 Closes scylladb/scylladb#15691	2023-11-07 10:30:23 +02:00
Botond Dénes	2f0284f30d	Merge 'build: cmake: configure all available config types' from Kefu Chai in this series, instead of assuming that we always have only one single `CMAKE_BUILD_TYPE`, we configure all available configurations, to be better prepared for the multi-config support. Refs #15241 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15933 * github.com:scylladb/scylladb: build: cmake: set compile options with generator expression build: cmake: configure all available config types build: cmake: set per-mode stack usage threshold build: cmake: drop build_mode build: cmake: check for config type if multi-config is used	2023-11-07 09:45:57 +02:00
Botond Dénes	7679152209	Merge 'Sanitize usage of make_sstable_easy+make_memtable in tests' from Pavel Emelyanov The helper makes sstable, writes mutations into it and loads one. Internally it uses the make_memtable() helper that prepares a memtable out of a vector of mutations. There are many test cases that don't use these facilities generating some code duplication. The make_sstable() wrapper around make_sstable_easy() is removed along the way. Closes scylladb/scylladb#15930 * github.com:scylladb/scylladb: tests: Use make_sstable_easy() where appropriate sstable_conforms_to_mutation_source_test: Open-code the make_sstable() helper sstable_mutation_test: Use make_sstable_easy() instead of make_sstable() tests: Make use of make_memtable() helper tests: Drop as_mutation_source helper test/sstable_utils: Hide assertion-related manipulations into branch	2023-11-07 09:29:30 +02:00
Kefu Chai	882e7eca25	build: cmake: set compile options with generator expression instead of using a single compile option for all modes, use per-mode compile options. this change keeps us away from using `CMAKE_BUILD_TYPE` directly, and prepares us for the multi-config generator support. because we only apply these settings in the configurations where sanitizers are used, there is no need to check if these option can be accepted by the compiler. if this turns out to be a problem, we can always add the check back on a per-mode basis. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:35:20 +08:00
Kefu Chai	61a542ffd0	build: cmake: configure all available config types if `CMAKE_CONFIGURATION_TYPES` is set, it implies that the multi-config generator is used, in this case, we include all available build types instead of only the one specified by `CMAKE_BUILD_TYPE`, which is typically used by non-multi-config generators. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:14:33 +08:00
Kefu Chai	6fcff51cf1	build: cmake: set per-mode stack usage threshold instead of setting a single stack usage threshold, set per-mode stack usage threshold. this prepares for the support of multi-config generator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:13:50 +08:00
Kefu Chai	23bb644314	build: cmake: drop build_mode there is no benefit having this variable. and it introduces another layer of indirection. so drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:10:59 +08:00
Kefu Chai	7369e2e3df	build: cmake: check for config type if multi-config is used we should not set_property() on a non-existant property. if a multi-config generator is used, `CMAKE_BUILD_TYPE` is not added as a cached entry at all. Refs #15241 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-07 10:10:59 +08:00
Paweł Zakrzewski	9e240c2dc8	test/cql-pytest: Verify that GRANT ALTER ALL allows changing the superuser password This is a test for #14277. We do want to match Cassandra's behavior, which means that a user who is granted ALTER ALL is able to change the password of a superuser. Closes scylladb/scylladb#15961	2023-11-06 18:39:53 +01:00
Takuya ASADA	a23278308f	dist: fix local-fs.target dependency systemd man page says: systemd-fstab-generator(3) automatically adds dependencies of type Before= to all mount units that refer to local mount points for this target unit. So "Before=local-fs.taget" is the correct dependency for local mount points, but we currently specify "After=local-fs.target", it should be fixed. Also replaced "WantedBy=multi-user.target" with "WantedBy=local-fs.target", since .mount are not related with multi-user but depends local filesystems. Fixes #8761 Closes scylladb/scylladb#15647	2023-11-06 18:39:53 +01:00
Kefu Chai	d78ccab337	test/s3: add --keep-tmp option to preserve the tmp dir before this change, the tempdir is always nuked no matter if the test succceds. but sometimes, it would be important to check scylla's sstables after the test finishes. so, in this change, an option named `--keep-tmp` is added so we can optionally preserve the temp directory. this option is off by default. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15949	2023-11-06 18:39:53 +01:00
Anna Stuchlik	3756705520	doc: add OS support in version 5.4 This commit adds OS support information in version 5.4 (removing the non-released version 5.3). In particular, it adds support for Oracle Linux and Amazon Linux. Also, it removes support for outdated versions. Closes scylladb/scylladb#15923	2023-11-06 18:39:53 +01:00
Anna Stuchlik	1e0cbfe522	doc: update package installation in version 5.4 This commit updates the package installation instructions in version 5.4. - It updates the variables to include "5.4" as the version name. - It adds the information for the newly supported Rocky/RHEL 9 - a new EPEL download link is required. Closes scylladb/scylladb#15963	2023-11-06 18:39:53 +01:00
Pavel Emelyanov	bcec9c4ffc	Merge 'test/object_store: PEP8 compliant cleanups' from Kefu Chai this series applies fixes to make the test more PEP8 compliant. the goal is to improve the readability and maintainability. Closes scylladb/scylladb#15946 * github.com:scylladb/scylladb: test/object_store: wrap line which is too long test/object_store: use pattern matching to capture variable in loop test/object_store: remove space after and before '{' and '}' test/object_store: add an empty line before nested function definition test/object_store: use two empty lines in-between global functions	2023-11-06 18:39:53 +01:00
Benny Halevy	0064fc55b0	interval: make default ctor and make_open_ended_both_sides constexpr Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15955	2023-11-06 18:39:53 +01:00
Kefu Chai	39340d23e5	storage_service: avoid using non-constexpr as format string in order to use compile-time format check, we would need to use compile-time constexpr for the format string. despite that we might be able to find a way to tell if an expression is compile-time constexpr in C++20, it'd be much simpler to always use a known-to-be-constexpr format string. this would help us to eventually migrate to the compile-time format check in seastar's logging subsystem. so, in this change, instead of feeding `seastar::logger::info()` and friends with a non-constexpr format string, let's just use "{}" for printing it, or mark the format string with `constexpr` instead of `const`. as the former tells the compiler it is a variable that can be evaluated at compile-time, while the latter just inform the compiler that the variable is not mutable after it is initialized. This change also helps to address the compiling failure with the yet-merged compile-time format check patch in Seastar: ``` /usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -Og -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result "-Wno-error=#warnings" -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/service.dir/storage_service.cc.o -MF service/CMakeFiles/service.dir/storage_service.cc.o.d -o service/CMakeFiles/service.dir/storage_service.cc.o -c /home/kefu/dev/scylladb/service/storage_service.cc /home/kefu/dev/scylladb/service/storage_service.cc:2460:18: error: call to consteval function 'seastar::logger::format_info<>::format_info<const char *, 0>' is not a constant expression slogger.info(str.c_str()); ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15959	2023-11-06 18:39:53 +01:00
Kamil Braun	315c69cec2	test: rest_api: test that generation is nonzero in `failure_detector/endpoints`	2023-11-06 18:03:34 +01:00
Kamil Braun	eb6943b852	api: failure_detector: fix indentation	2023-11-06 17:12:17 +01:00
Kamil Braun	a89c69007e	api: failure_detector: invoke on shard 0 These APIs may return stale or simply incorrect data on shards other than 0. Newer versions of Scylla are better at maintaining cross-shard consistency, but we need a simple fix that can be easily and without risk be backported to older versions; this is the fix. Fixes: scylladb/scylladb#15816	2023-11-06 17:03:38 +01:00
Piotr Dulikowski	85516c9155	raft topology: catch and abort on exceptions from topology_coordinator::run The `topology_coordinator` function is supposed to handle all of the exceptions internally. Assert, in runtime, that this is the case by wrapping the `run` invocation with a try..catch; in case of an exception, step down as a leader first and then abort.	2023-11-06 15:25:38 +01:00
Anna Stuchlik	a6fd4cccf2	doc: add the upgrade guide from 5.2 to 5.4 This commit adds the upgrade guide from version 5.2 to 5.4. Version 5.3 was never released. This commit must be backported to branch-5.4.	2023-11-06 14:48:26 +01:00
Piotr Dulikowski	843f02eb5d	Revert "storage_service: raft topology: mark topology_coordinator::run function as noexcept" This reverts commit `dcaaa74cd4`. The `noexcept` specifier that it added is only relevant to the function and not the coroutine returned from that function. This was not the intention and it looks confusing now, so remove it.	2023-11-06 12:00:42 +01:00
Piotr Dulikowski	41c2dac250	raft topology: don't print an error when fencing previous coordinator is aborted An attempt to fence the previous coordinator may fail because the current coordinator is aborted. It's not a critical error and it can happen during normal operations, so lower the verbosity used to print a message about this error to 'debug'. Return from the function immediately in that case - the sleep_aborted that happens as a next step would fail on abort_requested_exception anyway, so make it more explicit.	2023-11-06 12:00:42 +01:00
Piotr Dulikowski	1408b7cfa8	raft topology: handle abort exceptions from sleeping in fence_previous_coordinator The fence_previous_coordinator function has a retry loop: if it fails to perform a group0 operation, it will try again after a 1 second delay. However, if the topology coordinator is aborted while it waits, an exception will be thrown and will be propagated out of the function. The function is supposed to handle all exceptions internally, so this is not desired. Fix this by catching the abort_requested_exception and returning from the function if the exception is caught.	2023-11-06 12:00:41 +01:00
Michał Jadwiszczak	213e39a937	cql-pytest:test_describe: add test for whitespaces in json objects	2023-11-06 10:37:10 +01:00
Kamil Braun	15b441550b	gossiper: do_shadow_round: increment `nodes_down` in case of timeout Previously we would only increment `nodes_down` when getting `rpc::closed_error`. Distinguishing between that and timeout is unreliable. Consider: 1. if a node is dead but we can reach the IP, we'd get `closed_error` 2. if we cannot reach the IP (there's a network partition), the RPC would hang so we'd get `timeout_error` 3. if the node is both dead and the IP is unreachable, we'd get `timeout_error` And there are probably other more complex scenarios as well. In general, it is impossible to distinguish a dead node from a partitioned node in asynchronous networks, and whether we end up with `closed_error` or `timeout_error` is an implementation detail of the underlying protocol that we use. The fact that `nodes_down` was not incremented for timeouts would prevent a node from starting if it cannot reach isolated IPs (whether or not there were dead or alive nodes behind those IPs). This was observed in a Jepsen test: https://github.com/scylladb/scylladb/issues/15675. Note that `nodes_down` is only used to skip shadow round outside bootstrap/replace, i.e. during restarts, where the shadow round was "best effort" anyway (not mandatory). During bootstrap/replace it is now mandatory. Also fix grammar in the error message.	2023-11-06 10:28:08 +01:00
Kamil Braun	897cb6510e	gossiper: do_shadow_round: fix `nodes_down` calculation During shadow round we would calculate the number of nodes from which we got `rpc::closed_error` using `nodes_counter`, and if the counter reached the size of all contact points passed to shadow round, we would skip the shadow round (and after the previous commit, we do it only in the case of restart, not during bootstrap/replace which is unsafe). However, shadow round might have multiple loops, and `nodes_down` was initialized to `0` before the loop, then reused. So the same node might be counted multiple times in `nodes_down`, and we might incorrectly enter the skipping branch. Or we might go over `nodes.size()` and never finish the loop. Fix this by initializing `nodes_down = 0` inside the loop.	2023-11-06 10:28:07 +01:00
Kamil Braun	b03fa87551	storage_service: make shadow round mandatory during bootstrap/replace It is unsafe to bootstrap or perform replace without performing the shadow round, which is used to obtain features from the existing cluster and verify that we support all enabled features. Before this patch, I could easily produce the following scenario: 1. bootstrap first node in the cluster 2. shut it down 3. start bootstrapping second node, pointing to the first as seed 4. the second node skips shadow round because it gets `rpc::closed_error` when trying to connect to first node. 5. the node then passes the feature check (!) and proceeds to the next step, where it waits for nodes to show up in gossiper 6. we now restart the first node, and the second node finishes bootstrap The shadow round must be mandatory during bootstrap/replace, which is what this patch does. On restart it can remain optional as it was until now. In fact it should be completely unnecessary during restart, but since we did it until now (as best-effort), we can keep doing it.	2023-11-06 10:28:07 +01:00
Kamil Braun	7e9e84200c	gossiper: do_shadow_round: remove default value for nodes param	2023-11-06 10:28:07 +01:00
Kamil Braun	108aae09c5	gossiper: do_shadow_round: remove `fall_back_to_syn_msg` If during shadow round we learned that a contact node does not understand the GET_ENDPOINT_STATES verb, we'd fall back to old shadow round method (using gossiper SYN messages). The verb was added a long time ago and it ended up in Scylla 4.3 and 2021.1. So in newer versions we can make it mandatory, as we don't support skipping major versions during upgrades. Even if someone attempted to, they would just get an error and they can retry bootstrap after finnishing upgrade.	2023-11-06 10:28:07 +01:00
Botond Dénes	2e1562d889	Merge 'dht: i_partitioner cleanup' from Benny Halevy This series refactors the `dht/i_paritioner.hh` header file and cleans up its usage so to reduce the dependencies on it, since it is carries a lot of baggage that is rarely required in other header files. Closes scylladb/scylladb#15954 * github.com:scylladb/scylladb: everywhere: reduce dependencies on i_partitioner.hh locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration dht: reduce dependency on i_partitioner.hh dht: fold compatible_ring_position in ring_position.hh dht: refactor i_partitioner.hh dht: move token_comperator to token.{cc,hh} dht/i_partitioner: include i_partitioner_fwd.hh	2023-11-06 10:34:38 +02:00
Kefu Chai	2b961d8e3f	build: cmake: define per-mode compile definition instead of setting for a single CMAKE_BUILD_TYPE, set the compilation definitions for each build configuration. this prepares for the multi-config generator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15943	2023-11-06 10:34:38 +02:00
Kefu Chai	f2693752f1	build: cmake: avoid referencing CMAKE_BUILD_TYPE use generator-expresion instead, so that the value can be evaluated when generating the build system. this prepares for the multi-config support. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15942	2023-11-06 10:34:38 +02:00
Botond Dénes	7c7baf71d5	Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani This mini series purpose is to move all tests (that uses the infrastructure to create a Scylla cluster) to shut down gracefully on shutdown. One benefit is that the shutdown sequence for cluster will be tested better, however it is not the main purpose of this change. The main purpose of this change is to pave the way for coverage reporting on all tests and not only the ones that has a standalone executables. Full test runs are only slightly impacted by this change (~2.4% increase in runtime): Without gracefull shutdown ``` time ./test.py --mode dev Found 2966 tests. ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [2966/2966] topology_experimental_raft dev [ PASS ] topology_experimental_raft.test_raft_cluster_features.1 ------------------------------------------------------------------------------ CPU utilization: 13.1% real 4m50.587s user 13m58.358s sys 6m55.975s ``` With gracefull shutdown ``` time ./test.py --mode dev Found 2966 tests. ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [2966/2966] topology_experimental_raft dev [ PASS ] topology_experimental_raft.test_raft_cluster_features.1 ------------------------------------------------------------------------------ CPU utilization: 12.6% real 4m57.637s user 13m56.864s sys 6m46.657s ``` Closes scylladb/scylladb#15851 * github.com:scylladb/scylladb: test.py: move to a gracefull temination of nodes on teardown test.py: Use stop lock also in the graceful version	2023-11-06 10:34:38 +02:00
Benny Halevy	a1acf6854b	everywhere: reduce dependencies on i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:47:44 +02:00
Benny Halevy	6de1cc2993	locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh define token_metadata_ptr in token_metadata_fwd.hh So that the declaration of `make_splitter` can be moved to token_range_splitter.hh, where it belongs, and so token_metadata.hh won't have to include it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	182e5381d8	cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	4b184e950a	dht: reduce dependency on i_partitioner.hh include only the required header files where needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	aa70e3a536	dht: fold compatible_ring_position in ring_position.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	28b5482403	dht: refactor i_partitioner.hh Extract decorated_key.hh and ring_position.hh out of i_partitioner.hh so they can be included selectively, since i_partitioner.hh contains too much bagage that is not always needed in full. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:27 +02:00
Benny Halevy	232918eef0	dht: move token_comperator to token.{cc,hh} Move the `token_comparator` definition and implementation to token.{hh,cc}, respectively since they are independent of i_partitioner. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:15 +02:00
Benny Halevy	8309cf743e	dht/i_partitioner: include i_partitioner_fwd.hh Rather than repeating the same declarations in i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:14 +02:00
Kefu Chai	08f8796cf0	test/object_store: wrap line which is too long to be compliant to PEP8, see https://peps.python.org/pep-0008/#blank-lines also easier to read with smaller screen and/or large fonts. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	5c0e4df624	test/object_store: use pattern matching to capture variable in loop instead of referencing the elements in tuple with their indexes, use pattern matching to capture them. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	6208a05c40	test/object_store: remove space after and before '{' and '}' to be compliant with PEP8, see https://peps.python.org/pep-0008/#whitespace-in-expressions-and-statements for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	231938f739	test/object_store: add an empty line before nested function definition to be compliant to PEP8, see https://peps.python.org/pep-0008/#blank-lines for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Kefu Chai	38d5e7cae2	test/object_store: use two empty lines in-between global functions to be compliant to PEP8, see https://peps.python.org/pep-0008/#blank-lines for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 21:29:31 +08:00
Michał Jadwiszczak	cbfbcffc75	schema: add whitespace to description of table options Values of `caching`, `tombstone_gc` and `cdc` are json object but they were printed without any whitespaces. This commit adds them after colons(:) and commas(,), so the values are more readable and it matches format of old client-side describe.	2023-11-04 12:30:19 +01:00
Kefu Chai	ff12f1f678	docs: add divider using CSS instead of hardwiring the formatting in the html code, do this using CSS, more flexible this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:34 +08:00
Kefu Chai	1694a7addc	docs: extract _clean_description as a filter would be better to split the parser from the formatter. in future, we can apply more filter on top of the exiting one. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:34 +08:00
Kefu Chai	9ddc639237	docs: render option with role so we can cross-reference them with the syntax like :confval:`alternator_timeout_in_ms`. or even render an option like: .. confval:: alternator_timeout_in_ms in order to make the headerlink of the option visible, a new CSS rule is added. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:34 +08:00
Kefu Chai	53dfb5661d	docs: parse source files right into rst so we can render the rst without writing a temporary YAML. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-11-04 00:22:33 +08:00
Kamil Braun	6cc5bcae80	test: test_topology_ops: disable background writes Recently, in `a3ba4b3109`, this test was extended with a background task that continuously performs CQL writes. This turned out to be very valuable and detected a couple of bugs, including: https://github.com/scylladb/scylladb/issues/15924 https://github.com/scylladb/scylladb/issues/15935 Unfortunately this causes CI to be flaky. Until these bugs are fixed, we disable the background writes to unflake CI. Closes scylladb/scylladb#15937	2023-11-03 16:52:10 +02:00
Raphael S. Carvalho	cca85f5454	streaming: Don't adjust partition estimate if segregation is postponed When off-strategy is enabled, data segregation is postponed to when off-strategy runs. Turns out we're adjusting partition estimate even when segregation is postponed, meaning that sstables in maintenance set will smaller filters than they should otherwise have. This condition is transient as the system eventually heal this through compactions. But note that with TWCS, problem of inefficient filters may persist for a long time as sstables written into older windows may stay around for a significant amount of time. In the future, we're planning to make this less fragile by dynamically resizing filters on sstable write completion. The problem aforementioned is solved by skipping adjustment when segregation is postponed (i.e. off-strategy is enabled). Refs #15704. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-03 16:22:07 +02:00
Asias He	2b2302d373	streaming: Ignore dropped table on both sides It is possible the sender and receiver of streaming nodes have different views on if a table is dropped or not. For example: - n1, n2 and n3 in the cluster - n4 started to join the cluster and stream data from n1, n2, n3 - a table was dropped - n4 failed to write data from n2 to sstable because a table was dropped - n4 ended the streaming - n2 checked if the table was present and would ignore the error if the table was dropped - however n2 found the table was still present and was not dropped - n2 marked the streaming as failed This will fail the streaming when a table is dropped. We want streaming to ignore such dropped tables. In this patch, a status code is sent back to the sender to notify the table is dropped so the sender could ignore the dropped table. Fixes #15370 Closes scylladb/scylladb#15912	2023-11-03 13:38:48 +02:00
David Garcia	84e073d0ec	docs: update theme 1.6 Closes scylladb/scylladb#15782	2023-11-03 09:45:16 +01:00
Piotr Dulikowski	70f4f8d799	test/pylib: increase control connection timeout in cql_is_up After starting the associated node, ScyllaServer waits until the node starts serving CQL requests. It does that by periodically trying to establish a python driver session to the node. During session establishment, the driver tries to fetch some metadata from the system tables, and uses a pretty short timeout to do so (by default it's 2 seconds). When running tests in debug mode, this timeout can prove to be too short and may prevent the testing framework from noticing that the node came up. Fix the problem by increasing the timeout. Currently, after the session is established, a query is sent in order to further verify that the session works and it uses a very generous timeout of 1000 seconds to do so - use the same timeout for internal queries in the python driver. Fixes: scylladb/scylladb#15898 Closes scylladb/scylladb#15929	2023-11-03 09:32:11 +01:00
Kefu Chai	5b7feb8b95	build: s/create_building_system/create_build_system/ as build system is more correct in this context. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15932	2023-11-03 09:37:44 +02:00
Pavel Emelyanov	3173336e97	tests: Use make_sstable_easy() where appropriate There are two test cases out there that make sstable, write it and the load, but the make_sstable_easy() is for that, so use it there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:32:43 +03:00
Pavel Emelyanov	cc89acff67	sstable_conforms_to_mutation_source_test: Open-code the make_sstable() helper This test case is pretty special in the sense that it uses custom path for tempdir to create, write and load sstable to/from. It's better to open-code the make_sstable() helper into the test case rather than encourage callers to use custom tempdirs. "Good" test cases can use make_sstable_easy() for the same purposes (in fact they alredy do). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:30:54 +03:00
Pavel Emelyanov	7f6423bc35	sstable_mutation_test: Use make_sstable_easy() instead of make_sstable() The latter is only used in the former test case and doesn't provide extra value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:30:02 +03:00
Pavel Emelyanov	eeee58def8	tests: Make use of make_memtable() helper There's one in the utils that creates lw_shared_ptr<memtable> and applies provided vector of mutations into it. Lots of other test cases do literally the same by hand. The make_memtable() assumes that the caller is sitting in the seastar thread, and all the test cases that can benfit from it already are. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:28:35 +03:00
Pavel Emelyanov	c1824324bd	tests: Drop as_mutation_source helper It does nothing by calls the sstable method of the same name. Callers can do it on their own, the method is public. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:27:59 +03:00
Pavel Emelyanov	3ff32a2ca5	test/sstable_utils: Hide assertion-related manipulations into branch The make_sstable_containing() can validate the applied mutations are produced by the resulting sstable if the callers asks for it. To do so the mutations are merged prior to checking and this merging should only happen if validation is requested, otherwise it just makes no sense. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-02 19:26:46 +03:00
Kamil Braun	8179296f56	Merge 'retry automatic announcements of the schema changes on concurrent operation' from Patryk Jędrzejczak The follow-up to #15594. We retry every automatic `migration_manager::announce` if `group0_concurrent_modification` occurs. Concurrent operations can happen during concurrent bootstrap in Raft-based topology, so we need this change to enable support for concurrent bootstrap. This PR adds retry loops in 4 places: - `service::create_keyspace_if_missing`, - `system_distributed_keyspace::start`, - `redis::create_keyspace_if_not_exists_impl`, - `table_helper::setup_keyspace` (used for creating the `system_traces` keyspace). Fixes #15435 Closes scylladb/scylladb#15613 * github.com:scylladb/scylladb: table_helper: fix indentation table_helper: retry in setup_keyspace on concurrent operation table_helper: add logger redis/keyspace_utils: fix indentation redis: retry creating defualt databases on concurrent operation db/system_distributed_keyspace: fix indentation db/system_distributed_keyspace: retry start on concurrent operation auth/service: retry creating system_auth on concurrent operation	2023-11-02 17:24:52 +01:00
Kamil Braun	5cf18b18b2	Merge 'raft: topology: outside topology-on-raft mode, make sure not to use its RPCs' from Piotr Dulikowski Topology on raft is still an experimental feature. The RPC verbs introduced in that mode shouldn't be used when it's disabled, otherwise we lose the right to make breaking changes to those verbs. First, make sure that the aforementioned verbs are not sent outside the mode. It turns out that `raft_pull_topology_snapshot` could be sent outside topology-on-raft mode - after the PR, it no longer can. Second, topology-on-raft mode verbs are now not registered at all on the receiving side when the mode is disabled. Additionally tested by running `topology/` tests with `consistent_cluster_management: True` but with experimental features disabled. Fixes: scylladb/scylladb#15862 Closes scylladb/scylladb#15917 * github.com:scylladb/scylladb: storage_service: fix indentation raft: topology: only register verbs in topology-on-raft mode raft: topology: only pull topology snapshot in topology-on-raft mode	2023-11-02 16:44:18 +01:00
Kefu Chai	798eede61a	build: cmake: update 3rd party library deps where it is found move the code which updates the third-party library closer to where the library is found. for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15915	2023-11-02 17:20:57 +02:00
Kefu Chai	0421db2471	build: cmake: enable Seastar_UNUSED_RESULT_ERROR this mirrors what we already have in `configure.py`. so that Seastar can report [[nodiscard]] violations as error. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15914	2023-11-02 17:19:31 +02:00
Patryk Jędrzejczak	dacec6374d	table_helper: fix indentation Broken in the previous commit.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	e10036babe	table_helper: retry in setup_keyspace on concurrent operation Currently, table_helper::setup_keyspace is used only for starting the system_traces keyspace. We need to handle concurrent group 0 operations possible during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	e2894a081a	table_helper: add logger It will be used in the next commit to log information when a concurrent group 0 modification occurs.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	3e8a307cd4	redis/keyspace_utils: fix indentation Broken in the previous commit.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	24aa5bf72c	redis: retry creating defualt databases on concurrent operation A concurrent group 0 operation in create_keyspace_if_not_exists_impl can happen during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	0357636f16	db/system_distributed_keyspace: fix indentation Broken in the previous commit.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	813c7a582c	db/system_distributed_keyspace: retry start on concurrent operation A concurrent group 0 operation in system_distributed_keyspace::start can happen during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Patryk Jędrzejczak	dfba0b9e9b	auth/service: retry creating system_auth on concurrent operation A concurrent group 0 operation in service::create_keyspace_if_missing can happen during concurrent bootstrap in the Raft-based topology.	2023-11-02 14:21:15 +01:00
Pavel Emelyanov	1a44f362b2	pytest: Do not try to guess which scylla binary user wants to run When running some pytest-based tests they start scylla binary by hand instead of relying on test.py's "clusters". In automatic run (e.g. via test.py itself) the correct scylla binary is the one pointed to by SCYLLA environment, but when run from shell via pytest directly it tries to be smart and looks at build/*/scylla binaries picking the one with the greatest mtime. That guess is not very nice, because if the developer switches between build modes with configure.py and rebuilds binaries, binaries from "older" or "previous" builds stay on the way and confuse the guessing code. It's better to be explicit. refs: #15679 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15684	2023-11-02 12:34:49 +02:00
Kamil Braun	0846d324d7	Merge 'rollback topology operation on streaming failure' from Gleb This patch series adds error handling for streaming failure during topology operations instead of an infinite retry. If streaming fails the operation is rolled back: bootstrap/replace nodes move to left and decommissioned/remove nodes move back to normal state. * 'gleb/streaming-failure-rollback-v4' of github.com:scylladb/scylla-dev: raft: make sure that all operation forwarded to a leader are completed before destroying raft server storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier tests: add tests for streaming failure in bootstrap/replace/remove/decomission test/pylib: do not stop node if decommission failed with an expected error storage_service: raft topology: fix typo in "decommission" everywhere storage_service: raft topology: add streaming error injection storage_service: raft topology: do not increase topology version during CDC repair storage_service: raft topology: rollback topology operation on streaming failure. storage_service: raft topology: load request parameters in left_token_ring state as well storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch storage_service: raft topology: make global_token_metadata_barrier node independent storage_service: raft topology: split get_excluded_nodes from exec_global_command storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true storage_service: raft topology: simplify streaming RPC failure handling	2023-11-02 10:15:45 +01:00
Kamil Braun	ae58e39743	Merge 'reduce announcements of the automatic schema changes' from Patryk Jędrzejczak There are some schema modifications performed automatically (during bootstrap, upgrade etc.) by Scylla that are announced by multiple calls to `migration_manager::announce` even though they are logically one change. Precisely, they appear in: - `system_distributed_keyspace::start`, - `redis:create_keyspace_if_not_exists_impl`, - `table_helper::setup_keyspace` (for the `system_traces` keyspace). All these places contain a FIXME telling us to `announce` only once. There are a few reasons for this: - calling `migration_manager::announce` with Raft is quite expensive -- taking a `read_barrier` is necessary, and that requires contacting a leader, which then must contact a quorum, - we must implement a retrying mechanism for every automatic `announce` if `group0_concurrent_modification` occurs to enable support for concurrent bootstrap in Raft-based topology. Doing it before the FIXMEs mentioned above would be harder, and fixing the FIXMEs later would also be harder. This PR fixes the first two FIXMEs and improves the situation with the last one by reducing the number of the `announce` calls to two. Unfortunately, reducing this number to one requires a big refactor. We can do it as a follow-up to a new, more specific issue. Also, we leave a new FIXME. Fixing the first two FIXMEs required enabling the announcement of a keyspace together with its tables. Until now, the code responsible for preparing mutations for a new table could assume the existence of the keyspace. This assumption wasn't necessary, but removing it required some refactoring. Fixes scylladb/scylladb#15437 Closes scylladb/scylladb#15897 * github.com:scylladb/scylladb: table_helper: announce twice in setup_keyspace table_helper: refactor setup_table redis: create_keyspace_if_not_exists_impl: fix indentation redis: announce once in create_keyspace_if_not_exists_impl db: system_distributed_keyspace: fix indentation db: system_distributed_keyspace: announce once in start tablet_allocator: update on_before_create_column_family migration_listener: add parameter to on_before_create_column_family alternator: executor: use new prepare_new_column_family_announcement alternator: executor: introduce create_keyspace_metadata migration_manager: add new prepare_new_column_family_announcement	2023-11-02 09:32:35 +01:00
Piotr Dulikowski	6d15f0283e	storage_service: fix indentation It was broken by the previous commit.	2023-11-02 07:39:27 +01:00
Piotr Dulikowski	190d549bd5	raft: topology: only register verbs in topology-on-raft mode Verbs related to topology on raft should not be sent outside the topology on raft mode - and, after the previous commit, they aren't. Make sure not to register handlers for those verbs if topology on raft mode is not enabled.	2023-11-02 07:39:27 +01:00
Piotr Dulikowski	8727634e9c	raft: topology: only pull topology snapshot in topology-on-raft mode Currently, during group0 snapshot transfer, the node pulling the snapshot will send the `raft_pull_topology_snapshot` verb even if the cluster is not in topology-on-raft mode. The RPC handler returns an empty snapshot in that case. However, using the verb outside topology on raft causes problems: - It can cause issues during rolling upgrade as the snapshot transfer will keep failing on the upgraded nodes until the leader node is upgraded, - Topology changes on raft are still experimental, and using the RPC outside experimental mode will prevent us from doing breaking changes to it. Solve the issue by passing the "topology changes on raft enabled" flag to group0_state_machine and send the RPC only in topology on raft mode.	2023-11-02 07:39:27 +01:00
Yaniv Kaul	c662fe6444	Debian based Dockerfile: do not install 'suggested' pacakges We can opt out from installing suggested packages. Mainly those related to Java and friends that we do not seem to need. Fixes: #15579 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#15580	2023-11-01 17:16:18 +02:00
Botond Dénes	a34c8dc485	Merge 'Drop compaction_manager_for_testing' from Pavel Emelyanov There's such a wrapper class in test_services. After #15889 this class resembles the test_env_compaction_manager and can be replaced with it. However, two users of the former wrapper class need it just to construct table object, and the way they do it is re-implementation of table_for_tests class. This PR patches the test cases to make use of table_for_tests and removes the compaction_manager_for_testing that becomes unused after it. Closes scylladb/scylladb#15909 * github.com:scylladb/scylladb: test_services: Ditch compaction_manager_for_testing test/sstable_compaction_test: Make use of make_table_for_tests() test/sstable_3_x_test: Make use of make_table_for_tests() table_for_tests: Add const operator-> overload sstable_test_env: Add test_env_compaction_manager() getter sstable_test_env: Tune up maybe_start_compaction_manager() method test/sstable_compaction_test: Remove unused tracker allocation	2023-11-01 16:08:34 +02:00
Botond Dénes	665a5cb322	Update tools/jmx submodule * tools/jmx 8d15342e...05bb7b68 (4): > README: replace 0xA0 (NBSP) character with space > scylla-apiclient: update Guava dependency > scylla-apiclient: update snakeyaml dependency > scylla-apiclient: update Jackson dependencies [Botond: regenerate frozen toolchain]	2023-11-01 08:08:37 -04:00
Pavel Emelyanov	787c6576fe	test_services: Ditch compaction_manager_for_testing Now this wrapper is unused, all (both) test cases that needed it were patched to use make_table_for_tests(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	731a82869a	test/sstable_compaction_test: Make use of make_table_for_tests() The max_ongoing_compaction_test test case constructs table object by hand. For that it needs tracker, compaction manager and stats. Similarly to previous patch, the test_env::make_table_for_tests() helper does exactly that, so the test case can be simplified as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	5b3b8c2176	test/sstable_3_x_test: Make use of make_table_for_tests() The compacted_sstable_reader() helper constructs table object and all its "dependencies" by hand. The test_env::make_table_for_tests() helper does the same, so the test code can be simplified. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	9b8f03bdb0	table_for_tests: Add const operator-> overload Will be used later in boost transformation lambda Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	3021fb7b6c	sstable_test_env: Add test_env_compaction_manager() getter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	19b524d0f3	sstable_test_env: Tune up maybe_start_compaction_manager() method Make it public and add `bool enable` flag so that test cases could start the compaction manager (to call make_table_for_tests() later) but keep it disabled for their testing purposes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:17 +03:00
Pavel Emelyanov	3f354c07a3	test/sstable_compaction_test: Remove unused tracker allocation The sstable_run_based_compaction_test case allocates the tracker but doesn't use it. Probably was left after the case was patched to use make_table_for_tests() helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-01 14:18:12 +03:00
Kefu Chai	ef023dae44	s3: use rapixml/rapidxml.hpp as a fallback on debian derivatives librapidxml-dev installs rapidxml.h as rapixml/rapidxml.hpp, so let's use it as a fallback. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15814	2023-11-01 10:25:40 +03:00
Kefu Chai	7253369ad9	SCYLLA-VERSION-GEN: respect --date-stamp before this change the argument passed to --date-stamp option is ignored, as we don't reference the date-stamp specified with this option at all. instead, we always overwrite it with the the output of `date --utc +%Y%m%d`, if we are going to reference this value. so, in this change instead of unconditionally overwriting it, we keep its value intact if it is already set. the change which introduced this regression was `839d8f40e6` Fixes #15894 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15895	2023-11-01 10:24:04 +03:00
Avi Kivity	fcd86d993d	Merge 'Put table_for_tests on a diet' from Pavel Emelyanov The object in question is used to facilitate creation of table objects for compaction tests. Currently the table_for_test carries a bunch of auxiliary objects that are needed for table creation, such as stats of all sorts and table state. However, there's also some "infrastructure" stuff onboard namely: - reader concurrency semaphore - cache tracker - task manager - compaction manager And those four are excessive because all the tests in question run inside the sstables::test_env that has most of it. This PR removes the mentioned objects from table_for_tests and re-uses those from test_env. Also, while at it, it also removes the table::config object from table_for_tests so that it looks more like core code that creates table does. Closes scylladb/scylladb#15889 * github.com:scylladb/scylladb: table_for_tests: Use test_env's compaction manager sstables::test_env: Carry compaction manager on board table_for_tests: Stop table on stop table_for_tests: Get compaction manager from table table_for_tests: Ditch on-board concurrency semaphore table_for_tests: Require config argument to make table table_for_tests: Create table config locally table_for_tests: Get concurrency semaphore from table table_for_tests: Get table directory from table itself table_for_tests: Reuse cache tracker from sstables manager table_for_tests: Remove unused constructor tests: Split the compaction backlog test case sstable_test_env: Coroutinize and move to .cc test_env::stop()	2023-10-31 18:03:07 +02:00
Piotr Smaroń	8c464b2ddb	guardrails: restrict replication strategy (RS) Replacing `restrict_replication_simplestrategy` config option with 2 config options: `replication_strategy_{warn,fail}_list`, which allow us to impose soft limits (issue a warning) and hard limits (not execute CQL) on replication strategy when creating/altering a keyspace. The reason to rather replace than extend `restrict_replication_simplestrategy` config option is that it was not used and we wanted to generalize it. Only soft guardrail is enabled by default and it is set to SimpleStrategy, which means that we'll generate a CQL warning whenever replication strategy is set to SimpleStrategy. For new cloud deployments we'll move SimpleStrategy from warn to the fail list. Guardrails violations will be tracked by metrics. Resolves #5224 Refs #8892 (the replication strategy part, not the RF part) Closes scylladb/scylladb#15399	2023-10-31 18:34:41 +03:00
Botond Dénes	287f05ad26	Merge 'scylla-sstable/tools: Use semi-properly initiated db::config + extensions to allow encrypted sstables' from Calle Wilund Refs https://github.com/scylladb/scylla-enterprise/issues/3461 Refs https://github.com/scylladb/scylla-enterprise/issues/3210 Adds a tool-app global db::config + extensions to each tool invocation + configurable init. Uses this in scylla-sstables, allowing both enterprise-only configs to be read, as well as (almost all) encrypted sstables. Note: Do not backport to enterprise before https://github.com/scylladb/scylla-enterprise/pull/3473 is merged, otherwise tools will break there. Closes scylladb/scylladb#15615 * github.com:scylladb/scylladb: scylla-sstable: Use tool-global config + extensions tools: Add db config + extensions to tool app run	2023-10-31 14:21:57 +02:00
Pavel Emelyanov	b974d8ca1b	stream_session: Do not print banign exceptions with error level Handler of STREAM_MUTATION_FRAGMENTS verb creates and starts reader. The resulting future is then checked for being exceptional and an error message is printed in logs. However, if reader fails because of socket being closed by peer, the error looks excessive. In that case the exception is just regular handling of the socket/stream closure and can be demoted down to debug level. fixes: #15891 Similar cherry-picking of log level exists in e.g. storage proxy, see for example `56bd9b5d` (service: storage_proxy: do not report abort requests in handle_write ) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15892	2023-10-31 14:21:22 +02:00
Gleb Natapov	15a34f650d	gossip: remove unused HIBERNATE gossiper status The status is not used since `2ec1f719de` which is included in scylla-4.6.0. We cannot have mixed cluster with the version so old, so the new version should not carry the compatibility burden.	2023-10-31 14:08:38 +02:00
Gleb Natapov	35a1ac1a9a	gossip: remove unused STATUS_MOVING state Moving operation was removed by `4a0b561376` and since then the state is unused. Even back then it worked only for the case of one token so it is safe to say we never used it. Lets remove the remains of the code instead of carrying it forever.	2023-10-31 13:54:46 +02:00
Kefu Chai	2cd804b8e5	build: cmake: do not hardwire build_reloc.sh arguments before this change, we feed `build_reloc.sh` with hardwired arguments when building python3 submodule. but this is not flexible, and hurts the maintainability. in this change, we mirror the behavior of `configure.py`, and collect the arguments from the output of `install-dependencies.sh`, and feed the collected argument to `build_reloc.sh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15885	2023-10-31 13:27:12 +02:00
Botond Dénes	90a8489809	repair/repair.cc: do_repair_ranges(): prevent stalls when skipping ranges We have observed do_repair_ranges() receiving tens of thousands of ranges to repairs on occasion. do_repair_ranges() repairs all ranges in parallel, with parallel_for_each(). This is normally fine, as the lambda inside parallel_for_each() takes a semaphore and this will result in limited concurrency. However, in some instances, it is possible that most of these ranges are skipped. In this case the lambda will become synchronous, only logging a message. This can cause stalls beacuse there are no opportunities to yield. Solve this by adding an explicit yield to prevent this. Fixes: #14330 Closes scylladb/scylladb#15879	2023-10-31 13:24:54 +02:00
Avi Kivity	ef7db6df99	Merge 'schema_tables: turn view schema fixing code into a sanity check' from Kamil Braun The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal with legacy materialized view schemas used for secondary indexes, schemas which were created before the notion of "computed columns" was introduced. Back then, secondary index schemas would use a regular "token" column. Later it became a computed column and old schemas would be migrated during rolling upgrade. The migration code was introduced in 2019 (`db8d4a0cc6`) and then fixed in 2020 (`d473bc9b06`). The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming that users don't try crazy things like upgrading from 2021.X to 2023.X (which we do not support), all clusters will have already executed the migration code once they upgrade to 2023.X, meaning we can get rid of it. The main motivation of this PR is to get rid of the `db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft mode this was the only call to `merge_schema` outside "group 0 code" and in fact it is unsafe -- it uses locally generated mutations with locally generated timestamp (`api::new_timestamp()`), so if we actually did it, we would permanently diverge the group 0 state machine across nodes (the schema pulling code is disabled in Raft mode). Fortunately, this should be dead code by now, as explained in the previous paragraph. The migration code is now turned into a sanity check, if the users try something crazy, they will get an error instead of silent data corruption. Closes scylladb/scylladb#15695 * github.com:scylladb/scylladb: view: remove unused `_backing_secondary_index` schema_tables: turn view schema fixing code into a sanity check schema_tables: make comment more precise feature_service: make COMPUTED_COLUMNS feature unconditionally true	2023-10-31 13:23:19 +02:00
Kefu Chai	e853d7bb4b	build: cmake: add Scylla_DATE_STAMP option to be compatible with `configure.py` which allows us to optionally specify the --date-stamp option for SCYLLA-VERSION-GEN. this option is used by our CI workflow. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15896	2023-10-31 13:21:30 +02:00
Eliran Sinvani	2a45fed0cf	test.py: move to a gracefull temination of nodes on teardown This change move existing suits which create cluster through the testing infra to be stopped and uninstalled gracefully. The motivation, besides the obvious advantage of testing our stop sequence is that it will pave the way for applying code coverage support to all tests (not only standalone unit and boost test executables). testing: Ran all tests 10 times in a row in dev mode. Ran all tests once in release mode Ran all tests once in debug mode Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-10-31 13:12:49 +02:00
Eliran Sinvani	62ec1fe8e0	test.py: Use stop lock also in the graceful version An already known race (see: https://github.com/scylladb/scylladb/issues/15755) has been found once again as part of moving all tests to stop all nodes gracefully on teardown. The solution was to add the lock acquisition also to `stop_gracefully`. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-10-31 13:12:49 +02:00
Patryk Jędrzejczak	ba5275a6ae	table_helper: announce twice in setup_keyspace We refactor table_helper::setup_keyspace so that it calls migration_manager::announce at most twice. We achieve it by announcing all tables at once. The number of announcements should further be reduced to one, but it requires a big refactor. The CQL code used in parse_new_cf_statement assumes the keyspace has already been created. We cannot have such an assumption if we want to announce a keyspace and its tables together. However, we shouldn't touch the CQL code as it would impact user requests, too. One solution is using schema_builder instead of the CQL statements to create tables in table_helper. Another approach is removing table_helper completely. It is used only for the system_traces keyspace, which Scylla creates automatically. We could refactor the way Scylla handles this keyspace and make table_helper unneeded.	2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak	bf15d5f7bb	table_helper: refactor setup_table In the following commit, we reduce migration_manager::announce calls in table_helper::setup_keyspace by announcing all tables together. To do it, we cannot use table_helper::setup_table anymore, which announces a single table itself. However, the new code still has to translate CQL statements, so we extract it to the new parse_new_cf_statement function to avoid duplication.	2023-10-31 12:08:04 +01:00
Patryk Jędrzejczak	4dd5d8e5be	redis: create_keyspace_if_not_exists_impl: fix indentation Broken in the previous commit.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	3be7215163	redis: announce once in create_keyspace_if_not_exists_impl We refactor create_keyspace_if_not_exists_impl so that it takes at most one group 0 guard and calls migration_manager::announce at most once.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	df199eec11	db: system_distributed_keyspace: fix indentation Broken in the previous commit.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	91ff8007b3	db: system_distributed_keyspace: announce once in start We refactor system_distributed_keyspace::start so that it takes at most one group 0 guard and calls migration_manager::announce at most once. We remove a catch expression together with the FIXME from get_updated_service_levels (add_new_columns_if_missing before the patch) because we cannot treat the service_levels update differently anymore.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	5027c5f1e5	tablet_allocator: update on_before_create_column_family After adding the keyspace_metadata parameter to migration_listener::on_before_create_column_family, tablet_allocator doesn't need to load it from the database. This change is necessary before merging migration_manager::announce calls in the following commit.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	a762179972	migration_listener: add parameter to on_before_create_column_family After adding the new prepare_new_column_family_announcement that doesn't assume the existence of a keyspace, we also need to get rid of the same assumption in all on_before_create_column_family calls. After all, they may be initiated before creating the keyspace. However, some listeners require keyspace_metadata, so we pass it as a new parameter.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	a2e48b1a5b	alternator: executor: use new prepare_new_column_family_announcement We can use the new prepare_new_column_family_announcement function that doesn't assume the existence of the keyspace instead of the previous work-around.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	4ad2d895a3	alternator: executor: introduce create_keyspace_metadata We need to store a new keyspace's keyspace_metadata as a local variable in create_table_on_shard0. In the following commit, we use it to call the new prepare_new_column_family_announcement function.	2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak	fb2703de50	migration_manager: add new prepare_new_column_family_announcement In the following commits, we reduce the number of the migration_manager::anounce calls by merging some of them in a way that logically makes sense. Some of these merges are similar -- we announce a new keyspace and its tables together. However, we cannot use the current prepare_new_column_family_announcement there because it assumes that the keyspace has already been created (when it loads the keyspace from the database). Luckily, this assumption is not necessary as this function only needs keyspace_metadata. Instead of loading it from the database, we can pass it as a parameter.	2023-10-31 12:08:03 +01:00
Kefu Chai	9dd5af7fef	alternator: avoid using the deprecated API this change silences following compiling warning due to using the deprecated API by using the recommended API in place of the deprecated one: ``` /home/kefu/dev/scylladb/alternator/server.cc:569:27: warning: 'set_tls_credentials' is deprecated: use listen(socket_address addr, server_credentials_ptr credentials) [-Wdeprecated-declarations] _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) { ^ /home/kefu/dev/scylladb/seastar/include/seastar/http/httpd.hh:186:7: note: 'set_tls_credentials' has been explicitly marked deprecated here [[deprecated("use listen(socket_address addr, server_credentials_ptr credentials)")]] ^ 1 warning generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15884	2023-10-31 12:05:58 +03:00
Botond Dénes	4a0f16474f	Merge 'row_cache: abort on exteral_updater::execute errors' from Benny Halevy Currently the cache updaters aren't exception safe yet they are intended to be. Instead of allowing exceptions from `external_updater::execute` escape `row_cache::update`, abort using `on_fatal_internal_error`. Future changes should harden all `execute` implementations to effectively make them `noexcept`, then the pure virtual definition can be made `noexcept` to cement that. Fixes scylladb/scylladb#15576 Closes scylladb/scylladb#15577 * github.com:scylladb/scylladb: row_cache: abort on exteral_updater::execute errors row_cache: do_update: simplify _prev_snapshot_pos setup	2023-10-31 10:07:01 +02:00
Pavel Emelyanov	4db80ed61f	table_for_tests: Use test_env's compaction manager Now when the sstables::test_env provides the compaction manager instance, the table_for_tests can start using it and can remove c.m. and the sidecar task_manager. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:42:19 +03:00
Pavel Emelyanov	2c78b46c78	sstables::test_env: Carry compaction manager on board Most of the test cases that use sstables::test_env do not mess with table objects, they only need sstables. However, compaction test cases do need table objects and, respectively, a compaction manager instance. Today those test cases create compaction manager instance for each table they create, but that's a bit heaviweight and doesn't work the way core code works. This patch prepares the sstables::test_env to provide compaction manager on demand by starting it as soon as it's asked to create table object. For now this compaction manager is unused, but it will be in next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:39:54 +03:00
Pavel Emelyanov	b96d39e63a	table_for_tests: Stop table on stop Next patches will stop using compaction manager from table_for_tests in favor of external one (spoiler: the one from sstables::test_env), thus the compaction manager would outsurvive the table_for_tests object and the table object wrapped by it. So in order for the table_for_tests to stop correctly, it also needs to stop the wrapped table too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:38:03 +03:00
Pavel Emelyanov	e71409df38	table_for_tests: Get compaction manager from table There's table_for_tests::get_compaction_manager() helper that's excessive as compaction manager reference can be provided by the wrapped table object itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:37:22 +03:00
Pavel Emelyanov	ac45aae0c4	table_for_tests: Ditch on-board concurrency semaphore It's not used any longer and can be removed. This make table_for_tests stopping code a bit shorter as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:36:59 +03:00
Pavel Emelyanov	21998296a7	table_for_tests: Require config argument to make table This is the continuation of the previous patch. Make the caller of table_for_tests constructor provide the table::config. This makes the table_for_tests constructor shorter and more self-contained. Also, the caller now needs to provide the reference to reader concurrency semaphore, and that's good news, because the only caller for today is the sstables::test_env that does have it. This makes the semaphore sitting on table_for_tests itself unused and it will be removed eventually. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:34:59 +03:00
Pavel Emelyanov	5ab1af3804	table_for_tests: Create table config locally The table_for_tests keeps a copy of table::config on board. That's not "idiomatic" as table config is a temporary object that should only be needed while creating table object. Fortunately, the copy of config on table_for_tests is no longer needed and it can be made temporary. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:33:29 +03:00
Pavel Emelyanov	76e57cc805	table_for_tests: Get concurrency semaphore from table Making compaction permit needs a semaphore. Current code gets it from the table_for_tests, but the very same semaphore reference sits on the table. So get it from table, as the core code does. This will allow removing the dedicated semaphore from table_for_tests in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:32:32 +03:00
Pavel Emelyanov	35f7ada949	table_for_tests: Get table directory from table itself Making sstable for a table needs passing table directory as an argument. Current table_for_tests's helper gets the directory from table config, but the very same path sits on the table itself. This makes testing code to construct sstable look closer to the core code and is also the prerequisite for removing the table config from table_for_tests in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:30:59 +03:00
Pavel Emelyanov	769d9f17eb	table_for_tests: Reuse cache tracker from sstables manager When making table object it needs the cache tracker reference. The table_for_tests keeps one on board, but the very same object already sits on the sstables manager which has public getter. This makes the table_for_tests's cache tracker object not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:29:49 +03:00
Pavel Emelyanov	89e253c77e	table_for_tests: Remove unused constructor No code constructs it with just sstables manager argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:29:29 +03:00
Pavel Emelyanov	cba8f633f1	tests: Split the compaction backlog test case To improve parallelizm of embedded test sub-cases. By coinsidence, indentation fix is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:27:57 +03:00
Pavel Emelyanov	8d704f2532	sstable_test_env: Coroutinize and move to .cc test_env::stop() It's going to get larger, so better to move. Also when coroutinized it's goind to be easier to extend. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-31 09:26:58 +03:00
Kefu Chai	89a75967b1	build: ignore FileExistsError when creating compile_commands.json before this change, we only check the existence of compile_commands.json before creating a symlink to build/*/compile_commands.json. but there are chances that multiple ninja tasks are calling into `configure.py` for updating `build.ninja`: this does not break the process, as the last one wins: we just unconditionally `mv build.ninja.new build.ninja` for updating the this file. but this could break the build of `'compile_commands.json`: we create a symlink with Python, and if it fails the Python script errors out. in this change, we just ignore the `FileExistsError` when creating the symlink to `compile_commands.json`. because, if this symlink, we've achieved the goal, and should not consider it a failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15870	2023-10-30 23:47:48 +02:00
Anna Stuchlik	d4b1e8441a	doc: add the latest AWS image info to Installation This commit adds the AWS image information for the latest patch release to the Launch on AWS page in the installation section. This is a follow-up PR required to finalize the AWS installation docs and should be backported to branch-5.4. Related: https://github.com/scylladb/scylladb/pull/14153 https://github.com/scylladb/scylladb/pull/15651 Closes scylladb/scylladb#15867	2023-10-30 23:41:23 +02:00
Avi Kivity	949e9f1205	Merge 'Nodetool additional commands 3/N' from Botond Dénes This PR implements the following new nodetool commands: * cleanup * clearsnapshots * listsnapshots All commands come with tests and all tests pass with both the new and the current nodetool implementations. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15843 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement the listsnapshots command tools/scylla-nodetool: implement clearsnapshot command tools/scylla-nodetool: implement the cleanup command test/nodetool: rest_api_mock: add more options for multiple requests tools/scylla-nodetool: log responses with trace level	2023-10-30 21:53:36 +02:00
Avi Kivity	5a7d15a666	Update seastar submodule * seastar 17183ed4e4...830ce86738 (6): > coroutine: fix use-after-free in parallel_for_each > build: do not provide zlib as an ingredient > http: do not use req.content_length as both input parameter > io_tester: disable -Wuninitialized when including boost.accumulators > scheduling: revise the doxygen comment of create_scheduling_group() > Merge 'Added ability to configure different credentials per HTTP listeners' from Michał Maślanka Closes scylladb/scylladb#15871	2023-10-30 21:39:12 +02:00
Avi Kivity	03a801b61b	Merge 'Nodetools docs improvements 1/N' from Botond Dénes While working on https://github.com/scylladb/scylladb/issues/15588, I noticed problems with the existing documentation, when comparing it with the actual code. This PR contains fixes for nodetool compact, stop and scrub. Closes scylladb/scylladb#15636 * github.com:scylladb/scylladb: docs: nodetool compact: remove common arguments docs: nodetool stop: fix compaction types and examples docs: nodetool compact: remove unsupported partition option	2023-10-30 20:17:14 +02:00
Pavel Emelyanov	c88de8f91e	test/compaction: Use shorter make_table_for_tests() overload There's one that doesn't need tempdir path argument since it gets one from the env onboard tempdir anyway Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15825	2023-10-30 20:16:29 +02:00
Paweł Zakrzewski	384427bd02	doc: Replace instances of SimpleStrategy with NetworkTopologyStrategy The goal is to make the available defaults safe for future use, as they are often taken from existing config files or documentation verbatim. Referenced issue: #14290 Closes scylladb/scylladb#15856	2023-10-30 20:15:48 +02:00
Pavel Emelyanov	7fa7a9495d	task_manager: Don't leave task_ttl uninitialized When task_manager is constructed without config (tests) its task_ttl is left uninitialized (i.e. -- random number gets in there). This results in tasks hanging around being registered for infinite amount of time making long-living task manager look hanged. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15859	2023-10-30 20:15:05 +02:00
Kefu Chai	d01b9f95a0	build: cmake: disable sanitize-address-use-after-scope only when needed we enable sanitizer only in Debug and Sanitize build modes, if we pass `-fno-sanitize-address-use-after-scope` to compiler when the sanitizer is not enabled when compiling, Clang complains like: ``` clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument] ``` this breaks the build on the build modes where sanitizers are not enabled. so, in this change, we only disable the sanitize-address-use-after-scope sanitizer if the sanitizers are enabled. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15868	2023-10-30 20:14:12 +02:00
Anna Stuchlik	9f85b1dc38	doc: remove version "5.3" from the docs Version 5.3 was never released. This commit removes mentions of the version from the docs.	2023-10-30 15:56:53 +01:00
Anna Stuchlik	8723f71a3d	doc: remove the 5.2-to-5.3 upgrade guide Version 5.3 was never released, so the upgrade guide must be removed.	2023-10-30 15:47:23 +01:00
Marcin Maliszkiewicz	3992d1c2ce	alternator: add support for ReturnValuesOnConditionCheckFailure feature As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed. Fixes https://github.com/scylladb/scylladb/issues/14481	2023-10-30 15:33:56 +01:00
Marcin Maliszkiewicz	b4c77a373d	alternator: add ability to send additional fields in api_error While it may not be explicitly documented DynamoDB sometimes enchriches error message by additional fields. For instance when ConditionalCheckFailedException occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object, similarly for TransactionCanceledException it will add CancellationReasons object. There may be more cases like this so generic json field is added to our error class. The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure feature.	2023-10-30 15:13:06 +01:00
Calle Wilund	b9e57583f3	scylla-sstable: Use tool-global config + extensions Uses a single db::config + extensions, allowing both handling of enterprise-only scylla.yaml keys, as well as loading sstables utilizing extension in that universe.	2023-10-30 10:22:12 +00:00
Calle Wilund	6de4e7af21	tools: Add db config + extensions to tool app run Initializes extensions for tools runs, allowing potentially more interaction with, say, sstables in some versions of scylla.	2023-10-30 10:20:53 +00:00
Avi Kivity	d450a145ce	Revert "Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak" This reverts commit `4b80130b0b`, reversing changes made to `a5519c7c1f`. It's suspected of causing dtest failures due to a bug in coroutine::parallel_for_each.	2023-10-29 18:32:06 +02:00
Wojciech Mitros	f08e7aad61	test: account for multiple flushes of commitlog segments Currently, when we calculate the number of deactivated segments in test_commitlog_delete_when_over_disk_limit, we only count the segments that were active during the first flush. However, during the test, there may have been more than one flush, and a segment could have been created between them. This segment would sometimes get deactivated and even destroyed, and as a result, the count of destroyed segments would appear larger than the count of deactivated ones. This patch fixes this behavior by accounting for all segments that were active during any flush instead of just segments active during the first flush. Fixes #10527 Closes scylladb/scylladb#14610	2023-10-29 18:30:32 +02:00
Michał Chojnowski	93ea3d41d8	position_in_partition: make operator= exception-safe The copy assignment operator of _ck can throw after _type and _bound_weight have already been changed. This leaves position_in_partition in an inconsistent state, potentially leading to various weird symptoms. The problem was witnessed by test_exception_safety_of_reads. Specifically: in cache_flat_mutation_reader::add_to_buffer, which requires the assignment to _lower_bound to be exception-safe. The easy fix is to perform the only potentially-throwing step first. Fixes #15822 Closes scylladb/scylladb#15864	2023-10-29 18:30:32 +02:00
Andrii Patsula	5807ef0bb7	test: Verify server exit code during graceful process shutdown. Currently, it's possible for a test to pass even if the server crashes during a graceful shutdown. Additionally, the server may crash in the middle of a test, resulting in a test failure with an inaccurate description. This commit updates the test framework to monitor the server's return code and throw an exception in the event of an abnormal server shutdown. Fixes scylladb/scylla#15365 Closes scylladb/scylladb#15660	2023-10-29 18:30:32 +02:00
Kefu Chai	2be5a86a14	test/pylib: unset the env variables set by MinIoServer before this change, when running object_store tests with `pytest` directly, an instance of MinIoServer is started as a function-scope fixture, but the environmental variables set by it stay with the process, even after the fixture is teared down. So, when the 2nd test in the same process check these environmental variables, it would under the impression that there is already a S3 server running, and thinks it is drived by `test.py`, hence try to reuse the S3 server. But the MinIoServer instance is teared down at that moment, when the first test is completed. So the test is likely to fail when the Scylla instance tries to read the missing conf file previously created by the MinIoServer. after this change, the environmental variables are reset, so they won't be seen by the succeeding tests in the same pytest session. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15779	2023-10-29 18:30:32 +02:00
Botond Dénes	132ae92c75	Merge 'build: extract code fragments into functions' from Kefu Chai this series is one of the steps to remove global statements in `configure.py`. not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system. Refs #15379 Closes scylladb/scylladb#15818 * github.com:scylladb/scylladb: build: move the code with side effects into a single function build: create outdir when outdir is explictly used build: group the code with side effects together build: do not rely on updating global with a dict build: extract generate_version() out build: extract get_release_cxxflags() out build: extract get_extra_cxxflags() out build: move thrift_libs to where it is used build: move pkg closer to where it is used build: remove unused variable build: move variable closer to where it is used	2023-10-29 18:30:32 +02:00
Avi Kivity	e349a2657c	Merge 'Allow running perf-simple-query with tablets' from Tomasz Grabiec Usage: ``` build/dev/scylla perf-simple-query --tablets ``` Closes scylladb/scylladb#15656 * github.com:scylladb/scylladb: perf_simple_query: Allow running with tablets tests: cql_test_env: Allow creating keyspace with tablets tests: cql_test_env: Register storage_service in migration notifier test: cql_test_env: Initialize node state in topology	2023-10-29 18:30:32 +02:00
Aleksandr Bykov	6b991b4791	doc: add note about run test.py with toolchain/dbuild test.py tests could be run with toolchain/dbuild and in this case there is no need to executed ./install-dependicies.sh. Closes scylladb/scylladb#15837	2023-10-29 18:30:32 +02:00
Kefu Chai	3a6e359328	build: cmake: add token_metadata.cc to api `token_metadata.cc` moved into api in `e4c0a4d34d`, let's update CMake accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15857	2023-10-29 18:30:32 +02:00
Kefu Chai	8819865c8d	build: cmake: correct the variable names in mode.Dev.cmake it was a copy-pasta error. - s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/ - s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15849	2023-10-29 18:30:32 +02:00
Kamil Braun	1c0ae2e7ef	Merge 'raft topology: assign tokens after join node response rpc' from Piotr Dulikowski Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier. However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet. Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier. Refs: scylladb/scylladb#15686 Fixes: scylladb/scylladb#15738 Closes scylladb/scylladb#15724 * github.com:scylladb/scylladb: test: test_topology_ops: continuously write during the test raft topology: assign tokens after join node response rpc storage_service: fix indentation after previous commit raft topology: loosen assumptions about transition nodes having tokens	2023-10-29 18:30:32 +02:00
Marcin Maliszkiewicz	020a9c931b	db: view: run local materialized view mutations on a separate smp service group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) not committed framework for easy alternator performance testing: https://github.com/scylladb/scylladb/pull/13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \ --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \ --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads. Fixes https://github.com/scylladb/scylladb/issues/15844 Closes scylladb/scylladb#15845	2023-10-29 18:30:32 +02:00
Patryk Jędrzejczak	a6236072ee	raft topology: join_node_request_handler: wait until first node becomes normal We need to wait until the first node becomes normal in `join_node_request_handler` to ensure that joining nodes are not handled as the first node in the cluster. If we placed a join request before the first node becomes normal, the topology coordinator would incorrectly skip the join node handshake in `handle_node_transition` (`case node_state::none`). It would happen because the topology coordinator decides whether a node is the first in the cluster by checking if there are no normal nodes. Therefore, we must ensure at least one normal node when the topology coordinator handles a join request for a non-first node. We change the previous check because it can return true if there are no normal nodes. `topology::is_empty` would also return false if the first node was still new or in transition. Additionally, calling `join_node_request_handler` before the first node sets itself as normal is frequent during concurrent bootstrap, so we remove "unlikely" from the comment. Fixes: scylladb/scylladb#15807 Closes scylladb/scylladb#15775	2023-10-29 18:30:32 +02:00
Botond Dénes	16ce212c31	tools/scylla-nodetool: implement the listsnapshots command The output is changed slightly, compared to the current nodetool: * Number columns are aligned to the right * Number columns don't have decimal places * There are no trailing whitespaces	2023-10-27 01:26:54 -04:00
Botond Dénes	27854a50be	tools/scylla-nodetool: implement clearsnapshot command	2023-10-27 01:26:54 -04:00
Botond Dénes	b32ee54ba0	tools/scylla-nodetool: implement the cleanup command The --jobs command-line argument is accepted but ignored, just like the current nodetool does.	2023-10-27 01:26:53 -04:00
Botond Dénes	7e3a78d73d	test/nodetool: rest_api_mock: add more options for multiple requests Change the current bool multiple param to a weak enum, allowing for a third value: ANY, which allows for 0 matches too.	2023-10-26 08:31:12 -04:00
Botond Dénes	b878dcc1c3	tools/scylla-nodetool: log responses with trace level With this, both requests and responses to/from the remote are logged when trace-level logging is enabled. This should greatly simplify debugging any problems.	2023-10-26 08:28:37 -04:00
Anna Stuchlik	eb57c3bc22	doc: remove versions from Materialized Views This commit removes irrelevant information about versions from the Materialized Views page (CQL Reference). In addition, it replaces "Scylla" with "ScyllaDB" on MV-related pages.	2023-10-26 12:08:13 +02:00
Anna Stuchlik	29bd044db3	doc: add CQL Reference for Materialized Views This commit adds CQL Reference for Materialized Views to the Materialized Views page.	2023-10-26 11:47:22 +02:00
Kefu Chai	227136ddf5	main.cc: specify shortname for scheduling groups so, for instance, the logging message looks like: ``` INFO 2023-10-24 15:19:37,290 [shard 0:strm] storage_service - entering STARTING mode ``` instead of ``` INFO 2023-10-24 15:19:37,290 [shard 0:stre] storage_service - entering STARTING mode ``` Fixes #15267 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15821	2023-10-26 10:52:05 +03:00
Kefu Chai	d43afd576e	cql3/restrictions/statement_restrictions: s/allow filtering/ALLOW FILTERING/ use the captalized "ALLOW FILTERING" in the error message, because the error message is a part of the user interface, it would be better to keep it aligned with our document, where "ALLOW FILTERING" is used. so, in this change, the lower-cased "allow filtering" error message is changed to "ALLOW FILTERING", and the tests are updated accordingly. see also `a0ffbf3291` Refs #14321 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15718	2023-10-26 10:00:37 +03:00
Kefu Chai	bfd99fad7f	build: move the code with side effects into a single function so that we can optionally utilize CMake for generating the building system instead. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	85cc9073c9	build: create outdir when outdir is explictly used actually we've created outdir when using it as the parent directory of `tempfile.tempdir`, but there are many places where we use `tempfile.tempdir` for, for instance, testing the compiler flags, and these tests will be removed once we migrate to CMake, so they do not really matter when reviewing the change which migrates to CMake. the point of this change is to help the review understand the major changes performed by the migration. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	6c7cc927b5	build: group the code with side effects together so we can move them into a single function Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	a375ce2ac1	build: do not rely on updating global with a dict we use `globals().update(vars(args))` for updating the global variables with a dict in `args`, this is convenient, but it hurts the readability. let's reference the parsed options explicitly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	a25a153e9f	build: extract generate_version() out so we don't do less things with side effects in the global scope. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	cb6531b1a8	build: extract get_release_cxxflags() out prepare for the change to read the SCYLLA-*-FILE in functions not doing this in global scope. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	ec7ac3c750	build: extract get_extra_cxxflags() out on top of per-mode cxxflags, we apply more of them based on settings and building environment. to reduce the statements in global scope, let's extract the related code into a function. Refs #15379 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 12:58:19 +08:00
Kefu Chai	8646e6c5d1	build: move thrift_libs to where it is used for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:38 +08:00
Kefu Chai	8b76f2a835	build: move pkg closer to where it is used for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:37 +08:00
Kefu Chai	ea6bf6b908	build: remove unused variable `optional_packages` was introduced in `8b0a26f06d`, but we don't offer the alternative versions of libsystemd anymore, and this variable is not used in `configure.py`, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:37 +08:00
Kefu Chai	846218a8bc	build: move variable closer to where it is used for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-10-26 11:47:37 +08:00
Yaniv Kaul	600822379d	Docs: small typo in cql extensions page Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#15840	2023-10-25 17:27:04 +03:00
Botond Dénes	5d1e9d8c46	Merge 'Sanitize API -> token_metadata dependency' from Pavel Emelyanov This is the continuation for `19fc01be23` Registering API handlers for services need to * use only the required service (sharded<> one if needed) * get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything) There are several endpoints scattered over storage_service and snitch that use token metadata and topology. This PR makes those endpoints work the described way and drop the api::ctx -> token_metadata dependency. Closes scylladb/scylladb#15831 * github.com:scylladb/scylladb: api: Remove http::context -> token_metadata dependency api: Pass shared_token_metadata instead of storage_service api: Move snitch endpoints that use token metadata only api: Move storage_service endpoints that use token metadata only	2023-10-25 17:19:39 +03:00
Anna Stuchlik	ad29ba4cad	doc: add info about encrypted tables to Backup This commit updates the introduction of the Backup Your Data page to include information about encryption. Fixes https://github.com/scylladb/scylladb/issues/15573 Closes scylladb/scylladb#15612	2023-10-25 17:15:15 +03:00
Avi Kivity	782c6a208a	Merge 'cql3: mutation_fragments_select_statement: keep erm alive for duration of the query' from Botond Dénes Said statement keeps a reference to erm indirectly, via a topology node pointer, but doesn't keep erm alive. This can result in use-after-free. Furthermore, it allows for vnodes being pulled from under the query's feet, as it is running. To prevent this, keep the erm alive for the duration of the query. Also, use `host_id` instead of `node`, the node pointer is not needed really, as the statement only uses the host id from it. Fixes: #15802 Closes scylladb/scylladb#15808 * github.com:scylladb/scylladb: cql3: mutation_fragments_select_statement: use host_id instead of node cql3: mutation_fragments_select_statement: pin erm reference	2023-10-25 15:03:07 +03:00
Gleb Natapov	9f6e93c144	raft: make sure that all operation forwarded to a leader are completed before destroying raft server Hold a gate around all operations that are forwarded to a leader to be able to wait for them during server::abort() otherwise the abort() may complete while those operations are still running which may cause use after free.	2023-10-25 13:29:36 +03:00
Gleb Natapov	ba044b769a	storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier global_token_metadata_barrier and global_tablet_token_metadata_barrier are doing practically the same thing now. Call the former from the later.	2023-10-25 13:29:36 +03:00
Gleb Natapov	72419f1a61	tests: add tests for streaming failure in bootstrap/replace/remove/decomission	2023-10-25 13:29:36 +03:00
Gleb Natapov	b072ddd8a7	test/pylib: do not stop node if decommission failed with an expected error	2023-10-25 13:03:57 +03:00
Gleb Natapov	cee7aab32c	storage_service: raft topology: fix typo in "decommission" everywhere	2023-10-25 13:03:57 +03:00
Gleb Natapov	0201304096	storage_service: raft topology: add streaming error injection Add error injection into the stream_ranges topology command.	2023-10-25 13:03:57 +03:00
Gleb Natapov	ba217d9341	storage_service: raft topology: do not increase topology version during CDC repair CDC repair operation does not change the topology, but it goes through the same state as bootstrap that does. Distinguish between two cases and increment the topology version only in the case of the bootstrap.	2023-10-25 13:03:56 +03:00
Gleb Natapov	8e393ea750	storage_service: raft topology: rollback topology operation on streaming failure. Currently if a streaming fails during a topology operation the streaming is retried until is succeeds. If it will never succeed it will be retried forever. There is no way to stop the topology operation. This patch introduce the rollback mechanism on streaming failure. If streaming fails during bootstrap/replace the bootstrapping/replacing node is moved to the left_token_ring state (and then left state) and the operation has to be restarted after removing data directory. If streaming fails during decommission/remove the node is moved back to normal and the operation need to be restarted after the failure reason is eliminated.	2023-10-25 13:03:55 +03:00
Gleb Natapov	0a8c3e5c78	storage_service: raft topology: load request parameters in left_token_ring state as well Next patch will want to access request parameters in left_token_ring for failure recovery purposes.	2023-10-25 12:56:27 +03:00
Gleb Natapov	49b6153d27	storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error Term change is not an error. Do not report it as such.	2023-10-25 12:56:27 +03:00
Gleb Natapov	5b760572df	storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch Currently we get a future and check if it is failed, but with co-routines the complication is not needed. And since we want to filer out some errors in the next patch with try/catch it will be more effective.	2023-10-25 12:56:27 +03:00
Gleb Natapov	466fe35474	storage_service: raft topology: make global_token_metadata_barrier node independent We want to use global_token_metadata_barrier without the node, so make it accept guard and excluded nodes directly.	2023-10-25 12:56:26 +03:00
Gleb Natapov	a49ae3ff87	storage_service: raft topology: split get_excluded_nodes from exec_global_command Will be used later.	2023-10-25 12:56:26 +03:00
Gleb Natapov	897a7e599a	storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true	2023-10-25 12:56:26 +03:00
Gleb Natapov	7f1aa41e86	storage_service: raft topology: simplify streaming RPC failure handling Currently streaming failure handling is different for "removing" and all other operations. Unify them in one try/catch.	2023-10-25 12:56:26 +03:00
Piotr Dulikowski	a3ba4b3109	test: test_topology_ops: continuously write during the test In order to detect issues where requests are routed incorrectly during topology changes, modify the test_topology_ops test so that it runs a background process that continuously writes while the test performs topology changes in the cluster. At the end of the test check whether: - All writes were successful (we only require CL=LOCAL_ONE) - Whether there are any errors from the replica side logic in the nodes' logs (which happen e.g. when node receives writes before learning about the schema)	2023-10-25 11:50:17 +02:00
Piotr Dulikowski	63aa9332aa	raft topology: assign tokens after join node response rpc Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier. However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet. Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.	2023-10-25 11:50:17 +02:00
Piotr Dulikowski	46fce4cff3	storage_service: fix indentation after previous commit	2023-10-25 11:50:17 +02:00
Piotr Dulikowski	2d161676c7	raft topology: loosen assumptions about transition nodes having tokens In later commits, tokens for a joining/replacing node will not be inserted when the node enters `bootstrapping`/`replacing` state but at some later step of the procedure. Loosen some of the assumptions in `storage_service::topology_state_load` and `system_keyspace::load_topology_state` appropriately.	2023-10-25 11:50:17 +02:00
Anna Stuchlik	e223624e2e	doc: fix the Reference page layout This commit fixes the layout of the Reference page. Previously, the toctree level was "2", which made the page hard to navigate. This PR changes the level to "1". In addition, the capitalization of page titles is fixed. This is a follow-up PR to the ones that created and updated the Reference section. It must be backported to branch-5.4. Closes scylladb/scylladb#15830	2023-10-25 12:15:27 +03:00
Botond Dénes	ceb866fa2e	Merge 'Make s3 upload sink PUT small objects' from Pavel Emelyanov When upload-sink is flushed, it may notice that the upload had not yet been started and fall-back to plain PUT in that case. This will make small files uploading much nicer, because multipart upload would take 3 API calls (start, part, complete) in this case fixes: #13014 Closes scylladb/scylladb#15824 * github.com:scylladb/scylladb: test: Add s3_client test for upload PUT fallback s3/client: Add PUT fallback to upload sink	2023-10-25 10:03:46 +03:00
Pavel Emelyanov	cb63d303f0	test: Make test_sstables_excluding_staging_correctness run over s3 too This test checks the way sstable is moved and lives in staging state. Now it passes on S3 as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	d827068d01	sstables,s3: Support state change (without generation change) Now when the system.sstables has the state field, it can be changed (UPDATEd). However, when changing the state AND generation, this still won't work, because generation is the clustering key of the table in question and cannot be just changed. This, nonetheless, is OK, as generation changes with state only when moving an sstable from upload dir into normal/staging and this is separate issue for S3 (#13018). For now changing state only is OK. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	ca5d3d217f	system_keyspace: Add state field to system.sstables The state is one of <empty>(normal)/staging/quarantine. Currently when sstable is moved to non-normal state the s3 backend state_change() call throws thus such sstables do not appear. Next patches are going to change that and the new field in the system.sstables is needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	295936c1d3	sstable_directory: Tune up sstables entries processing comment In fact, this FIXME had been fixed by `2c9ec6bc` (sstable_directory: Garbage collect S3 sstables on reboot) and is no longer valid. However, it's still good to know if GC failed or misbehaved, so replace the comment with a warning. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	e4162227ff	system_keyspace: Tune up status change trace message There will appear very similar one tracing the state change, so it's good to tell them from one another. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	63758d19ce	sstables: Add state string to state enum class convert There's the backward converter already out there. Next code will need to convert string representation of the state back to the internal type. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 19:12:37 +03:00
Pavel Emelyanov	8e1ff745fa	api: Remove http::context -> token_metadata dependency Now the token metadata usage is fine grained by the relevant endpoint handlers only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:49:05 +03:00
Pavel Emelyanov	be9ea0c647	api: Pass shared_token_metadata instead of storage_service The token metadata endpoints need token metadata, not storage service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:48:27 +03:00
Pavel Emelyanov	c23193bed0	api: Move snitch endpoints that use token metadata only Snitch is now a service can speaks for the local node only. In order to get dc/rack for peers in the cluster one need to use topology which, in turn, lives on token metadata. This patch moves the dc/rack getters to api/token_metadata.cc next to other t.m. related endpoints. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:47:18 +03:00
Pavel Emelyanov	e4c0a4d34d	api: Move storage_service endpoints that use token metadata only There are few of them that don't need the storage service for anything but get token metadata from. Move them to own .cc/.hh units. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 17:44:53 +03:00
Pavel Emelyanov	caa3e751f7	test: Add s3_client test for upload PUT fallback The test case creates non-jumbo upload simk and puts some bytes into it, then flushes. In order to make sure the fallback did took place the multipar memory tracker sempahore is broken in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 15:03:53 +03:00
Kamil Braun	db49ccccb0	view: remove unused `_backing_secondary_index` This boolean was only used for a sanity check which was replaced with a stronger sanity check in the previous commit that doesn't require the boolean.	2023-10-24 13:33:36 +02:00
Kamil Braun	3976808b12	schema_tables: turn view schema fixing code into a sanity check The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal with legacy materialized view schemas used for secondary indexes, schemas which were created before the notion of "computed columns" was introduced. Back then, secondary index schemas would use a regular "token" column. Later it became a computed column and old schemas would be migrated during rolling upgrade. The migration code was introduced in 2019 (`db8d4a0cc6`) and then fixed in 2020 (`d473bc9b06`). The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming that users don't try crazy things like upgrading from 2021.X to 2023.X (which we do not support), all clusters will have already executed the migration code once they upgrade to 2023.X, meaning we can get rid of it. The main motivation of this patch is to get rid of the `db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft mode this was the only call to `merge_schema` outside "group 0 code" and in fact it is unsafe -- it uses locally generated mutations with locally generated timestamp (`api::new_timestamp()`), so if we actually did it, we would permanently diverge the group 0 state machine across nodes (the schema pulling code is disabled in Raft mode). Fortunately, this should be dead code by now, as explained in the previous paragraph. The migration code is now turned into a sanity check, if the users try something crazy, they will get an error instead of silent data corruption.	2023-10-24 13:33:35 +02:00
Kamil Braun	f02ac9a9e7	schema_tables: make comment more precise `maybe_fix_legacy_secondary_index_mv_schema` function has this piece of code: ``` // If the first clustering key part of a view is a column with name not found in base schema, // it implies it might be backing an index created before computed columns were introduced, // and as such it must be recreated properly. if (!base_schema->columns_by_name().contains(first_view_ck.name())) { schema_builder builder{schema_ptr(v)}; builder.mark_column_computed(first_view_ck.name(), std::make_unique<legacy_token_column_computation>()); if (preserve_version) { builder.with_version(v->version()); } return view_ptr(builder.build()); } ``` The comment uses the phrase "it might be". However, the code inside the `if` assumes that it "must be": once we determined that the first column in this materialized view does not have a corresponding name in the base table, we set it to be computed using `legacy_token_column_computation`, so we assumed that the column was indeed storing the token. Doing that for a column which is not the token column would be a small disaster. Assuming that the code is correct, we can make the comment more precise. I checked the documentation and I don't see any other way how we could have such a column other than the token column which is internally created by Scylla when creating a secondary index (for example, it is forbidden to use an alias in select statement when creating materialized views, which I checked experimentally).	2023-10-24 13:30:13 +02:00
Kamil Braun	5397524875	feature_service: make COMPUTED_COLUMNS feature unconditionally true The feature is assumed to be true, it was introduced in 2019. It's still advertised in gossip, but it's assumed to always be present. The `schema_feature` enum class still contains `COMPUTED_COLUMNS`, and the `all_tables` function in schema_tables.cc still checks for the schema feature when deciding if `computed_columns()` table should be included. This is necessary because digest calculation tests contain many digests calculated with the feature disabled, if we wanted to make it unconditional in the schema_tables code we'd have to regenerate almost all digests in the tests. It is simpler to leave the possibility for the tests to disable the feature.	2023-10-24 13:30:13 +02:00
Pavel Emelyanov	63f2bdca01	s3/client: Add PUT fallback to upload sink When the non-jumbo sink is flushed and notices that the real upload is not started yet, it may just go ahead and PUT the buffers into the object with the single request. For jumbo sink the fallback is not implemented as it likely doesn't make and any sense -- jumbo sinks are unlikely to produce less than 5Mb of data so it's going to be dead code anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-24 10:59:46 +03:00
Botond Dénes	23898581d5	cql3: mutation_fragments_select_statement: use host_id instead of node The statement only uses the node to get its host_id later. Simpler to obtain and store only the host_id int he first place.	2023-10-24 03:12:58 -04:00
Botond Dénes	3cb1669340	cql3: mutation_fragments_select_statement: pin erm reference This query bypasses the usual read-path in storage-proxy and therefore also misses the erm pinning done by storage-proxy. To avoid a vnode being pulled from under its feet, do the erm pinning in the statement itself.	2023-10-24 03:12:36 -04:00
Botond Dénes	8180f61147	test/boost/multishard_mutation_query_test: fix querier cache misses expectations There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many shards we have without readers on them.	2023-10-23 08:07:14 -04:00
Botond Dénes	0a34f29ea5	test/lib/test_utils: add require_* variants for all comparators Not just equal. This allows for better error messages, printing both values and the failed relation operator, instead of a generic fail message.	2023-10-23 07:52:38 -04:00
Botond Dénes	f811a63e1b	docs: nodetool compact: remove common arguments These are already documented in the nodetool index page. The list in the nodetool index page is less informative, so copy the list from nodetool compact over there.	2023-10-20 10:16:42 -04:00
Botond Dénes	397f67990f	docs: nodetool stop: fix compaction types and examples Nodetool doesn't recognize RESHARD, even though ScyllaDB supports stopping RESHARD compaction. Remove VALIDATE from the list - ScyllaDB doesn't support it. Add a note about the unimplemented --id option. Fix the examples, they are broken. Fix the entry in the nodetool command list, the command is called `stop`, not `stop compaction`.	2023-10-20 10:15:47 -04:00
Botond Dénes	70ba6b94c3	docs: nodetool compact: remove unsupported partition option This option is not supported by either the nodetool frontend, nor ScyllaDB itself. Remove it. Also improve the wording on the unsupported options.	2023-10-20 10:15:44 -04:00
Aleksandra Martyniuk	56221f2161	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk	520d9db92d	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk	b91064bd2a	tasks: fail if a task was aborted run() method of task_manager::task::impl does not have to throw when a task is aborted with task manager api. Thus, a user will see that the task finished successfully which makes it inconsistent. Finish a task with a failure if it was aborted with task manager api.	2023-10-19 10:47:20 +02:00
Aleksandra Martyniuk	0681795417	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-10-19 10:47:17 +02:00
Tomasz Grabiec	7862ffbd14	perf_simple_query: Allow running with tablets	2023-10-06 23:49:15 +02:00
Tomasz Grabiec	0edb39715d	tests: cql_test_env: Allow creating keyspace with tablets	2023-10-06 23:49:15 +02:00
Tomasz Grabiec	0ff10c72de	tests: cql_test_env: Register storage_service in migration notifier The procedure in main already does this. Processing of tablet metadata on schema changes relies on this. Without this, creating a tablet-based table will fail on missing tablet map in token metadata because the listener in storage service does not fire.	2023-10-06 23:49:15 +02:00
Tomasz Grabiec	3c0d723ad4	test: cql_test_env: Initialize node state in topology This is necessary for using tablets with cql_test_env in tools like perf-simple-query. Otherwise, the test will fail with: Shard count not known for node c06a7e7f-ee6c-44e5-9257-09cdc5b2bb10 The existing tablets_test works because it creates its own topology bypassing the one in storage_service.	2023-10-06 23:49:15 +02:00
Benny Halevy	bec489409e	row_cache: abort on exteral_updater::execute errors Currently the cache updaters aren't exception safe yet they are intended to be. Instead of allowing exceptions from `external_updater::execute` escape `row_cache::update`, abort using `on_fatal_internal_error`. Future changes should harden all `execute` implementations to effectively make them `noexcept`, then the pure virtual definition can be made `noexcept` to cement that. Fixes scylladb/scylladb#15576 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-28 09:11:04 +03:00
Benny Halevy	80bba3d4b7	row_cache: do_update: simplify _prev_snapshot_pos setup ring_position::min() is noexcept since `6d7ae4ead1` So no need to call it outside of the critical noexcept block. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-28 08:21:30 +03:00
Takuya ASADA	ea61b14f27	scylla_swap_setup: use fallocate on ext4 We stop using fallocate for allocating swap since it does not work on xfs (#6650). However, dd is much slower than fallocate since it filling data on the file, let's use fallocate when filesystem is ext4 since it actually works and faster. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2023-02-01 01:58:13 +09:00
Takuya ASADA	dffadabb94	scylla_swap_setup: run error check before allocating swap We should run error check before running dd, otherwise it will left swapfile on disk without completing swap setup. Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2023-02-01 01:58:13 +09:00

1678 changed files with 66948 additions and 25912 deletions

36

.github/CODEOWNERS vendored

View File

@@ -19,13 +19,13 @@ db/batch* @elcallio
 service/storage_proxy* @gleb-cloudius
 # COMPACTION
 compaction/* @raphaelsc @nyh
 compaction/* @raphaelsc
 # CQL TRANSPORT LAYER
 transport/*
 # CQL QUERY LANGUAGE
 cql3/* @tgrabiec @cvybhu @nyh
 cql3/* @tgrabiec
 # COUNTERS
 counters* @jul-stas
@@ -45,44 +45,44 @@ dist/docker/*
 utils/logalloc* @tgrabiec
 # MATERIALIZED VIEWS
 db/view/* @nyh @cvybhu @piodul
 cql3/statements/*view* @nyh @cvybhu @piodul
 test/boost/view_* @nyh @cvybhu @piodul
 db/view/* @nyh @piodul
 cql3/statements/*view* @nyh @piodul
 test/boost/view_* @nyh @piodul
 # PACKAGING
 dist/* @syuu1228
 # REPAIR
 repair/* @tgrabiec @asias @nyh
 repair/* @tgrabiec @asias
 # SCHEMA MANAGEMENT
 db/schema_tables* @tgrabiec @nyh
 db/legacy_schema_migrator* @tgrabiec @nyh
 service/migration* @tgrabiec @nyh
 schema* @tgrabiec @nyh
 db/schema_tables* @tgrabiec
 db/legacy_schema_migrator* @tgrabiec
 service/migration* @tgrabiec
 schema* @tgrabiec
 # SECONDARY INDEXES
 index/* @nyh @cvybhu @piodul
 cql3/statements/*index* @nyh @cvybhu @piodul
 test/boost/*index* @nyh @cvybhu @piodul
 index/* @nyh @piodul
 cql3/statements/*index* @nyh @piodul
 test/boost/*index* @nyh @piodul
 # SSTABLES
 sstables/* @tgrabiec @raphaelsc @nyh
 sstables/* @tgrabiec @raphaelsc
 # STREAMING
 streaming/* @tgrabiec @asias
 service/storage_service.* @tgrabiec @asias
 # ALTERNATOR
 alternator/* @nyh @havaker @nuivall
 test/alternator/* @nyh @havaker @nuivall
 alternator/* @havaker @nuivall
 test/alternator/* @havaker @nuivall
 # HINTED HANDOFF
 db/hints/* @piodul @vladzcloudius @eliransin
 # REDIS
 redis/* @nyh @syuu1228
 test/redis/* @nyh @syuu1228
 redis/* @syuu1228
 test/redis/* @syuu1228
 # READERS
 reader_* @denesb

									
										67

.github/mergify.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,67 @@

				pull_request_rules:

				  - name: put PR in draft if conflicts

				    conditions:

				      - label = conflicts

				      - author = mergify[bot]

				      - head ~= ^mergify/

				    actions:

				      edit:

				         draft: true

				  - name: Delete mergify backport branch

				    conditions:

				      - base~=branch-

				      - or:

				        - merged

				        - closed

				    actions:

				      delete_head_branch:

				  - name: Automate backport pull request 5.2

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/5.2 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 5.2] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				           Refs #{{number}}

				        branches:

				          - branch-5.2

				        assignees:

				          - "{{ author }}"

				  - name: Automate backport pull request 5.4

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/5.4 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 5.4] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				          Refs #{{number}}

				        branches:

				          - branch-5.4

				        assignees:

				          - "{{ author }}"

									
										59

.github/scripts/label_promoted_commits.py
									
										vendored
									
										Executable file
									
												View File
												
				@@ -0,0 +1,59 @@

				import requests

				from github import Github

				import argparse

				import re

				import sys

				import os

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				except KeyError:

				    print("Please set the 'GITHUB_TOKEN' environment variable")

				    sys.exit(1)

				def parser():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--repository', type=str, required=True,

				                        help='Github repository name (e.g., scylladb/scylladb)')

				    parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('

				                                                                               'newest commit).')

				    parser.add_argument('--commit_after_merge', type=str, required=True,

				                        help='Git commit ID to end labeling at (oldest '

				                             'commit, exclusive).')

				    parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '

				                                                                         'done')

				    parser.add_argument('--label', type=str, required=True, help='Label to use')

				    return parser.parse_args()

				def main():

				    args = parser()

				    pr_pattern = re.compile(r'Closes .*#([0-9]+)')

				    g = Github(github_token)

				    repo = g.get_repo(args.repository, lazy=False)

				    commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)

				    # Print commit information

				    for commit in commits.commits:

				        print(commit.sha)

				        match = pr_pattern.search(commit.commit.message)

				        if match:

				            pr_number = match.group(1)

				            url = f'https://api.github.com/repos/{args.repository}/issues/{pr_number}/labels'

				            data = {

				                "labels": [f'{args.label}']

				            }

				            headers = {

				                "Authorization": f"token {github_token}",

				                "Accept": "application/vnd.github.v3+json"

				            }

				            response = requests.post(url, headers=headers, json=data)

				            if response.ok:

				                print(f"Label added successfully to {url}")

				            else:

				                print(f"No label was added to {url}")

				if __name__ == "__main__":

				    main()

									
										95

.github/scripts/sync_labels.py
									
										vendored
									
										Executable file
									
												View File
												
				@@ -0,0 +1,95 @@

				#!/usr/bin/env python3

				import argparse

				import os

				import sys

				from github import Github

				import re

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				except KeyError:

				    print("Please set the 'GITHUB_TOKEN' environment variable")

				    sys.exit(1)

				def parser():

				    parse = argparse.ArgumentParser()

				    parse.add_argument('--repo', type=str, required=True, help='Github repository name (e.g., scylladb/scylladb)')

				    parse.add_argument('--number', type=int, required=True, help='Pull request or issue number to sync labels from')

				    parse.add_argument('--label', type=str, default=None, help='Label to add/remove from an issue or PR')

				    parse.add_argument('--is_issue', action='store_true', help='Determined if label change is in Issue or not')

				    parse.add_argument('--action', type=str, choices=['opened', 'labeled', 'unlabeled'], required=True, help='Sync labels action')

				    return parse.parse_args()

				def copy_labels_from_linked_issues(repo, pr_number):

				    pr = repo.get_pull(pr_number)

				    if pr.body:

				        linked_issue_numbers = set(re.findall(r'Fixes:? (?:#|https.*?/issues/)(\d+)', pr.body))

				        for issue_number in linked_issue_numbers:

				            try:

				                issue = repo.get_issue(int(issue_number))

				                for label in issue.labels:

				                    pr.add_to_labels(label.name)

				                print(f"Labels from issue #{issue_number} copied to PR #{pr_number}")

				            except Exception as e:

				                print(f"Error processing issue #{issue_number}: {e}")

				def get_linked_pr_from_issue_number(repo, number):

				    linked_prs = []

				    for pr in repo.get_pulls(state='all', base='master'):

				        if pr.body and f'{number}' in pr.body:

				            linked_prs.append(pr.number)

				            break

				        else:

				            continue

				    return linked_prs

				def get_linked_issues_based_on_pr_body(repo, number):

				    pr = repo.get_pull(number)

				    repo_name = repo.full_name

				    pattern = rf"(?:fix(?:|es|ed)|resolve(?:|d|s))\s*:?\s*(?:(?:(?:{repo_name})?#)|https://github\.com/{repo_name}/issues/)(\d+)"

				    issue_number_from_pr_body = []

				    if pr.body is None:

				        return issue_number_from_pr_body

				    matches = re.findall(pattern, pr.body, re.IGNORECASE)

				    if matches:

				        for match in matches:

				            issue_number_from_pr_body.append(match)

				            print(f"Found issue number: {match}")

				    return issue_number_from_pr_body

				def sync_labels(repo, number, label, action, is_issue=False):

				    if is_issue:

				        linked_prs_or_issues = get_linked_pr_from_issue_number(repo, number)

				    else:

				        linked_prs_or_issues = get_linked_issues_based_on_pr_body(repo, number)

				    for pr_or_issue_number in linked_prs_or_issues:

				        if is_issue:

				            target = repo.get_issue(pr_or_issue_number)

				        else:

				            target = repo.get_issue(int(pr_or_issue_number))

				        if action == 'labeled':

				            target.add_to_labels(label)

				            print(f"Label '{label}' successfully added.")

				        elif action == 'unlabeled':

				            target.remove_from_labels(label)

				            print(f"Label '{label}' successfully removed.")

				        elif action == 'opened':

				            copy_labels_from_linked_issues(repo, number)

				        else:

				            print("Invalid action. Use 'labeled', 'unlabeled' or 'opened'.")

				def main():

				    args = parser()

				    github = Github(github_token)

				    repo = github.get_repo(args.repo)

				    sync_labels(repo, args.number, args.label, args.action, args.is_issue)

				if __name__ == "__main__":

				    main()

									
										26

.github/workflows/add-label-when-promoted.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,26 @@

				name: Check if commits are promoted

				on:

				  push:

				    branches:

				      - master

				jobs:

				  check-commit:

				    runs-on: ubuntu-latest

				    permissions:

				      pull-requests: write

				      issues: write

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				          fetch-depth: 0  # Fetch all history for all tags and branches

				      - name: Install dependencies

				        run: sudo apt-get install -y python3-github

				      - name: Run python script

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --label promoted-to-master

									
										26

.github/workflows/backport-pr-fixes-validation.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,26 @@

				name: Fixes validation for backport PR

				on:

				  pull_request:

				    types: [opened, reopened, edited]

				    branches: [branch-*]

				jobs:

				  check-fixes-prefix:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Check PR body for "Fixes" prefix patterns

				        uses: actions/github-script@v7

				        with:

				          script: |

				            const body = context.payload.pull_request.body;

				            const repo = context.payload.repository.full_name;

				            // Regular expression pattern to check for "Fixes" prefix

				            // Adjusted to dynamically insert the repository full name

				            const pattern = `Fixes:? (?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)`;

				            const regex = new RegExp(pattern);

				            if (!regex.test(body)) {

				              core.setFailed("PR body does not contain a valid 'Fixes' reference.");

				            }

									
										17

.github/workflows/codespell.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,17 @@

				name: codespell

				on:

				  pull_request:

				    branches:

				      - master

				permissions: {}

				jobs:

				  codespell:

				    name: Check for spelling errors

				    runs-on: ubuntu-latest

				    steps:

				      - uses: actions/checkout@v4

				      - uses: codespell-project/actions-codespell@master

				        with:

				          only_warn: 1

				          ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison"

				          skip: "./.git,./build,./tools,*.js,*.thrift,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

									
										17

.github/workflows/docs-amplify-enhanced.yaml
									
										vendored
									
												View File
											
				@@ -1,17 +0,0 @@

				name: "Docs / Amplify enhanced"

				on: issue_comment

				jobs:

				  build:

				    runs-on: ubuntu-latest

				    if: ${{ github.event.issue.pull_request }}

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        with:

				          fetch-depth: 0

				      - name: Amplify enhanced

				        env:

				          TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        uses: scylladb/sphinx-scylladb-theme/.github/actions/amplify-enhanced@master

									
										9

.github/workflows/docs-pages.yaml
									
										vendored
									
												View File
												
				@@ -4,12 +4,14 @@ name: "Docs / Publish"

				env:

				  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

				  DEFAULT_BRANCH: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'master' }}

				on:

				  push:

				    branches:

				      - 'master'

				      - 'enterprise'

				      - 'branch-**'

				    paths:

				      - "docs/**"

				  workflow_dispatch:

				@@ -19,14 +21,15 @@ jobs:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        uses: actions/checkout@v4

				        with:

				          ref: ${{ env.DEFAULT_BRANCH }}

				          persist-credentials: false

				          fetch-depth: 0

				      - name: Set up Python

				        uses: actions/setup-python@v3

				        uses: actions/setup-python@v5

				        with:

				          python-version: 3.7

				          python-version: "3.10"

				      - name: Set up env

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

									
										6

.github/workflows/docs-pr.yaml
									
										vendored
									
												View File
												
				@@ -18,14 +18,14 @@ jobs:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout

				        uses: actions/checkout@v3

				        uses: actions/checkout@v4

				        with:

				          persist-credentials: false

				          fetch-depth: 0

				      - name: Set up Python

				        uses: actions/setup-python@v3

				        uses: actions/setup-python@v5

				        with:

				          python-version: 3.7

				          python-version: "3.10"

				      - name: Set up env

				        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

				      - name: Build docs

									
										22

.github/workflows/pr-require-backport-label.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,22 @@

				name: PR require backport label

				on:

				  pull_request:

				    types: [opened, labeled, unlabeled, synchronize]

				    branches:

				      - master

				      - next

				jobs:

				  label:

				    if: github.event.pull_request.draft == false

				    runs-on: ubuntu-latest

				    permissions:

				      issues: write

				      pull-requests: write

				    steps:

				      - uses: mheap/github-action-required-labels@v5

				        with:

				          mode: minimum

				          count: 1

				          labels: "backport/none\nbackport/\\d.\\d"

				          use_regex: true

				          add_comment: false

									
										45

.github/workflows/sync-labels.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,45 @@

				name: Sync labels

				on:

				  pull_request_target:

				    types: [opened, labeled, unlabeled]

				    branches: [master, next]

				  issues:

				    types: [labeled, unlabeled]

				jobs:

				  label-sync:

				    if: ${{ github.repository == 'scylladb/scylladb' }}

				    name:  Synchronize labels between PR and the issue(s) fixed by it

				    runs-on: ubuntu-latest

				    permissions:

				      pull-requests: write

				      issues: write

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				          sparse-checkout: |

				            .github/scripts/sync_labels.py

				          sparse-checkout-cone-mode: false

				      - name: Install dependencies

				        run: sudo apt-get install -y python3-github

				      - name: Pull request opened event

				        if: ${{ github.event.action == 'opened' }}

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}

				      - name: Pull request labeled or unlabeled event

				        if: github.event_name == 'pull_request' && startsWith(github.event.label.name, 'backport/')

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}

				      - name: Issue labeled or unlabeled event

				        if: github.event_name == 'issues' && startsWith(github.event.label.name, 'backport/')

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.issue.number }} --action ${{ github.event.action }} --is_issue --label ${{ github.event.label.name }}

									
										81

CMakeLists.txt
									
												View File
												
				@@ -10,22 +10,39 @@ list(APPEND CMAKE_MODULE_PATH

				# Set the possible values of build type for cmake-gui

				set(scylla_build_types

				    "Debug" "Release" "Dev" "Sanitize" "Coverage")

				set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				  ${scylla_build_types})

				if(NOT CMAKE_BUILD_TYPE)

				    set(CMAKE_BUILD_TYPE "Release" CACHE

				        STRING "Choose the type of build." FORCE)

				    message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'Release'")

				elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)

				    message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "

				        "Following types are supported: ${scylla_build_types}")

				endif()

				string(TOUPPER "${CMAKE_BUILD_TYPE}" build_mode)

				include(mode.${build_mode})

				    "Debug" "RelWithDebInfo" "Dev" "Sanitize" "Coverage")

				if(DEFINED CMAKE_BUILD_TYPE)

				    set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS

				     ${scylla_build_types})

				    if(NOT CMAKE_BUILD_TYPE)

				        set(CMAKE_BUILD_TYPE "RelWithDebInfo" CACHE

				            STRING "Choose the type of build." FORCE)

				        message(WARNING "CMAKE_BUILD_TYPE not specified, Using 'RelWithDebInfo'")

				    elseif(NOT CMAKE_BUILD_TYPE IN_LIST scylla_build_types)

				        message(FATAL_ERROR "Unknown CMAKE_BUILD_TYPE: ${CMAKE_BUILD_TYPE}. "

				            "Following types are supported: ${scylla_build_types}")

				    endif()

				endif(DEFINED CMAKE_BUILD_TYPE)

				include(mode.common)

				if(CMAKE_CONFIGURATION_TYPES)

				    foreach(config ${CMAKE_CONFIGURATION_TYPES})

				        include(mode.${config})

				        list(APPEND scylla_build_modes ${scylla_build_mode_${config}})

				    endforeach()

				    add_custom_target(mode_list

				        COMMAND ${CMAKE_COMMAND} -E echo "$<JOIN:${scylla_build_modes}, >"

				        COMMENT "List configured modes"

				        BYPRODUCTS mode-list.phony.stamp

				        COMMAND_EXPAND_LISTS)

				else()

				    include(mode.${CMAKE_BUILD_TYPE})

				    add_custom_target(mode_list

				        ${CMAKE_COMMAND} -E echo "${scylla_build_mode}"

				        COMMENT "List configured modes")

				endif()

				add_compile_definitions(

				    ${Seastar_DEFINITIONS_${build_mode}}

				    FMT_DEPRECATED_OSTREAM)

				include(limit_jobs)

				# Configure Seastar compile options to align with Scylla

				@@ -37,11 +54,17 @@ set(Seastar_TESTING ON CACHE BOOL "" FORCE)

				set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)

				set(Seastar_APPS ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

				add_subdirectory(seastar)

				# System libraries dependencies

				find_package(Boost REQUIRED

				    COMPONENTS filesystem program_options system thread regex unit_test_framework)

				target_link_libraries(Boost::regex

				  INTERFACE

				    ICU::i18n

				    ICU::uc)

				find_package(Lua REQUIRED)

				find_package(ZLIB REQUIRED)

				find_package(ICU COMPONENTS uc i18n REQUIRED)

				@@ -108,6 +131,32 @@ target_link_libraries(scylla-main

				    Snappy::snappy

				    systemd

				    ZLIB::ZLIB)

				option(Scylla_CHECK_HEADERS

				  "Add check-headers target for checking the self-containness of headers")

				if(Scylla_CHECK_HEADERS)

				    add_custom_target(check-headers)

				    # compatibility target used by CI, which builds "check-headers" only for

				    # the "Dev" mode.

				    # our CI currently builds "dev-headers" using ninja without specify a build

				    # mode. where "dev" is actually a prefix encoded in the target name for the

				    # underlying "headers" target. while we don't have this convention in CMake

				    # targets. in contrast, the "check-headers" which is built for all

				    # configurations defined by "CMAKE_DEFAULT_CONFIGS". however, we only need

				    # to build "check-headers" for the "Dev" configuration. Therefore, before

				    # updating the CI to use build "check-headers:Dev", let's add a new target

				    # that specifically builds "check-headers" only for Dev configuration. The

				    # new target will do nothing for other configurations.

				    add_custom_target(dev-headers

				        COMMAND ${CMAKE_COMMAND}

				          "$<IF:$<CONFIG:Dev>,--build;${CMAKE_BINARY_DIR};--config;$<CONFIG>;--target;check-headers,-E;echo;skipping;dev-headers;in;$<CONFIG>>"

				        COMMAND_EXPAND_LISTS)

				endif()

				include(check_headers)

				check_headers(check-headers scylla-main

				  GLOB ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

				add_subdirectory(api)

				add_subdirectory(alternator)

				add_subdirectory(db)

				@@ -185,10 +234,6 @@ target_link_libraries(scylla PRIVATE

				    transport

				    types

				    utils)

				target_link_libraries(Boost::regex

				  INTERFACE

				    ICU::i18n

				    ICU::uc)

				target_link_libraries(scylla PRIVATE

				    seastar

3

SCYLLA-VERSION-GEN

View File

@@ -28,7 +28,7 @@ The files created are:
 By default, these files are created in the 'build'
 subdirectory under the directory containing the script.
 The destination directory can be overriden by
 The destination directory can be overridden by
 using '-o PATH' option.
 END
 )
@@ -87,7 +87,6 @@ then
 else
 	SCYLLA_VERSION=$VERSION
 	if [ -z "$SCYLLA_RELEASE" ]; then
 		DATE=$(date --utc +%Y%m%d)
 		GIT_COMMIT=$(git -C "$SCRIPT_DIR" log --pretty=format:'%h' -n 1 --abbrev=12)
 		# For custom package builds, replace "0" with "counter.your_name",
 		# where counter starts at 1 and increments for successive versions.

									
										3

alternator/CMakeLists.txt
									
												View File
												
				@@ -28,3 +28,6 @@ target_link_libraries(alternator

				  idl

				  Seastar::seastar

				  xxHash::xxhash)

				check_headers(check-headers alternator

				  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

									
										8

alternator/auth.cc
									
												View File
												
				@@ -7,19 +7,17 @@

				 */

				#include "alternator/error.hh"

				#include "auth/common.hh"

				#include "log.hh"

				#include <string>

				#include <string_view>

				#include "bytes.hh"

				#include "alternator/auth.hh"

				#include <fmt/format.h>

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "auth/roles-metadata.hh"

				#include "service/storage_proxy.hh"

				#include "alternator/executor.hh"

				#include "cql3/selection/selection.hh"

				#include "query-result-set.hh"

				#include "cql3/result_set.hh"

				#include <seastar/core/coroutine.hh>

				@@ -27,8 +25,8 @@ namespace alternator {

				static logging::logger alogger("alternator-auth");

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {

				    schema_ptr schema = proxy.data_dictionary().find_schema("system_auth", "roles");

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username) {

				    schema_ptr schema = proxy.data_dictionary().find_schema(auth::get_auth_ks_name(as.query_processor()), "roles");

				    partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));

				    dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};

				    std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};

									
										6

alternator/auth.hh
									
												View File
												
				@@ -9,10 +9,8 @@

				#pragma once

				#include <string>

				#include <string_view>

				#include <array>

				#include "gc_clock.hh"

				#include "utils/loading_cache.hh"

				#include "auth/service.hh"

				namespace service {

				class storage_proxy;

				@@ -22,6 +20,6 @@ namespace alternator {

				using key_cache = utils::loading_cache<std::string, std::string, 1>;

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);

				future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username);

				}

									
										5

alternator/conditions.cc
									
												View File
												
				@@ -6,12 +6,9 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <list>

				#include <map>

				#include <string_view>

				#include "alternator/conditions.hh"

				#include "alternator/error.hh"

				#include "cql3/constants.hh"

				#include <unordered_map>

				#include "utils/rjson.hh"

				#include "serialization.hh"

				@@ -342,7 +339,7 @@ static bool check_NOT_NULL(const rjson::value* val) {

				}

				// Only types S, N or B (string, number or bytes) may be compared by the

				// various comparion operators - lt, le, gt, ge, and between.

				// various comparison operators - lt, le, gt, ge, and between.

				// Note that in particular, if the value is missing (v->IsNull()), this

				// check returns false.

				static bool check_comparable_type(const rjson::value& v) {

									
										2

alternator/conditions.hh
									
												View File
												
				@@ -18,8 +18,6 @@

				#pragma once

				#include "cql3/restrictions/statement_restrictions.hh"

				#include "serialization.hh"

				#include "expressions_types.hh"

				namespace alternator {

									
										4

alternator/controller.cc
									
												View File
												
				@@ -73,11 +73,11 @@ future<> controller::start_server() {

				        // shards - if necessary for LWT.

				        smp_service_group_config c;

				        c.max_nonlocal_requests = 5000;

				        _ssg = create_smp_service_group(c).get0();

				        _ssg = create_smp_service_group(c).get();

				        rmw_operation::set_default_write_isolation(_config.alternator_write_isolation());

				        net::inet_address addr = utils::resolve(_config.alternator_address, family).get0();

				        net::inet_address addr = utils::resolve(_config.alternator_address, family).get();

				        auto get_cdc_metadata = [] (cdc::generation_service& svc) { return std::ref(svc.get_cdc_metadata()); };

				        auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {

									
										18

alternator/error.hh
									
												View File
												
				@@ -10,6 +10,7 @@

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include "utils/rjson.hh"

				namespace alternator {

				@@ -27,10 +28,16 @@ public:

				    status_type _http_code;

				    std::string _type;

				    std::string _msg;

				    api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request)

				    // Additional data attached to the error, null value if not set. It's wrapped in copyable_value

				    // class because copy constructor is required for exception classes otherwise it won't compile

				    // (despite that its use may be optimized away).

				    rjson::copyable_value _extra_fields; 

				    api_error(std::string type, std::string msg, status_type http_code = status_type::bad_request,

				    rjson::value extra_fields = rjson::null_value())

				        : _http_code(std::move(http_code))

				        , _type(std::move(type))

				        , _msg(std::move(msg))

				        , _extra_fields(std::move(extra_fields))

				    { }

				    // Factory functions for some common types of DynamoDB API errors

				@@ -58,8 +65,13 @@ public:

				    static api_error access_denied(std::string msg) {

				        return api_error("AccessDeniedException", std::move(msg));

				    }

				    static api_error conditional_check_failed(std::string msg) {

				        return api_error("ConditionalCheckFailedException", std::move(msg));

				    static api_error conditional_check_failed(std::string msg, rjson::value&& item) {

				        if (!item.IsNull()) {

				            auto tmp = rjson::empty_object();

				            rjson::add(tmp, "Item", std::move(item));

				            item = std::move(tmp);

				        }

				        return api_error("ConditionalCheckFailedException", std::move(msg), status_type::bad_request, std::move(item));

				    }

				    static api_error expired_iterator(std::string msg) {

				        return api_error("ExpiredIteratorException", std::move(msg));

									
										261

alternator/executor.cc
									
												View File
												
				@@ -6,13 +6,11 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "utils/base64.hh"

				#include <seastar/core/sleep.hh>

				#include "alternator/executor.hh"

				#include "db/config.hh"

				#include "log.hh"

				#include "schema/schema_builder.hh"

				#include "data_dictionary/keyspace_metadata.hh"

				#include "exceptions/exceptions.hh"

				#include "timestamp.hh"

				#include "types/map.hh"

				@@ -21,17 +19,13 @@

				#include "query-result-reader.hh"

				#include "cql3/selection/selection.hh"

				#include "cql3/result_set.hh"

				#include "cql3/type_json.hh"

				#include "bytes.hh"

				#include "cql3/update_parameters.hh"

				#include "server.hh"

				#include "service/pager/query_pagers.hh"

				#include <functional>

				#include "error.hh"

				#include "serialization.hh"

				#include "expressions.hh"

				#include "conditions.hh"

				#include "cql3/constants.hh"

				#include "cql3/util.hh"

				#include <optional>

				#include "utils/overloaded_functor.hh"

				@@ -41,6 +35,7 @@

				#include "schema/schema.hh"

				#include "db/tags/extension.hh"

				#include "db/tags/utils.hh"

				#include "replica/database.hh"

				#include "alternator/rmw_operation.hh"

				#include <seastar/core/coroutine.hh>

				#include <boost/range/adaptors.hpp>

				@@ -48,7 +43,6 @@

				#include <unordered_set>

				#include "service/storage_proxy.hh"

				#include "gms/gossiper.hh"

				#include "schema/schema_registry.hh"

				#include "utils/error_injection.hh"

				#include "db/schema_tables.hh"

				#include "utils/rjson.hh"

				@@ -77,10 +71,10 @@ static sstring_view table_status_to_sstring(table_status tbl_status) {

				        case table_status::deleting:

				            return "DELETING";

				    }

				    return "UKNOWN";

				    return "UNKNOWN";

				}

				static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type);

				static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type, const std::map<sstring, sstring>& tags_map);

				static map_type attrs_type() {

				    static thread_local auto t = map_type_impl::get_instance(utf8_type, bytes_type, true);

				@@ -770,15 +764,33 @@ enum class update_tags_action { add_tags, delete_tags };

				static void update_tags_map(const rjson::value& tags, std::map<sstring, sstring>& tags_map, update_tags_action action) {

				    if (action == update_tags_action::add_tags) {

				        for (auto it = tags.Begin(); it != tags.End(); ++it) {

				            const rjson::value& key = (*it)["Key"];

				            const rjson::value& value = (*it)["Value"];

				            auto tag_key = rjson::to_string_view(key);

				            if (tag_key.empty() || tag_key.size() > 128 || !validate_legal_tag_chars(tag_key)) {

				                throw api_error::validation("The Tag Key provided is invalid string");

				            if (!it->IsObject()) {

				                throw api_error::validation("invalid tag object");

				            }

				            auto tag_value = rjson::to_string_view(value);

				            if (tag_value.empty() || tag_value.size() > 256 || !validate_legal_tag_chars(tag_value)) {

				                throw api_error::validation("The Tag Value provided is invalid string");

				            const rjson::value* key = rjson::find(*it, "Key");

				            const rjson::value* value = rjson::find(*it, "Value");

				            if (!key || !key->IsString() || !value || !value->IsString()) {

				                throw api_error::validation("string Key and Value required");

				            }

				            auto tag_key = rjson::to_string_view(*key);

				            auto tag_value = rjson::to_string_view(*value);

				            if (tag_key.empty()) {

				                throw api_error::validation("A tag Key cannot be empty");

				            }

				            if (tag_key.size() > 128) {

				                throw api_error::validation("A tag Key is limited to 128 characters");

				            }

				            if (!validate_legal_tag_chars(tag_key)) {

				                throw api_error::validation("A tag Key can only contain letters, spaces, and [+-=._:/]");

				            }

				            // Note tag values are limited similarly to tag keys, but have a

				            // longer length limit, and *can* be empty.

				            if (tag_value.size() > 256) {

				                throw api_error::validation("A tag Value is limited to 256 characters");

				            }

				            if (!validate_legal_tag_chars(tag_value)) {

				                throw api_error::validation("A tag Value can only contain letters, spaces, and [+-=._:/]");

				            }

				            tags_map[sstring(tag_key)] = sstring(tag_value);

				        }

				@@ -994,7 +1006,7 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				                add_column(view_builder, view_range_key, attribute_definitions, column_kind::clustering_key);

				            }

				            // Base key columns which aren't part of the index's key need to

				            // be added to the view nontheless, as (additional) clustering

				            // be added to the view nonetheless, as (additional) clustering

				            // key(s).

				            if  (hash_key != view_hash_key && hash_key != view_range_key) {

				                add_column(view_builder, hash_key, attribute_definitions, column_kind::clustering_key);

				@@ -1002,6 +1014,8 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				            if  (!range_key.empty() && range_key != view_hash_key && range_key != view_range_key) {

				                add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);

				            }

				            // GSIs have no tags:

				            view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>());

				            sstring where_clause = format("{} IS NOT NULL", cql3::util::maybe_quote(view_hash_key));

				            if (!view_range_key.empty()) {

				                where_clause = format("{} AND {} IS NOT NULL", where_clause,

				@@ -1051,7 +1065,7 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				            }

				            add_column(view_builder, view_range_key, attribute_definitions, column_kind::clustering_key);

				            // Base key columns which aren't part of the index's key need to

				            // be added to the view nontheless, as (additional) clustering

				            // be added to the view nonetheless, as (additional) clustering

				            // key(s).

				            if  (!range_key.empty() && view_range_key != range_key) {

				                add_column(view_builder, range_key, attribute_definitions, column_kind::clustering_key);

				@@ -1066,6 +1080,11 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				                    cql3::util::maybe_quote(view_range_key));

				            }

				            where_clauses.push_back(std::move(where_clause));

				            // LSIs have no tags, but Scylla's "synchronous_updates" feature

				            // (which an LSIs need), is actually implemented as a tag so we

				            // need to add it here:

				            std::map<sstring, sstring> tags_map = {{db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY, "true"}};

				            view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>(tags_map));

				            view_builders.emplace_back(std::move(view_builder));

				        }

				    }

				@@ -1112,7 +1131,6 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				        }

				        const bool include_all_columns = true;

				        view_builder.with_view_info(*schema, include_all_columns, *where_clause_it);

				        view_builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>());

				        ++where_clause_it;

				    }

				@@ -1121,7 +1139,19 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				    auto group0_guard = co_await mm.start_group0_operation();

				    auto ts = group0_guard.write_timestamp();

				    std::vector<mutation> schema_mutations;

				    auto ksm = create_keyspace_metadata(keyspace_name, sp, gossiper, ts);

				    auto ksm = create_keyspace_metadata(keyspace_name, sp, gossiper, ts, tags_map);

				    // Alternator Streams doesn't yet work when the table uses tablets (#16317)

				    if (stream_specification && stream_specification->IsObject()) {

				        auto stream_enabled = rjson::find(*stream_specification, "StreamEnabled");

				        if (stream_enabled && stream_enabled->IsBool() && stream_enabled->GetBool()) {

				            locator::replication_strategy_params params(ksm->strategy_options(), ksm->initial_tablets());

				            auto rs = locator::abstract_replication_strategy::create_replication_strategy(ksm->strategy_name(), params);

				            if (rs->uses_tablets()) {

				                co_return api_error::validation("Streams not yet supported on a table using tablets (issue #16317). "

				                "If you want to use streams, create a table with vnodes by setting the tag 'experimental:initial_tablets' set to 'none'.");

				            }

				        }

				    }

				    try {

				        schema_mutations = service::prepare_new_keyspace_announcement(sp.local_db(), ksm, ts);

				    } catch (exceptions::already_exists_exception&) {

				@@ -1135,8 +1165,18 @@ static future<executor::request_return_type> create_table_on_shard0(tracing::tra

				    }

				    co_await service::prepare_new_column_family_announcement(schema_mutations, sp, *ksm, schema, ts);

				    for (schema_builder& view_builder : view_builders) {

				        view_ptr view(view_builder.build());

				        db::schema_tables::add_table_or_view_to_schema_mutation(

				            view_ptr(view_builder.build()), ts, true, schema_mutations);

				            view, ts, true, schema_mutations);

				        // add_table_or_view_to_schema_mutation() is a low-level function that

				        // doesn't call the callbacks that prepare_new_view_announcement()

				        // calls. So we need to call this callback here :-( If we don't, among

				        // other things *tablets* will not be created for the new view.

				        // These callbacks need to be called in a Seastar thread.

				        co_await seastar::async([&sp, &ksm, &view, &schema_mutations, ts] {

				            return sp.local_db().get_notifier().before_create_column_family(*ksm, *view, schema_mutations, ts);

				        });

				    }

				    co_await mm.announce(std::move(schema_mutations), std::move(group0_guard), format("alternator-executor: create {} table", table_name));

				@@ -1199,6 +1239,13 @@ future<executor::request_return_type> executor::update_table(client_state& clien

				        rjson::value* stream_specification = rjson::find(request, "StreamSpecification");

				        if (stream_specification && stream_specification->IsObject()) {

				            add_stream_options(*stream_specification, builder, p.local());

				            // Alternator Streams doesn't yet work when the table uses tablets (#16317)

				            auto stream_enabled = rjson::find(*stream_specification, "StreamEnabled");

				            if (stream_enabled && stream_enabled->IsBool() && stream_enabled->GetBool() &&

				                p.local().local_db().find_keyspace(tab->ks_name()).get_replication_strategy().uses_tablets()) {

				                co_return api_error::validation("Streams not yet supported on a table using tablets (issue #16317). "

				                    "If you want to enable streams, re-create this table with vnodes (with the tag 'experimental:initial_tablets' set to 'none').");

				            }

				        }

				        auto schema = builder.build();

				@@ -1489,11 +1536,31 @@ rmw_operation::returnvalues rmw_operation::parse_returnvalues(const rjson::value

				    }

				}

				rmw_operation::returnvalues_on_condition_check_failure

				rmw_operation::parse_returnvalues_on_condition_check_failure(const rjson::value& request) {

				    const rjson::value* attribute_value = rjson::find(request, "ReturnValuesOnConditionCheckFailure");

				    if (!attribute_value) {

				        return rmw_operation::returnvalues_on_condition_check_failure::NONE;

				    }

				    if (!attribute_value->IsString()) {

				        throw api_error::validation(format("Expected string value for ReturnValuesOnConditionCheckFailure, got: {}", *attribute_value));

				    }

				    auto s = rjson::to_string_view(*attribute_value);

				    if (s == "NONE") {

				        return rmw_operation::returnvalues_on_condition_check_failure::NONE;

				    } else if (s == "ALL_OLD") {

				        return rmw_operation::returnvalues_on_condition_check_failure::ALL_OLD;

				    } else {

				        throw api_error::validation(format("Unrecognized value for ReturnValuesOnConditionCheckFailure: {}", s));

				    }

				}

				rmw_operation::rmw_operation(service::storage_proxy& proxy, rjson::value&& request)

				    : _request(std::move(request))

				    , _schema(get_table(proxy, _request))

				    , _write_isolation(get_write_isolation_for_schema(_schema))

				    , _returnvalues(parse_returnvalues(_request))

				    , _returnvalues_on_condition_check_failure(parse_returnvalues_on_condition_check_failure(_request))

				{

				    // _pk and _ck will be assigned later, by the subclass's constructor

				    // (each operation puts the key in a slightly different location in

				@@ -1599,7 +1666,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr

				                    [this, &proxy, trace_state, permit = std::move(permit)] (std::unique_ptr<rjson::value> previous_item) mutable {

				                std::optional<mutation> m = apply(std::move(previous_item), api::new_timestamp());

				                if (!m) {

				                    return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("Failed condition."));

				                    return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes)));

				                }

				                return proxy.mutate(std::vector<mutation>{std::move(*m)}, db::consistency_level::LOCAL_QUORUM, executor::default_timeout(), trace_state, std::move(permit), db::allow_per_partition_rate_limit::yes).then([this] () mutable {

				                    return rmw_operation_return(std::move(_return_attributes));

				@@ -1624,7 +1691,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr

				            {timeout, std::move(permit), client_state, trace_state},

				            db::consistency_level::LOCAL_SERIAL, db::consistency_level::LOCAL_QUORUM, timeout, timeout).then([this, read_command] (bool is_applied) mutable {

				        if (!is_applied) {

				            return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("Failed condition."));

				            return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes)));

				        }

				        return rmw_operation_return(std::move(_return_attributes));

				    });

				@@ -1713,6 +1780,10 @@ public:

				    virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const override {

				        if (!verify_expected(_request, previous_item.get()) ||

				            !verify_condition_expression(_condition_expression, previous_item.get())) {

				            if (previous_item && _returnvalues_on_condition_check_failure ==

				                returnvalues_on_condition_check_failure::ALL_OLD) {

				                _return_attributes = std::move(*previous_item);

				            }

				            // If the update is to be cancelled because of an unfulfilled Expected

				            // condition, return an empty optional mutation, which is more

				            // efficient than throwing an exception.

				@@ -1753,7 +1824,7 @@ future<executor::request_return_type> executor::put_item(client_state& client_st

				        });

				    }

				    return op->execute(_proxy, client_state, trace_state, std::move(permit), needs_read_before_write, _stats).finally([op, start_time, this] {

				        _stats.api_operations.put_item_latency.add(std::chrono::steady_clock::now() - start_time);

				        _stats.api_operations.put_item_latency.mark(std::chrono::steady_clock::now() - start_time);

				    });

				}

				@@ -1798,6 +1869,10 @@ public:

				    virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const override {

				        if (!verify_expected(_request, previous_item.get()) ||

				            !verify_condition_expression(_condition_expression, previous_item.get())) {

				            if (previous_item && _returnvalues_on_condition_check_failure ==

				                returnvalues_on_condition_check_failure::ALL_OLD) {

				                _return_attributes = std::move(*previous_item);

				            }

				            // If the update is to be cancelled because of an unfulfilled Expected

				            // condition, return an empty optional mutation, which is more

				            // efficient than throwing an exception.

				@@ -1838,7 +1913,7 @@ future<executor::request_return_type> executor::delete_item(client_state& client

				        });

				    }

				    return op->execute(_proxy, client_state, trace_state, std::move(permit), needs_read_before_write, _stats).finally([op, start_time, this] {

				        _stats.api_operations.delete_item_latency.add(std::chrono::steady_clock::now() - start_time);

				        _stats.api_operations.delete_item_latency.mark(std::chrono::steady_clock::now() - start_time);

				    });

				}

				@@ -2240,7 +2315,7 @@ enum class select_type { regular, count, projection };

				static select_type parse_select(const rjson::value& request, table_or_view_type table_type) {

				    const rjson::value* select_value = rjson::find(request, "Select");

				    if (!select_value) {

				        // If "Select" is not specificed, it defaults to ALL_ATTRIBUTES

				        // If "Select" is not specified, it defaults to ALL_ATTRIBUTES

				        // on a base table, or ALL_PROJECTED_ATTRIBUTES on an index

				        return table_type == table_or_view_type::base ?

				            select_type::regular : select_type::projection;

				@@ -2677,22 +2752,35 @@ static std::optional<rjson::value> action_result(

				    }, action._action);

				}

				}

				// Print an attribute_path_map_node<action> as the list of paths it contains:

				static std::ostream& operator<<(std::ostream& out, const attribute_path_map_node<parsed::update_expression::action>& h) {

				template <> struct fmt::formatter<alternator::attribute_path_map_node<alternator::parsed::update_expression::action>> {

				    constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }

				    // this function recursively call into itself, so we have to forward declare it.

				    auto format(const alternator::attribute_path_map_node<alternator::parsed::update_expression::action>& h, fmt::format_context& ctx) const

				      -> decltype(ctx.out());

				};

				auto fmt::formatter<alternator::attribute_path_map_node<alternator::parsed::update_expression::action>>::format(const alternator::attribute_path_map_node<alternator::parsed::update_expression::action>& h, fmt::format_context& ctx) const

				    -> decltype(ctx.out()) {

				    auto out = ctx.out();

				    if (h.has_value()) {

				        out << " " << h.get_value()._path;

				        out = fmt::format_to(out, " {}", h.get_value()._path);

				    } else if (h.has_members()) {

				        for (auto& member : h.get_members()) {

				            out << *member.second;

				            out = fmt::format_to(out, "{}", *member.second);

				        }

				    } else if (h.has_indexes()) {

				        for (auto& index : h.get_indexes()) {

				            out << *index.second;

				            out = fmt::format_to(out, "{}", *index.second);

				        }

				    }

				    return out;

				}

				namespace alternator {

				// Apply the hierarchy of actions in an attribute_path_map_node<action> to a

				// JSON object which uses DynamoDB's serialization conventions. The complete,

				// unmodified, previous_item is also necessary for the right-hand sides of the

				@@ -2794,6 +2882,10 @@ std::optional<mutation>

				update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const {

				    if (!verify_expected(_request, previous_item.get()) ||

				        !verify_condition_expression(_condition_expression, previous_item.get())) {

				        if (previous_item && _returnvalues_on_condition_check_failure ==

				            returnvalues_on_condition_check_failure::ALL_OLD) {

				            _return_attributes = std::move(*previous_item);

				        }

				        // If the update is to be cancelled because of an unfulfilled

				        // condition, return an empty optional mutation, which is more

				        // efficient than throwing an exception.

				@@ -3085,14 +3177,14 @@ future<executor::request_return_type> executor::update_item(client_state& client

				        });

				    }

				    return op->execute(_proxy, client_state, trace_state, std::move(permit), needs_read_before_write, _stats).finally([op, start_time, this] {

				        _stats.api_operations.update_item_latency.add(std::chrono::steady_clock::now() - start_time);

				        _stats.api_operations.update_item_latency.mark(std::chrono::steady_clock::now() - start_time);

				    });

				}

				// Check according to the request's "ConsistentRead" field, which consistency

				// level we need to use for the read. The field can be True for strongly

				// consistent reads, or False for eventually consistent reads, or if this

				// field is absense, we default to eventually consistent reads.

				// field is absence, we default to eventually consistent reads.

				// In Scylla, eventually-consistent reads are implemented as consistency

				// level LOCAL_ONE, and strongly-consistent reads as LOCAL_QUORUM.

				static db::consistency_level get_read_consistency(const rjson::value& request) {

				@@ -3169,7 +3261,7 @@ future<executor::request_return_type> executor::get_item(client_state& client_st

				    return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl,

				            service::storage_proxy::coordinator_query_options(executor::default_timeout(), std::move(permit), client_state, trace_state)).then(

				            [this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), attrs_to_get = std::move(attrs_to_get), start_time = std::move(start_time)] (service::storage_proxy::coordinator_query_result qr) mutable {

				        _stats.api_operations.get_item_latency.add(std::chrono::steady_clock::now() - start_time);

				        _stats.api_operations.get_item_latency.mark(std::chrono::steady_clock::now() - start_time);

				        return make_ready_future<executor::request_return_type>(make_jsonable(describe_item(schema, partition_slice, *selection, *qr.query_result, std::move(attrs_to_get))));

				    });

				}

				@@ -3540,7 +3632,7 @@ public:

				        // the JSON but take them out before finally returning the JSON.

				        if (_attrs_to_get) {

				            _filter.for_filters_on([&] (std::string_view attr) {

				                std::string a(attr); // no heterogenous maps searches :-(

				                std::string a(attr); // no heterogeneous maps searches :-(

				                if (!_attrs_to_get->contains(a)) {

				                    _extra_filter_attrs.emplace(std::move(a));

				                }

				@@ -3625,9 +3717,9 @@ public:

				    }

				};

				static std::tuple<rjson::value, size_t> describe_items(const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {

				static future<std::tuple<rjson::value, size_t>> describe_items(const cql3::selection::selection& selection, std::unique_ptr<cql3::result_set> result_set, std::optional<attrs_to_get>&& attrs_to_get, filter&& filter) {

				    describe_items_visitor visitor(selection.get_columns(), attrs_to_get, filter);

				    result_set->visit(visitor);

				    co_await result_set->visit_gently(visitor);

				    auto scanned_count = visitor.get_scanned_count();

				    rjson::value items = std::move(visitor).get_items();

				    rjson::value items_descr = rjson::empty_object();

				@@ -3644,7 +3736,7 @@ static std::tuple<rjson::value, size_t> describe_items(const cql3::selection::se

				    if (!attrs_to_get || !attrs_to_get->empty()) {

				        rjson::add(items_descr, "Items", std::move(items));

				    }

				    return {std::move(items_descr), size};

				    co_return std::tuple<rjson::value, size_t>{std::move(items_descr), size};

				}

				static rjson::value encode_paging_state(const schema& schema, const service::pager::paging_state& paging_state) {

				@@ -3685,18 +3777,18 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag

				static future<executor::request_return_type> do_query(service::storage_proxy& proxy,

				        schema_ptr schema,

				        const rjson::value* exclusive_start_key,

				        dht::partition_range_vector&& partition_ranges,

				        std::vector<query::clustering_range>&& ck_bounds,

				        std::optional<attrs_to_get>&& attrs_to_get,

				        dht::partition_range_vector partition_ranges,

				        std::vector<query::clustering_range> ck_bounds,

				        std::optional<attrs_to_get> attrs_to_get,

				        uint32_t limit,

				        db::consistency_level cl,

				        filter&& filter,

				        filter filter,

				        query::partition_slice::option_set custom_opts,

				        service::client_state& client_state,

				        cql3::cql_stats& cql_stats,

				        tracing::trace_state_ptr trace_state,

				        service_permit permit) {

				    lw_shared_ptr<service::pager::paging_state> paging_state = nullptr;

				    lw_shared_ptr<service::pager::paging_state> old_paging_state = nullptr;

				    tracing::trace(trace_state, "Performing a database query");

				@@ -3706,7 +3798,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr

				        if (schema->clustering_key_size() > 0) {

				            pos = pos_from_json(*exclusive_start_key, schema);

				        }

				        paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, query_id::create_null_id(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);

				        old_paging_state = make_lw_shared<service::pager::paging_state>(pk, pos, query::max_partitions, query_id::create_null_id(), service::pager::paging_state::replicas_per_token_range{}, std::nullopt, 0);

				    }

				    auto regular_columns = boost::copy_range<query::column_id_vector>(

				@@ -3725,34 +3817,28 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr

				    // FIXME: should be moved above, set on opts, so get_max_result_size knows it?

				    command->slice.options.set<query::partition_slice::option::allow_short_read>();

				    auto query_options = std::make_unique<cql3::query_options>(cl, std::vector<cql3::raw_value>{});

				    query_options = std::make_unique<cql3::query_options>(std::move(query_options), std::move(paging_state));

				    query_options = std::make_unique<cql3::query_options>(std::move(query_options), std::move(old_paging_state));

				    auto p = service::pager::query_pagers::pager(proxy, schema, selection, *query_state_ptr, *query_options, command, std::move(partition_ranges), nullptr);

				    return p->fetch_page(limit, gc_clock::now(), executor::default_timeout()).then(

				            [p = std::move(p), schema, cql_stats, partition_slice = std::move(partition_slice),

				             selection = std::move(selection), query_state_ptr = std::move(query_state_ptr),

				             attrs_to_get = std::move(attrs_to_get),

				             query_options = std::move(query_options),

				             filter = std::move(filter)] (std::unique_ptr<cql3::result_set> rs) mutable {

				        if (!p->is_exhausted()) {

				            rs->get_metadata().set_paging_state(p->state());

				        }

				        auto paging_state = rs->get_metadata().paging_state();

				        bool has_filter = filter;

				        auto [items, size] = describe_items(*selection, std::move(rs), std::move(attrs_to_get), std::move(filter));

				        if (paging_state) {

				            rjson::add(items, "LastEvaluatedKey", encode_paging_state(*schema, *paging_state));

				        }

				        if (has_filter){

				            cql_stats.filtered_rows_read_total += p->stats().rows_read_total;

				            // update our "filtered_row_matched_total" for all the rows matched, despited the filter

				            cql_stats.filtered_rows_matched_total += size;

				        }

				        if (is_big(items)) {

				            return make_ready_future<executor::request_return_type>(make_streamed(std::move(items)));

				        }

				        return make_ready_future<executor::request_return_type>(make_jsonable(std::move(items)));

				    });

				    std::unique_ptr<cql3::result_set> rs = co_await p->fetch_page(limit, gc_clock::now(), executor::default_timeout());

				    if (!p->is_exhausted()) {

				        rs->get_metadata().set_paging_state(p->state());

				    }

				    auto paging_state = rs->get_metadata().paging_state();

				    bool has_filter = filter;

				    auto [items, size] = co_await describe_items(*selection, std::move(rs), std::move(attrs_to_get), std::move(filter));

				    if (paging_state) {

				        rjson::add(items, "LastEvaluatedKey", encode_paging_state(*schema, *paging_state));

				    }

				    if (has_filter){

				        cql_stats.filtered_rows_read_total += p->stats().rows_read_total;

				        // update our "filtered_row_matched_total" for all the rows matched, despited the filter

				        cql_stats.filtered_rows_matched_total += size;

				    }

				    if (is_big(items)) {

				        co_return executor::request_return_type(make_streamed(std::move(items)));

				    }

				    co_return executor::request_return_type(make_jsonable(std::move(items)));

				}

				static dht::token token_for_segment(int segment, int total_segments) {

				@@ -4463,7 +4549,7 @@ future<executor::request_return_type> executor::describe_continuous_backups(clie

				// of nodes in the cluster: A cluster with 3 or more live nodes, gets RF=3.

				// A smaller cluster (presumably, a test only), gets RF=1. The user may

				// manually create the keyspace to override this predefined behavior.

				static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type ts) {

				static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_view keyspace_name, service::storage_proxy& sp, gms::gossiper& gossiper, api::timestamp_type ts, const std::map<sstring, sstring>& tags_map) {

				    int endpoint_count = gossiper.num_endpoints();

				    int rf = 3;

				    if (endpoint_count < rf) {

				@@ -4473,7 +4559,36 @@ static lw_shared_ptr<keyspace_metadata> create_keyspace_metadata(std::string_vie

				    }

				    auto opts = get_network_topology_options(sp, gossiper, rf);

				    return keyspace_metadata::new_keyspace(keyspace_name, "org.apache.cassandra.locator.NetworkTopologyStrategy", std::move(opts), true);

				    // Even if the "tablets" experimental feature is available, we currently

				    // do not enable tablets by default on Alternator tables because LWT is

				    // not yet fully supported with tablets.

				    // The user can override the choice of whether or not to use tablets at

				    // table-creation time by supplying the following tag with a numeric value

				    // (setting the value to 0 means enabling tablets with automatic selection

				    // of the best number of tablets).

				    // Setting this tag to any non-numeric value (e.g., an empty string or the

				    // word "none") will ask to disable tablets.

				    // If we make this tag a permanent feature, it will get a "system:" prefix -

				    // until then we give it the "experimental:" prefix to not commit to it.

				    static constexpr auto INITIAL_TABLETS_TAG_KEY = "experimental:initial_tablets";

				    // initial_tablets currently defaults to unset, so tablets will not be

				    // used by default on new Alternator tables. Change this initialization

				    // to 0 enable tablets by default, with automatic number of tablets.

				    std::optional<unsigned> initial_tablets;

				    if (sp.get_db().local().get_config().check_experimental(db::experimental_features_t::feature::TABLETS)) {

				        auto it = tags_map.find(INITIAL_TABLETS_TAG_KEY);

				        if (it != tags_map.end()) {

				            // Tag set. If it's a valid number, use it. If not - e.g., it's

				            // empty or a word like "none", disable tablets by setting

				            // initial_tablets to a disengaged optional.

				            try {

				                initial_tablets = std::stol(tags_map.at(INITIAL_TABLETS_TAG_KEY));

				            } catch(...) {

				                initial_tablets = std::nullopt;

				            }

				        }

				    }

				    return keyspace_metadata::new_keyspace(keyspace_name, "org.apache.cassandra.locator.NetworkTopologyStrategy", std::move(opts), initial_tablets);

				}

				future<> executor::start() {

									
										32

alternator/expressions.cc
									
												View File
												
				@@ -133,21 +133,6 @@ void path::check_depth_limit() {

				    }

				}

				std::ostream& operator<<(std::ostream& os, const path& p) {

				    os << p.root();

				    for (const auto& op : p.operators()) {

				        std::visit(overloaded_functor {

				            [&] (const std::string& member) {

				                os << '.' << member;

				            },

				            [&] (unsigned index) {

				                os << '[' << index << ']';

				            }

				        }, op);

				    }

				    return os;

				}

				} // namespace parsed

				// The following resolve_*() functions resolve references in parsed

				@@ -756,3 +741,20 @@ rjson::value calculate_value(const parsed::set_rhs& rhs,

				}

				} // namespace alternator

				auto fmt::formatter<alternator::parsed::path>::format(const alternator::parsed::path& p, fmt::format_context& ctx) const

				        -> decltype(ctx.out()) {

				    auto out = ctx.out();

				    out = fmt::format_to(out, "{}", p.root());

				    for (const auto& op : p.operators()) {

				        std::visit(overloaded_functor {

				            [&] (const std::string& member) {

				                out = fmt::format_to(out, ".{}", member);

				            },

				            [&] (unsigned index) {

				                out = fmt::format_to(out, "[{}]", index);

				            }

				        }, op);

				    }

				    return out;

				}

									
										38

alternator/expressions.hh
									
												View File
												
				@@ -60,24 +60,30 @@ enum class calculate_value_caller {

				    UpdateExpression, ConditionExpression, ConditionExpressionAlone

				};

				inline std::ostream& operator<<(std::ostream& out, calculate_value_caller caller) {

				    switch (caller) {

				        case calculate_value_caller::UpdateExpression:

				            out << "UpdateExpression";

				            break;

				        case calculate_value_caller::ConditionExpression:

				            out << "ConditionExpression";

				            break;

				        case calculate_value_caller::ConditionExpressionAlone:

				            out << "ConditionExpression";

				            break;

				        default:

				            out << "unknown type of expression";

				            break;

				    }

				    return out;

				}

				template <> struct fmt::formatter<alternator::calculate_value_caller> {

				    constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }

				    auto format(alternator::calculate_value_caller caller, fmt::format_context& ctx) const {

				        std::string_view name = "unknown type of expression";

				        switch (caller) {

				            using enum alternator::calculate_value_caller;

				            case UpdateExpression:

				                name = "UpdateExpression";

				                break;

				            case ConditionExpression:

				                name = "ConditionExpression";

				                break;

				            case ConditionExpressionAlone:

				                name = "ConditionExpression";

				                break;

				        }

				        return fmt::format_to(ctx.out(), "{}", name);

				    }

				};

				namespace alternator {

				rjson::value calculate_value(const parsed::value& v,

				        calculate_value_caller caller,

				        const rjson::value* previous_item);

									
										4

alternator/expressions_types.hh
									
												View File
												
				@@ -255,3 +255,7 @@ public:

				} // namespace parsed

				} // namespace alternator

				template <> struct fmt::formatter<alternator::parsed::path> : fmt::formatter<std::string_view> {

				    auto format(const alternator::parsed::path&, fmt::format_context& ctx) const -> decltype(ctx.out());

				};

									
										8

alternator/rmw_operation.hh
									
												View File
												
				@@ -19,7 +19,7 @@ namespace alternator {

				// operations which may involve a read of the item before the write

				// (so-called Read-Modify-Write operations). These operations include PutItem,

				// UpdateItem and DeleteItem: All of these may be conditional operations (the

				// "Expected" parameter) which requir a read before the write, and UpdateItem

				// "Expected" parameter) which require a read before the write, and UpdateItem

				// may also have an update expression which refers to the item's old value.

				//

				// The code below supports running the read and the write together as one

				@@ -69,7 +69,11 @@ protected:

				    enum class returnvalues {

				        NONE, ALL_OLD, UPDATED_OLD, ALL_NEW, UPDATED_NEW

				    } _returnvalues;

				    enum class returnvalues_on_condition_check_failure {

				        NONE, ALL_OLD

				    } _returnvalues_on_condition_check_failure;

				    static returnvalues parse_returnvalues(const rjson::value& request);

				    static returnvalues_on_condition_check_failure parse_returnvalues_on_condition_check_failure(const rjson::value& request);

				    // When _returnvalues != NONE, apply() should store here, in JSON form,

				    // the values which are to be returned in the "Attributes" field.

				    // The default null JSON means do not return an Attributes field at all.

				@@ -77,6 +81,8 @@ protected:

				    // it (see explanation below), but note that because apply() may be

				    // called more than once, if apply() will sometimes set this field it

				    // must set it (even if just to the default empty value) every time.

				    // Additionally when _returnvalues_on_condition_check_failure is ALL_OLD

				    // then condition check failure will also result in storing values here.

				    mutable rjson::value _return_attributes;

				public:

				    // The constructor of a rmw_operation subclass should parse the request

									
										3

alternator/serialization.cc
									
												View File
												
				@@ -11,7 +11,6 @@

				#include "log.hh"

				#include "serialization.hh"

				#include "error.hh"

				#include "rapidjson/writer.h"

				#include "concrete_types.hh"

				#include "cql3/type_json.hh"

				#include "mutation/position_in_partition.hh"

				@@ -59,7 +58,7 @@ type_representation represent_type(alternator_type atype) {

				// calculate its magnitude and precision from its scale() and unscaled_value().

				// So in the following ugly implementation we calculate them from the string

				// representation instead. We assume the number was already parsed

				// sucessfully to a big_decimal to it follows its syntax rules.

				// successfully to a big_decimal to it follows its syntax rules.

				//

				// FIXME: rewrite this function to take a big_decimal, not a string.

				// Maybe a snippet like this can help:

									
										16

alternator/server.cc
									
												View File
												
				@@ -23,7 +23,6 @@

				#include "service/storage_proxy.hh"

				#include "gms/gossiper.hh"

				#include "utils/overloaded_functor.hh"

				#include "utils/fb_utilities.hh"

				#include "utils/aws_sigv4.hh"

				static logging::logger slogger("alternator-server");

				@@ -118,7 +117,7 @@ public:

				                 }

				                 return make_ready_future<std::unique_ptr<reply>>(std::move(rep));

				             }

				             auto res = resf.get0();

				             auto res = resf.get();

				             std::visit(overloaded_functor {

				                 [&] (const json::json_return_type& json_return_value) {

				                     slogger.trace("api_handler success case");

				@@ -156,6 +155,9 @@ public:

				protected:

				    void generate_error_reply(reply& rep, const api_error& err) {

				        rjson::value results = rjson::empty_object();

				        if (!err._extra_fields.IsNull() && err._extra_fields.IsObject()) {

				            results = rjson::copy(err._extra_fields);

				        }

				        rjson::add(results, "__type", rjson::from_string("com.amazonaws.dynamodb.v20120810#" + err._type));

				        rjson::add(results, "message", err._msg);

				        rep._content = rjson::print(std::move(results));

				@@ -308,8 +310,8 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				        }

				    }

				    auto cache_getter = [&proxy = _proxy] (std::string username) {

				        return get_key_from_roles(proxy, std::move(username));

				    auto cache_getter = [&proxy = _proxy, &as = _auth_service] (std::string username) {

				        return get_key_from_roles(proxy, as, std::move(username));

				    };

				    return _key_cache.get_ptr(user, cache_getter).then([this, &req, &content,

				                                                    user = std::move(user),

				@@ -566,14 +568,14 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:

				            set_routes(_https_server._routes);

				            _https_server.set_content_length_limit(server::content_length_limit);

				            _https_server.set_content_streaming(true);

				            _https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {

				            auto server_creds = creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {

				                if (ep) {

				                    slogger.warn("Exception loading {}: {}", files, ep);

				                } else {

				                    slogger.info("Reloaded {}", files);

				                }

				            }).get0());

				            _https_server.listen(socket_address{addr, *https_port}).get();

				            }).get();

				            _https_server.listen(socket_address{addr, *https_port}, std::move(server_creds)).get();

				            _enabled_servers.push_back(std::ref(_https_server));

				        }

				    });

									
										6

alternator/stats.cc
									
												View File
												
				@@ -21,10 +21,12 @@ stats::stats() : api_operations{} {

				    _metrics.add_group("alternator", {

				#define OPERATION(name, CamelCaseName) \

				                seastar::metrics::make_total_operations("operation", api_operations.name, \

				                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}),

				                        seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}).set_skip_when_empty(),

				#define OPERATION_LATENCY(name, CamelCaseName) \

				                seastar::metrics::make_histogram("op_latency", \

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name);}),

				                        seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \

								seastar::metrics::make_summary("op_latency_summary", \

										                        seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),

				            OPERATION(batch_get_item, "BatchGetItem")

				            OPERATION(batch_write_item, "BatchWriteItem")

				            OPERATION(create_backup, "CreateBackup")

									
										12

alternator/stats.hh
									
												View File
												
				@@ -11,8 +11,8 @@

				#include <cstdint>

				#include <seastar/core/metrics_registration.hh>

				#include "seastarx.hh"

				#include "utils/estimated_histogram.hh"

				#include "utils/histogram.hh"

				#include "cql3/stats.hh"

				namespace alternator {

				@@ -66,11 +66,11 @@ public:

				        uint64_t get_shard_iterator = 0;

				        uint64_t get_records = 0;

				        utils::time_estimated_histogram put_item_latency;

				        utils::time_estimated_histogram get_item_latency;

				        utils::time_estimated_histogram delete_item_latency;

				        utils::time_estimated_histogram update_item_latency;

				        utils::time_estimated_histogram get_records_latency;

				        utils::timed_rate_moving_average_summary_and_histogram put_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram get_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram delete_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram update_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram get_records_latency;

				    } api_operations;

				    // Miscellaneous event counters

				    uint64_t total_operations = 0;

									
										14

alternator/streams.cc
									
												View File
												
				@@ -13,8 +13,6 @@

				#include <seastar/json/formatter.hh>

				#include "utils/base64.hh"

				#include "log.hh"

				#include "db/config.hh"

				#include "cdc/log.hh"

				@@ -25,7 +23,6 @@

				#include "utils/UUID_gen.hh"

				#include "cql3/selection/selection.hh"

				#include "cql3/result_set.hh"

				#include "cql3/type_json.hh"

				#include "cql3/column_identifier.hh"

				#include "schema/schema_builder.hh"

				#include "service/storage_proxy.hh"

				@@ -33,7 +30,6 @@

				#include "gms/feature_service.hh"

				#include "executor.hh"

				#include "rmw_operation.hh"

				#include "data_dictionary/data_dictionary.hh"

				/**

				@@ -280,7 +276,7 @@ struct sequence_number {

				         * Timeuuids viewed as msb<<64|lsb are _not_,

				         * but they are still sorted as

				         *  timestamp() << 64|lsb

				         * so we can simpy unpack the mangled msb

				         * so we can simply unpack the mangled msb

				         * and use as hi 64 in our "bignum".

				         */

				        uint128_t hi = uint64_t(num.uuid.timestamp());

				@@ -419,7 +415,7 @@ using namespace std::string_literals;

				 *

				 * In scylla, this is sort of akin to an ID having corresponding ID/ID:s

				 * that cover the token range it represents. Because ID:s are per

				 * vnode shard however, this relation can be somewhat ambigous.

				 * vnode shard however, this relation can be somewhat ambiguous.

				 * We still provide some semblance of this by finding the ID in

				 * older generation that has token start < current ID token start.

				 * This will be a partial overlap, but it is the best we can do.

				@@ -526,7 +522,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl

				        // (see explanation above) since we want to find closest

				        // token boundary when determining parent.

				        // #7346 - we processed and searched children/parents in

				        // stored order, which is not neccesarily token order,

				        // stored order, which is not necessarily token order,

				        // so the finding of "closest" token boundary (using upper bound)

				        // could give somewhat weird results.

				        static auto token_cmp = [](const cdc::stream_id& id1, const cdc::stream_id& id2) {

				@@ -1020,7 +1016,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				            // shard did end, then the next read will have nrecords == 0 and

				            // will notice end end of shard and not return NextShardIterator.

				            rjson::add(ret, "NextShardIterator", next_iter);

				            _stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);

				            _stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);

				            return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));

				        }

				@@ -1043,7 +1039,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				                shard_iterator next_iter(iter.table, iter.shard, utils::UUID_gen::min_time_UUID(high_ts.time_since_epoch()), true);

				                rjson::add(ret, "NextShardIterator", iter);

				            }

				            _stats.api_operations.get_records_latency.add(std::chrono::steady_clock::now() - start_time);

				            _stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);

				            if (is_big(ret)) {

				                return make_ready_future<executor::request_return_type>(make_streamed(std::move(ret)));

				            }

									
										17

alternator/ttl.cc
									
												View File
												
				@@ -32,13 +32,11 @@

				#include "service/pager/paging_state.hh"

				#include "service/pager/query_pagers.hh"

				#include "gms/feature_service.hh"

				#include "sstables/types.hh"

				#include "mutation/mutation.hh"

				#include "types/types.hh"

				#include "types/map.hh"

				#include "utils/rjson.hh"

				#include "utils/big_decimal.hh"

				#include "utils/fb_utilities.hh"

				#include "cql3/selection/selection.hh"

				#include "cql3/values.hh"

				#include "cql3/query_options.hh"

				@@ -81,6 +79,11 @@ future<executor::request_return_type> executor::update_time_to_live(client_state

				        co_return api_error::validation("UpdateTimeToLive requires boolean Enabled");

				    }

				    bool enabled = v->GetBool();

				    // Alternator TTL doesn't yet work when the table uses tablets (#16567)

				    if (enabled && _proxy.local_db().find_keyspace(schema->ks_name()).get_replication_strategy().uses_tablets()) {

				        co_return api_error::validation("TTL not yet supported on a table using tablets (issue #16567). "

				            "Create a table with the tag 'experimental:initial_tablets' set to 'none' to use vnodes.");

				    }

				    v = rjson::find(*spec, "AttributeName");

				    if (!v || !v->IsString()) {

				        co_return api_error::validation("UpdateTimeToLive requires string AttributeName");

				@@ -155,7 +158,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta

				// node owning this range as a "primary range" (the first node in the ring

				// with this range), but when this node is down, the secondary owner (the

				// second in the ring) may take over.

				// An expiration thread is reponsible for all tables which need expiration

				// An expiration thread is responsible for all tables which need expiration

				// scans. Currently, the different tables are scanned sequentially (not in

				// parallel).

				// The expiration thread scans item using CL=QUORUM to ensures that it reads

				@@ -417,6 +420,7 @@ class token_ranges_owned_by_this_shard {

				    };

				    schema_ptr _s;

				    locator::effective_replication_map_ptr _erm;

				    // _token_ranges will contain a list of token ranges owned by this node.

				    // We'll further need to split each such range to the pieces owned by

				    // the current shard, using _intersecter.

				@@ -430,15 +434,14 @@ class token_ranges_owned_by_this_shard {

				    size_t _range_idx;

				    size_t _end_idx;

				    std::optional<dht::selective_token_range_sharder> _intersecter;

				    locator::effective_replication_map_ptr _erm;

				public:

				    token_ranges_owned_by_this_shard(replica::database& db, gms::gossiper& g, schema_ptr s)

				        :  _s(s)

				        , _token_ranges(db.find_keyspace(s->ks_name()).get_effective_replication_map(),

				                g, utils::fb_utilities::get_broadcast_address())

				        , _erm(s->table().get_effective_replication_map())

				        , _token_ranges(db.find_keyspace(s->ks_name()).get_vnode_effective_replication_map(),

				                g, _erm->get_topology().my_address())

				        , _range_idx(random_offset(0, _token_ranges.size() - 1))

				        , _end_idx(_range_idx + _token_ranges.size())

				        , _erm(s->table().get_effective_replication_map())

				    {

				        tlogger.debug("Generating token ranges starting from base range {} of {}", _range_idx, _token_ranges.size());

				    }

									
										9

api/CMakeLists.txt
									
												View File
												
				@@ -15,10 +15,12 @@ set(swagger_files

				  api-doc/lsa.json

				  api-doc/messaging_service.json

				  api-doc/metrics.json

				  api-doc/raft.json

				  api-doc/storage_proxy.json

				  api-doc/storage_service.json

				  api-doc/stream_manager.json

				  api-doc/system.json

				  api-doc/tasks.json

				  api-doc/task_manager.json

				  api-doc/task_manager_test.json

				  api-doc/utils.json)

				@@ -52,12 +54,15 @@ target_sources(api

				    hinted_handoff.cc

				    lsa.cc

				    messaging_service.cc

				    raft.cc

				    storage_proxy.cc

				    storage_service.cc

				    stream_manager.cc

				    system.cc

				    tasks.cc

				    task_manager.cc

				    task_manager_test.cc

				    token_metadata.cc

				    ${swagger_gen_files})

				target_include_directories(api

				  PUBLIC

				@@ -66,6 +71,8 @@ target_include_directories(api

				target_link_libraries(api

				  idl

				  wasmtime_bindings

				  Seastar::seastar

				  xxHash::xxhash)

				check_headers(check-headers api

				  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

									
										10

api/api-doc/column_family.json
									
												View File
												
				@@ -84,6 +84,14 @@

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"flush_memtables",

				                     "description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when the table is flushed explicitly before invoking the compaction api.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"split_output",

				                     "description":"true if the output of the major compaction should be split in several sstables",

				@@ -203,7 +211,7 @@

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Sets the minumum and maximum number of sstables in queue before compaction kicks off",

				               "summary":"Sets the minimum and maximum number of sstables in queue before compaction kicks off",

				               "type":"string",

				               "nickname":"set_compaction_threshold",

				               "produces":[

									
										15

api/api-doc/commitlog.json
									
												View File
												
				@@ -144,6 +144,21 @@

				          "parameters": []

				        }

				      ]

				    },

				    {

				      "path": "/commitlog/metrics/max_disk_size",

				      "operations": [

				        {

				          "method": "GET",

				          "summary": "Get max disk size",

				          "type": "long",

				          "nickname": "get_max_disk_size",

				          "produces": [

				            "application/json"

				          ],

				          "parameters": []

				        }

				      ]

				    }

				   ]

				}

									
										24

api/api-doc/error_injection.json
									
												View File
												
				@@ -90,6 +90,30 @@

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/disconnect/{ip}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Drop connection to a given IP",

				               "type":"void",

				               "nickname":"inject_disconnect",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ip",

				                     "description":"IP address to disconnect from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/v2/error_injection/injection",

				         "operations":[

									
										4

api/api-doc/gossiper.json
									
												View File
												
				@@ -12,7 +12,7 @@

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get the addreses of the down endpoints",

				               "summary":"Get the addresses of the down endpoints",

				               "type":"array",

				               "items":{

				                  "type":"string"

				@@ -31,7 +31,7 @@

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get the addreses of live endpoints",

				               "summary":"Get the addresses of live endpoints",

				               "type":"array",

				               "items":{

				                  "type":"string"

									
										6

api/api-doc/metrics.def.json
									
												View File
												
				@@ -7,11 +7,11 @@

				                "items": {

				                    "type": "string"

				                },

				                "description": "The source labels, a match is based on concatination of the labels"

				                "description": "The source labels, a match is based on concatenation of the labels"

				            },

				            "action": {

				                "type": "string",

				                "description": "The action to perfrom on match",

				                "description": "The action to perform on match",

				                "enum": ["skip_when_empty", "report_when_empty", "replace", "keep", "drop", "drop_label"]

				            },

				            "target_label": {

				@@ -28,7 +28,7 @@

				            },

				            "separator": {

				                "type": "string",

				                "description": "The separator string to use when concatinating the labels"

				                "description": "The separator string to use when concatenating the labels"

				            }

				        }

				    }

									
										67

api/api-doc/raft.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,67 @@

				{

				   "apiVersion":"0.0.1",

				   "swaggerVersion":"1.2",

				   "basePath":"{{Protocol}}://{{Host}}",

				   "resourcePath":"/raft",

				   "produces":[

				      "application/json"

				   ],

				   "apis":[

				      {

				         "path":"/raft/trigger_snapshot/{group_id}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Triggers snapshot creation and log truncation for the given Raft group",

				               "type":"string",

				               "nickname":"trigger_snapshot",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"group_id",

				                     "description":"The ID of the group which should get snapshotted",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"timeout",

				                     "description":"Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"long",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/raft/leader_host",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Returns host ID of the current leader of the given Raft group",

				               "type":"string",

				               "nickname":"get_leader_host",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"group_id",

				                     "description":"The ID of the group. When absent, group0 is used.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      }

				   ]

				}

									
										387

api/api-doc/storage_service.json
									
												View File
												
				@@ -336,6 +336,14 @@

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Column family name",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -368,25 +376,6 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/describe_ring/",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"The TokenRange for a any keyspace",

				               "type":"array",

				               "items":{

				                  "type":"token_range"

				               },

				               "nickname":"describe_any_ring",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/describe_ring/{keyspace}",

				         "operations":[

				@@ -409,6 +398,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"table",

				                     "description":"The name of table to fetch information about",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -436,6 +433,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Column family name",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -701,6 +706,30 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/compact",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Forces major compaction in all keyspaces",

				               "type":"void",

				               "nickname":"force_compaction",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"flush_memtables",

				                     "description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/keyspace_compaction/{keyspace}",

				         "operations":[

				@@ -715,7 +744,7 @@

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "description":"The keyspace to compact",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -728,6 +757,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"flush_memtables",

				                     "description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -747,7 +784,7 @@

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "description":"The keyspace to cleanup",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -765,6 +802,21 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/cleanup_all",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger a global cleanup",

				               "type":"long",

				               "nickname":"cleanup_all",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/keyspace_offstrategy_compaction/{keyspace}",

				         "operations":[

				@@ -912,6 +964,21 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/flush",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Flush all memtables in all keyspaces.",

				               "type":"void",

				               "nickname":"force_flush",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/keyspace_flush/{keyspace}",

				         "operations":[

				@@ -1122,6 +1189,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"small_table_optimization",

				                     "description":"If the value is the string 'true' with any capitalization, perform small table optimization. When this option is enabled, user can send the repair request to any of the nodes in the cluster. There is no need to send repair requests to multiple nodes. All token ranges for the table will be repaired automatically.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            },

				@@ -1455,6 +1530,15 @@

				                     "type":"string",

				                     "enum": [ "all", "user", "non_local_strategy" ],

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"replication",

				                     "description":"Filter keyspaces for the replication used: vnodes or tablets (default: all)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "enum": [ "all", "vnodes", "tablets" ],

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -1602,7 +1686,7 @@

				            },

				            {

				               "method":"POST",

				               "summary":"allows a user to reenable thrift",

				               "summary":"allows a user to re-enable thrift",

				               "type":"void",

				               "nickname":"start_rpc_server",

				               "produces":[

				@@ -2410,6 +2494,238 @@

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/move",

				         "operations":[

				            {

				               "nickname":"move_tablet",

				               "method":"POST",

				               "summary":"Moves a tablet replica",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ks",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"Table name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"token",

				                     "description":"Token owned by the tablet to move",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"src_host",

				                     "description":"Source host id",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_host",

				                     "description":"Destination host id",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"src_shard",

				                     "description":"Source shard number",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_shard",

				                     "description":"Destination shard number",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"When set to true, replication strategy constraints can be broken (false by default)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/add_replica",

				         "operations":[

				            {

				               "nickname":"add_tablet_replica",

				               "method":"POST",

				               "summary":"Adds replica to tablet",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ks",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"Table name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"token",

				                     "description":"Token owned by the tablet to add replica to",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_host",

				                     "description":"Destination host id",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"dst_shard",

				                     "description":"Destination shard number",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"When set to true, replication strategy constraints can be broken (false by default)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/del_replica",

				         "operations":[

				            {

				               "nickname":"del_tablet_replica",

				               "method":"POST",

				               "summary":"Deletes replica from tablet",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"ks",

				                     "description":"Keyspace name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"table",

				                     "description":"Table name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"token",

				                     "description":"Token owned by the tablet to delete replica from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"host",

				                     "description":"Host id to remove replica from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"shard",

				                     "description":"Shard number to remove replica from",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"integer",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"When set to true, replication strategy constraints can be broken (false by default)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/tablets/balancing",

				         "operations":[

				            {

				               "nickname":"tablet_balancing_enable",

				               "method":"POST",

				               "summary":"Controls tablet load-balancing",

				               "type":"void",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"enabled",

				                     "description":"When set to false, tablet load balancing is disabled",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/metrics/total_hints",

				         "operations":[

				@@ -2511,6 +2827,33 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/storage_service/raft_topology/upgrade",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger the upgrade to topology on raft.",

				               "type":"void",

				               "nickname":"upgrade_to_raft_topology",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Get information about the current upgrade status of topology on raft.",

				               "type":"string",

				               "nickname":"raft_topology_upgrade_status",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      }

				   ],

				   "models":{

									
										15

api/api-doc/system.json
									
												View File
												
				@@ -179,6 +179,21 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/system/dump_llvm_profile",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Dump llvm profile data (raw profile data) that can later be used for coverage reporting or PGO (no-op if the current binary is not instrumented)",

				               "type":"void",

				               "nickname":"dump_profile",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      }

				   ]

				}

									
										230

api/api-doc/tasks.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,230 @@

				{

				   "apiVersion":"0.0.1",

				   "swaggerVersion":"1.2",

				   "basePath":"{{Protocol}}://{{Host}}",

				   "resourcePath":"/tasks",

				   "produces":[

				      "application/json"

				   ],

				   "apis":[

				      {

				         "path":"/tasks/compaction/keyspace_compaction/{keyspace}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Forces major compaction of a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",

				               "type":"string",

				               "nickname":"force_keyspace_compaction_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"flush_memtables",

				                     "description":"Controls flushing of memtables before compaction (true by default). Set to \"false\" to skip automatic flushing of memtables before compaction, e.g. when tables were flushed explicitly before invoking the compaction api.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_cleanup/{keyspace}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Trigger a cleanup of keys on a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",

				               "type": "string",

				               "nickname":"force_keyspace_cleanup_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_offstrategy_compaction/{keyspace}",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Perform offstrategy compaction, if needed, in a single keyspace asynchronously, returns uuid which can be used to check progress with task manager",

				               "type":"string",

				               "nickname":"perform_keyspace_offstrategy_compaction_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to operate on",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_scrub/{keyspace}",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Scrub (deserialize + reserialize at the latest version, resolving corruptions if any) the given keyspace asynchronously, returns uuid which can be used to check progress with task manager. If columnFamilies array is empty, all CFs are scrubbed. Scrubbed CFs will be snapshotted first, if disableSnapshot is false. Scrub has the following modes: Abort (default) - abort scrub if corruption is detected; Skip (same as `skip_corrupted=true`) skip over corrupt data, omitting them from the output; Segregate - segregate data into multiple sstables if needed, such that each sstable contains data with valid order; Validate - read (no rewrite) and validate data, logging any problems found.",

				               "type": "string",

				               "nickname":"scrub_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"disable_snapshot",

				                     "description":"When set to true, disable snapshot",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"skip_corrupted",

				                     "description":"When set to true, skip corrupted",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"scrub_mode",

				                     "description":"How to handle corrupt data (overrides 'skip_corrupted'); ",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "enum":[

				                        "ABORT",

				                        "SKIP",

				                        "SEGREGATE",

				                        "VALIDATE"

				                     ],

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"quarantine_mode",

				                     "description":"Controls whether to scrub quarantined sstables (default INCLUDE)",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "enum":[

				                        "INCLUDE",

				                        "EXCLUDE",

				                        "ONLY"

				                     ],

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace to query about",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/tasks/compaction/keyspace_upgrade_sstables/{keyspace}",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Rewrite all sstables to the latest version. Unlike scrub, it doesn't skip bad rows and do not snapshot sstables first asynchronously, returns uuid which can be used to check progress with task manager.",

				               "type": "string",

				               "nickname":"upgrade_sstables_async",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"keyspace",

				                     "description":"The keyspace",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"exclude_current_version",

				                     "description":"When set to true exclude current version",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"cf",

				                     "description":"Comma-separated table (column family) names",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      }

				   ]

				}

									
										47

api/api.cc
									
												View File
												
				@@ -11,6 +11,7 @@

				#include <seastar/http/transformers.hh>

				#include <seastar/http/api_docs.hh>

				#include "storage_service.hh"

				#include "token_metadata.hh"

				#include "commitlog.hh"

				#include "gossiper.hh"

				#include "failure_detector.hh"

				@@ -31,6 +32,8 @@

				#include "api/config.hh"

				#include "task_manager.hh"

				#include "task_manager_test.hh"

				#include "tasks.hh"

				#include "raft.hh"

				logging::logger apilog("api");

				@@ -65,6 +68,9 @@ future<> set_server_init(http_context& ctx) {

				                "The system related API");

				        rb02->add_definitions_file(r, "metrics");

				        set_system(ctx, r);

				        rb->register_function(r, "error_injection",

				            "The error injection API");

				        set_error_injection(ctx, r);

				    });

				}

				@@ -155,6 +161,14 @@ future<> unset_server_snapshot(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_snapshot(ctx, r); });

				}

				future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm) {

				    return ctx.http_server.set_routes([&ctx, &tm] (routes& r) { set_token_metadata(ctx, r, tm); });

				}

				future<> unset_server_token_metadata(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_token_metadata(ctx, r); });

				}

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch) {

				    return register_api(ctx, "endpoint_snitch_info", "The endpoint snitch info API", [&snitch] (http_context& ctx, routes& r) {

				        set_endpoint_snitch(ctx, r, snitch);

				@@ -172,14 +186,14 @@ future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {

				                });

				}

				future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {

				    return register_api(ctx, "column_family",

				                "The column family API", [&sys_ks] (http_context& ctx, routes& r) {

				                    set_column_family(ctx, r, sys_ks);

				                });

				}

				future<> unset_server_load_sstable(http_context& ctx) {

				future<> unset_server_column_family(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });

				}

				@@ -264,9 +278,6 @@ future<> set_server_done(http_context& ctx) {

				        rb->register_function(r, "collectd",

				                "The collectd API");

				        set_collectd(ctx, r);

				        rb->register_function(r, "error_injection",

				                "The error injection API");

				        set_error_injection(ctx, r);

				    });

				}

				@@ -302,6 +313,32 @@ future<> unset_server_task_manager_test(http_context& ctx) {

				#endif

				future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &ss, &snap_ctl](routes& r) {

				        rb->register_function(r, "tasks",

				                "The tasks API");

				        set_tasks_compaction_module(ctx, r, ss, snap_ctl);

				    });

				}

				future<> unset_server_tasks_compaction_module(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_tasks_compaction_module(ctx, r); });

				}

				future<> set_server_raft(http_context& ctx, sharded<service::raft_group_registry>& raft_gr) {

				    auto rb = std::make_shared<api_registry_builder>(ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &raft_gr] (routes& r) {

				        rb->register_function(r, "raft", "The Raft API");

				        set_raft(ctx, r, raft_gr);

				    });

				}

				future<> unset_server_raft(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_raft(ctx, r); });

				}

				void req_params::process(const request& req) {

				    // Process mandatory parameters

				    for (auto& [name, ent] : params) {

									
										26

api/api.hh
									
												View File
												
				@@ -14,11 +14,11 @@

				#include <boost/algorithm/string/split.hpp>

				#include <boost/algorithm/string/classification.hpp>

				#include <boost/units/detail/utility.hpp>

				#include "api/api_init.hh"

				#include "api/api-doc/utils.json.hh"

				#include "utils/histogram.hh"

				#include "utils/estimated_histogram.hh"

				#include <seastar/http/exception.hh>

				#include "api_init.hh"

				#include "seastarx.hh"

				namespace api {

				@@ -26,7 +26,9 @@ namespace api {

				template<class T>

				std::vector<sstring> container_to_vec(const T& container) {

				    std::vector<sstring> res;

				    for (auto i : container) {

				    res.reserve(std::size(container));

				    for (const auto& i : container) {

				        res.push_back(fmt::to_string(i));

				    }

				    return res;

				@@ -35,27 +37,31 @@ std::vector<sstring> container_to_vec(const T& container) {

				template<class T>

				std::vector<T> map_to_key_value(const std::map<sstring, sstring>& map) {

				    std::vector<T> res;

				    for (auto i : map) {

				    res.reserve(map.size());

				    for (const auto& [key, value] : map) {

				        res.push_back(T());

				        res.back().key = i.first;

				        res.back().value = i.second;

				        res.back().key = key;

				        res.back().value = value;

				    }

				    return res;

				}

				template<class T, class MAP>

				std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {

				    for (auto i : map) {

				    res.reserve(res.size() + std::size(map));

				    for (const auto& [key, value] : map) {

				        T val;

				        val.key = fmt::to_string(i.first);

				        val.value = fmt::to_string(i.second);

				        val.key = fmt::to_string(key);

				        val.value = fmt::to_string(value);

				        res.push_back(val);

				    }

				    return res;

				}

				template <typename T, typename S = T>

				T map_sum(T&& dest, const S& src) {

				    for (auto i : src) {

				    for (const auto& i : src) {

				        dest[i.first] += i.second;

				    }

				    return std::move(dest);

				@@ -64,6 +70,8 @@ T map_sum(T&& dest, const S& src) {

				template <typename MAP>

				std::vector<sstring> map_keys(const MAP& map) {

				    std::vector<sstring> res;

				    res.reserve(std::size(map));

				    for (const auto& i : map) {

				        res.push_back(fmt::to_string(i.first));

				    }

									
										23

api/api_init.hh
									
												View File
												
				@@ -23,6 +23,7 @@ class load_meter;

				class storage_proxy;

				class storage_service;

				class raft_group0_client;

				class raft_group_registry;

				} // namespace service

				@@ -32,6 +33,10 @@ namespace streaming {

				class stream_manager;

				}

				namespace gms {

				    class inet_address;

				}

				namespace locator {

				class token_metadata;

				@@ -73,14 +78,12 @@ struct http_context {

				    httpd::http_server_control http_server;

				    distributed<replica::database>& db;

				    service::load_meter& lmeter;

				    const sharded<locator::shared_token_metadata>& shared_token_metadata;

				    http_context(distributed<replica::database>& _db,

				            service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)

				            : db(_db), lmeter(_lm), shared_token_metadata(_stm) {

				            service::load_meter& _lm)

				            : db(_db), lmeter(_lm)

				    {

				    }

				    const locator::token_metadata& get_token_metadata();

				};

				future<> set_server_init(http_context& ctx);

				@@ -103,9 +106,11 @@ future<> set_server_authorization_cache(http_context& ctx, sharded<auth::service

				future<> unset_server_authorization_cache(http_context& ctx);

				future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);

				future<> unset_server_snapshot(http_context& ctx);

				future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm);

				future<> unset_server_token_metadata(http_context& ctx);

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);

				future<> set_server_load_sstable(http_context& ctx, sharded<db::system_keyspace>& sys_ks);

				future<> unset_server_load_sstable(http_context& ctx);

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);

				future<> unset_server_column_family(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);

				future<> unset_server_messaging_service(http_context& ctx);

				future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy);

				@@ -122,5 +127,9 @@ future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>

				future<> unset_server_task_manager(http_context& ctx);

				future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);

				future<> unset_server_task_manager_test(http_context& ctx);

				future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);

				future<> unset_server_tasks_compaction_module(http_context& ctx);

				future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);

				future<> unset_server_raft(http_context&);

				}

									
										2

api/authorization_cache.cc
									
												View File
												
				@@ -9,8 +9,6 @@

				#include "api/api-doc/authorization_cache.json.hh"

				#include "api/authorization_cache.hh"

				#include "api/api.hh"

				#include "auth/common.hh"

				#include "auth/service.hh"

				namespace api {

									
										6

api/cache_service.cc
									
												View File
												
				@@ -197,7 +197,7 @@ void set_cache_service(http_context& ctx, routes& r) {

				    cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return db.row_cache_tracker().region().occupancy().used_space();

				            return memory::stats().total_memory();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				@@ -240,9 +240,9 @@ void set_cache_service(http_context& ctx, routes& r) {

				    cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        // In origin row size is the weighted size.

				        // We currently do not support weights, so we use num entries instead

				        // We currently do not support weights, so we use raw size in bytes instead

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return db.row_cache_tracker().partitions();

				            return db.row_cache_tracker().region().occupancy().used_space();

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

									
										2

api/collectd.cc
									
												View File
												
				@@ -10,9 +10,9 @@

				#include "api/api-doc/collectd.json.hh"

				#include <seastar/core/scollectd.hh>

				#include <seastar/core/scollectd_api.hh>

				#include "endian.h"

				#include <boost/range/irange.hpp>

				#include <regex>

				#include "api/api_init.hh"

				namespace api {

									
										7

api/collectd.hh
									
												View File
												
				@@ -8,10 +8,13 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_collectd(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_collectd(http_context& ctx, seastar::httpd::routes& r);

				}

									
										30

api/column_family.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include "column_family.hh"

				#include "api/api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include <vector>

				#include <seastar/http/exception.hh>

				@@ -306,7 +307,9 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared

				void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {

				    cf::get_column_family_name.set(r, [&ctx] (const_req req){

				        std::vector<sstring> res;

				        ctx.db.local().get_tables_metadata().for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {

				        const replica::database::tables_metadata& meta = ctx.db.local().get_tables_metadata();

				        res.reserve(meta.size());

				        meta.for_each_table_id([&] (const std::pair<sstring, sstring>& kscf, table_id) {

				            res.push_back(kscf.first + ":" + kscf.second);

				        });

				        return res;

				@@ -326,8 +329,10 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_column_family_name_keyspace.set(r, [&ctx] (const_req req){

				        std::vector<sstring> res;

				        for (auto i = ctx.db.local().get_keyspaces().cbegin(); i!=  ctx.db.local().get_keyspaces().cend(); i++) {

				            res.push_back(i->first);

				        const flat_hash_map<sstring, replica::keyspace>& keyspaces = ctx.db.local().get_keyspaces();

				        res.reserve(keyspaces.size());

				        for (const auto& i : keyspaces) {

				            res.push_back(i.first);

				        }

				        return res;

				    });

				@@ -1047,12 +1052,19 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::force_major_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        if (req->get_query_param("split_output") != "") {

				        auto params = req_params({

				            std::pair("name", mandatory::yes),

				            std::pair("flush_memtables", mandatory::no),

				            std::pair("split_output", mandatory::no),

				        });

				        params.process(*req);

				        if (params.get("split_output")) {

				            fail(unimplemented::cause::API);

				        }

				        auto [ks, cf] = parse_fully_qualified_cf_name(*params.get("name"));

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.info("column_family/force_major_compaction: name={} flush={}", req->param["name"], flush);

				        apilog.info("column_family/force_major_compaction: name={}", req->param["name"]);

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->param["name"]);

				        auto keyspace = validate_keyspace(ctx, ks);

				        std::vector<table_info> table_infos = {table_info{

				            .name = cf,

				@@ -1060,7 +1072,11 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        }};

				        auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, std::move(table_infos));

				        std::optional<flush_mode> fmopt;

				        if (!flush) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt);

				        co_await task->done();

				        co_return json_void();

				    });

									
										4

api/column_family.hh
									
												View File
												
				@@ -8,11 +8,11 @@

				#pragma once

				#include "api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include "replica/database.hh"

				#include <seastar/core/future-util.hh>

				#include <seastar/json/json_elements.hh>

				#include <any>

				#include "api/api_init.hh"

				namespace db {

				class system_keyspace;

									
										6

api/commitlog.cc
									
												View File
												
				@@ -9,6 +9,7 @@

				#include "commitlog.hh"

				#include "db/commitlog/commitlog.hh"

				#include "api/api-doc/commitlog.json.hh"

				#include "api/api_init.hh"

				#include "replica/database.hh"

				#include <vector>

				@@ -16,7 +17,7 @@ namespace api {

				using namespace seastar::httpd;

				template<typename T>

				static auto acquire_cl_metric(http_context& ctx, std::function<T (db::commitlog*)> func) {

				static auto acquire_cl_metric(http_context& ctx, std::function<T (const db::commitlog*)> func) {

				    typedef T ret_type;

				    return ctx.db.map_reduce0([func = std::move(func)](replica::database& db) {

				@@ -62,6 +63,9 @@ void set_commitlog(http_context& ctx, routes& r) {

				    httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_max_disk_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::disk_limit, std::placeholders::_1));

				    });

				}

				}

									
										8

api/commitlog.hh
									
												View File
												
				@@ -8,10 +8,12 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_commitlog(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_commitlog(http_context& ctx, seastar::httpd::routes& r);

				}

									
										9

api/compaction_manager.cc
									
												View File
												
				@@ -10,6 +10,7 @@

				#include "compaction_manager.hh"

				#include "compaction/compaction_manager.hh"

				#include "api/api.hh"

				#include "api/api-doc/compaction_manager.json.hh"

				#include "db/system_keyspace.hh"

				#include "column_family.hh"

				@@ -50,7 +51,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				            for (const auto& c : cm.get_compactions()) {

				                cm::summary s;

				                s.id = c.compaction_uuid.to_sstring();

				                s.id = fmt::to_string(c.compaction_uuid);

				                s.ks = c.ks_name;

				                s.cf = c.cf_name;

				                s.unit = "keys";

				@@ -115,9 +116,9 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				            table_names = map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());

				        }

				        auto type = req->get_query_param("type");

				        co_await ctx.db.invoke_on_all([&ks_name, &table_names, type] (replica::database& db) {

				        co_await ctx.db.invoke_on_all([&] (replica::database& db) {

				            auto& cm = db.get_compaction_manager();

				            return parallel_for_each(table_names, [&db, &cm, &ks_name, type] (sstring& table_name) {

				            return parallel_for_each(table_names, [&] (sstring& table_name) {

				                auto& t = db.find_column_family(ks_name, table_name);

				                return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {

				                    return cm.stop_compaction(type, &ts);

				@@ -157,7 +158,7 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				                return s.write("[").then([&ctx, &s, &first] {

				                    return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {

				                        cm::history h;

				                        h.id = entry.id.to_sstring();

				                        h.id = fmt::to_string(entry.id);

				                        h.ks = std::move(entry.ks);

				                        h.cf = std::move(entry.cf);

				                        h.compacted_at = entry.compacted_at;

									
										8

api/compaction_manager.hh
									
												View File
												
				@@ -8,10 +8,12 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_compaction_manager(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r);

				}

									
										1

api/config.cc
									
												View File
												
				@@ -11,6 +11,7 @@

				#include "db/config.hh"

				#include <sstream>

				#include <boost/algorithm/string/replace.hpp>

				#include <seastar/http/exception.hh>

				namespace api {

				using namespace seastar::httpd;

									
										2

api/config.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				#include <seastar/http/api_docs.hh>

				namespace api {

									
										32

api/endpoint_snitch.cc
									
												View File
												
				@@ -6,45 +6,15 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "locator/token_metadata.hh"

				#include "locator/snitch_base.hh"

				#include "locator/production_snitch_base.hh"

				#include "endpoint_snitch.hh"

				#include "api/api-doc/endpoint_snitch_info.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "utils/fb_utilities.hh"

				namespace api {

				using namespace seastar::httpd;

				void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_ptr>& snitch) {

				    static auto host_or_broadcast = [](const_req req) {

				        auto host = req.get_query_param("host");

				        return host.empty() ? gms::inet_address(utils::fb_utilities::get_broadcast_address()) : gms::inet_address(host);

				    };

				    httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&ctx](const_req req) {

				        auto& topology = ctx.shared_token_metadata.local().get()->get_topology();

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.dc;

				        }

				        return topology.get_datacenter(ep);

				    });

				    httpd::endpoint_snitch_info_json::get_rack.set(r, [&ctx](const_req req) {

				        auto& topology = ctx.shared_token_metadata.local().get()->get_topology();

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.rack;

				        }

				        return topology.get_rack(ep);

				    });

				    httpd::endpoint_snitch_info_json::get_snitch_name.set(r, [&snitch] (const_req req) {

				        return snitch.local()->get_name();

				    });

				@@ -60,8 +30,6 @@ void set_endpoint_snitch(http_context& ctx, routes& r, sharded<locator::snitch_p

				}

				void unset_endpoint_snitch(http_context& ctx, routes& r) {

				    httpd::endpoint_snitch_info_json::get_datacenter.unset(r);

				    httpd::endpoint_snitch_info_json::get_rack.unset(r);

				    httpd::endpoint_snitch_info_json::get_snitch_name.unset(r);

				    httpd::storage_service_json::update_snitch.unset(r);

				}

									
										2

api/endpoint_snitch.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace locator {

				class snitch_ptr;

									
										2

api/error_injection.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace api {

									
										81

api/failure_detector.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include "failure_detector.hh"

				#include "api/api.hh"

				#include "api/api-doc/failure_detector.json.hh"

				#include "gms/application_state.hh"

				#include "gms/gossiper.hh"

				@@ -18,37 +19,43 @@ namespace fd = httpd::failure_detector_json;

				void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    fd::get_all_endpoint_states.set(r, [&g](std::unique_ptr<request> req) {

				        std::vector<fd::endpoint_state> res;

				        res.reserve(g.num_endpoints());

				        g.for_each_endpoint_state([&] (const gms::inet_address& addr, const gms::endpoint_state& eps) {

				            fd::endpoint_state val;

				            val.addrs = fmt::to_string(addr);

				            val.is_alive = g.is_alive(addr);

				            val.generation = eps.get_heart_beat_state().get_generation().value();

				            val.version = eps.get_heart_beat_state().get_heart_beat_version().value();

				            val.update_time = eps.get_update_timestamp().time_since_epoch().count();

				            for (const auto& [as_type, app_state] : eps.get_application_state_map()) {

				                fd::version_value version_val;

				                // We return the enum index and not it's name to stay compatible to origin

				                // method that the state index are static but the name can be changed.

				                version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(as_type);

				                version_val.value = app_state.value();

				                version_val.version = app_state.version().value();

				                val.application_state.push(version_val);

				            }

				            res.emplace_back(std::move(val));

				        return g.container().invoke_on(0, [] (gms::gossiper& g) {

				            std::vector<fd::endpoint_state> res;

				            res.reserve(g.num_endpoints());

				            g.for_each_endpoint_state([&] (const gms::inet_address& addr, const gms::endpoint_state& eps) {

				                fd::endpoint_state val;

				                val.addrs = fmt::to_string(addr);

				                val.is_alive = g.is_alive(addr);

				                val.generation = eps.get_heart_beat_state().get_generation().value();

				                val.version = eps.get_heart_beat_state().get_heart_beat_version().value();

				                val.update_time = eps.get_update_timestamp().time_since_epoch().count();

				                for (const auto& [as_type, app_state] : eps.get_application_state_map()) {

				                    fd::version_value version_val;

				                    // We return the enum index and not it's name to stay compatible to origin

				                    // method that the state index are static but the name can be changed.

				                    version_val.application_state = static_cast<std::underlying_type<gms::application_state>::type>(as_type);

				                    version_val.value = app_state.value();

				                    version_val.version = app_state.version().value();

				                    val.application_state.push(version_val);

				                }

				                res.emplace_back(std::move(val));

				            });

				            return make_ready_future<json::json_return_type>(res);

				        });

				        return make_ready_future<json::json_return_type>(res);

				    });

				    fd::get_up_endpoint_count.set(r, [&g](std::unique_ptr<request> req) {

				        int res = g.get_up_endpoint_count();

				        return make_ready_future<json::json_return_type>(res);

				        return g.container().invoke_on(0, [] (gms::gossiper& g) {

				            int res = g.get_up_endpoint_count();

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    fd::get_down_endpoint_count.set(r, [&g](std::unique_ptr<request> req) {

				        int res = g.get_down_endpoint_count();

				        return make_ready_future<json::json_return_type>(res);

				        return g.container().invoke_on(0, [] (gms::gossiper& g) {

				            int res = g.get_down_endpoint_count();

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    fd::get_phi_convict_threshold.set(r, [] (std::unique_ptr<request> req) {

				@@ -56,11 +63,13 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				    fd::get_simple_states.set(r, [&g] (std::unique_ptr<request> req) {

				        std::map<sstring, sstring> nodes_status;

				        g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {

				            nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");

				        return g.container().invoke_on(0, [] (gms::gossiper& g) {

				            std::map<sstring, sstring> nodes_status;

				            g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {

				                nodes_status.emplace(node.to_sstring(), g.is_alive(node) ? "UP" : "DOWN");

				            });

				            return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));

				        });

				        return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));

				    });

				    fd::set_phi_convict_threshold.set(r, [](std::unique_ptr<request> req) {

				@@ -71,13 +80,15 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				    fd::get_endpoint_state.set(r, [&g] (std::unique_ptr<request> req) {

				        auto state = g.get_endpoint_state_ptr(gms::inet_address(req->param["addr"]));

				        if (!state) {

				            return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->param["addr"]));

				        }

				        std::stringstream ss;

				        g.append_endpoint_state(ss, *state);

				        return make_ready_future<json::json_return_type>(sstring(ss.str()));

				        return g.container().invoke_on(0, [req = std::move(req)] (gms::gossiper& g) {

				            auto state = g.get_endpoint_state_ptr(gms::inet_address(req->param["addr"]));

				            if (!state) {

				                return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->param["addr"]));

				            }

				            std::stringstream ss;

				            g.append_endpoint_state(ss, *state);

				            return make_ready_future<json::json_return_type>(sstring(ss.str()));

				        });

				    });

				    fd::get_endpoint_phi_values.set(r, [](std::unique_ptr<request> req) {

									
										2

api/failure_detector.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api_init.hh"

				namespace gms {

									
										1

api/gossiper.cc
									
												View File
												
				@@ -12,6 +12,7 @@

				#include "api/api-doc/gossiper.json.hh"

				#include "gms/endpoint_state.hh"

				#include "gms/gossiper.hh"

				#include "api/api.hh"

				namespace api {

				using namespace seastar::httpd;

									
										2

api/gossiper.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace gms {

									
										2

api/hinted_handoff.cc
									
												View File
												
				@@ -6,10 +6,10 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <algorithm>

				#include <vector>

				#include "hinted_handoff.hh"

				#include "api/api.hh"

				#include "api/api-doc/hinted_handoff.json.hh"

				#include "gms/inet_address.hh"

									
										2

api/hinted_handoff.hh
									
												View File
												
				@@ -9,7 +9,7 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				#include "api/api_init.hh"

				namespace service { class storage_proxy; }

									
										6

api/lsa.cc
									
												View File
												
				@@ -8,12 +8,10 @@

				#include "api/api-doc/lsa.json.hh"

				#include "api/lsa.hh"

				#include "api/api.hh"

				#include <seastar/http/exception.hh>

				#include "utils/logalloc.hh"

				#include "log.hh"

				#include "replica/database.hh"

				namespace api {

				using namespace seastar::httpd;

				@@ -21,9 +19,9 @@ using namespace seastar::httpd;

				static logging::logger alogger("lsa-api");

				void set_lsa(http_context& ctx, routes& r) {

				    httpd::lsa_json::lsa_compact.set(r, [&ctx](std::unique_ptr<request> req) {

				    httpd::lsa_json::lsa_compact.set(r, [](std::unique_ptr<request> req) {

				        alogger.info("Triggering compaction");

				        return ctx.db.invoke_on_all([] (replica::database&) {

				        return smp::invoke_on_all([] {

				            logalloc::shard_tracker().reclaim(std::numeric_limits<size_t>::max());

				        }).then([] {

				            return json::json_return_type(json::json_void());

									
										2

api/lsa.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace api {

									
										17

api/messaging_service.cc
									
												View File
												
				@@ -10,8 +10,8 @@

				#include "message/messaging_service.hh"

				#include <seastar/rpc/rpc_types.hh>

				#include "api/api-doc/messaging_service.json.hh"

				#include <iostream>

				#include <sstream>

				#include "api/api-doc/error_injection.json.hh"

				#include "api/api.hh"

				using namespace seastar::httpd;

				using namespace httpd::messaging_service_json;

				@@ -19,6 +19,8 @@ using namespace netw;

				namespace api {

				namespace hf = httpd::error_injection_json;

				using shard_info = messaging_service::shard_info;

				using msg_addr = messaging_service::msg_addr;

				@@ -112,7 +114,7 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging

				    }));

				    get_version.set(r, [&ms](const_req req) {

				        return ms.local().get_raw_version(req.get_query_param("addr"));

				        return ms.local().get_raw_version(gms::inet_address(req.get_query_param("addr")));

				    });

				    get_dropped_messages_by_ver.set(r, [&ms](std::unique_ptr<request> req) {

				@@ -142,6 +144,14 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    hf::inject_disconnect.set(r, [&ms] (std::unique_ptr<request> req) -> future<json::json_return_type> {

				        auto ip = msg_addr(req->param["ip"]);

				        co_await ms.invoke_on_all([ip] (netw::messaging_service& ms) {

				            ms.remove_rpc_client(ip);

				        });

				        co_return json::json_void();

				    });

				}

				void unset_messaging_service(http_context& ctx, routes& r) {

				@@ -155,6 +165,7 @@ void unset_messaging_service(http_context& ctx, routes& r) {

				    get_respond_completed_messages.unset(r);

				    get_version.unset(r);

				    get_dropped_messages_by_ver.unset(r);

				    hf::inject_disconnect.unset(r);

				}

				}

									
										2

api/messaging_service.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace netw { class messaging_service; }

									
										84

api/raft.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,84 @@

				/*

				 * Copyright (C) 2024-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <seastar/core/coroutine.hh>

				#include "api/api.hh"

				#include "api/api-doc/raft.json.hh"

				#include "service/raft/raft_group_registry.hh"

				using namespace seastar::httpd;

				extern logging::logger apilog;

				namespace api {

				namespace r = httpd::raft_json;

				using namespace json;

				void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr) {

				    r::trigger_snapshot.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        raft::group_id gid{utils::UUID{req->param["group_id"]}};

				        auto timeout_dur = std::invoke([timeout_str = req->get_query_param("timeout")] {

				            if (timeout_str.empty()) {

				                return std::chrono::seconds{60};

				            }

				            auto dur = std::stoll(timeout_str);

				            if (dur <= 0) {

				                throw std::runtime_error{"Timeout must be a positive number."};

				            }

				            return std::chrono::seconds{dur};

				        });

				        std::atomic<bool> found_srv{false};

				        co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {

				            auto* srv = raft_gr.find_server(gid);

				            if (!srv) {

				                co_return;

				            }

				            found_srv = true;

				            abort_on_expiry aoe(lowres_clock::now() + timeout_dur);

				            apilog.info("Triggering Raft group {} snapshot", gid);

				            auto result = co_await srv->trigger_snapshot(&aoe.abort_source());

				            if (result) {

				                apilog.info("New snapshot for Raft group {} created", gid);

				            } else {

				                apilog.info("Could not create new snapshot for Raft group {}, no new entries applied", gid);

				            }

				        });

				        if (!found_srv) {

				            throw std::runtime_error{fmt::format("Server for group ID {} not found", gid)};

				        }

				        co_return json_void{};

				    });

				    r::get_leader_host.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        return smp::submit_to(0, [&] {

				            auto& srv = std::invoke([&] () -> raft::server& {

				                if (req->query_parameters.contains("group_id")) {

				                    raft::group_id id{utils::UUID{req->get_query_param("group_id")}};

				                    return raft_gr.local().get_server(id);

				                } else {

				                    return raft_gr.local().group0();

				                }

				            });

				            return json_return_type(srv.current_leader().to_sstring());

				        });

				    });

				}

				void unset_raft(http_context&, httpd::routes& r) {

				    r::trigger_snapshot.unset(r);

				    r::get_leader_host.unset(r);

				}

				}

									
										18

api/raft.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,18 @@

				/*

				 * Copyright (C) 2023-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#pragma once

				#include "api_init.hh"

				namespace api {

				void set_raft(http_context& ctx, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr);

				void unset_raft(http_context& ctx, httpd::routes& r);

				}

									
										18

api/scrub_status.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,18 @@

				/*

				 * Copyright (C) 2023-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				namespace api {

				enum class scrub_status {

				    successful = 0,

				    aborted,

				    unable_to_cancel,   // Not used in Scylla, included to ensure compatibility with nodetool api.

				    validation_errors,

				};

				} // namespace api

									
										5

api/storage_proxy.cc
									
												View File
												
				@@ -8,6 +8,7 @@

				#include "storage_proxy.hh"

				#include "service/storage_proxy.hh"

				#include "api/api.hh"

				#include "api/api-doc/storage_proxy.json.hh"

				#include "api/api-doc/utils.json.hh"

				#include "db/config.hh"

				@@ -27,7 +28,7 @@ utils::time_estimated_histogram timed_rate_moving_average_summary_merge(utils::t

				}

				/**

				 * This function implement a two dimentional map reduce where

				 * This function implement a two dimensional map reduce where

				 * the first level is a distributed storage_proxy class and the

				 * second level is the stats per scheduling group class.

				 * @param d -  a reference to the storage_proxy distributed class.

				@@ -48,7 +49,7 @@ future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,

				}

				/**

				 * This function implement a two dimentional map reduce where

				 * This function implement a two dimensional map reduce where

				 * the first level is a distributed storage_proxy class and the

				 * second level is the stats per scheduling group class.

				 * @param d -  a reference to the storage_proxy distributed class.

									
										2

api/storage_proxy.hh
									
												View File
												
				@@ -9,7 +9,7 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				#include "api/api_init.hh"

				namespace service { class storage_proxy; }

									
										632

api/storage_service.cc
									
												View File
												
				@@ -7,8 +7,11 @@

				 */

				#include "storage_service.hh"

				#include "api/api.hh"

				#include "api/api-doc/column_family.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "api/api-doc/storage_proxy.json.hh"

				#include "api/scrub_status.hh"

				#include "db/config.hh"

				#include "db/schema_tables.hh"

				#include "utils/hash.hh"

				@@ -16,11 +19,14 @@

				#include <sstream>

				#include <time.h>

				#include <algorithm>

				#include <functional>

				#include <iterator>

				#include <boost/range/adaptor/map.hpp>

				#include <boost/range/adaptor/filtered.hpp>

				#include <boost/algorithm/string/trim_all.hpp>

				#include <boost/algorithm/string/case_conv.hpp>

				#include <boost/functional/hash.hpp>

				#include "service/raft/raft_group0_client.hh"

				#include "service/storage_service.hh"

				#include "service/load_meter.hh"

				#include "db/commitlog/commitlog.hh"

				@@ -35,6 +41,7 @@

				#include "log.hh"

				#include "release.hh"

				#include "compaction/compaction_manager.hh"

				#include "compaction/task_manager_module.hh"

				#include "sstables/sstables.hh"

				#include "replica/database.hh"

				#include "db/extensions.hh"

				@@ -58,26 +65,66 @@ namespace ss = httpd::storage_service_json;

				namespace sp = httpd::storage_proxy_json;

				using namespace json;

				sstring validate_keyspace(http_context& ctx, sstring ks_name) {

				sstring validate_keyspace(const http_context& ctx, sstring ks_name) {

				    if (ctx.db.local().has_keyspace(ks_name)) {

				        return ks_name;

				    }

				    throw bad_param_exception(replica::no_such_keyspace(ks_name).what());

				}

				sstring validate_keyspace(http_context& ctx, const parameters& param) {

				sstring validate_keyspace(const http_context& ctx, const parameters& param) {

				    return validate_keyspace(ctx, param["keyspace"]);

				}

				static void validate_table(const http_context& ctx, sstring ks_name, sstring table_name) {

				    auto& db = ctx.db.local();

				    try {

				        db.find_column_family(ks_name, table_name);

				    } catch (replica::no_such_column_family& e) {

				        throw bad_param_exception(e.what());

				    }

				}

				static void ensure_tablets_disabled(const http_context& ctx, const sstring& ks_name, const sstring& api_endpoint_path) {

				    if (ctx.db.local().find_keyspace(ks_name).uses_tablets()) {

				        throw bad_param_exception{fmt::format("{} is per-table in keyspace '{}'. Please provide table name using 'cf' parameter.", api_endpoint_path, ks_name)};

				    }

				}

				static bool any_of_keyspaces_use_tablets(const http_context& ctx) {

				    auto& db = ctx.db.local();

				    auto uses_tablets = [&db](const auto& ks_name) {

				        return db.find_keyspace(ks_name).uses_tablets();

				    };

				    auto keyspaces = db.get_all_keyspaces();

				    return std::any_of(std::begin(keyspaces), std::end(keyspaces), uses_tablets);

				}

				locator::host_id validate_host_id(const sstring& param) {

				    auto hoep = locator::host_id_or_endpoint(param, locator::host_id_or_endpoint::param_type::host_id);

				    return hoep.id;

				    return hoep.id();

				}

				bool validate_bool(const sstring& param) {

				    if (param == "true") {

				        return true;

				    } else if (param == "false") {

				        return false;

				    } else {

				        throw std::runtime_error("Parameter must be either 'true' or 'false'");

				    }

				}

				static

				int64_t validate_int(const sstring& param) {

				    return std::atoll(param.c_str());

				}

				// splits a request parameter assumed to hold a comma-separated list of table names

				// verify that the tables are found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective no_such_column_family error.

				std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, sstring value) {

				std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, sstring value) {

				    if (value.empty()) {

				        return map_keys(ctx.db.local().find_keyspace(ks_name).metadata().get()->cf_meta_data());

				    }

				@@ -92,7 +139,7 @@ std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, sst

				    return names;

				}

				std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {

				std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {

				    auto it = query_params.find(param_name);

				    if (it == query_params.end()) {

				        return {};

				@@ -100,7 +147,7 @@ std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, con

				    return parse_tables(ks_name, ctx, it->second);

				}

				std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, sstring value) {

				std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, sstring value) {

				    std::vector<table_info> res;

				    try {

				        if (value.empty()) {

				@@ -125,30 +172,11 @@ std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context&

				    return res;

				}

				std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {

				std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name) {

				    auto it = query_params.find(param_name);

				    return parse_table_infos(ks_name, ctx, it != query_params.end() ? it->second : "");

				}

				// Run on all tables, skipping dropped tables

				future<> run_on_existing_tables(sstring op, replica::database& db, std::string_view keyspace, const std::vector<table_info> local_tables, std::function<future<> (replica::table&)> func) {

				    std::exception_ptr ex;

				    for (const auto& ti : local_tables) {

				        apilog.debug("Starting {} on {}.{}", op, keyspace, ti);

				        try {

				            co_await func(db.find_column_family(ti.id));

				        } catch (const replica::no_such_column_family& e) {

				            apilog.warn("Skipping {} of {}.{}: {}", op, keyspace, ti, e.what());

				        } catch (...) {

				            ex = std::current_exception();

				            apilog.error("Failed {} of {}.{}: {}", op, keyspace, ti, ex);

				        }

				        if (ex) {

				            co_await coroutine::return_exception_ptr(std::move(ex));

				        }

				    }

				}

				static ss::token_range token_range_endpoints_to_json(const dht::token_range_endpoints& d) {

				    ss::token_range r;

				    r.start_token = d._start_token;

				@@ -249,18 +277,86 @@ future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const

				    });

				}

				future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl, std::unique_ptr<http::request> req) {

				    scrub_info info;

				    auto rp = req_params({

				        {"keyspace", {mandatory::yes}},

				        {"cf", {""}},

				        {"scrub_mode", {}},

				        {"skip_corrupted", {}},

				        {"disable_snapshot", {}},

				        {"quarantine_mode", {}},

				    });

				    rp.process(*req);

				    info.keyspace = validate_keyspace(ctx, *rp.get("keyspace"));

				    info.column_families = parse_tables(info.keyspace, ctx, *rp.get("cf"));

				    auto scrub_mode_opt = rp.get("scrub_mode");

				    auto scrub_mode = sstables::compaction_type_options::scrub::mode::abort;

				    if (!scrub_mode_opt) {

				        const auto skip_corrupted = rp.get_as<bool>("skip_corrupted").value_or(false);

				        if (skip_corrupted) {

				            scrub_mode = sstables::compaction_type_options::scrub::mode::skip;

				        }

				    } else {

				        auto scrub_mode_str = *scrub_mode_opt;

				        if (scrub_mode_str == "ABORT") {

				            scrub_mode = sstables::compaction_type_options::scrub::mode::abort;

				        } else if (scrub_mode_str == "SKIP") {

				            scrub_mode = sstables::compaction_type_options::scrub::mode::skip;

				        } else if (scrub_mode_str == "SEGREGATE") {

				            scrub_mode = sstables::compaction_type_options::scrub::mode::segregate;

				        } else if (scrub_mode_str == "VALIDATE") {

				            scrub_mode = sstables::compaction_type_options::scrub::mode::validate;

				        } else {

				            throw httpd::bad_param_exception(fmt::format("Unknown argument for 'scrub_mode' parameter: {}", scrub_mode_str));

				        }

				    }

				    if (!req_param<bool>(*req, "disable_snapshot", false)) {

				        auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());

				        co_await coroutine::parallel_for_each(info.column_families, [&snap_ctl, keyspace = info.keyspace, tag](sstring cf) {

				            // We always pass here db::snapshot_ctl::snap_views::no since:

				            // 1. When scrubbing particular tables, there's no need to auto-snapshot their views.

				            // 2. When scrubbing the whole keyspace, column_families will contain both base tables and views.

				            return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag, db::snapshot_ctl::snap_views::no, db::snapshot_ctl::skip_flush::no);

				        });

				    }

				    info.opts = {

				        .operation_mode = scrub_mode,

				    };

				    const sstring quarantine_mode_str = req_param<sstring>(*req, "quarantine_mode", "INCLUDE");

				    if (quarantine_mode_str == "INCLUDE") {

				        info.opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::include;

				    } else if (quarantine_mode_str == "EXCLUDE") {

				        info.opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::exclude;

				    } else if (quarantine_mode_str == "ONLY") {

				        info.opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::only;

				    } else {

				        throw httpd::bad_param_exception(fmt::format("Unknown argument for 'quarantine_mode' parameter: {}", quarantine_mode_str));

				    }

				    co_return info;

				}

				void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {

				    ss::start_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {

				    ss::start_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return ctl.start_server();

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.start_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {

				    ss::stop_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return ctl.request_stop_server();

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.request_stop_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				@@ -282,17 +378,21 @@ void unset_transport_controller(http_context& ctx, routes& r) {

				}

				void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {

				    ss::stop_rpc_server.set(r, [&ctl](std::unique_ptr<http::request> req) {

				    ss::stop_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return ctl.request_stop_server();

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.request_stop_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::start_rpc_server.set(r, [&ctl](std::unique_ptr<http::request> req) {

				    ss::start_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return ctl.start_server();

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.start_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				@@ -315,9 +415,22 @@ void unset_rpc_controller(http_context& ctx, routes& r) {

				void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {

				    ss::repair_async.set(r, [&ctx, &repair](std::unique_ptr<http::request> req) {

				        static std::vector<sstring> options = {"primaryRange", "parallelism", "incremental",

				        static std::unordered_set<sstring> options = {"primaryRange", "parallelism", "incremental",

				                "jobThreads", "ranges", "columnFamilies", "dataCenters", "hosts", "ignore_nodes", "trace",

				                "startToken", "endToken", "ranges_parallelism"};

				                "startToken", "endToken", "ranges_parallelism", "small_table_optimization"};

				        // Nodetool still sends those unsupported options. Ignore them to avoid failing nodetool repair.

				        static std::unordered_set<sstring> legacy_options_to_ignore = {"pullRepair", "ignoreUnreplicatedKeyspaces"};

				        for (auto& x : req->query_parameters) {

				            if (legacy_options_to_ignore.contains(x.first)) {

				                continue;

				            }

				            if (!options.contains(x.first)) {

				                return make_exception_future<json::json_return_type>(

				                        httpd::bad_param_exception(format("option {} is not supported", x.first)));

				            }

				        }

				        std::unordered_map<sstring, sstring> options_map;

				        for (auto o : options) {

				            auto s = req->get_query_param(o);

				@@ -347,7 +460,7 @@ void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {

				                .then_wrapped([] (future<repair_status>&& fut) {

				            ss::ns_repair_async_status::return_type_wrapper res;

				            try {

				                res = fut.get0();

				                res = fut.get();

				            } catch(std::runtime_error& e) {

				                throw httpd::bad_param_exception(e.what());

				            }

				@@ -380,7 +493,7 @@ void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {

				                .then_wrapped([] (future<repair_status>&& fut) {

				            ss::ns_repair_async_status::return_type_wrapper res;

				            try {

				                res = fut.get0();

				                res = fut.get();

				            } catch (std::exception& e) {

				                return make_exception_future<json::json_return_type>(httpd::bad_param_exception(e.what()));

				            }

				@@ -462,25 +575,11 @@ static future<json::json_return_type> describe_ring_as_json(sharded<service::sto

				    co_return json::json_return_type(stream_range_as_array(co_await ss.local().describe_ring(keyspace), token_range_endpoints_to_json));

				}

				static future<json::json_return_type> describe_ring_as_json_for_table(const sharded<service::storage_service>& ss, sstring keyspace, sstring table) {

				    co_return json::json_return_type(stream_range_as_array(co_await ss.local().describe_ring_for_table(keyspace, table), token_range_endpoints_to_json));

				}

				void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {

				    ss::local_hostid.set(r, [&ss](std::unique_ptr<http::request> req) {

				        auto id = ss.local().get_token_metadata().get_my_id();

				        return make_ready_future<json::json_return_type>(id.to_sstring());

				    });

				    ss::get_tokens.set(r, [&ss] (std::unique_ptr<http::request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(ss.local().get_token_metadata().sorted_tokens(), [](const dht::token& i) {

				           return fmt::to_string(i);

				        }));

				    });

				    ss::get_node_tokens.set(r, [&ss] (std::unique_ptr<http::request> req) {

				        gms::inet_address addr(req->param["endpoint"]);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(ss.local().get_token_metadata().get_tokens(addr), [](const dht::token& i) {

				           return fmt::to_string(i);

				       }));

				    });

				    ss::get_commitlog.set(r, [&ctx](const_req req) {

				        return ctx.db.local().commitlog()->active_config().commit_log_location;

				    });

				@@ -544,24 +643,6 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        });

				    });

				    ss::get_leaving_nodes.set(r, [&ss](const_req req) {

				        return container_to_vec(ss.local().get_token_metadata().get_leaving_endpoints());

				    });

				    ss::get_moving_nodes.set(r, [](const_req req) {

				        std::unordered_set<sstring> addr;

				        return container_to_vec(addr);

				    });

				    ss::get_joining_nodes.set(r, [&ss](const_req req) {

				        auto points = ss.local().get_token_metadata().get_bootstrap_tokens();

				        std::unordered_set<sstring> addr;

				        for (auto i: points) {

				            addr.insert(fmt::to_string(i.second));

				        }

				        return container_to_vec(addr);

				    });

				    ss::get_release_version.set(r, [&ss](const_req req) {

				        return ss.local().get_release_version();

				    });

				@@ -583,8 +664,23 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::get_range_to_endpoint_map.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto table = req->get_query_param("cf");

				        auto erm = std::invoke([&]() -> locator::effective_replication_map_ptr {

				            auto& ks = ctx.db.local().find_keyspace(keyspace);

				            if (table.empty()) {

				                ensure_tablets_disabled(ctx, keyspace, "storage_service/range_to_endpoint_map");

				                return ks.get_vnode_effective_replication_map();

				            } else {

				                validate_table(ctx, keyspace, table);

				                auto& cf = ctx.db.local().find_column_family(keyspace, table);

				                return cf.get_effective_replication_map();

				            }

				        });

				        std::vector<ss::maplist_mapper> res;

				        co_return stream_range_as_array(co_await ss.local().get_range_to_address_map(keyspace),

				        co_return stream_range_as_array(co_await ss.local().get_range_to_address_map(erm),

				                [](const std::pair<dht::token_range, inet_address_vector_replica_set>& entry){

				            ss::maplist_mapper m;

				            if (entry.first.start()) {

				@@ -612,25 +708,19 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        return make_ready_future<json::json_return_type>(res);

				    });

				    ss::describe_any_ring.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) {

				        // Find an arbitrary non-system keyspace.

				        auto keyspaces = ctx.db.local().get_non_local_vnode_based_strategy_keyspaces();

				        if (keyspaces.empty()) {

				            throw std::runtime_error("No keyspace provided and no non system kespace exist");

				        }

				        auto ks = keyspaces[0];

				        return describe_ring_as_json(ss, ks);

				    });

				    ss::describe_ring.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) {

				        if (!req->param.exists("keyspace")) {

				            throw bad_param_exception("The keyspace param is not provided");

				        }

				        auto keyspace = req->param["keyspace"];

				        auto table = req->get_query_param("table");

				        if (!table.empty()) {

				            validate_table(ctx, keyspace, table);

				            return describe_ring_as_json_for_table(ss, keyspace, table);

				        }

				        return describe_ring_as_json(ss, validate_keyspace(ctx, req->param));

				    });

				    ss::get_host_id_map.set(r, [&ss](const_req req) {

				        std::vector<ss::mapper> res;

				        return map_to_key_value(ss.local().get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);

				    });

				    ss::get_load.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        return get_cf_stats(ctx, &replica::column_family_stats::live_disk_space_used);

				    });

				@@ -649,7 +739,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    });

				    ss::get_current_generation_number.set(r, [&ss](std::unique_ptr<http::request> req) {

				        gms::inet_address ep(utils::fb_utilities::get_broadcast_address());

				        auto ep = ss.local().get_token_metadata().get_topology().my_address();

				        return ss.local().gossiper().get_current_generation_number(ep).then([](gms::generation_type res) {

				            return make_ready_future<json::json_return_type>(res.value());

				        });

				@@ -669,14 +759,50 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        });

				    });

				    ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				    ss::force_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        apilog.debug("force_keyspace_compaction: keyspace={} tables={}", keyspace, table_infos);

				        auto params = req_params({

				            std::pair("flush_memtables", mandatory::no),

				        });

				        params.process(*req);

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.info("force_compaction: flush={}", flush);

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos);

				        std::optional<flush_mode> fmopt;

				        if (!flush) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<global_major_compaction_task_impl>({}, db, fmopt);

				        try {

				            co_await task->done();

				        } catch (...) {

				            apilog.error("force_compaction failed: {}", std::current_exception());

				            throw;

				        }

				        co_return json_void();

				    });

				    ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto params = req_params({

				            std::pair("keyspace", mandatory::yes),

				            std::pair("cf", mandatory::no),

				            std::pair("flush_memtables", mandatory::no),

				        });

				        params.process(*req);

				        auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));

				        auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.debug("force_keyspace_compaction: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        std::optional<flush_mode> fmopt;

				        if (!flush) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);

				        try {

				            co_await task->done();

				        } catch (...) {

				@@ -691,6 +817,12 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        auto& db = ctx.db;

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        const auto& rs = db.local().find_keyspace(keyspace).get_replication_strategy();

				        if (rs.get_type() == locator::replication_strategy_type::local || !rs.is_vnode_based()) {

				            auto reason = rs.get_type() == locator::replication_strategy_type::local ? "require" : "support";

				            apilog.info("Keyspace {} does not {} cleanup", keyspace, reason);

				            co_return json::json_return_type(0);

				        }

				        apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, table_infos);

				        if (!co_await ss.local().is_cleanup_allowed(keyspace)) {

				            auto msg = "Can not perform cleanup operation when topology changes";

				@@ -699,7 +831,8 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        }

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos);

				        auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>(

				            {}, std::move(keyspace), db, table_infos, flush_mode::all_tables);

				        try {

				            co_await task->done();

				        } catch (...) {

				@@ -710,11 +843,36 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        co_return json::json_return_type(0);

				    });

				    ss::cleanup_all.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        apilog.info("cleanup_all");

				        auto done = co_await ss.invoke_on(0, [] (service::storage_service& ss) -> future<bool> {

				            if (!ss.is_topology_coordinator_enabled()) {

				                co_return false;

				            }

				            co_await ss.do_cluster_cleanup();

				            co_return true;

				        });

				        if (done) {

				            co_return json::json_return_type(0);

				        }

				        // fall back to the local global cleanup if topology coordinator is not enabled

				        auto& db = ctx.db;

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<global_cleanup_compaction_task_impl>({}, db);

				        try {

				            co_await task->done();

				        } catch (...) {

				            apilog.error("cleanup_all failed: {}", std::current_exception());

				            throw;

				        }

				        co_return json::json_return_type(0);

				    });

				    ss::perform_keyspace_offstrategy_compaction.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {

				        apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, table_infos);

				        bool res = false;

				        auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, res);

				        auto task = co_await compaction_module.make_and_start_task<offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, &res);

				        try {

				            co_await task->done();

				        } catch (...) {

				@@ -743,6 +901,14 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        co_return json::json_return_type(0);

				    }));

				    ss::force_flush.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        apilog.info("flush all tables");

				        co_await ctx.db.invoke_on_all([] (replica::database& db) {

				            return db.flush_all_tables();

				        });

				        co_return json_void();

				    });

				    ss::force_keyspace_flush.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto column_families = parse_tables(keyspace, ctx, req->query_parameters, "cf");

				@@ -860,12 +1026,22 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::get_keyspaces.set(r, [&ctx](const_req req) {

				        auto type = req.get_query_param("type");

				        auto replication = req.get_query_param("replication");

				        std::vector<sstring> keyspaces;

				        if (type == "user") {

				            return ctx.db.local().get_user_keyspaces();

				            keyspaces = ctx.db.local().get_user_keyspaces();

				        } else if (type == "non_local_strategy") {

				            return ctx.db.local().get_non_local_strategy_keyspaces();

				            keyspaces = ctx.db.local().get_non_local_strategy_keyspaces();

				        } else {

				            keyspaces = map_keys(ctx.db.local().get_keyspaces());

				        }

				        return map_keys(ctx.db.local().get_keyspaces());

				        if (replication.empty() || replication == "all") {

				            return keyspaces;

				        }

				        const auto want_tablets = replication == "tablets";

				        return boost::copy_range<std::vector<sstring>>(keyspaces | boost::adaptors::filtered([&ctx, want_tablets] (const sstring& ks) {

				            return ctx.db.local().find_keyspace(ks).get_replication_strategy().uses_tablets() == want_tablets;

				        }));

				    });

				    ss::stop_gossiping.set(r, [&ss](std::unique_ptr<http::request> req) {

				@@ -897,7 +1073,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::is_initialized.set(r, [&ss](std::unique_ptr<http::request> req) {

				        return ss.local().get_operation_mode().then([&ss] (auto mode) {

				            bool is_initialized = mode >= service::storage_service::mode::STARTING;

				            bool is_initialized = mode >= service::storage_service::mode::STARTING && mode != service::storage_service::mode::MAINTENANCE;

				            if (mode == service::storage_service::mode::NORMAL) {

				                is_initialized = ss.local().gossiper().is_enabled();

				            }

				@@ -911,7 +1087,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::is_joined.set(r, [&ss] (std::unique_ptr<http::request> req) {

				        return ss.local().get_operation_mode().then([] (auto mode) {

				            return make_ready_future<json::json_return_type>(mode >= service::storage_service::mode::JOINING);

				            return make_ready_future<json::json_return_type>(mode >= service::storage_service::mode::JOINING && mode != service::storage_service::mode::MAINTENANCE);

				        });

				    });

				@@ -1194,7 +1370,11 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        return make_ready_future<json::json_return_type>(0);

				    });

				    ss::get_ownership.set(r, [&ss] (std::unique_ptr<http::request> req) {

				    ss::get_ownership.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) {

				        if (any_of_keyspaces_use_tablets(ctx)) {

				            throw httpd::bad_param_exception("storage_service/ownership cannot be used when a keyspace uses tablets");

				        }

				        return ss.local().get_ownership().then([] (auto&& ownership) {

				            std::vector<storage_service_json::mapper> res;

				            return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));

				@@ -1203,7 +1383,17 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::get_effective_ownership.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) {

				        auto keyspace_name = req->param["keyspace"] == "null" ? "" : validate_keyspace(ctx, req->param);

				        return ss.local().effective_ownership(keyspace_name).then([] (auto&& ownership) {

				        auto table_name = req->get_query_param("cf");

				        if (!keyspace_name.empty()) {

				            if (table_name.empty()) {

				                ensure_tablets_disabled(ctx, keyspace_name, "storage_service/ownership");

				            } else {

				                validate_table(ctx, keyspace_name, table_name);

				            }

				        }

				        return ss.local().effective_ownership(keyspace_name, table_name).then([] (auto&& ownership) {

				            std::vector<storage_service_json::mapper> res;

				            return make_ready_future<json::json_return_type>(map_to_key_value(ownership, res));

				        });

				@@ -1348,6 +1538,90 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        co_return json_void();

				    });

				    ss::upgrade_to_raft_topology.set(r,

				            [&ss] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        apilog.info("Requested to schedule upgrade to raft topology");

				        try {

				            co_await ss.invoke_on(0, [] (auto& ss) {

				                return ss.start_upgrade_to_raft_topology();

				            });

				        } catch (...) {

				            auto ex = std::current_exception();

				            apilog.error("Failed to schedule upgrade to raft topology: {}", ex);

				            std::rethrow_exception(std::move(ex));

				        }

				        co_return json_void();

				    });

				    ss::raft_topology_upgrade_status.set(r,

				            [&ss] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        const auto ustate = co_await ss.invoke_on(0, [] (auto& ss) {

				            return ss.get_topology_upgrade_state();

				        });

				        co_return sstring(format("{}", ustate));

				    });

				    ss::move_tablet.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        auto src_host_id = validate_host_id(req->get_query_param("src_host"));

				        shard_id src_shard_id = validate_int(req->get_query_param("src_shard"));

				        auto dst_host_id = validate_host_id(req->get_query_param("dst_host"));

				        shard_id dst_shard_id = validate_int(req->get_query_param("dst_shard"));

				        auto token = dht::token::from_int64(validate_int(req->get_query_param("token")));

				        auto ks = req->get_query_param("ks");

				        auto table = req->get_query_param("table");

				        validate_table(ctx, ks, table);

				        auto table_id = ctx.db.local().find_column_family(ks, table).schema()->id();

				        auto force_str = req->get_query_param("force");

				        auto force = service::loosen_constraints(force_str == "" ? false : validate_bool(force_str));

				        co_await ss.local().move_tablet(table_id, token,

				            locator::tablet_replica{src_host_id, src_shard_id},

				            locator::tablet_replica{dst_host_id, dst_shard_id},

				            force);

				        co_return json_void();

				    });

				    ss::add_tablet_replica.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        auto dst_host_id = validate_host_id(req->get_query_param("dst_host"));

				        shard_id dst_shard_id = validate_int(req->get_query_param("dst_shard"));

				        auto token = dht::token::from_int64(validate_int(req->get_query_param("token")));

				        auto ks = req->get_query_param("ks");

				        auto table = req->get_query_param("table");

				        auto table_id = ctx.db.local().find_column_family(ks, table).schema()->id();

				        auto force_str = req->get_query_param("force");

				        auto force = service::loosen_constraints(force_str == "" ? false : validate_bool(force_str));

				        co_await ss.local().add_tablet_replica(table_id, token,

				            locator::tablet_replica{dst_host_id, dst_shard_id},

				            force);

				        co_return json_void();

				    });

				    ss::del_tablet_replica.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        auto dst_host_id = validate_host_id(req->get_query_param("host"));

				        shard_id dst_shard_id = validate_int(req->get_query_param("shard"));

				        auto token = dht::token::from_int64(validate_int(req->get_query_param("token")));

				        auto ks = req->get_query_param("ks");

				        auto table = req->get_query_param("table");

				        auto table_id = ctx.db.local().find_column_family(ks, table).schema()->id();

				        auto force_str = req->get_query_param("force");

				        auto force = service::loosen_constraints(force_str == "" ? false : validate_bool(force_str));

				        co_await ss.local().del_tablet_replica(table_id, token,

				            locator::tablet_replica{dst_host_id, dst_shard_id},

				            force);

				        co_return json_void();

				    });

				    ss::tablet_balancing_enable.set(r, [&ss] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        auto enabled = validate_bool(req->get_query_param("enabled"));

				        co_await ss.local().set_tablet_balancing_enabled(enabled);

				        co_return json_void();

				    });

				    sp::get_schema_versions.set(r, [&ss](std::unique_ptr<http::request> req)  {

				        return ss.local().describe_schema_versions().then([] (auto result) {

				            std::vector<sp::mapper_list> res;

				@@ -1363,15 +1637,9 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				}

				void unset_storage_service(http_context& ctx, routes& r) {

				    ss::local_hostid.unset(r);

				    ss::get_tokens.unset(r);

				    ss::get_node_tokens.unset(r);

				    ss::get_commitlog.unset(r);

				    ss::get_token_endpoint.unset(r);

				    ss::toppartitions_generic.unset(r);

				    ss::get_leaving_nodes.unset(r);

				    ss::get_moving_nodes.unset(r);

				    ss::get_joining_nodes.unset(r);

				    ss::get_release_version.unset(r);

				    ss::get_scylla_release_version.unset(r);

				    ss::get_schema_version.unset(r);

				@@ -1379,18 +1647,19 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::get_saved_caches_location.unset(r);

				    ss::get_range_to_endpoint_map.unset(r);

				    ss::get_pending_range_to_endpoint_map.unset(r);

				    ss::describe_any_ring.unset(r);

				    ss::describe_ring.unset(r);

				    ss::get_host_id_map.unset(r);

				    ss::get_load.unset(r);

				    ss::get_load_map.unset(r);

				    ss::get_current_generation_number.unset(r);

				    ss::get_natural_endpoints.unset(r);

				    ss::cdc_streams_check_and_repair.unset(r);

				    ss::force_compaction.unset(r);

				    ss::force_keyspace_compaction.unset(r);

				    ss::force_keyspace_cleanup.unset(r);

				    ss::cleanup_all.unset(r);

				    ss::perform_keyspace_offstrategy_compaction.unset(r);

				    ss::upgrade_sstables.unset(r);

				    ss::force_flush.unset(r);

				    ss::force_keyspace_flush.unset(r);

				    ss::decommission.unset(r);

				    ss::move.unset(r);

				@@ -1450,51 +1719,45 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::get_effective_ownership.unset(r);

				    ss::sstable_info.unset(r);

				    ss::reload_raft_topology_state.unset(r);

				    ss::upgrade_to_raft_topology.unset(r);

				    ss::raft_topology_upgrade_status.unset(r);

				    ss::move_tablet.unset(r);

				    ss::add_tablet_replica.unset(r);

				    ss::del_tablet_replica.unset(r);

				    ss::tablet_balancing_enable.unset(r);

				    sp::get_schema_versions.unset(r);

				}

				enum class scrub_status {

				    successful = 0,

				    aborted,

				    unable_to_cancel,   // Not used in Scylla, included to ensure compability with nodetool api.

				    validation_errors,

				};

				void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl) {

				    ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<http::request> req) {

				        return snap_ctl.local().get_snapshot_details().then([] (std::unordered_map<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& result) {

				            std::function<future<>(output_stream<char>&&)> f = [result = std::move(result)](output_stream<char>&& s) {

				                return do_with(output_stream<char>(std::move(s)), true, [&result] (output_stream<char>& s, bool& first){

				                    return s.write("[").then([&s, &first, &result] {

				                        return do_for_each(result, [&s, &first](std::tuple<sstring, std::vector<db::snapshot_ctl::snapshot_details>>&& map){

				                            return do_with(ss::snapshots(), [&s, &first, &map](ss::snapshots& all_snapshots) {

				                                all_snapshots.key = std::get<0>(map);

				                                future<> f = first ? make_ready_future<>() : s.write(", ");

				                                first = false;

				                                std::vector<ss::snapshot> snapshot;

				                                for (auto& cf: std::get<1>(map)) {

				                                    ss::snapshot snp;

				                                    snp.ks = cf.ks;

				                                    snp.cf = cf.cf;

				                                    snp.live = cf.live;

				                                    snp.total = cf.total;

				                                    snapshot.push_back(std::move(snp));

				                                }

				                                all_snapshots.value = std::move(snapshot);

				                                return f.then([&s, &all_snapshots] {

				                                    return all_snapshots.write(s);

				                                });

				                            });

				                        });

				                    }).then([&s] {

				                        return s.write("]").then([&s] {

				                            return s.close();

				                        });

				                    });

				                });

				            };

				    ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto result = co_await snap_ctl.local().get_snapshot_details();

				        co_return std::function([res = std::move(result)] (output_stream<char>&& o) -> future<> {

				            auto result = std::move(res);

				            output_stream<char> out = std::move(o);

				            bool first = true;

				            return make_ready_future<json::json_return_type>(std::move(f));

				            co_await out.write("[");

				            for (auto&& map : result) {

				                if (!first) {

				                    co_await out.write(", ");

				                }

				                std::vector<ss::snapshot> snapshot;

				                for (auto& cf : std::get<1>(map)) {

				                    ss::snapshot snp;

				                    snp.ks = cf.ks;

				                    snp.cf = cf.cf;

				                    snp.live = cf.live;

				                    snp.total = cf.total;

				                    snapshot.push_back(std::move(snp));

				                }

				                ss::snapshots all_snapshots;

				                all_snapshots.key = std::get<0>(map);

				                all_snapshots.value = std::move(snapshot);

				                co_await all_snapshots.write(out);

				                first = false;

				            }

				            co_await out.write("]");

				            co_await out.close();

				        });

				    });

				@@ -1554,68 +1817,11 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_

				    ss::scrub.set(r, [&ctx, &snap_ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto rp = req_params({

				            {"keyspace", {mandatory::yes}},

				            {"cf", {""}},

				            {"scrub_mode", {}},

				            {"skip_corrupted", {}},

				            {"disable_snapshot", {}},

				            {"quarantine_mode", {}},

				        });

				        rp.process(*req);

				        auto keyspace = validate_keyspace(ctx, *rp.get("keyspace"));

				        auto column_families = parse_tables(keyspace, ctx, *rp.get("cf"));

				        auto scrub_mode_opt = rp.get("scrub_mode");

				        auto scrub_mode = sstables::compaction_type_options::scrub::mode::abort;

				        if (!scrub_mode_opt) {

				            const auto skip_corrupted = rp.get_as<bool>("skip_corrupted").value_or(false);

				            if (skip_corrupted) {

				                scrub_mode = sstables::compaction_type_options::scrub::mode::skip;

				            }

				        } else {

				            auto scrub_mode_str = *scrub_mode_opt;

				            if (scrub_mode_str == "ABORT") {

				                scrub_mode = sstables::compaction_type_options::scrub::mode::abort;

				            } else if (scrub_mode_str == "SKIP") {

				                scrub_mode = sstables::compaction_type_options::scrub::mode::skip;

				            } else if (scrub_mode_str == "SEGREGATE") {

				                scrub_mode = sstables::compaction_type_options::scrub::mode::segregate;

				            } else if (scrub_mode_str == "VALIDATE") {

				                scrub_mode = sstables::compaction_type_options::scrub::mode::validate;

				            } else {

				                throw httpd::bad_param_exception(fmt::format("Unknown argument for 'scrub_mode' parameter: {}", scrub_mode_str));

				            }

				        }

				        if (!req_param<bool>(*req, "disable_snapshot", false)) {

				            auto tag = format("pre-scrub-{:d}", db_clock::now().time_since_epoch().count());

				            co_await coroutine::parallel_for_each(column_families, [&snap_ctl, keyspace, tag](sstring cf) {

				                // We always pass here db::snapshot_ctl::snap_views::no since:

				                // 1. When scrubbing particular tables, there's no need to auto-snapshot their views.

				                // 2. When scrubbing the whole keyspace, column_families will contain both base tables and views.

				                return snap_ctl.local().take_column_family_snapshot(keyspace, cf, tag, db::snapshot_ctl::snap_views::no, db::snapshot_ctl::skip_flush::no);

				            });

				        }

				        sstables::compaction_type_options::scrub opts = {

				            .operation_mode = scrub_mode,

				        };

				        const sstring quarantine_mode_str = req_param<sstring>(*req, "quarantine_mode", "INCLUDE");

				        if (quarantine_mode_str == "INCLUDE") {

				            opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::include;

				        } else if (quarantine_mode_str == "EXCLUDE") {

				            opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::exclude;

				        } else if (quarantine_mode_str == "ONLY") {

				            opts.quarantine_operation_mode = sstables::compaction_type_options::scrub::quarantine_mode::only;

				        } else {

				            throw httpd::bad_param_exception(fmt::format("Unknown argument for 'quarantine_mode' parameter: {}", quarantine_mode_str));

				        }

				        auto info = co_await parse_scrub_options(ctx, snap_ctl, std::move(req));

				        sstables::compaction_stats stats;

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, std::move(keyspace), db, column_families, opts, stats);

				        auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, info.keyspace, db, info.column_families, info.opts, &stats);

				        try {

				            co_await task->done();

				            if (stats.validation_errors) {

				@@ -1624,7 +1830,7 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_

				        } catch (const sstables::compaction_aborted_exception&) {

				            co_return json::json_return_type(static_cast<int>(scrub_status::aborted));

				        } catch (...) {

				            apilog.error("scrub keyspace={} tables={} failed: {}", keyspace, column_families, std::current_exception());

				            apilog.error("scrub keyspace={} tables={} failed: {}", info.keyspace, info.column_families, std::current_exception());

				            throw;

				        }

									
										23

api/storage_service.hh
									
												View File
												
				@@ -8,10 +8,9 @@

				#pragma once

				#include <iostream>

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				#include <seastar/json/json_elements.hh>

				#include "api/api_init.hh"

				#include "db/data_listeners.hh"

				namespace cql_transport { class controller; }

				@@ -37,25 +36,35 @@ namespace api {

				// verify that the keyspace is found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective keyspace error.

				sstring validate_keyspace(http_context& ctx, sstring ks_name);

				sstring validate_keyspace(const http_context& ctx, sstring ks_name);

				// verify that the keyspace parameter is found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective keyspace error.

				sstring validate_keyspace(http_context& ctx, const httpd::parameters& param);

				sstring validate_keyspace(const http_context& ctx, const httpd::parameters& param);

				// splits a request parameter assumed to hold a comma-separated list of table names

				// verify that the tables are found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective no_such_column_family error.

				// Returns an empty vector if no parameter was found.

				// If the parameter is found and empty, returns a list of all table names in the keyspace.

				std::vector<sstring> parse_tables(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);

				std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);

				// splits a request parameter assumed to hold a comma-separated list of table names

				// verify that the tables are found, otherwise a bad_param_exception exception is thrown

				// containing the description of the respective no_such_column_family error.

				// Returns a vector of all table infos given by the parameter, or

				// if the parameter is not found or is empty, returns a list of all table infos in the keyspace.

				std::vector<table_info> parse_table_infos(const sstring& ks_name, http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);

				std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);

				std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, sstring value);

				struct scrub_info {

				    sstables::compaction_type_options::scrub opts;

				    sstring keyspace;

				    std::vector<sstring> column_families;

				};

				future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl, std::unique_ptr<http::request> req);

				void set_storage_service(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, service::raft_group0_client&);

				void unset_storage_service(http_context& ctx, httpd::routes& r);

									
										2

api/stream_manager.cc
									
												View File
												
				@@ -9,8 +9,10 @@

				#include "stream_manager.hh"

				#include "streaming/stream_manager.hh"

				#include "streaming/stream_result_future.hh"

				#include "api/api.hh"

				#include "api/api-doc/stream_manager.json.hh"

				#include <vector>

				#include <rapidjson/document.h>

				#include "gms/gossiper.hh"

				namespace api {

									
										2

api/stream_manager.hh
									
												View File
												
				@@ -8,7 +8,7 @@

				#pragma once

				#include "api.hh"

				#include "api/api_init.hh"

				namespace api {

									
										32

api/system.cc
									
												View File
												
				@@ -6,21 +6,20 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "api/api_init.hh"

				#include "api/api-doc/system.json.hh"

				#include "api/api-doc/metrics.json.hh"

				#include "replica/database.hh"

				#include "api/api.hh"

				#include <rapidjson/document.h>

				#include <seastar/core/reactor.hh>

				#include <seastar/core/metrics_api.hh>

				#include <seastar/core/relabel_config.hh>

				#include <seastar/http/exception.hh>

				#include <seastar/util/short_streams.hh>

				#include <seastar/http/short_streams.hh>

				#include "utils/rjson.hh"

				#include "log.hh"

				#include "replica/database.hh"

				extern logging::logger apilog;

				@@ -30,6 +29,10 @@ using namespace seastar::httpd;

				namespace hs = httpd::system_json;

				namespace hm = httpd::metrics_json;

				extern "C" void __attribute__((weak)) __llvm_profile_dump();

				extern "C" const char * __attribute__((weak)) __llvm_profile_get_filename();

				extern "C" void __attribute__((weak)) __llvm_profile_reset_counters();

				void set_system(http_context& ctx, routes& r) {

				    hm::get_metrics_config.set(r, [](const_req req) {

				        std::vector<hm::metrics_config> res;

				@@ -158,6 +161,27 @@ void set_system(http_context& ctx, routes& r) {

				            return json::json_return_type(json::json_void());

				        });

				    });

				    hs::dump_profile.set(r, [](std::unique_ptr<request> req) {

				        if (!__llvm_profile_dump) {

				            apilog.info("Profile will not be dumped, executable is not instrumented with profile dumping.");

				            return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));

				        }

				        sstring profile_dest(__llvm_profile_get_filename ? __llvm_profile_get_filename() : "disk");

				        apilog.info("Dumping profile to {}", profile_dest);

				        __llvm_profile_dump();

				        if (__llvm_profile_reset_counters) {

				            // If counters are not reset the profile dumping mechanism will issue a warning and exit

				            // next time it is attempted. If the counters are reset, profiles can be accumulated

				            // (if %m is present in LLVM_PROFILE_FILE pattern) so it can be dumped in stages or

				            // multiple times during runtime.

				            __llvm_profile_reset_counters();

				        } else {

				            apilog.warn("Could not reset profile counters, profile dumping will be skipped next time it is attempted");

				        }

				        apilog.info("Profile dumped to {}", profile_dest);

				        return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));

				    }) ;

				}

				}

									
										9

api/task_manager.cc
									
												View File
												
				@@ -7,13 +7,12 @@

				 */

				#include <seastar/core/coroutine.hh>

				#include <seastar/http/exception.hh>

				#include "task_manager.hh"

				#include "api/api.hh"

				#include "api/api-doc/task_manager.json.hh"

				#include "db/system_keyspace.hh"

				#include "column_family.hh"

				#include "unimplemented.hh"

				#include "storage_service.hh"

				#include <utility>

				#include <boost/range/adaptors.hpp>

				@@ -232,8 +231,8 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				        while (!q.empty()) {

				            auto& current = q.front();

				            res.push_back(co_await retrieve_status(current));

				            for (size_t i = 0; i < current->get_children().size(); ++i) {

				                q.push(co_await current->get_children()[i].copy());

				            for (auto& child: current->get_children()) {

				                q.push(co_await child.copy());

				            }

				            q.pop();

				        }

									
										2

api/task_manager.hh
									
												View File
												
				@@ -9,7 +9,7 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				#include "api/api_init.hh"

				#include "db/config.hh"

				namespace tasks {

									
										118

api/tasks.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,118 @@

				/*

				 * Copyright (C) 2022-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <seastar/core/coroutine.hh>

				#include "api/api.hh"

				#include "api/storage_service.hh"

				#include "api/api-doc/tasks.json.hh"

				#include "compaction/compaction_manager.hh"

				#include "compaction/task_manager_module.hh"

				#include "service/storage_service.hh"

				#include "tasks/task_manager.hh"

				using namespace seastar::httpd;

				extern logging::logger apilog;

				namespace api {

				namespace t = httpd::tasks_json;

				using namespace json;

				using ks_cf_func = std::function<future<json::json_return_type>(http_context&, std::unique_ptr<http::request>, sstring, std::vector<table_info>)>;

				static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {

				    return [&ctx, f = std::move(f)](std::unique_ptr<http::request> req) {

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        return f(ctx, std::move(req), std::move(keyspace), std::move(table_infos));

				    };

				}

				void set_tasks_compaction_module(http_context& ctx, routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {

				    t::force_keyspace_compaction_async.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto params = req_params({

				            std::pair("keyspace", mandatory::yes),

				            std::pair("cf", mandatory::no),

				            std::pair("flush_memtables", mandatory::no),

				        });

				        params.process(*req);

				        auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));

				        auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.debug("force_keyspace_compaction_async: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        std::optional<flush_mode> fmopt;

				        if (!flush) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);

				        co_return json::json_return_type(task->get_status().id.to_sstring());

				    });

				    t::force_keyspace_cleanup_async.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto keyspace = validate_keyspace(ctx, req->param);

				        auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");

				        apilog.info("force_keyspace_cleanup_async: keyspace={} tables={}", keyspace, table_infos);

				        if (!co_await ss.local().is_cleanup_allowed(keyspace)) {

				            auto msg = "Can not perform cleanup operation when topology changes";

				            apilog.warn("force_keyspace_cleanup_async: keyspace={} tables={}: {}", keyspace, table_infos, msg);

				            co_await coroutine::return_exception(std::runtime_error(msg));

				        }

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos, flush_mode::all_tables);

				        co_return json::json_return_type(task->get_status().id.to_sstring());

				    });

				    t::perform_keyspace_offstrategy_compaction_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {

				        apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, table_infos);

				        auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, nullptr);

				        co_return json::json_return_type(task->get_status().id.to_sstring());

				    }));

				    t::upgrade_sstables_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);

				        apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, table_infos, exclude_current_version);

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<upgrade_sstables_compaction_task_impl>({}, std::move(keyspace), db, table_infos, exclude_current_version);

				        co_return json::json_return_type(task->get_status().id.to_sstring());

				    }));

				    t::scrub_async.set(r, [&ctx, &snap_ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& db = ctx.db;

				        auto info = co_await parse_scrub_options(ctx, snap_ctl, std::move(req));

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, std::move(info.keyspace), db, std::move(info.column_families), info.opts, nullptr);

				        co_return json::json_return_type(task->get_status().id.to_sstring());

				    });

				}

				void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r) {

				    t::force_keyspace_compaction_async.unset(r);

				    t::force_keyspace_cleanup_async.unset(r);

				    t::perform_keyspace_offstrategy_compaction_async.unset(r);

				    t::upgrade_sstables_async.unset(r);

				    t::scrub_async.unset(r);

				}

				}

									
										19

api/tasks.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,19 @@

				/*

				 * Copyright (C) 2023-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#pragma once

				#include "api.hh"

				#include "db/config.hh"

				namespace api {

				void set_tasks_compaction_module(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r);

				}

									
										114

api/token_metadata.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,114 @@

				/*

				 * Copyright (C) 2023-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "api/api.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "api/api-doc/endpoint_snitch_info.json.hh"

				#include "locator/token_metadata.hh"

				using namespace seastar::httpd;

				namespace api {

				namespace ss = httpd::storage_service_json;

				using namespace json;

				void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_token_metadata>& tm) {

				    ss::local_hostid.set(r, [&tm](std::unique_ptr<http::request> req) {

				        auto id = tm.local().get()->get_my_id();

				        return make_ready_future<json::json_return_type>(id.to_sstring());

				    });

				    ss::get_tokens.set(r, [&tm] (std::unique_ptr<http::request> req) {

				        return make_ready_future<json::json_return_type>(stream_range_as_array(tm.local().get()->sorted_tokens(), [](const dht::token& i) {

				           return fmt::to_string(i);

				        }));

				    });

				    ss::get_node_tokens.set(r, [&tm] (std::unique_ptr<http::request> req) {

				        gms::inet_address addr(req->param["endpoint"]);

				        auto& local_tm = *tm.local().get();

				        const auto host_id = local_tm.get_host_id_if_known(addr);

				        return make_ready_future<json::json_return_type>(stream_range_as_array(host_id ? local_tm.get_tokens(*host_id): std::vector<dht::token>{}, [](const dht::token& i) {

				            return fmt::to_string(i);

				        }));

				    });

				    ss::get_leaving_nodes.set(r, [&tm](const_req req) {

				        const auto& local_tm = *tm.local().get();

				        const auto& leaving_host_ids = local_tm.get_leaving_endpoints();

				        std::unordered_set<gms::inet_address> eps;

				        eps.reserve(leaving_host_ids.size());

				        for (const auto host_id: leaving_host_ids) {

				            eps.insert(local_tm.get_endpoint_for_host_id(host_id));

				        }

				        return container_to_vec(eps);

				    });

				    ss::get_moving_nodes.set(r, [](const_req req) {

				        std::unordered_set<sstring> addr;

				        return container_to_vec(addr);

				    });

				    ss::get_joining_nodes.set(r, [&tm](const_req req) {

				        const auto& local_tm = *tm.local().get();

				        const auto& points = local_tm.get_bootstrap_tokens();

				        std::unordered_set<gms::inet_address> eps;

				        eps.reserve(points.size());

				        for (const auto& [token, host_id]: points) {

				            eps.insert(local_tm.get_endpoint_for_host_id(host_id));

				        }

				        return container_to_vec(eps);

				    });

				    ss::get_host_id_map.set(r, [&tm](const_req req) {

				        std::vector<ss::mapper> res;

				        return map_to_key_value(tm.local().get()->get_endpoint_to_host_id_map_for_reading(), res);

				    });

				    static auto host_or_broadcast = [&tm](const_req req) {

				        auto host = req.get_query_param("host");

				        return host.empty() ? tm.local().get()->get_topology().my_address() : gms::inet_address(host);

				    };

				    httpd::endpoint_snitch_info_json::get_datacenter.set(r, [&tm](const_req req) {

				        auto& topology = tm.local().get()->get_topology();

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.dc;

				        }

				        return topology.get_datacenter(ep);

				    });

				    httpd::endpoint_snitch_info_json::get_rack.set(r, [&tm](const_req req) {

				        auto& topology = tm.local().get()->get_topology();

				        auto ep = host_or_broadcast(req);

				        if (!topology.has_endpoint(ep)) {

				            // Cannot return error here, nodetool status can race, request

				            // info about just-left node and not handle it nicely

				            return locator::endpoint_dc_rack::default_location.rack;

				        }

				        return topology.get_rack(ep);

				    });

				}

				void unset_token_metadata(http_context& ctx, routes& r) {

				    ss::local_hostid.unset(r);

				    ss::get_tokens.unset(r);

				    ss::get_node_tokens.unset(r);

				    ss::get_leaving_nodes.unset(r);

				    ss::get_moving_nodes.unset(r);

				    ss::get_joining_nodes.unset(r);

				    ss::get_host_id_map.unset(r);

				    httpd::endpoint_snitch_info_json::get_datacenter.unset(r);

				    httpd::endpoint_snitch_info_json::get_rack.unset(r);

				}

				}

									
										21

api/token_metadata.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,21 @@

				/*

				 * Copyright (C) 2023-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api/api_init.hh"

				namespace locator { class shared_token_metadata; }

				namespace api {

				void set_token_metadata(http_context& ctx, httpd::routes& r, sharded<locator::shared_token_metadata>& tm);

				void unset_token_metadata(http_context& ctx, httpd::routes& r);

				}

									
										6

auth/CMakeLists.txt
									
												View File
												
				@@ -20,7 +20,8 @@ target_sources(scylla_auth

				    sasl_challenge.cc

				    service.cc

				    standard_role_manager.cc

				    transitional.cc)

				    transitional.cc

				    maintenance_socket_role_manager.cc)

				target_include_directories(scylla_auth

				  PUBLIC

				    ${CMAKE_SOURCE_DIR})

				@@ -35,3 +36,6 @@ target_link_libraries(scylla_auth

				    libxcrypt::libxcrypt)

				add_whole_archive(auth scylla_auth)

				check_headers(check-headers scylla_auth

				  GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

									
										1

auth/allow_all_authenticator.cc
									
												View File
												
				@@ -20,6 +20,7 @@ static const class_registrator<

				        authenticator,

				        allow_all_authenticator,

				        cql3::query_processor&,

				        ::service::raft_group0_client&,

				        ::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");

				}

									
										8

auth/allow_all_authenticator.hh
									
												View File
												
				@@ -28,7 +28,7 @@ extern const std::string_view allow_all_authenticator_name;

				class allow_all_authenticator final : public authenticator {

				public:

				    allow_all_authenticator(cql3::query_processor&, ::service::migration_manager&) {

				    allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&) {

				    }

				    virtual future<> start() override {

				@@ -59,15 +59,15 @@ public:

				        return make_ready_future<authenticated_user>(anonymous_user());

				    }

				    virtual future<> create(std::string_view, const authentication_options& options) const override {

				    virtual future<> create(std::string_view, const authentication_options& options) override {

				        return make_ready_future();

				    }

				    virtual future<> alter(std::string_view, const authentication_options& options) const override {

				    virtual future<> alter(std::string_view, const authentication_options& options) override {

				        return make_ready_future();

				    }

				    virtual future<> drop(std::string_view) const override {

				    virtual future<> drop(std::string_view) override {

				        return make_ready_future();

				    }

									
										1

auth/allow_all_authorizer.cc
									
												View File
												
				@@ -20,6 +20,7 @@ static const class_registrator<

				    authorizer,

				    allow_all_authorizer,

				    cql3::query_processor&,

				    ::service::raft_group0_client&,

				    ::service::migration_manager&> registration("org.apache.cassandra.auth.AllowAllAuthorizer");

				}

									
										12

auth/allow_all_authorizer.hh
									
												View File
												
				@@ -9,7 +9,6 @@

				#pragma once

				#include "auth/authorizer.hh"

				#include "exceptions/exceptions.hh"

				namespace cql3 {

				class query_processor;

				@@ -17,6 +16,7 @@ class query_processor;

				namespace service {

				class migration_manager;

				class raft_group0_client;

				}

				namespace auth {

				@@ -25,7 +25,7 @@ extern const std::string_view allow_all_authorizer_name;

				class allow_all_authorizer final  : public authorizer {

				public:

				    allow_all_authorizer(cql3::query_processor&, ::service::migration_manager&) {

				    allow_all_authorizer(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&) {

				    }

				    virtual future<> start() override {

				@@ -44,12 +44,12 @@ public:

				        return make_ready_future<permission_set>(permissions::ALL);

				    }

				    virtual future<> grant(std::string_view, permission_set, const resource&) const override {

				    virtual future<> grant(std::string_view, permission_set, const resource&) override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke(std::string_view, permission_set, const resource&) const override {

				    virtual future<> revoke(std::string_view, permission_set, const resource&) override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

				@@ -60,12 +60,12 @@ public:

				                        "LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke_all(std::string_view) const override {

				    virtual future<> revoke_all(std::string_view) override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke_all(const resource&) const override {

				    virtual future<> revoke_all(const resource&) override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

									
										1

auth/authentication_options.hh
									
												View File
												
				@@ -8,7 +8,6 @@

				#pragma once

				#include <iosfwd>

				#include <optional>

				#include <stdexcept>

				#include <unordered_map>

									
										4

auth/authenticator.cc
									
												View File
												
				@@ -11,10 +11,6 @@

				#include "auth/authenticator.hh"

				#include "auth/authenticated_user.hh"

				#include "auth/common.hh"

				#include "auth/password_authenticator.hh"

				#include "cql3/query_processor.hh"

				#include "utils/class_registrator.hh"

				const sstring auth::authenticator::USERNAME_KEY("username");

				const sstring auth::authenticator::PASSWORD_KEY("password");

									
										11

auth/authenticator.hh
									
												View File
												
				@@ -12,8 +12,6 @@

				#include <string_view>

				#include <memory>

				#include <set>

				#include <stdexcept>

				#include <unordered_map>

				#include <optional>

				#include <functional>

				@@ -26,9 +24,6 @@

				#include "auth/authentication_options.hh"

				#include "auth/resource.hh"

				#include "auth/sasl_challenge.hh"

				#include "bytes.hh"

				#include "enum_set.hh"

				#include "exceptions/exceptions.hh"

				namespace db {

				    class config;

				@@ -111,7 +106,7 @@ public:

				    ///

				    /// The options provided must be a subset of `supported_options()`.

				    ///

				    virtual future<> create(std::string_view role_name, const authentication_options& options) const = 0;

				    virtual future<> create(std::string_view role_name, const authentication_options& options) = 0;

				    ///

				    /// Alter the authentication record of an existing user.

				@@ -120,12 +115,12 @@ public:

				    ///

				    /// Callers must ensure that the specification of `alterable_options()` is adhered to.

				    ///

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) const = 0;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) = 0;

				    ///

				    /// Delete the authentication record for a user. This will disallow the user from logging in.

				    ///

				    virtual future<> drop(std::string_view role_name) const = 0;

				    virtual future<> drop(std::string_view role_name) = 0;

				    ///

				    /// Query for custom options (those corresponding to \ref authentication_options::options).

									
										10

auth/authorizer.hh
									
												View File
												
				@@ -11,8 +11,6 @@

				#pragma once

				#include <string_view>

				#include <functional>

				#include <optional>

				#include <stdexcept>

				#include <tuple>

				#include <vector>

				@@ -83,14 +81,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if granting permissions is not supported.

				    ///

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&) const = 0;

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&) = 0;

				    ///

				    /// Revoke a set of permissions from a role for a particular \ref resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&) const = 0;

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&) = 0;

				    ///

				    /// Query for all directly granted permissions.

				@@ -104,14 +102,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(std::string_view role_name) const = 0;

				    virtual future<> revoke_all(std::string_view role_name) = 0;

				    ///

				    /// Revoke all permissions granted to any role for a particular resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(const resource&) const = 0;

				    virtual future<> revoke_all(const resource&) = 0;

				    ///

				    /// System resources used internally as part of the implementation. These are made inaccessible to users.

									
										9

auth/certificate_authenticator.cc
									
												View File
												
				@@ -30,13 +30,14 @@ static const std::string cfg_source_altname = "ALTNAME";

				static const class_registrator<auth::authenticator

				    , auth::certificate_authenticator

				    , cql3::query_processor&

				    , ::service::raft_group0_client&

				    , ::service::migration_manager&> cert_auth_reg(CERT_AUTH_NAME);

				enum class auth::certificate_authenticator::query_source {

				    subject, altname

				};

				auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::migration_manager&)

				auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&)

				    : _queries([&] {

				        auto& conf = qp.db().get_config();

				        auto queries = conf.auth_certificate_role_queries();

				@@ -154,16 +155,16 @@ future<auth::authenticated_user> auth::certificate_authenticator::authenticate(c

				    throw exceptions::authentication_exception("Cannot authenticate using attribute map");

				}

				future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options) const {

				future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options) {

				    // TODO: should we keep track of roles/enforce existence? Role manager should deal with this...

				    co_return;

				}

				future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options) const {

				future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options) {

				    co_return;

				}

				future<> auth::certificate_authenticator::drop(std::string_view role_name) const {

				future<> auth::certificate_authenticator::drop(std::string_view role_name) {

				    co_return;

				}

									
										9

auth/certificate_authenticator.hh
									
												View File
												
				@@ -20,6 +20,7 @@ class query_processor;

				namespace service {

				class migration_manager;

				class raft_group0_client;

				}

				namespace auth {

				@@ -30,7 +31,7 @@ class certificate_authenticator : public authenticator {

				    enum class query_source;

				    std::vector<std::pair<query_source, boost::regex>> _queries;

				public:

				    certificate_authenticator(cql3::query_processor&, ::service::migration_manager&);

				    certificate_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&);

				    ~certificate_authenticator();

				    future<> start() override;

				@@ -46,9 +47,9 @@ public:

				    future<authenticated_user> authenticate(const credentials_map& credentials) const override;

				    future<std::optional<authenticated_user>> authenticate(session_dn_func) const override;

				    future<> create(std::string_view role_name, const authentication_options& options) const override;

				    future<> alter(std::string_view role_name, const authentication_options& options) const override;

				    future<> drop(std::string_view role_name) const override;

				    future<> create(std::string_view role_name, const authentication_options& options) override;

				    future<> alter(std::string_view role_name, const authentication_options& options) override;

				    future<> drop(std::string_view role_name) override;

				    future<custom_options> query_custom_options(std::string_view role_name) const override;

									
										119

auth/common.cc
									
												View File
												
				@@ -6,30 +6,51 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include <seastar/core/coroutine.hh>

				#include "auth/common.hh"

				#include <optional>

				#include <seastar/core/coroutine.hh>

				#include <seastar/core/shared_ptr.hh>

				#include <seastar/core/sharded.hh>

				#include "mutation/canonical_mutation.hh"

				#include "schema/schema_fwd.hh"

				#include "timestamp.hh"

				#include "utils/exponential_backoff_retry.hh"

				#include "cql3/query_processor.hh"

				#include "cql3/statements/create_table_statement.hh"

				#include "replica/database.hh"

				#include "schema/schema_builder.hh"

				#include "service/migration_manager.hh"

				#include "service/raft/group0_state_machine.hh"

				#include "timeout_config.hh"

				#include "db/config.hh"

				#include "db/system_auth_keyspace.hh"

				#include "utils/error_injection.hh"

				namespace auth {

				namespace meta {

				constinit const std::string_view AUTH_KS("system_auth");

				constinit const std::string_view USERS_CF("users");

				namespace legacy {

				    constinit const std::string_view AUTH_KS("system_auth");

				    constinit const std::string_view USERS_CF("users");

				} // namespace legacy

				constinit const std::string_view AUTH_PACKAGE_NAME("org.apache.cassandra.auth.");

				}

				} // namespace meta

				static logging::logger auth_log("auth");

				bool legacy_mode(cql3::query_processor& qp) {

				    return qp.auth_version < db::system_auth_keyspace::version_t::v2;

				}

				std::string_view get_auth_ks_name(cql3::query_processor& qp) {

				    if (legacy_mode(qp)) {

				        return meta::legacy::AUTH_KS;

				    }

				    return db::system_auth_keyspace::NAME;

				}

				// Func must support being invoked more than once.

				future<> do_after_system_ready(seastar::abort_source& as, seastar::noncopyable_function<future<>()> func) {

				    struct empty_state { };

				@@ -55,7 +76,7 @@ static future<> create_metadata_table_if_missing_impl(

				    auto parsed_statement = cql3::query_processor::parse_statement(cql);

				    auto& parsed_cf_statement = static_cast<cql3::statements::raw::cf_statement&>(*parsed_statement);

				    parsed_cf_statement.prepare_keyspace(meta::AUTH_KS);

				    parsed_cf_statement.prepare_keyspace(meta::legacy::AUTH_KS);

				    auto statement = static_pointer_cast<cql3::statements::create_table_statement>(

				            parsed_cf_statement.prepare(db, qp.get_cql_stats())->statement);

				@@ -98,4 +119,88 @@ future<> create_metadata_table_if_missing(

				    return qs;

				}

				static future<> announce_mutations_with_guard(

				        ::service::raft_group0_client& group0_client,

				        std::vector<canonical_mutation> muts,

				        ::service::group0_guard group0_guard,

				        seastar::abort_source* as,

				        std::optional<::service::raft_timeout> timeout) {

				    auto group0_cmd = group0_client.prepare_command(

				        ::service::write_mutations{

				            .mutations{std::move(muts)},

				        },

				        group0_guard,

				        "auth: modify internal data"

				    );

				    return group0_client.add_entry(std::move(group0_cmd), std::move(group0_guard), as, timeout);

				}

				future<> announce_mutations_with_batching(

				        ::service::raft_group0_client& group0_client,

				        start_operation_func_t start_operation_func,

				        std::function<mutations_generator(api::timestamp_type& t)> gen,

				        seastar::abort_source* as,

				        std::optional<::service::raft_timeout> timeout) {

				    // account for command's overhead, it's better to use smaller threshold than constantly bounce off the limit

				    size_t memory_threshold = group0_client.max_command_size() * 0.75;

				    utils::get_local_injector().inject("auth_announce_mutations_command_max_size",

				        [&memory_threshold] {

				        memory_threshold = 1000;

				    });

				    size_t memory_usage = 0;

				    std::vector<canonical_mutation> muts;

				    // guard has to be taken before we execute code in gen as

				    // it can do read-before-write and we want announce_mutations

				    // operation to be linearizable with other such calls,

				    // for instance if we do select and then delete in gen

				    // we want both to operate on the same data or fail

				    // if someone else modified it in the middle

				    std::optional<::service::group0_guard> group0_guard;

				    group0_guard = co_await start_operation_func(as);

				    auto timestamp = group0_guard->write_timestamp();

				    auto g = gen(timestamp);

				    while (auto mut = co_await g()) {

				        muts.push_back(canonical_mutation{*mut});

				        memory_usage += muts.back().representation().size();

				        if (memory_usage >= memory_threshold) {

				            if (!group0_guard) {

				                group0_guard = co_await start_operation_func(as);

				                timestamp = group0_guard->write_timestamp();

				            }

				            co_await announce_mutations_with_guard(group0_client, std::move(muts), std::move(*group0_guard), as, timeout);

				            group0_guard = std::nullopt;

				            memory_usage = 0;

				            muts = {};

				        }

				    }

				    if (!muts.empty()) {

				        if (!group0_guard) {

				            group0_guard = co_await start_operation_func(as);

				            timestamp = group0_guard->write_timestamp();

				        }

				        co_await announce_mutations_with_guard(group0_client, std::move(muts), std::move(*group0_guard), as, timeout);

				    }

				}

				future<> announce_mutations(

				        cql3::query_processor& qp,

				        ::service::raft_group0_client& group0_client,

				        const sstring query_string,

				        std::vector<data_value_or_unset> values,

				        seastar::abort_source* as,

				        std::optional<::service::raft_timeout> timeout) {

				    auto group0_guard = co_await group0_client.start_operation(as, timeout);

				    auto timestamp = group0_guard.write_timestamp();

				    auto muts = co_await qp.get_mutations_internal(

				            query_string,

				            internal_distributed_query_state(),

				            timestamp,

				            std::move(values));

				    std::vector<canonical_mutation> cmuts = {muts.begin(), muts.end()};

				    co_await announce_mutations_with_guard(group0_client, std::move(cmuts), std::move(group0_guard), as, timeout);

				}

				}

									
										45

auth/common.hh
									
												View File
												
				@@ -8,7 +8,6 @@

				#pragma once

				#include <chrono>

				#include <string_view>

				#include <seastar/core/future.hh>

				@@ -19,9 +18,9 @@

				#include <seastar/core/sstring.hh>

				#include <seastar/core/smp.hh>

				#include "log.hh"

				#include "seastarx.hh"

				#include "utils/exponential_backoff_retry.hh"

				#include "schema/schema_registry.hh"

				#include "types/types.hh"

				#include "service/raft/raft_group0_client.hh"

				using namespace std::chrono_literals;

				@@ -42,12 +41,22 @@ namespace auth {

				namespace meta {

				constexpr std::string_view DEFAULT_SUPERUSER_NAME("cassandra");

				namespace legacy {

				extern constinit const std::string_view AUTH_KS;

				extern constinit const std::string_view USERS_CF;

				} // namespace legacy

				constexpr std::string_view DEFAULT_SUPERUSER_NAME("cassandra");

				extern constinit const std::string_view AUTH_PACKAGE_NAME;

				}

				} // namespace meta

				// This is a helper to check whether auth-v2 is on.

				bool legacy_mode(cql3::query_processor& qp);

				// We have legacy implementation using different keyspace

				// and need to parametrize depending on runtime feature.

				std::string_view get_auth_ks_name(cql3::query_processor& qp);

				template <class Task>

				future<> once_among_shards(Task&& f) {

				@@ -72,4 +81,28 @@ future<> create_metadata_table_if_missing(

				///

				::service::query_state& internal_distributed_query_state() noexcept;

				// Execute update query via group0 mechanism, mutations will be applied on all nodes.

				// Use this function when need to perform read before write on a single guard or if

				// you have more than one mutation and potentially exceed single command size limit.

				using start_operation_func_t = std::function<future<::service::group0_guard>(abort_source*)>;

				using mutations_generator = coroutine::experimental::generator<mutation>;

				future<> announce_mutations_with_batching(

				        ::service::raft_group0_client& group0_client,

				        // since we can operate also in topology coordinator context where we need stronger

				        // guarantees than start_operation from group0_client gives we allow to inject custom

				        // function here

				        start_operation_func_t start_operation_func,

				        std::function<mutations_generator(api::timestamp_type& t)> gen,

				        seastar::abort_source* as,

				        std::optional<::service::raft_timeout> timeout);

				// Execute update query via group0 mechanism, mutations will be applied on all nodes.

				future<> announce_mutations(

				        cql3::query_processor& qp,

				        ::service::raft_group0_client& group0_client,

				        const sstring query_string,

				        std::vector<data_value_or_unset> values,

				        seastar::abort_source* as,

				        std::optional<::service::raft_timeout> timeout);

				}

									
										200

auth/default_authorizer.cc
									
												View File
												
				@@ -9,20 +9,18 @@

				 */

				#include "auth/default_authorizer.hh"

				#include "db/system_auth_keyspace.hh"

				extern "C" {

				#include <crypt.h>

				#include <unistd.h>

				}

				#include <chrono>

				#include <random>

				#include <boost/algorithm/string/join.hpp>

				#include <boost/range.hpp>

				#include <seastar/core/seastar.hh>

				#include <seastar/core/sleep.hh>

				#include "auth/authenticated_user.hh"

				#include "auth/common.hh"

				#include "auth/permission.hh"

				#include "auth/role_or_anonymous.hh"

				@@ -30,7 +28,6 @@ extern "C" {

				#include "cql3/untyped_result_set.hh"

				#include "exceptions/exceptions.hh"

				#include "log.hh"

				#include "replica/database.hh"

				#include "utils/class_registrator.hh"

				namespace auth {

				@@ -51,10 +48,12 @@ static const class_registrator<

				        authorizer,

				        default_authorizer,

				        cql3::query_processor&,

				        ::service::raft_group0_client&,

				        ::service::migration_manager&> password_auth_reg("org.apache.cassandra.auth.CassandraAuthorizer");

				default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::migration_manager& mm)

				default_authorizer::default_authorizer(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm)

				        : _qp(qp)

				        , _group0_client(g0)

				        , _migration_manager(mm) {

				}

				@@ -64,11 +63,11 @@ default_authorizer::~default_authorizer() {

				static const sstring legacy_table_name{"permissions"};

				bool default_authorizer::legacy_metadata_exists() const {

				    return _qp.db().has_schema(meta::AUTH_KS, legacy_table_name);

				    return _qp.db().has_schema(meta::legacy::AUTH_KS, legacy_table_name);

				}

				future<bool> default_authorizer::any_granted() const {

				    static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::AUTH_KS, PERMISSIONS_CF);

				future<bool> default_authorizer::legacy_any_granted() const {

				    static const sstring query = format("SELECT * FROM {}.{} LIMIT 1", meta::legacy::AUTH_KS, PERMISSIONS_CF);

				    return _qp.execute_internal(

				            query,

				@@ -79,9 +78,9 @@ future<bool> default_authorizer::any_granted() const {

				    });

				}

				future<> default_authorizer::migrate_legacy_metadata() const {

				future<> default_authorizer::migrate_legacy_metadata() {

				    alogger.info("Starting migration of legacy permissions metadata.");

				    static const sstring query = format("SELECT * FROM {}.{}", meta::AUTH_KS, legacy_table_name);

				    static const sstring query = format("SELECT * FROM {}.{}", meta::legacy::AUTH_KS, legacy_table_name);

				    return _qp.execute_internal(

				            query,

				@@ -112,7 +111,7 @@ future<> default_authorizer::start() {

				            "{} set<text>,"

				            "PRIMARY KEY({}, {})"

				            ") WITH gc_grace_seconds={}",

				            meta::AUTH_KS,

				            meta::legacy::AUTH_KS,

				            PERMISSIONS_CF,

				            ROLE_NAME,

				            RESOURCE_NAME,

				@@ -129,11 +128,11 @@ future<> default_authorizer::start() {

				                _migration_manager).then([this] {

				            _finished = do_after_system_ready(_as, [this] {

				                return async([this] {

				                    _migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get0();

				                    _migration_manager.wait_for_schema_agreement(_qp.db().real_database(), db::timeout_clock::time_point::max(), &_as).get();

				                    if (legacy_metadata_exists()) {

				                        if (!any_granted().get0()) {

				                            migrate_legacy_metadata().get0();

				                        if (!legacy_any_granted().get()) {

				                            migrate_legacy_metadata().get();

				                            return;

				                        }

				@@ -153,27 +152,25 @@ future<> default_authorizer::stop() {

				future<permission_set>

				default_authorizer::authorize(const role_or_anonymous& maybe_role, const resource& r) const {

				    if (is_anonymous(maybe_role)) {

				        return make_ready_future<permission_set>(permissions::NONE);

				        co_return permissions::NONE;

				    }

				    static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",

				    const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? AND {} = ?",

				            PERMISSIONS_NAME,

				            meta::AUTH_KS,

				            get_auth_ks_name(_qp),

				            PERMISSIONS_CF,

				            ROLE_NAME,

				            RESOURCE_NAME);

				    return _qp.execute_internal(

				    const auto results = co_await _qp.execute_internal(

				            query,

				            db::consistency_level::LOCAL_ONE,

				            {*maybe_role.name, r.name()},

				            cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> results) {

				        if (results->empty()) {

				            return permissions::NONE;

				        }

				        return permissions::from_strings(results->one().get_set<sstring>(PERMISSIONS_NAME));

				    });

				            cql3::query_processor::cache_internal::yes);

				    if (results->empty()) {

				        co_return permissions::NONE;

				    }

				    co_return permissions::from_strings(results->one().get_set<sstring>(PERMISSIONS_NAME));

				}

				future<>

				@@ -181,88 +178,88 @@ default_authorizer::modify(

				        std::string_view role_name,

				        permission_set set,

				        const resource& resource,

				        std::string_view op) const {

				    return do_with(

				            format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",

				                    meta::AUTH_KS,

				                    PERMISSIONS_CF,

				                    PERMISSIONS_NAME,

				                    PERMISSIONS_NAME,

				                    op,

				                    ROLE_NAME,

				                    RESOURCE_NAME),

				            [this, &role_name, set, &resource](const auto& query) {

				        return _qp.execute_internal(

				        std::string_view op) {

				    const sstring query = format("UPDATE {}.{} SET {} = {} {} ? WHERE {} = ? AND {} = ?",

				            get_auth_ks_name(_qp),

				            PERMISSIONS_CF,

				            PERMISSIONS_NAME,

				            PERMISSIONS_NAME,

				            op,

				            ROLE_NAME,

				            RESOURCE_NAME);

				    if (legacy_mode(_qp)) {

				        co_return co_await _qp.execute_internal(

				                query,

				                db::consistency_level::ONE,

				                internal_distributed_query_state(),

				                {permissions::to_strings(set), sstring(role_name), resource.name()},

				                cql3::query_processor::cache_internal::no).discard_result();

				    });

				    }

				    co_return co_await announce_mutations(_qp, _group0_client, query,

				        {permissions::to_strings(set), sstring(role_name), resource.name()}, &_as, ::service::raft_timeout{});

				}

				future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) const {

				future<> default_authorizer::grant(std::string_view role_name, permission_set set, const resource& resource) {

				    return modify(role_name, std::move(set), resource, "+");

				}

				future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) const {

				future<> default_authorizer::revoke(std::string_view role_name, permission_set set, const resource& resource) {

				    return modify(role_name, std::move(set), resource, "-");

				}

				future<std::vector<permission_details>> default_authorizer::list_all() const {

				    static const sstring query = format("SELECT {}, {}, {} FROM {}.{}",

				    const sstring query = format("SELECT {}, {}, {} FROM {}.{}",

				            ROLE_NAME,

				            RESOURCE_NAME,

				            PERMISSIONS_NAME,

				            meta::AUTH_KS,

				            get_auth_ks_name(_qp),

				            PERMISSIONS_CF);

				    return _qp.execute_internal(

				    const auto results = co_await _qp.execute_internal(

				            query,

				            db::consistency_level::ONE,

				            internal_distributed_query_state(),

				            {},

				            cql3::query_processor::cache_internal::yes).then([](::shared_ptr<cql3::untyped_result_set> results) {

				        std::vector<permission_details> all_details;

				            cql3::query_processor::cache_internal::yes);

				        for (const auto& row : *results) {

				            if (row.has(PERMISSIONS_NAME)) {

				                auto role_name = row.get_as<sstring>(ROLE_NAME);

				                auto resource = parse_resource(row.get_as<sstring>(RESOURCE_NAME));

				                auto perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));

				                all_details.push_back(permission_details{std::move(role_name), std::move(resource), std::move(perms)});

				            }

				    std::vector<permission_details> all_details;

				    for (const auto& row : *results) {

				        if (row.has(PERMISSIONS_NAME)) {

				            auto role_name = row.get_as<sstring>(ROLE_NAME);

				            auto resource = parse_resource(row.get_as<sstring>(RESOURCE_NAME));

				            auto perms = permissions::from_strings(row.get_set<sstring>(PERMISSIONS_NAME));

				            all_details.push_back(permission_details{std::move(role_name), std::move(resource), std::move(perms)});

				        }

				        return all_details;

				    });

				    }

				    co_return all_details;

				}

				future<> default_authorizer::revoke_all(std::string_view role_name) const {

				    static const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",

				            meta::AUTH_KS,

				            PERMISSIONS_CF,

				            ROLE_NAME);

				    return _qp.execute_internal(

				            query,

				            db::consistency_level::ONE,

				            internal_distributed_query_state(),

				            {sstring(role_name)},

				            cql3::query_processor::cache_internal::no).discard_result().handle_exception([role_name](auto ep) {

				        try {

				            std::rethrow_exception(ep);

				        } catch (exceptions::request_execution_exception& e) {

				            alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);

				future<> default_authorizer::revoke_all(std::string_view role_name) {

				    try {

				        const sstring query = format("DELETE FROM {}.{} WHERE {} = ?",

				                get_auth_ks_name(_qp),

				                PERMISSIONS_CF,

				                ROLE_NAME);

				        if (legacy_mode(_qp)) {

				            co_await _qp.execute_internal(

				                    query,

				                    db::consistency_level::ONE,

				                    internal_distributed_query_state(),

				                    {sstring(role_name)},

				                    cql3::query_processor::cache_internal::no).discard_result();

				        } else {

				            co_await announce_mutations(_qp, _group0_client, query, {sstring(role_name)}, &_as, ::service::raft_timeout{});

				        }

				    });

				    } catch (exceptions::request_execution_exception& e) {

				        alogger.warn("CassandraAuthorizer failed to revoke all permissions of {}: {}", role_name, e);

				    }

				}

				future<> default_authorizer::revoke_all(const resource& resource) const {

				future<> default_authorizer::revoke_all_legacy(const resource& resource) {

				    static const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",

				            ROLE_NAME,

				            meta::AUTH_KS,

				            get_auth_ks_name(_qp),

				            PERMISSIONS_CF,

				            RESOURCE_NAME);

				@@ -272,13 +269,13 @@ future<> default_authorizer::revoke_all(const resource& resource) const {

				            {resource.name()},

				            cql3::query_processor::cache_internal::no).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {

				        try {

				            auto res = f.get0();

				            auto res = f.get();

				            return parallel_for_each(

				                    res->begin(),

				                    res->end(),

				                    [this, res, resource](const cql3::untyped_result_set::row& r) {

				                static const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",

				                        meta::AUTH_KS,

				                        get_auth_ks_name(_qp),

				                        PERMISSIONS_CF,

				                        ROLE_NAME,

				                        RESOURCE_NAME);

				@@ -304,8 +301,55 @@ future<> default_authorizer::revoke_all(const resource& resource) const {

				    });

				}

				future<> default_authorizer::revoke_all(const resource& resource) {

				    if (legacy_mode(_qp)) {

				        co_return co_await revoke_all_legacy(resource);

				    }

				    auto name = resource.name();

				    try {

				        auto gen = [this, name] (api::timestamp_type& t) -> mutations_generator {

				            const sstring query = format("SELECT {} FROM {}.{} WHERE {} = ? ALLOW FILTERING",

				                    ROLE_NAME,

				                    get_auth_ks_name(_qp),

				                    PERMISSIONS_CF,

				                    RESOURCE_NAME);

				            auto res = co_await _qp.execute_internal(

				                    query,

				                    db::consistency_level::LOCAL_ONE,

				                    {name},

				                    cql3::query_processor::cache_internal::no);

				            for (const auto& r : *res) {

				                const sstring query = format("DELETE FROM {}.{} WHERE {} = ? AND {} = ?",

				                        get_auth_ks_name(_qp),

				                        PERMISSIONS_CF,

				                        ROLE_NAME,

				                        RESOURCE_NAME);

				                auto muts = co_await _qp.get_mutations_internal(

				                        query,

				                        internal_distributed_query_state(),

				                        t,

				                        {r.get_as<sstring>(ROLE_NAME), name});

				                if (muts.size() != 1) {

				                    on_internal_error(alogger,

				                        format("expecting single delete mutation, got {}", muts.size()));

				                }

				                co_yield std::move(muts[0]);

				            }

				        };

				        const auto timeout = ::service::raft_timeout{};

				        co_await announce_mutations_with_batching(

				                _group0_client,

				                [this, timeout](abort_source* as) { return _group0_client.start_operation(as, timeout); },

				                std::move(gen),

				                &_as,

				            timeout);

				    } catch (exceptions::request_execution_exception& e) {

				        alogger.warn("CassandraAuthorizer failed to revoke all permissions on {}: {}", name, e);

				    }

				}

				const resource_set& default_authorizer::protected_resources() const {

				    static const resource_set resources({ make_data_resource(meta::AUTH_KS, PERMISSIONS_CF) });

				    static const resource_set resources({ make_data_resource(meta::legacy::AUTH_KS, PERMISSIONS_CF) });

				    return resources;

				}

Compare commits

2773 Commits mykaul-pat ... annastuchl

36 .github/CODEOWNERS vendored Unescape Escape View File

67 .github/mergify.yml vendored Normal file Unescape Escape View File

59 .github/scripts/label_promoted_commits.py vendored Executable file Unescape Escape View File

95 .github/scripts/sync_labels.py vendored Executable file Unescape Escape View File

26 .github/workflows/add-label-when-promoted.yaml vendored Normal file Unescape Escape View File

26 .github/workflows/backport-pr-fixes-validation.yaml vendored Normal file Unescape Escape View File

17 .github/workflows/codespell.yaml vendored Normal file Unescape Escape View File

17 .github/workflows/docs-amplify-enhanced.yaml vendored Unescape Escape View File

9 .github/workflows/docs-pages.yaml vendored Unescape Escape View File

6 .github/workflows/docs-pr.yaml vendored Unescape Escape View File

22 .github/workflows/pr-require-backport-label.yaml vendored Normal file Unescape Escape View File

45 .github/workflows/sync-labels.yaml vendored Normal file Unescape Escape View File

81 CMakeLists.txt Unescape Escape View File

3 SCYLLA-VERSION-GEN Unescape Escape View File

3 alternator/CMakeLists.txt Unescape Escape View File

8 alternator/auth.cc Unescape Escape View File

6 alternator/auth.hh Unescape Escape View File

5 alternator/conditions.cc Unescape Escape View File

2 alternator/conditions.hh Unescape Escape View File

4 alternator/controller.cc Unescape Escape View File

18 alternator/error.hh Unescape Escape View File

261 alternator/executor.cc Unescape Escape View File

32 alternator/expressions.cc Unescape Escape View File

38 alternator/expressions.hh Unescape Escape View File

4 alternator/expressions_types.hh Unescape Escape View File

8 alternator/rmw_operation.hh Unescape Escape View File

3 alternator/serialization.cc Unescape Escape View File

16 alternator/server.cc Unescape Escape View File

6 alternator/stats.cc Unescape Escape View File

12 alternator/stats.hh Unescape Escape View File

14 alternator/streams.cc Unescape Escape View File

17 alternator/ttl.cc Unescape Escape View File

9 api/CMakeLists.txt Unescape Escape View File

10 api/api-doc/column_family.json Unescape Escape View File

15 api/api-doc/commitlog.json Unescape Escape View File

24 api/api-doc/error_injection.json Unescape Escape View File

4 api/api-doc/gossiper.json Unescape Escape View File

6 api/api-doc/metrics.def.json Unescape Escape View File

67 api/api-doc/raft.json Normal file Unescape Escape View File

387 api/api-doc/storage_service.json Unescape Escape View File

15 api/api-doc/system.json Unescape Escape View File

230 api/api-doc/tasks.json Normal file Unescape Escape View File

47 api/api.cc Unescape Escape View File

26 api/api.hh Unescape Escape View File

23 api/api_init.hh Unescape Escape View File

2 api/authorization_cache.cc Unescape Escape View File

6 api/cache_service.cc Unescape Escape View File

2 api/collectd.cc Unescape Escape View File

7 api/collectd.hh Unescape Escape View File

30 api/column_family.cc Unescape Escape View File

4 api/column_family.hh Unescape Escape View File

6 api/commitlog.cc Unescape Escape View File

8 api/commitlog.hh Unescape Escape View File

9 api/compaction_manager.cc Unescape Escape View File

8 api/compaction_manager.hh Unescape Escape View File

1 api/config.cc Unescape Escape View File

2 api/config.hh Unescape Escape View File

32 api/endpoint_snitch.cc Unescape Escape View File

2 api/endpoint_snitch.hh Unescape Escape View File

2 api/error_injection.hh Unescape Escape View File

81 api/failure_detector.cc Unescape Escape View File

2 api/failure_detector.hh Unescape Escape View File

1 api/gossiper.cc Unescape Escape View File

2 api/gossiper.hh Unescape Escape View File

2 api/hinted_handoff.cc Unescape Escape View File

2 api/hinted_handoff.hh Unescape Escape View File

6 api/lsa.cc Unescape Escape View File

2 api/lsa.hh Unescape Escape View File

17 api/messaging_service.cc Unescape Escape View File

2 api/messaging_service.hh Unescape Escape View File

84 api/raft.cc Normal file Unescape Escape View File

18 api/raft.hh Normal file Unescape Escape View File

18 api/scrub_status.hh Normal file Unescape Escape View File

5 api/storage_proxy.cc Unescape Escape View File

2 api/storage_proxy.hh Unescape Escape View File

632 api/storage_service.cc Unescape Escape View File

23 api/storage_service.hh Unescape Escape View File

2 api/stream_manager.cc Unescape Escape View File

2773 Commits

mykaul-pat ... annastuchl

36

.github/CODEOWNERS vendored

View File

67

.github/mergify.yml vendored Normal file

View File

59

.github/scripts/label_promoted_commits.py vendored Executable file

View File

95

.github/scripts/sync_labels.py vendored Executable file

View File

26

.github/workflows/add-label-when-promoted.yaml vendored Normal file

View File

26

.github/workflows/backport-pr-fixes-validation.yaml vendored Normal file

View File

17

.github/workflows/codespell.yaml vendored Normal file

View File

17

.github/workflows/docs-amplify-enhanced.yaml vendored

View File

9

.github/workflows/docs-pages.yaml vendored

View File

6

.github/workflows/docs-pr.yaml vendored

View File

22

.github/workflows/pr-require-backport-label.yaml vendored Normal file

View File

45

.github/workflows/sync-labels.yaml vendored Normal file

View File

81

CMakeLists.txt

View File

3

SCYLLA-VERSION-GEN

View File

3

alternator/CMakeLists.txt

View File

8

alternator/auth.cc

View File

6

alternator/auth.hh

View File

5

alternator/conditions.cc

View File

2

alternator/conditions.hh

View File

4

alternator/controller.cc

View File

18

alternator/error.hh

View File

261

alternator/executor.cc

View File

32

alternator/expressions.cc

View File

38

alternator/expressions.hh

View File

4

alternator/expressions_types.hh

View File

8

alternator/rmw_operation.hh

View File

3

alternator/serialization.cc

View File

16

alternator/server.cc

View File

6

alternator/stats.cc

View File

12

alternator/stats.hh

View File

14

alternator/streams.cc

View File

17

alternator/ttl.cc

View File

9

api/CMakeLists.txt

View File

10

api/api-doc/column_family.json

View File

15

api/api-doc/commitlog.json

View File

24

api/api-doc/error_injection.json

View File

4

api/api-doc/gossiper.json

View File

6

api/api-doc/metrics.def.json

View File

67

api/api-doc/raft.json Normal file

View File

387

api/api-doc/storage_service.json

View File

15

api/api-doc/system.json

View File

230

api/api-doc/tasks.json Normal file

View File

47

api/api.cc

View File

26

api/api.hh

View File

23

api/api_init.hh

View File

2

api/authorization_cache.cc

View File

6

api/cache_service.cc

View File

2

api/collectd.cc

View File

7

api/collectd.hh

View File

30

api/column_family.cc

View File

4

api/column_family.hh

View File

6

api/commitlog.cc

View File

8

api/commitlog.hh

View File

9

api/compaction_manager.cc

View File

8

api/compaction_manager.hh

View File

1

api/config.cc

View File

2

api/config.hh

View File

32

api/endpoint_snitch.cc

View File

2

api/endpoint_snitch.hh

View File

2

api/error_injection.hh

View File

81

api/failure_detector.cc

View File

2

api/failure_detector.hh

View File

1

api/gossiper.cc

View File

2

api/gossiper.hh

View File

2

api/hinted_handoff.cc

View File

2

api/hinted_handoff.hh

View File

6

api/lsa.cc

View File

2

api/lsa.hh

View File

17

api/messaging_service.cc

View File

2

api/messaging_service.hh

View File

84

api/raft.cc Normal file

View File

18

api/raft.hh Normal file

View File

18

api/scrub_status.hh Normal file

View File

5

api/storage_proxy.cc

View File

2

api/storage_proxy.hh

View File

632

api/storage_service.cc

View File

23

api/storage_service.hh

View File

2

api/stream_manager.cc

View File

2

api/stream_manager.hh

View File