scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 16:22:15 +00:00

Author	SHA1	Message	Date
Kefu Chai	c03141b4b2	api: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-13 09:32:51 +08:00
Botond Dénes	feea609e37	api/error_injection: add getter for error_injection Allow external code to obtain information about an error injection point, including whether it is enabled, and importantly, what its parameters are. Together with the `set_parameter()` added in the previous patch, this allows tests to read out the values of internal parameters, via a set_parameter() injection point.	2024-06-11 04:17:48 -04:00
Aleksandra Martyniuk	30f97ea133	tasks: test: modify test_task methods Wait until the task is done in test_task::finish_failed and test_task::finish to ensure that it is folded into its parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	c1b2b8cb2c	api: task_manager: do not unregister task in /task_manager/wait_task/ If /task_manager/wait_task/ unregisters the task, then there is no way to examine children failures, since their statuses can be checked only through their parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	e6c50ad2d0	tasks: fold finished tasks info their parents Currently, when a child task is unregistered, it is still kept by its parent. This leads to excessive memory usage, especially when the tasks are configured to be kept in task manager after they are finished (task_ttl_in_seconds). Introduce task_essentials struct which keeps only data necesarry for task manager API. When a task which has a parent is finished, a foreign pointer to it in its parent is replaced with respective task_essentials. Once a parent task is finished it is also folded into its parent (if it has one). Children details of a folded task are lost, unless they (or some of their subtrees) failed. That is, when a task is finished, we keep: - a root task (until it is unregistered); - task_essentials of root's direct children; - a path (of task_essentials) from root to each failed task (so that the reason of a failure could be examined).	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	6add9edf8a	tasks: change _children type Keep task children in a map. It's a preparation for further changes.	2024-05-31 10:27:09 +02:00
Kefu Chai	e70b116333	api/api-doc/utils: fix a typo in description s/mintues/minutes/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18869	2024-05-27 14:15:23 +03:00
Pavel Emelyanov	d86a8252d4	api: Don't switch sched group to start/stop protocol servers All the protocol servers implementations now maintain scheduling group on their own, so the API handler can stop caring Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	b24fb8dc87	inet_address: Remove to_sstring() in favor of fmt::to_string The existing inet_address::to_string() calls fmt::format("{}", *this) anyway. However, the to_string() method is declared in .cc file, while form formatter is in the header and is equipeed with constexprs so that converting an address to string is done as much as possible compile-time. Also, though minor, fmt::to_string(foo) is believed to be even faster than fmt::format("{}", foo). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18712	2024-05-21 09:43:08 +03:00
Botond Dénes	f239339a29	Merge 'Improve modularity of some per-table API endpoints' from Pavel Emelyanov There's a set of API endpoints that toggle per-table auto-compaction and tombstone-gc booleans. They all live in two different .cc files under api/ directory and duplicate code of each other. This PR generalizes those handlers, places them next to each other, fixes leak on stop and, as a nice side effect, enlightens database.hh header. Closes scylladb/scylladb#18703 * github.com:scylladb/scylladb: api,database: Move auto-compaction toggle guard api: Move some table manipulation helpers from storage_service api: Move table-related calls from storage_service domain api: Reimplement some endpoints using existing helpers api: Lost unset of tombstone-gc endpoints	2024-05-20 18:01:54 +03:00
Avi Kivity	52fe351c31	Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec This is needed to avoid severe imbalance between shards which can happen when some table grows and is split. The inter-node balance can be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N then there is not even a possibility of moving tablets around to fix the imbalance. The only way to bring the system to balance is to move tablets within the nodes. The system is not prepared for intra-node migration currently. Request coordination is host-based, while for intra-node migration it should be (also) shard-based. The solution employed here is to keep the coordination between nodes as-is, and for intra-node migration storage_proxy-level coordinator is not aware of the migration (no pending host). The replica-side request handler will be a second-level coordinator which routes requests to shards, similar to how the first-level coordinator routes them to hosts. Tablet sharder is adjusted to handle intra-migration where a tablet can have two replicas on the same host. For reads, sharder uses the read selector to resolve the conflict. For writes, the write selector is used. The old shard_of() API is kept to represent shard for reads, and new method is introduced to query the shards for writing: shard_for_writes(). All writers should be switched to that API, which is not done in this patch yet. The request handler on replica side acts as a second-level coordinator, using sharder to determine routing to shards. A given sharder has a scope of a single topology version, a single effective_replication_map_ptr, which should be kept alive during writes. perf-simple-query test results show no signs of regression: Command: perf-simple-query -c1 -m1G --write --tablets --duration=10 Before: > 83294.81 tps ( 59.5 allocs/op, 14.3 tasks/op, 53725 insns/op, 0 errors) > 87756.72 tps ( 59.5 allocs/op, 14.3 tasks/op, 54049 insns/op, 0 errors) > 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > 86211.38 tps ( 59.7 allocs/op, 14.3 tasks/op, 54219 insns/op, 0 errors) > 86559.89 tps ( 59.6 allocs/op, 14.3 tasks/op, 54188 insns/op, 0 errors) > 86609.39 tps ( 59.6 allocs/op, 14.3 tasks/op, 54117 insns/op, 0 errors) > 87464.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 54039 insns/op, 0 errors) > 86185.43 tps ( 59.6 allocs/op, 14.3 tasks/op, 54169 insns/op, 0 errors) > 86254.71 tps ( 59.6 allocs/op, 14.3 tasks/op, 54139 insns/op, 0 errors) > 83395.35 tps ( 60.2 allocs/op, 14.4 tasks/op, 54693 insns/op, 0 errors) > > median 86428.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54208 insns/op, 0 errors) > median absolute deviation: 243.04 > maximum: 87756.72 > minimum: 83294.81 > After: > 85523.06 tps ( 59.5 allocs/op, 14.3 tasks/op, 53872 insns/op, 0 errors) > 89362.47 tps ( 59.6 allocs/op, 14.3 tasks/op, 54226 insns/op, 0 errors) > 88167.55 tps ( 59.7 allocs/op, 14.3 tasks/op, 54400 insns/op, 0 errors) > 87044.40 tps ( 59.7 allocs/op, 14.3 tasks/op, 54310 insns/op, 0 errors) > 88344.50 tps ( 59.6 allocs/op, 14.3 tasks/op, 54289 insns/op, 0 errors) > 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > 88725.46 tps ( 59.6 allocs/op, 14.3 tasks/op, 54230 insns/op, 0 errors) > 88640.08 tps ( 59.6 allocs/op, 14.3 tasks/op, 54210 insns/op, 0 errors) > 90306.31 tps ( 59.4 allocs/op, 14.3 tasks/op, 54043 insns/op, 0 errors) > 87343.62 tps ( 59.8 allocs/op, 14.3 tasks/op, 54496 insns/op, 0 errors) > > median 88355.06 tps ( 59.6 allocs/op, 14.3 tasks/op, 54242 insns/op, 0 errors) > median absolute deviation: 1007.41 > maximum: 90306.31 > minimum: 85523.06 Command (reads): perf-simple-query -c1 -m1G --tablets --duration=10 Before: > 95860.18 tps ( 63.1 allocs/op, 14.1 tasks/op, 42476 insns/op, 0 errors) > 97537.69 tps ( 63.1 allocs/op, 14.1 tasks/op, 42454 insns/op, 0 errors) > 97549.23 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97511.29 tps ( 63.1 allocs/op, 14.1 tasks/op, 42470 insns/op, 0 errors) > 97227.32 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 94031.94 tps ( 63.1 allocs/op, 14.1 tasks/op, 42441 insns/op, 0 errors) > 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > 96401.70 tps ( 63.1 allocs/op, 14.1 tasks/op, 42473 insns/op, 0 errors) > 96573.77 tps ( 63.1 allocs/op, 14.1 tasks/op, 42440 insns/op, 0 errors) > 96340.54 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > > median 96978.04 tps ( 63.1 allocs/op, 14.1 tasks/op, 42462 insns/op, 0 errors) > median absolute deviation: 571.20 > maximum: 97549.23 > minimum: 94031.94 > After: > 99794.67 tps ( 63.1 allocs/op, 14.1 tasks/op, 42471 insns/op, 0 errors) > 101244.99 tps ( 63.1 allocs/op, 14.1 tasks/op, 42472 insns/op, 0 errors) > 101128.37 tps ( 63.1 allocs/op, 14.1 tasks/op, 42485 insns/op, 0 errors) > 101065.27 tps ( 63.1 allocs/op, 14.1 tasks/op, 42465 insns/op, 0 errors) > 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > 101413.31 tps ( 63.1 allocs/op, 14.1 tasks/op, 42463 insns/op, 0 errors) > 101464.92 tps ( 63.1 allocs/op, 14.1 tasks/op, 42466 insns/op, 0 errors) > 101086.74 tps ( 63.1 allocs/op, 14.1 tasks/op, 42488 insns/op, 0 errors) > 101559.09 tps ( 63.1 allocs/op, 14.1 tasks/op, 42468 insns/op, 0 errors) > 100742.58 tps ( 63.1 allocs/op, 14.1 tasks/op, 42491 insns/op, 0 errors) > > median 101212.98 tps ( 63.1 allocs/op, 14.1 tasks/op, 42456 insns/op, 0 errors) > median absolute deviation: 200.33 > maximum: 101559.09 > minimum: 99794.67 > Fixes #16594 Closes scylladb/scylladb#18026 * github.com:scylladb/scylladb: Implement fast streaming for intra-node migration test: tablets_test: Test sharding during intra-node migration test: tablets_test: Check sharding also on the pending host test: py: tablets: Test writes concurrent with migration test: py: tablets: Test crash during intra-node migration api, storage_service: Introduce API to wait for topology to quiesce dht, replica: Remove deprecated sharder APIs test: Avoid using deprecated sharded API db: do_apply_many() avoid deprecated sharded API replica: mutation_dump: Avoid deprecated sharder API repair: Avoid deprecated sharder API table: Remove optimization which returns empty reader when key is not owned by the shard dht: is_single_shard: Avoid deprecated sharder API dht: split_range_to_single_shard: Work with static_sharder only dht: ring_position_range_sharder: Avoid deprecated sharder APIs dht: token: Avoid use of deprecated sharder API by switching to static_sharder selective_token_sharder: Avoid use of deprecated sharder API docs: Document tablet sharding vs tablet replica placement readers/multishard.cc: use shard_for_reads() instead of shard_of() multishard_mutation_query.cc: use shard_for_reads() instead of shard_of() storage_proxy: Extract common code to apply mutations on many shards according to sharder storage_proxy: Prepare per-partition rate-limiting for intra-node migration storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate() storage_proxy: Prepare mutate_hint() for intra-node tablet migration commitlog_replayer: Avoid deprecated sharder::shard_of() lwt: Avoid deprecated sharder::shard_of() compaction: Avoid deprecated sharder::shard_of() dht: Extract dht::static_sharder replica: Deprecate table::shard_of() locator: Deprecate effective_replication_map::shard_of() dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard tests: tablets: py: Add intra-node migration test tests: tablets: Test that drained nodes are not balanced internally tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load tests: tablets: Verify that disabling balancing results in no intra-node migrations tests: tablets: Check that nodes are internally balanced tests: tablets: Improve debuggability by showing which rows are missing tablets, storage_service: Support intra-node migration in move_tablet() API tablet_allocator: Generate intra-node migration plan tablet_allocator: Extract make_internode_plan() tablet_allocator: Maintain candidate list and shard tablet count for target nodes tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions tablets, streaming: Implement tablet streaming for intra-node migration dht, auto_refreshing_sharder: Allow overriding write selector multishard_writer: Handle intra-node migration storage_proxy: Handle intra-node tablet migration for writes tablets: Get rid of tablet_map::get_shard() tablets: Avoid tablet_map::get_shard in cleanup tablets: test: Use sharder instead of tablet_map::get_shard() tablets: tablet_sharder: Allow working with non-local host sharding: Prepare for intra-node-migration docs: Document sharder use for tablets tablets: Introduce tablet transition kind for intra-node migration tests: tablets: Fix use-after-move of skiplist in rebalance_tablets() sstables, gdb: Track readers in a linked list raft topology: Fix global token metadata barrier to not fence ahead of what is drained	2024-05-20 16:13:01 +03:00
Pavel Emelyanov	31d05925cc	api,database: Move auto-compaction toggle guard Toggling per-table auto-compaction enabling bit is guarded with on-database boolean and raii guard. It's only used by a single api/column_family.cc file, so it can live there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:51 +03:00
Pavel Emelyanov	a43b178f72	api: Move some table manipulation helpers from storage_service Continuation of the previous patch -- helpers toggling tombstone_gc and auto_compaction on tables should live in the same file that uses them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	862fcd7bc7	api: Move table-related calls from storage_service domain The storage_service/(enable\|disable)_(tombstone_gc\|auto_compaction) endpoints are not handled by storage_service _service_ and should rather live in the column_family/ domain which is handler by replica::database. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	ba53283d21	api: Reimplement some endpoints using existing helpers The (enable\|disable)_(tombstone_gc\|auto_compaction) endpoints living in column_family domain can benefit from the helpers that do the same in the storage_service domain. The "difference" is that c.f. endpoints do it per-table, while s.s. ones operate on a vector of tables, so the former is a corner case of the latter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Pavel Emelyanov	231ffa623c	api: Lost unset of tombstone-gc endpoints On stop all endpoints must be unregistered, these three are lost Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-16 14:42:50 +03:00
Tomasz Grabiec	7956a2991e	api, storage_service: Introduce API to wait for topology to quiesce	2024-05-16 00:28:47 +02:00
Botond Dénes	fd25bb6f9f	api/storage_service: add tablet support for /storage_service/tokens_endpoint Add a keyspace and cf parameter. When specified, the endpoint will return token -> primary replica mapping for the table's tablet tokens, not the vnodes.	2024-05-13 07:09:20 -04:00
Botond Dénes	0438febdc9	Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`. Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table. The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards. Fixes #5883. Closes scylladb/scylladb#18286 * github.com:scylladb/scylladb: test/alternator: test addressing LSI using REST API REST API: stop using deprecated, buggy, path parameter	2024-05-09 08:26:43 +03:00
Kefu Chai	0b0e661a85	build: bring abseil submodule back because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689, the rebuilt abseil package provided by fedora has different settings than the ones if the tree is built with the sanitizer enabled. this inconsistency leads to a crash. to address this problem, we have to reinstate the abseil submodule, so we can built it with the same compiler options with which we build the tree. in this change * Revert "build: drop abseil submodule, replace with distribution abseil" * update CMake building system with abseil header include settings * bump up the abseil submodule to the latest LTS branch of abseil: lts_2024_01_16 * update scylla-gdb.py to adapt to the new structure of flat_hash_map This reverts commit `8635d24424`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18511	2024-05-05 23:31:09 +03:00
Nadav Har'El	1aacfdf460	REST API: stop using deprecated, buggy, path parameter The API req->param["name"] to access parameters in the path part of the URL was buggy - it forgot to do URL decoding and the result of our use of it in Scylla was bugs like #5883 - where special characters in certain REST API requests got botched up (encoded by the client, then not decoded by the server). The solution is to replace all uses of req->param["name"] by the new req->get_path_param("name"), which does the decoding correctly. Unfortunately we needed to change 104 (!) callers in this patch, but the transformation is mostly mechanical and there is no functional changes in this patch. Another set of changes was to bring req, not req->param, to a few functions that want to get the path param. This patch avoids the numerous deprecation warnings we had before, and more importantly, it fixes #5883. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-02 12:33:46 +03:00
Kefu Chai	0bbaded4ce	api/storage_service: convert runtime_error from repair to http error in `set_repair()`, despite that the repair is performed asynchronously, we check the options specified by client immediately, and throw `std::runtime_error`, if any of them is not supported. before this change, these unhandled exceptions are translated to HTTP 500 error but the underlying HTTP router. but this is misleading, as these errors are caused by client, not server. and the error message is missing in the HTTP error message when performing the translation. in this change, we handle the `runtime_error`, and translate them into `httpd::bad_param_exception`, so that the client can have HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error), and with informative error message. for instance, if we apply repair with "small_table_optimization" enabled on a keyspace with tablets enabled. we should have an HTTP error 400 with "The small_table_optimization option is not supported for tablet repair" as the body of the error. this would much more helpful. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:25:15 +08:00
Kefu Chai	d737ba1ab2	api/storage_service: coroutinize set_repair() before this change, `set_repair()` uses a lambda for handling the client-side requests. and this works great. but the underlying `repair_start()` throws if any of the given options is not sane. and we don't handle any of these throw exceptions in `set_repair()`, from client's point of view, it would get an HTTP 500 error code, which implies an "Internal Server Error". but actually, we should blame the client for the error, not the server. so, to prepare the error handling, let's take the opportunity to coroutinize the lambda handling the request, so that we can handle the exception in a more elegant way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-26 14:24:03 +08:00
Pavel Emelyanov	ae4c1c44ec	snapshot: Get per-table snapshot size under snapshot lock Walking per-table snapshot directory without lock is racy. There's snapshot-ctl locking that's used to get db-wide snapshot details, it should be used to get per-table snapshot details too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 10:05:51 +03:00
Pavel Emelyanov	186b36165e	snapshot: Move per-table snap API to other snapshot endpoints So that they are collected in one place and to facilitate next patch that's going to use snapshot-ctl for per-table API too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-25 10:05:01 +03:00
Botond Dénes	572003c469	Merge 'Cleanup the way snapshot details are propagated via API' from Pavel Emelyanov There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several snapshot_details aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls. Closes scylladb/scylladb#18317 * github.com:scylladb/scylladb: snapshot_ctl: Brush up true_snapshots_size() internals snapshot_ctl: Remove unused details struct snapshot_ctl: No double recoding of details database,snapshots: Move database::snapshot_details into snapshot_ctl database,snapshots: Make database::get_snapshot_details() return map, not vector table,snapshots: Move table::snapshot_details into snapshot_ctl	2024-04-23 16:28:25 +03:00
Pavel Emelyanov	e8f10be12e	snapshot_ctl: No double recoding of details Currently database::get_snapshot_details() returns a collection of snapshots. The snapshot_ctl converts this collection into similarly looking one with slightly different structures inside. The resulting collection is converted one more time on the API layer into another similarly looking map. This patch removes the intermediate conversion. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-19 20:04:32 +03:00
Kefu Chai	a439ebcfce	treewide: include fmt/ranges.h and/or fmt/std.h before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:16 +08:00
Kamil Braun	eb9ba914a3	Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy The problem this series solves is correctly ignoring DOWN nodes state when replacing a node. When a node is replaced and there are other nodes that are down, the replacing node is told to ignore those DOWN nodes using the `ignore_dead_nodes_for_replace` option. Since the replacing node is bootstrapping it starts with an empty system.peers table so it has no notion about any node state and it learns about all other nodes via gossip shadow round done in `storage_service::prepare_replacement_info`. Normally, since the DOWN nodes to ignore already joined the ring, the remaining node will have their endpoint state already in gossip, but if the whole cluster was restarted while those DOWN nodes did not start, the remaining nodes will only have a partial endpoint state from them, which is loaded from system.peers. Currently, the partial endpoint state contains only `HOST_ID` and `TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`. The first part of this series loads also `DC` and `RACK` from system.peers to make them available to the replacing node as they are crucial for building a correct replication map with network topology replication strategy. But still, without a `STATUS` those nodes are not considered as normal token owners yet, and they do not go through handle_state_normal which adds them to the topology and token_metadata. The second part of this series uses the endpoint state retrieved in the gossip shadow round to explicitly add the ignored nodes' state to topology (including dc and rack) and token_metadata (tokens) in `prepare_replacement_info`. If there are more DOWN nodes that are not explicitly ignored replace will fail (as it should). Fixes scylladb/scylladb#15787 Closes scylladb/scylladb#15788 * github.com:scylladb/scylladb: storage_service: join_token_ring: load ignored nodes state if replacing storage_service: replacement_info: return ignore_nodes state locator: host_id_or_endpoint: keep value as variant gms: endpoint_state: add getters for host_id, dc_rack, and tokens storage_service: topology_state_load: set local STATUS state using add_saved_endpoint gossiper: add_saved_endpoint: set dc and rack gossiper: add_saved_endpoint: fixup indentation gossiper: add_saved_endpoint: make host_id mandatory gossiper: add load_endpoint_state gossiper: start_gossiping: log local state	2024-04-16 10:27:36 +02:00
Pavel Emelyanov	05c4042511	api/lsa: Don't use database to perform invoke-on-all The sharded<database> is used as a invoke_in_all() method provider, there's no real need in database itself. Simple smp::invoke_on_all() would work just as good. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18221	2024-04-16 07:12:40 +03:00
Pavel Emelyanov	f3edde7d2e	api: Qualify callback commitlog* argument with const There's a helper map-reducer that accepts a function to call on commitlog. All callers accumulate statistics with it, so the commitlog argument is const pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18238	2024-04-16 07:02:31 +03:00
Pavel Emelyanov	8bad828208	api: Add method to delete replica from tablet Copied from the add_replica counterpart TODO: Generalize common parts of move_tablet and add_\|del_tablet_replica Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-15 16:31:07 +03:00
Benny Halevy	7c2bd8dc34	locator: host_id_or_endpoint: keep value as variant Rather than allowing to keep both host_id and endpoint, keep only one of them and provide resolve functions that use the token_metadata to resolve the host_id into an inet_address or vice verse. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:25:50 +03:00
Tomasz Grabiec	0c74c2c12f	Merge 'Extend tablet_transition_kind::rebuild to rebuild tablet to new replica' from Pavel Emelyanov When altering rf for a keyspace, all tablets in this ks will get more replicas. Part of this process is rebuilding tablets' onto new node(s). This PR extends the tablets transition code to support rebuilding of tablet on new replica. fixes: #18030 Closes scylladb/scylladb#18082 * github.com:scylladb/scylladb: test: Check data presense as well test: Test how tablets are copied between nodes test: Add sanity test for tablet migration api: Add method to add replica to a tablet tablet: Make leaving replica optional	2024-04-05 12:51:10 +02:00
Pavel Emelyanov	2a98e95cd0	api: Coroutinize API get_snapshot_details handler Now it's possible to understand what it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18190	2024-04-04 22:20:28 +03:00
Kefu Chai	64b8bb239f	api/storage_service: throw if table is not found when move tablets `database::find_column_family()` throws no_such_column_family if an unknown ks.cf is fed to it. and we call into this function without checking for the existence of ks.cf first. since "/storage_service/tablets/move" is a public interface, we should translate this error to a better http error. in this change, we check for the existence of the given ks.cf, and throw an exception so that it can be caught by seastar::httpd::routers, and converted to an HTTP error. Fixes #17198 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17217	2024-04-04 11:23:52 +03:00
Pavel Emelyanov	79ad760e95	api: Add method to add replica to a tablet The new API submits rebuild transition with new replicas set to be old (current) replicas plus the provided one. It looks and acts like the move_tablet API call with several changes: - lacks the "source" replica argument - submits "rebuild" transition kind - cross racks checks are not performed The 'force' argument is inherited from move_tablet, but is unused now and is left for future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-04-04 09:22:16 +03:00
Benny Halevy	1272d736c0	api: storage_service: upgrade_to_raft_topology: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-02 20:02:51 +03:00
Benny Halevy	31026ae27f	api: storage_service: upgrade_to_raft_topology: add logging Upgrading raft topology is an important api call that should be logged. When failed, it is also important to log the exception to get better visibility into why the call failed. Indentation will be fixed in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-02 20:02:49 +03:00
Pavel Emelyanov	67c2a06493	api: Rename (un)set_server_load_sstable -> (un)set_server_column_family The method sets up column family API, not load-sstables one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18022	2024-03-26 12:16:08 +02:00
Kefu Chai	1b859e484f	treewide: use fmt::to_string() to transform a UUID to std::string without `FMT_DEPRECATED_OSTREAM` macro, `UUID::to_sstring()` is implemented using its `fmt::formatter`, which is not available at the end of this header file where `UUID` is defined. at this moment, we still use `FMT_DEPRECATED_OSTREAM` and {fmt} v9, so we can still use `UUID::to_sstring()`, but in {fmt} v10, we cannot. so, in this change, we change all callers of `UUID::to_sstring()` to `fmt::to_string()`, so that we don't depend on `FMT_DEPRECATED_OSTREAM` and {fmt} v9 anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-26 13:38:37 +08:00
Petr Gusev	5db6b8b3c2	error_injection: move api registration to set_server_init The set_server_done function is called only when a node is fully initialized. To allow error injection to be used during initialization we move the handler registration to set_server_init, which is called as soon as the api http server is started.	2024-03-19 20:18:29 +04:00
Avi Kivity	72bbe75d5b	Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec This PR fixes a problem with replacing a node with tablets when RF=N. Currently, this will fail because tablet replica allocation for rebuild will not be able to find a viable destination, as the replacing node is not considered to be a candidate. It cannot be a candidate because replace rolls back on failure and we cannot roll back after tablets were migrated. The solution taken here is to not drain tablet replicas from replaced node during topology request but leave it to happen later after the replaced node is in left state and replacing node is in normal state. The replacing node waits for this draining to be complete on boot before the node is considered booted. Fixes https://github.com/scylladb/scylladb/issues/17025 Nodes in the left state will be kept in tablet replica sets for a while after node replace is done, until the new replica is rebuilt. So we need to know about those node's location (dc, rack) for two reasons: 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first. 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement. It's ok to not know the IP, and we don't keep it. Those nodes will not be present in the IP-based replica sets, e.g. those returned by get_natural_endpoints(), only in host_id-based replica sets. storage_proxy request coordination is not affected. Nodes in the left state are still not present in token ring, and not considered to be members of the ring (datacanter endpoints excludes them). In the future we could make the change even more transparent by only loading locator::node* for those nodes and keeping node* in tablet replica sets. Currently left nodes are never removed from topology, so will accumulate in memory. We could garbage-collect them from topology coordinator if a left node is absent in any replica set. That means we need a new state - left_for_real. Closes scylladb/scylladb#17388 * github.com:scylladb/scylladb: test: py: Add test for view replica pairing after replace raft, api: Add RESTful API to query current leader of a raft group test: test_tablets_removenode: Verify replacing when there is no spare node doc: topology-on-raft: Document replace behavior with tablets tablets, raft topology: Rebuild tablets after replacing node is normal tablets: load_balancer: Access node attributes via node struct tablets: load_balancer: Extract ensure_node() mv: Switch to using host_id-based replica set effective_replication_map: Introduce host_id-based get_replicas() raft topology: Keep nodes in the left state to topology tablets: Introduce read_required_hosts()	2024-03-18 16:16:08 +02:00
Tomasz Grabiec	6d50e93f10	raft, api: Add RESTful API to query current leader of a raft group Example: $ curl -X GET "http://127.0.0.1:10000/raft/leader_host" "f7f57588-62de-4cac-9e4b-c62bfc458d91" Accepts optional group_id param, defaults to group0.	2024-03-15 13:20:08 +01:00
Benny Halevy	530d270828	api: /storage_service/tablets/balancing: fix incorrect operation summary It was probably copy-pasted from /storage_service/tablets/move Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#17811	2024-03-14 22:52:57 +01:00
Pavel Emelyanov	def5fed619	api: Fix stats reported for row cache Here are three endpoints in the api/cache_service that report "metrics" for the row cache and the values they return - entries: number of partitions - size: number of partitions - capacity: used space The size and capacity seem very inaccurate. Comment says, that in C* the size should be weighted, but scylla doesn't support weight of entries in cache. Also, capacity is configurable via row_cache_size_in_mb config option or set_row_cache_capacity_in_mb API call, but Scylla doesn't support both either. This patch suggestes changing return values for size and capacity endpoints. Despite row cache doesn't support weights, it's natural to return used_space in bytes as the value, which is more accurate to what "size" means rather than number of entries. The capacity may return back total memory size, because this is what Scylla really does -- row cache growth is only limited by other memory consumers, not by configured limits. fixes: #9418 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17724	2024-03-12 13:44:59 +02:00
Patryk Wrobel	9eb91b5526	storage_service/ownership: discard get_ownership() requests when tablets enabled This change introduces a logic, that is responsible for checking if tablets are enabled for any of keyspaces when get_ownership() is invoked. Without it, the result would be calculated based solely on sorted_tokens() which was invalid. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:52:25 +01:00
Patryk Wrobel	51da80da7d	storage_service/ownership/{keyspace}: handle requests when tablets are enabled Before this change, when user tried to utilize 'storage_service/ownership/{keyspace}' API with keyspace parameter that uses tablets, then internal error was thrown. The code was calling a function, that is intended for vnodes: get_vnode_effective_replication_map(). This commit introduces graceful handling of such scenario and extends the API to allow passing 'cf' parameter that denotes table name. Now, when keyspace uses tablets and cf parameter is not passed a descriptive error message is returned via BAD_REQUEST. Users cannot query ownership for keyspace that uses tablets, but they can query ownership for a table in a given keyspace that uses tablets. Also, new tests have been added to test/rest_api/test_storage_service.py and to test/topology_experimental_raft/test_tablets.py in order to verify the behavior with and without tablets enabled. Refs: scylladb#17342 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-03-11 09:52:23 +01:00
Pavel Emelyanov	ceac65be1e	api: Reserve vectors in advance Some endpoints in api/column_family fill vectors with data obtained from database and return them back. Since the amount of data is known in advance, it's good to reserve the vector. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 19:13:05 +03:00
Pavel Emelyanov	f3e58cb806	api: Use range-loop to iterate keyspaces The code uses standard for (;;) loop, but range version is nicer Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-02-20 19:12:12 +03:00

1 2 3 4 5 ...

941 Commits