scylladb

Author	SHA1	Message	Date
Botond Dénes	ce0ed29ad6	Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun This allows the user of `raft::server` to cause it to create a snapshot and truncate the Raft log (leaving no trailing entries; in the future we may extend the API to specify number of trailing entries left if needed). In a later commit we'll add a REST endpoint to Scylla to trigger group 0 snapshots. One use case for this API is to create group 0 snapshots in Scylla deployments which upgraded to Raft in version 5.2 and started with an empty Raft log with no snapshot at the beginning. This causes problems, e.g. when a new node bootstraps to the cluster, it will not receive a snapshot that would contain both schema and group 0 history, which would then lead to inconsistent schema state and trigger assertion failures as observed in scylladb/scylladb#16683. In 5.4 the logic of initial group 0 setup was changed to start the Raft log with a snapshot at index 1 (`ff386e7a44`) but a problem remains with these existing deployments coming from 5.2, we need a way to trigger a snapshot in them (other than performing 1000 arbitrary schema changes). Another potential use case in the future would be to trigger snapshots based on external memory pressure in tablet Raft groups (for strongly consistent tables). The PR adds the API to `raft::server` and a HTTP endpoint that uses it. In a follow-up PR, we plan to modify group 0 server startup logic to automatically call this API if it sees that no snapshot is present yet (to automatically fix the aforementioned 5.2 deployments once they upgrade.) Closes scylladb/scylladb#16816 * github.com:scylladb/scylladb: raft: remove `empty()` from `fsm_output` test: add test for manual triggering of Raft snapshots api: add HTTP endpoint to trigger Raft snapshots raft: server: add `trigger_snapshot` API raft: server: track last persisted snapshot descriptor index raft: server: framework for handling server requests raft: server: inline `poll_fsm_output` raft: server: fix indentation raft: server: move `io_fiber`'s processing of `batch` to a separate function raft: move `poll_output()` from `fsm` to `server` raft: move `_sm_events` from `fsm` to `server` raft: fsm: remove constructor used only in tests raft: fsm: move trace message from `poll_output` to `has_output` raft: fsm: extract `has_output()` raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor` raft: server: pass `*_aborted` to `set_exception` call (cherry picked from commit `d202d32f81`) Backport notes: - `has_output()` has a smaller condition in the backported version (because the condition was smaller in `poll_output()`) - `process_fsm_output` has a smaller body (because `io_fiber` had a smaller body) in the backported version - the HTTP API is only started if `raft_group_registry` is started	2024-02-01 15:38:51 +01:00
Aleksandra Martyniuk	f85375ff99	api: ignore future in task_manager_json::wait_task Before returning task status, wait_task waits for it to finish with done() method and calls get() on a resulting future. If requested task fails, an exception will be thrown and user will get internal server error instead of failed task status. Result of done() method is ignored. Fixes: #14914. (cherry picked from commit `ae67f5d47e`) Closes #16438	2024-01-30 10:54:33 +02:00
Aleksandra Martyniuk	5def443cf0	tasks: keep task's children in list If std::vector is resized its iterators and references may get invalidated. While task_manager::task::impl::_children's iterators are avoided throughout the code, references to its elements are being used. Since children vector does not need random access to its elements, change its type to std::list<foreign_task_ptr>, which iterators and references aren't invalidated on element insertion. Fixes: #16380. Closes scylladb/scylladb#16381 (cherry picked from commit `9b9ea1193c`) Closes #16777	2024-01-15 15:38:00 +02:00
Tomasz Grabiec	bfd8401477	api, storage_service: Recalculate table digests on relocal_schema api call Currently, the API call recalculates only per-node schema version. To workaround issues like #4485 we want to recalculate per-table digests. One way to do that is to restart the node, but that's slow and has impact on availability. Use like this: curl -X POST http://127.0.0.1:10000/storage_service/relocal_schema Fixes #15380 Closes #15381 (cherry picked from commit `c27d212f4b`)	2023-11-21 01:29:28 +01:00
Botond Dénes	0f3e31975d	api/storage_service: start/stop native transport in the statement sg Currently, it is started/stopped in the streaming/maintenance sg, which is what the API itself runs in. Starting the native transport in the streaming sg, will lead to severely degraded performance, as the streaming sg has significantly less CPU/disk shares and reader concurrency semaphore resources. Furthermore, it will lead to multi-paged reads possibly switching between scheduling groups mid-way, triggering an internal error. To fix, use `with_scheduling_group()` for both starting and stopping native transport. Technically, it is only strictly necessary for starting, but I added it for stop as well for consistency. Also apply the same treatment to RPC (Thrift). Although no one uses it, best to fix it, just to be on the safe side. I think we need a more systematic approach for solving this once and for all, like passing the scheduling group to the protocol server and have it switch to it internally. This allows the server to always run on the correct scheduling group, not depending on the caller to remember using it. However, I think this is best done in a follow-up, to keep this critical patch small and easily backportable. Fixes: #15485 Closes scylladb/scylladb#16019 (cherry picked from commit `dfd7981fa7`)	2023-11-20 20:00:56 +02:00
Pavel Emelyanov	f76ba217e7	Merge 'api: failure_detector: invoke on shard 0' from Kamil Braun These APIs may return stale or simply incorrect data on shards other than 0. Newer versions of Scylla are better at maintaining cross-shard consistency, but we need a simple fix that can be easily and without risk be backported to older versions; this is the fix. Add a simple test to check that the `failure_detector/endpoints` API returns nonzero generation. Fixes: scylladb/scylladb#15816 Closes scylladb/scylladb#15970 * github.com:scylladb/scylladb: test: rest_api: test that generation is nonzero in `failure_detector/endpoints` api: failure_detector: fix indentation api: failure_detector: invoke on shard 0 (cherry picked from commit `9443253f3d`)	2023-11-07 15:12:12 +01:00
Botond Dénes	ca8723a6fd	Merge 'gossiper: add get_unreachable_members_synchronized and use over api' from Benny Halevy Modeled after get_live_members_synchronized, get_unreachable_members_synchronized calls replicate_live_endpoints_on_change to synchronize the state of unreachable_members on all shards. Fixes #12261 Fixes #15088 Also, add rest_api unit test for those apis Closes #15093 * github.com:scylladb/scylladb: test: rest_api: add test_gossiper gossiper: add get_unreachable_members_synchronized (cherry picked from commit `57deeb5d39`) Backport note: `gossiper::lock_endpoint_update_semaphore` helper function was missing, replaced with `get_units(g._endpoint_update_semaphore, 1)`	2023-09-27 15:09:32 +02:00
Kamil Braun	05f4640360	Merge 'api: gossiper: get alive nodes after reaching current shard 0 version' from Alecco Add an API call to wait for all shards to reach the current shard 0 gossiper version. Throws when timeout is reached. Closes #12540 * github.com:scylladb/scylladb: api: gossiper: fix alive nodes gms, service: lock live endpoint copy gms, service: live endpoint copy method (cherry picked from commit `b919373cce`)	2023-08-29 12:27:52 +02:00
Aleksandra Martyniuk	6b79c92cb7	api: get task statuses recursively Sometimes to debug some task manager module, we may want to inspect the whole tree of descendants of some task. To make it easier, an api call getting a list of statuses of the requested task and all its descendants in BFS order is added.	2023-01-11 12:34:06 +01:00
Aleksandra Martyniuk	dcb91457da	api: change retrieve_status signature Sometimes we may need task status to be nothrow move constructible. httpd::task_manager_json::task_status does not satisfy this requirement. retrieve_status returns future<full_task_status> instead of future<task_status> to provide an intermediate struct with better properties. An argument is passed by reference to prevent the necessity to copy foreign_ptr.	2023-01-05 13:28:51 +01:00
Aleksandra Martyniuk	ee13a5dde8	api: extend status in task manager api Status of tasks returned with get_task_status and wait_task is extended with the list of ids of child tasks.	2022-12-21 10:54:56 +01:00
Aleksandra Martyniuk	697af4ccf2	api: extend get_tasks in task manager api Each task stats in a list returned from tm::get_task api call is extended with info about: task type, keyspace, table, entity, and sequence number.	2022-12-21 10:54:50 +01:00
Raphael S. Carvalho	254c38c4d2	api: compaction_manager: Stop a compaction type for all groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:19 -03:00
Raphael S. Carvalho	4e836cb96c	api: Estimate pending tasks on all compaction groups Estimates # of compaction jobs to be performed on a table. Adaptation is done by adding estimation from all groups. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:17 -03:00
Raphael S. Carvalho	640436e72a	api: storage_service: Run maintenance compactions on all compaction groups Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:16:15 -03:00
Raphael S. Carvalho	ef8f542d75	replica: Adapt table::active_memtable() to compaction groups active_memtable() was fine to a single group, but with multiple groups, there will be one active memtable per group. Let's change the interface to reflect that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-12-19 11:15:14 -03:00
Aleksandra Martyniuk	f0b2b00a15	api: delete unused type parameter from task_manager_test api	2022-12-15 10:50:30 +01:00
Aleksandra Martyniuk	5bc09daa7a	tasks: repair: api: remove type attribute from task_manager::task::status	2022-12-15 10:49:09 +01:00
Benny Halevy	68141d0aac	topology: get rid of pending state Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-13 14:17:18 +02:00
Pavel Emelyanov	4c6bfc078d	code: Use http::re(quest\|ply) instead of httpd:: ones Recent seastar update deprecated those from httpd namespace. fixes: #12142 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12161	2022-12-01 17:33:35 +02:00
Benny Halevy	ec5707a4a8	api: storage_service: fixup indentation	2022-11-20 09:14:45 +02:00
Benny Halevy	cc63719782	api: storage_service: add run_on_existing_tables Gracefully skip tables that were removed in the background. Fixes #12007 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:14:29 +02:00
Benny Halevy	9ef9b9d1d9	api: storage_service: add parse_table_infos The table UUIDs are the same on all shards so we might as well get them on shard 0 (as we already do) and reuse them on other shards. It is more efficient and accurate to lookup the table eventually on the shard using its uuid rather than its name. If the table was dropped and recreated using the same name in the background, the new table will have a new uuid and do the api function does not apply to it anymore. A following change will handle the no_such_column_family cases. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:14:21 +02:00
Benny Halevy	9b4a9b2772	api: storage_service: log errors from compaction related handlers Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:03:25 +02:00
Benny Halevy	a47f96bc05	api: storage_service: coroutinize compaction related handlers Before we improve parsing tables lists and handling of no_such_column_family errors. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-20 09:03:25 +02:00
Benny Halevy	1a183047c0	utils: config_src: add set_value_on_all_shards functions Currently when we set a single value we need to call broadcast_to_all_shards to let observers on all shards get notified of the new value. However, the latter broadcasts all value to all shards so it's terribly inefficient. Instead, add async set_value_on_all_shards functions to broadcast a value to all shards. Use those in system_keyspace for db_config_table virtual table and in task_manager_test to update the task_manager ttl. Refs #7316 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 11:55:14 +02:00
Benny Halevy	fc278be6c4	table: add perform_cleanup_compaction Move the integration with compaction_manager from the api layer to the tabel class so it can also make sure the memtable is cleaned up in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:33 +02:00
Benny Halevy	85523c45c0	api: storage_service: add logging for compaction operations et al Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:31 +02:00
Botond Dénes	139fbb466e	Merge 'Task manager extension' from Aleksandra Martyniuk The PR adds changes to task manager that allow more convenient integration with modules. Introduced changes: - adds internal flag in task::impl that allows user to filter too specific tasks - renames `parent_data` to more appropriate name `task_info` - creates `tasks/types.hh` which allows using some types connected with task manager without the necessity to include whole task manager - adds more flexible version of `make_task` method Closes #11821 * github.com:scylladb/scylladb: tasks: add alternative make_task method tasks: rename parent_data to task_info and move it tasks: move task_id to tasks/types.hh tasks: add internal flag for task_manager::task::impl	2022-10-31 09:57:10 +02:00
Benny Halevy	335a8cc362	api: doc: remove_node: improve summary The current summary of the operation is obscure. It refers to a token in the ring and the endpoint associated with it, while the operation uses a host_id to identify a whole node. Instead, clarify the summary to refer to a node in the cluster, consistent with the description for the host_id parameter. Also, describe the effect the call has on the data the removed node logically owned. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:52:37 +03:00
Benny Halevy	9ef2631ec2	api, service: storage_service: removenode: allow passing ignore_nodes as uuid:s Currently the api is inconsistent: requiring a uuid for the host_id of the node to be removed, while the ignored nodes list is given as comma-separated ip addresses. Instead, support identifying the ignored_nodes either by their host_id (uuid) or ip address. Also, require all ignore_nodes to be of the same kind: either UUIDs or ip addresses, as a mix of the 2 is likely indicating a user error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:49:03 +03:00
Benny Halevy	340a5a0c94	api: storage_service: remove_node: validate host_id The node to be removed must be identified by its host_id. Validate that at the api layer and pass the parsed host_id down to storage_service::removenode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00
Aleksandra Martyniuk	10d11a7baf	tasks: rename parent_data to task_info and move it parent_data struct contains info that is common for each task, not only in parent-child relationship context. To use it this way without confusion, its name is changed to task_info. In order to be able to widely and comfortably use task_info, it is moved from tasks/task_manager.hh to tasks/types.hh and slightly extended.	2022-10-26 14:01:05 +02:00
Aleksandra Martyniuk	e2e8a286cc	tasks: add internal flag for task_manager::task::impl It is convenient to create many different tasks implementations representing more and more specific parts of the operation in a module. Presenting all of them through the api makes it cumbersome for user to navigate and track, though. Flag internal is added to task_manager::task::impl so that the tasks could be filtered before they are sent to user.	2022-10-26 14:01:05 +02:00
Pavel Emelyanov	1674882220	snitch: Add sharded<snitch_ptr> arg to reset_snitch() The method replaces snitch instance on the existing sharded<snitch_ptr> and the "existing" is nowadays the global instance. This patch changes it to use local reference passed from API code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:34 +03:00
Pavel Emelyanov	5fba0a7f65	api: Move update_snitch endpoint It's now living in storage_service.cc, but non-global snitch is available in endpoint_snitch.cc so move the endpoint handler there Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:20 +03:00
Pavel Emelyanov	0d49b0e24a	api: Use local snitch reference The snitch/name endpoint needs snitch instance to get the name from. Also the storage_service/reset_snitch endpoint will also need snitch instance to call reset on. This patch carries local snitch reference all thw way through API setup and patches the get_name() call. The reset_snitch() will come in the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:31:45 +03:00
Pavel Emelyanov	c175ea33e2	api: Unset snitch endpoints on stop Some time soon snitch API handlers will operate on local snitch reference capture, so those need to be unset before the target local variable variable goes away Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:31:12 +03:00
Pavel Emelyanov	583204972e	api: Don't report dc/rack for endpoints not in ring When an endpoint is not in ring the snitch/get_{rack\|datacenter} API still return back some value. The value is, in fact, the default one, because this is how snitch resolves it -- when it cannot find a node in gossiper and system keyspace it just returns defaults. When this happens the API should better return some error (bad param?) but there's a bug in nodetool -- when the 'status' command collects info about the ring it first collects the endpoints, then gets status for each. If between getting an endpoint and getting its status the endpoint disappears, the API would fail, but nodetool doesn't handle it. Next patches will make .get_rack/_dc calls use in-topology collections that don't fall-back to default values if the entry is not found in it, so prepare the API in advance to return back defaults. refs: #11706 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:12:47 +03:00
Avi Kivity	b0814bdd42	api: column_family: fix memtable off-heap memory reporting We report virtual memory used, but that's not a real accounting of the actual memory used. Use the correct real_memory_used() instead. Note that this isn't a recent regression and was probably broken forever. However nobody looks at this measure (and it's usually close to the correct value) so nobody noticed. Since it's so minor, I didn't bother filing an issue.	2022-10-04 13:56:29 +03:00
Avi Kivity	bc2fcf5187	dirty_memory_manager: unscramble terminology Before `95f31f37c1` ("Merge 'dirty_memory_manager: simplify region_group' from Avi Kivity"), we had two region_group objects, one _real_region_group and another _virtual_region_group, each with a set of "soft" and "hard" limits and related functions and members. In `95f31f37c1`, we merged _real_region_group into _virtual_region_group, but unfortunately the _real_region_group members received the "hard" prefix when they got merged. This overloads the meaning of "hard" - is it related to soft/hard limit or is it related to the real/virtual distinction? This patch applied some renaming to restore consistency. Anything that came from _virtual_region_group now has "virtual" in its name. Anything that came from _real_region_group now has "real" in its name. The terms are still pretty bad but at least they are consistent.	2022-10-04 13:56:28 +03:00
Botond Dénes	060dda8e00	Merge 'Reduce dependencies on large data handler header' from Benny Halevy Reduce the false dependencies on db/large_data_handler.hh by not including it from commonly used header files, and rather including it only in the source files that actually need it. The is in preparation for https://github.com/scylladb/scylladb/issues/11449 Closes #11654 * github.com:scylladb/scylladb: test: lib: do not include db/large_data_handler.hh in test_service.hh test: lib: move sstable test_env::impl ctor out of line sstables: do not include db/large_data_handler.hh in sstables.hh api/column_family: add include db/system_keyspace.hh	2022-09-30 13:27:38 +03:00
Benny Halevy	fb7e55b0a8	api/column_family: add include db/system_keyspace.hh For db::system_keyspace::load_view_build_progress that currently indirectly satisfied via sstables/sstables.hh -> db/large_data_handler.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-29 12:42:54 +03:00
Benny Halevy	d32c497cd9	database: automatically take snapshot of base table views The logic to reject explicit snapshot of views/indexes was improved in `aa127a2dbb`. However, we never implemented auto-snapshot of view/indexes when taking a snapshot of the base table. This is implemented in this patch. The implementation is built on top of `ba42852b0e` so it would be hard to backport to 5.1 or earlier releases. Fixes #11612 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 11:02:54 +03:00
Benny Halevy	55b0b8fe2c	api: storage_service: reject snapshot of views in api layer Rather than pushing the check to `snapshot_ctl::take_column_family_snapshot`, just check that explcitly when taking a snapshot of a particular table by name over the api. Other paths that call snapshot_ctl::take_column_family_snapshot are internal and use it to snap views already. With that, we can get rid of the allow_view_snapshots flag that was introduced in `aab4cd850c`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-09-26 10:44:56 +03:00
Piotr Sarna	481240b8b4	Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El We had quite a few tests for Alternator TTL in test/alternator, but most of them did not run as part of the usual Jenkins test suite, because they were considered "very slow" (and require a special "--runveryslow" flag to run). In this series we enable six tests which run quickly enough to run by default, without an additional flag. We also make them even quicker - the six tests now take around 2.5 seconds. I also noticed that we don't have a test for the Alternator TTL metrics - and added one. Fixes #11374. Refs https://github.com/scylladb/scylla-monitoring/issues/1783 Closes #11384 * github.com:scylladb/scylladb: test/alternator: insert test names into Scylla logs rest api: add a new /system/log operation alternator ttl: log warning if scan took too long. alternator,ttl: allow sub-second TTL scanning period, for tests test/alternator: skip fewer Alternator TTL tests test/alternator: test Alternator TTL metrics	2022-09-22 09:47:50 +02:00
Nadav Har'El	a81310e23d	rest api: add a new /system/log operation Add a new REST API operation, taking a log level and a message, and printing it into the Scylla log. This can be useful when a test wants to mark certain positions in the log (e.g., to see which other log messages we get between the two positions). An alternative way to achieve this could have been for the test to write directly into the log file - but an on-disk log file is only one of the logging options that Scylla support, and the approach in this patch allows to add log message regardless of how Scylla keeps the logs. In motivation of this feature is that in the following patch the test/alternator framework will add log messages when starting and ending tests, which can help debug test failures. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-09-12 10:32:56 +03:00
Aleksandra Martyniuk	ec86410094	task_manager: test api layer implementation The implementation of a test api that helps testing task manager api. It provides methods to simulate the operations that can happen on modules and theirs task. Through the api user can: register and unregister the test module and the tasks belonging to the module, and finish the tasks with success or custom error.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	42f36db55b	task_manager: test api layer The test api that helps testing task manager api. It can be used to simulate the operations that can happen on modules and theirs task. Through the api user can: register and unregister the test module and the tasks belonging to the module, and finish the tasks with success or custom error.	2022-09-09 14:29:28 +02:00
Aleksandra Martyniuk	c9637705a6	task_manager: api layer implementation The implementation of a task manager api layer. It provides methods to list the modules registered in task_manager, list tasks belonging to the given module, abort, wait for or retrieve a status of the given task.	2022-09-09 14:29:28 +02:00

1 2 3 4 5 ...

704 Commits