scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	6c1e5c248f	main,proxy: Drain proxy in its stop_remote Currently proxy initialization is pretty disperse, in particular it's stopped in several steps -- first drain_on_shutdown() then stop_remote(). In between there's nothing that needs proxy in any particular sate, so those two steps can be merged into one. refs: scylladb/scylladb#2737 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19344	2024-06-27 12:26:51 +02:00
Wojciech Mitros	f70f774e40	mv: gossip the same backlog if a different backlog was sent in a response Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog. This patch changes this by notifying the gossip that a the backlog changed since the last gossip round so a different backlog could have been send through the response piggyback mechanism. With that information, gossip will send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461	2024-06-06 10:45:15 +02:00
Wojciech Mitros	272e80fe0a	node_update_backlog: divide adding and fetching backlogs Currently, we only update the backlogs in node_update_backlog at the same time when we're fetching them. This is done using storage_proxy's method get_view_update_backlog, which is confusing because it's a getter with side-effects. Additionally, we don't always want to update the backlog when we're reading it (as in gossip which is only on shard 0) and we don't always want to read it when we're updating it (when we're not handling any writes but the backlog drops due to background work finish). This patch divides the node_view_backlog::add_fetch as well the storage_proxy::get_view_update_backlog both into two methods; one for updating and one for reading the backlog. This patch only replaces the places where we're currently using the view backlog getter, more situations where we should get/update the backlog should be considered in a following patch.	2024-06-06 10:45:13 +02:00
Piotr Dulikowski	68eca3778c	Merge 'mv: throttle view update generation for large queries' from Wojciech Mitros This series is a reupload of #13792 with a few modifications, namely a test is added and the conflicts with recent tablet related changes are fixed. See https://github.com/scylladb/scylladb/issues/12379 and https://github.com/scylladb/scylladb/pull/13583 for a detailed description of the problem and discussions. This PR aims to extend the existing throttling mechanism to work with requests that internally generate a large amount of view updates, as suggested by @nyh. The existing mechanism works in the following way: * Client sends a request, we generate the view updates corresponding to the request and spawn background tasks which will send these updates to remote nodes * Each background task consumes some units from the `view_update_concurrency_semaphore`, but doesn't wait for these units, it's just for tracking * We keep track of the percent of consumed units on each node, this is called `view update backlog`. * Before sending a response to the client we sleep for a short amount of time. The amount of time to sleep for is based on the fullness of this `view update backlog`. For a well behaved client with limited concurrency this will limit the amount of incoming requests to a manageable level. This mechanism doesn't handle large DELETE queries. Deleting a partition is fast for the base table, but it requires us to generate a view update for every single deleted row. The number of deleted rows per single client request can be in the millions. Delaying response to the request doesn't help when a single request can generate millions of updates. To deal with this we could treat the view update generator just like any other client and force it to wait a bit of time before sending the next batch of updates. The amount of time to wait for is calculated just like in the existing throttling code, it's based on the fullness of `view update backlogs`. The new algorithm of view update generation looks something like this: ```c++ for(;;) { auto updates = generate_updates_batch_with_max_100_rows(); co_await seastar::sleep(calculate_sleep_time_from_backlogs()); spawn_background_tasks_for_updates(updates); } ``` Fixes: https://github.com/scylladb/scylladb/issues/12379 Closes scylladb/scylladb#16819 * github.com:scylladb/scylladb: test: add test for bad_allocs during large mv queries mv: throttle view update generation for large queries exceptions: add read_write_timeout_exception, a subclass of request_timeout_exception db/view: extract view throttling delay calculation to a global function view_update_generator: add get_storage_proxy() storage_proxy: make view backlog getters public	2024-05-16 08:22:54 +02:00
Botond Dénes	155332ebf8	Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail. This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop. Closes scylladb/scylladb#18408 * github.com:scylladb/scylladb: Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB" view: Abort pending view updates when draining	2024-05-09 08:26:44 +03:00
Piotr Dulikowski	64ba620dc2	Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased. The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost. Refs scylladb/scylladb#6403 Fixes scylladb/scylladb#12278 Closes scylladb/scylladb#15567 * github.com:scylladb/scylladb: docs: Update Hinted Handoff documentation db/hints: Add endpoint_downtime_not_bigger_than() db/hints: Migrate hinted handoff when cluster feature is enabled db/hints: Handle arbitrary directories in resource manager db/hints: Start using hint_directory_manager db/hints: Enforce providing IP in get_ep_manager() db/hints: Introduce hint_directory_manager db/hints/resource_manager: Update function description db/hints: Coroutinize space_watchdog::scan_one_ep_dir() db/hints: Expose update lock of space watchdog db/hints: Add function for migrating hint directories to host ID db/hints: Take both IP and host ID when storing hints db/hints: Prepare initializing endpoint managers for migrating from IP to host ID db/hints: Migrate to locator::host_id db/hints: Remove noexcept in do_send_one_mutation() service: Add locator::host_id to on_leave_cluster service: Fix indentation db/hints: Fix indentation	2024-05-06 09:58:18 +02:00
Benny Halevy	890b890e36	storage_proxy: add mutate_locally(vector<frozen_mutation_and_schema>) method Generalizing the ad-hoc implementation out of group0_state_machine.write_mutations_to_database. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-02 19:42:58 +03:00
Jan Ciolek	4c5cfc7683	storage_proxy: make view backlog getters public Storage proxy maintains information about both local and remote view update backlogs. This information might also be useful outside of storage_proxy, so let's expose the functions that allow to acces backlog information. There aren't any implementation quirks that would make it unsafe to make the functions public, the worst that can happen is that someone causes a lot of atomic operations by repeatedly calling get_view_update_backlog(). Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2024-05-02 10:59:55 +02:00
Pavel Emelyanov	d47053266b	view: Abort pending view updates when draining When view builder is drained (it now happens very early, but next patch moves this into regular drain) it waits for all on-going view build steps to complete. This includes waiting for any outstanding proxy view writes to complete as well. View writes in proxy have very high timeout of 5 minutes but they are cancellable. However, canecelling of such writes happens in proxy's drain_on_shutdown() call which, in turn, happens pretty late on shutdown. Effectively, by the time it happens all view writes mush have completed already, so stop-time cancelling doesn't really work nowadays. Next patch makes view builder drain happen a bit later during shutdown, namely -- _after_ shutting down messaging service. When it happen that late, non-working view writes cancellation becomes critical, as view builder drain hangs for aforementioned 5 minutes. This patch explicitly cancels all view writes when view builder stops. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-02 08:16:12 +03:00
Pavel Emelyanov	5d992a4f01	proxy: Remove declaration of nonexisting view_update_write_response_handler class Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18417	2024-05-01 10:15:41 +03:00
Dawid Medrek	cfd03fe273	db/hints: Migrate to locator::host_id We change the type of node identifiers used within the module and fix compilation. Directories storing hints to specific nodes are now represented by host IDs instead of IPs.	2024-04-26 22:44:04 +02:00
Dawid Medrek	54ae9797b9	service: Add locator::host_id to on_leave_cluster We extend the function endpoint_lifecycle_subscriber::on_leave_cluster by another argument -- locator::host_id. It's more convenient to have a consistent pair of IP and host ID.	2024-04-26 22:44:03 +02:00
Dawid Medrek	a36387d942	service: Fix indentation	2024-04-26 22:44:03 +02:00
Kefu Chai	c323c93fa4	treewide: remove {dclocal_,}read_repair_chance options dclocal_read_repair_chance and read_repair_chance have been removed in Cassandra 3.11 and 4.x, see https://issues.apache.org/jira/browse/CASSANDRA-13910. if we expose the properties via DDL, Cassandra would fails to consume the CQL statement to creating the table when performing migration from Scylla to Cassandra 4.x, as the latter does not understand these properties anymore. currently the default values of `dc_local_read_repair_chance` and `read_repair_chance` are both "0". so this is practically disabled, unless user deliberately set them to a value greater than 0. also, as a side effect, Cassandra 4.x has better support of Python3. the cqlsh shipped along with Cassandra 3.11.16 only supports python2.7, see https://github.com/apache/cassandra/blob/cassandra-3.11.16/bin/cqlsh.py it errors out if the system only provides python3 with the error of ``` No appropriate python interpreter found. ``` but modern linux systems do not provide python2 anymore. so, in this change, we deprecate these two options. Fixes #3502 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-25 17:15:27 +08:00
Avi Kivity	4ddf82e58b	treewide: don't #include "gms/feature_service.hh" from other headers feature_service.hh is a high-level header that integrates much of the system functionality, so including it in lower-level headers causes unnecessary rebuilds. Specifically, when retiring features. Fix by removing feature_service.hh from headers, and supply forward declarations and includes in .cc where needed. Closes scylladb/scylladb#18005	2024-03-26 15:31:18 +02:00
Pavel Emelyanov	7c5c89ba8d	Revert "Merge 'Use utils::directories instead of db::config to get dirs' from Patryk Wróbel" This reverts commit `370fbd346c`, reversing changes made to `0912d2a2c6`. This makes scylla-manager mis-interpret the data_file_directories somehow, issue #17078	2024-01-31 15:08:14 +03:00
Kefu Chai	b931d93668	treewide: fix misspellings in code comments these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17004	2024-01-31 09:16:10 +02:00
Patryk Wrobel	f08768e767	service/storage_proxy: use utils::directories to get paths of dirs This change replaces usage of db::config with usage of utils::directories to get paths of directories in service/storage_proxy. Refs: scylladb#5626 Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-29 13:11:33 +01:00
Gleb Natapov	5b246920ae	storage_proxy: allow to wait for all ongoing writes We want to be able to wait for all writes started through the storage proxy before a fence is advanced. Add phased_barrier that is entered on each local write operation before checking the fence to do so. A write will be either tracked by the phased_barrier or fenced. This will be needed to wait for all non fenced local writes to complete before starting a cleanup.	2024-01-14 14:44:07 +02:00
Patryk Jędrzejczak	f1dea4bc8a	storage_proxy: do not fence reads and writes to local tables Fencing is necessary only for reads and writes to non-local tables. Moreover, fencing a read or write to a local table can cause an error on the bootstrapping node. It is explained in the comment in storage_proxy::get_fence. A scenario described in the comment has been reported in scylladb/scylladb#16423. A write to the local RAFT table failed because of fencing, and it killed server_impl::io_fiber. Fixes scylladb/scylladb#16423 Closes scylladb/scylladb#16525	2023-12-28 19:34:27 +02:00
Benny Halevy	a529097d96	storage_proxy: use locator::topology rather than fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 10:44:13 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Marcin Maliszkiewicz	020a9c931b	db: view: run local materialized view mutations on a separate smp service group When base write triggers mv write and it needs to be send to another shard it used the same service group and we could end up with a deadlock. This fix affects also alternator's secondary indexes. Testing was done using (yet) not committed framework for easy alternator performance testing: https://github.com/scylladb/scylladb/pull/13121. I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and then ran: ./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \ --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \ --duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000 Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb: p seastar::get_smp_service_groups_semaphore(2,0)._count $1 = 0 With the patch I wasn't able to observe the problem, even with 2x concurrency. I was able to make the process hang with 10x concurrency but I think it's hitting different limit as there wasn't any depleted smp service group semaphore and it was happening also on non mv loads. Fixes https://github.com/scylladb/scylladb/issues/15844 Closes scylladb/scylladb#15845	2023-10-29 18:30:32 +02:00
Pavel Emelyanov	53891dd9cc	api,hints: Move gossiper access to proxy API handlers should try to avoid using any service other than the "main" one. For hints API this service is going to be proxy, so no gossiper access in the handler itself. (indentation is left broken) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 16:14:26 +03:00
Benny Halevy	2c54d7a35a	view, storage_proxy: carry effective_replication_map along with endpoints When sending mutation to remote endpoint, the selected endpoints must be in sync with the current effective_replication_map. Currently, the endpoints are sent down the storage_proxy stack, and later on an effective_replication_map is retrieved again, and it might not match the target or pending endpoints, similar to the case seen in https://github.com/scylladb/scylladb/issues/15138 The correct way is to carry the same effective replication map used to select said endpoints and pass it down the stack. See also https://github.com/scylladb/scylladb/pull/15141 Fixes scylladb/scylladb#15144 Fixes scylladb/scylladb#14730 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15142	2023-08-29 09:08:42 +03:00
Botond Dénes	47ce69e9bf	Merge 'paxos_response_handler: carry effective replication map' from Benny Halevy As `create_write_response_handler` on this path accepts an `inet_address_vector_replica_set` that corresponds to the effective_replication_map_ptr in the paxos_response_handler, but currently, the function retrieves a new effective_replication_map_ptr that may not hold all the said endpoints. Fixes scylladb/scylladb#15138 Closes #15141 * github.com:scylladb/scylladb: storage_proxy: create_write_response_handler: carry effective_replication_map_ptr from paxos_response_handler storage_proxy: send_to_live_endpoints: throw on_internal_error if node not found	2023-08-28 11:42:38 +03:00
Benny Halevy	4a2e367e92	storage_proxy: create_write_response_handler: carry effective_replication_map_ptr from paxos_response_handler As `create_write_response_handler` on this path accepts an `inet_address_vector_replica_set` that corresponds to the effective_replication_map_ptr in the paxos_response_handler, but currently, the function retrieves a new effective_replication_map_ptr that may not hold all the said endpoints. Fixes scylladb/scylladb#15138 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 11:45:13 +03:00
Benny Halevy	098dd5021a	storage_proxy: mutate_atomically_result: keep schema of batchlog mutation in context The batchlog mutation is for system.batchlog. Rather than looking the schema up in multiple places do that once and keep it in the context object. It will be used in the next patch to get a respective effective_replication_map_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-24 10:43:23 +03:00
Benny Halevy	3c122a87b5	storage_proxy: query_partition_key_range_concurrent: turn tail recursion to iteration Update the function state and loop for the next ranges instead of nesting it oneself. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-07-31 09:43:33 +03:00
Botond Dénes	53da97416a	Merge 'Remove qctx from system.paxos table access methods' from Pavel Emelyanov The "fix" is straightforward -- callers of system_keyspace::paxos methods need to get system keyspace from somewhere. This time the only caller is storage_proxy::remote that can have system keyspace via direct dependency reference. Closes #14758 * github.com:scylladb/scylladb: db/system_keyspace: Move and use qctx::execute_cql_with_timeout() db/system_keyspace: Make paxos methods non-static service/paxos: Add db::system_keyspace& argument to some methods test: Optionally initialize proxy remote for cql_test_env proxy/remote: Keep sharded<db::system_keyspace>& dependency	2023-07-20 16:53:25 +03:00
Pavel Emelyanov	b0b91bf5ec	proxy/remote: Keep sharded<db::system_keyspace>& dependency This dependency will be needed to call service::paxos_state:: calls and all of them are done in storage_proxy::remote() methods only Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-07-19 17:36:42 +03:00
Avi Kivity	460b28d067	Merge 'Introduce `SELECT MUTATION FRAGMENTS` statement' from Botond Dénes SELECT MUTATION FRAGMENTS is a new select statement sub-type, which allows dumping the underling mutations making up the data of a given table. The output of this statement is mutation-fragments presented as CQL rows. Each row corresponds to a mutation-fragment. Subsequently, the output of this statement has a schema that is different than that of the underlying table. The output schema is derived from the table's schema, as following: * The table's partition key is copied over as-is * The clustering key is formed from the following columns: - mutation_source (text): the kind of the mutation source, one of: memtable, row-cache or sstable; and the identifier of the individual mutation source. - partition_region (int): represents the enum with the same name. - the copy of the table's clustering columns - position_weight (int): -1, 0 or 1, has the same meaning as that in position_in_partition, used to disambiguate range tombstone changes with the same clustering key, from rows and from each other. * The following regular columns: - metadata (text): the JSON representation of the mutation-fragment's metadata. - value (text): the JSON representation of the mutation-fragment's value. Data is always read from the local replica, on which the query is executed. Migrating queries between coordinators is frobidden. More details in the documentation commit (last commit). Example: ```cql cqlsh> CREATE TABLE ks.tbl (pk int, ck int, v int, PRIMARY KEY (pk, ck)); cqlsh> DELETE FROM ks.tbl WHERE pk = 0; cqlsh> DELETE FROM ks.tbl WHERE pk = 0 AND ck > 0 AND ck < 2; cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 0, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 1, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (0, 2, 0); cqlsh> INSERT INTO ks.tbl (pk, ck, v) VALUES (1, 0, 0); cqlsh> SELECT * FROM ks.tbl; pk \| ck \| v ----+----+--- 1 \| 0 \| 0 0 \| 0 \| 0 0 \| 1 \| 0 0 \| 2 \| 0 (4 rows) cqlsh> SELECT * FROM MUTATION_FRAGMENTS(ks.tbl); pk \| mutation_source \| partition_region \| ck \| position_weight \| metadata \| mutation_fragment_kind \| value ----+-----------------+------------------+----+-----------------+--------------------------------------------------------------------------------------------------------------------------+------------------------+----------- 1 \| memtable:0 \| 0 \| \| \| {"tombstone":{}} \| partition start \| null 1 \| memtable:0 \| 2 \| 0 \| 0 \| {"marker":{"timestamp":1688122873341627},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122873341627}}} \| clustering row \| {"v":"0"} 1 \| memtable:0 \| 3 \| \| \| null \| partition end \| null 0 \| memtable:0 \| 0 \| \| \| {"tombstone":{"timestamp":1688122848686316,"deletion_time":"2023-06-30 11:00:48z"}} \| partition start \| null 0 \| memtable:0 \| 2 \| 0 \| 0 \| {"marker":{"timestamp":1688122860037077},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122860037077}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 2 \| 0 \| 1 \| {"tombstone":{"timestamp":1688122853571709,"deletion_time":"2023-06-30 11:00:53z"}} \| range tombstone change \| null 0 \| memtable:0 \| 2 \| 1 \| 0 \| {"marker":{"timestamp":1688122864641920},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122864641920}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 2 \| 2 \| -1 \| {"tombstone":{}} \| range tombstone change \| null 0 \| memtable:0 \| 2 \| 2 \| 0 \| {"marker":{"timestamp":1688122868706989},"columns":{"v":{"is_live":true,"type":"regular","timestamp":1688122868706989}}} \| clustering row \| {"v":"0"} 0 \| memtable:0 \| 3 \| \| \| null \| partition end \| null (10 rows) ``` Perf simple query: ``` /build/release/scylla perf-simple-query -c1 -m2G --duration=60 ``` Before: ``` median 141596.39 tps ( 62.1 allocs/op, 13.1 tasks/op, 43688 insns/op, 0 errors) median absolute deviation: 137.15 maximum: 142173.32 minimum: 140492.37 ``` After: ``` median 141889.95 tps ( 62.1 allocs/op, 13.1 tasks/op, 43692 insns/op, 0 errors) median absolute deviation: 167.04 maximum: 142380.26 minimum: 141025.51 ``` Fixes: https://github.com/scylladb/scylladb/issues/11130 Closes #14347 * github.com:scylladb/scylladb: docs/operating-scylla/admin-tools: add documentation for the SELECT * FROM MUTATION_FRAGMENTS() statement test/topology_custom: add test_select_from_mutation_fragments.py test/boost/database_test: add test for mutation_dump/generate_output_schema_from_underlying_schema test/cql-pytest: add test_select_mutation_fragments.py test/cql-pytest: move scylla_data_dir fixture to conftest.py cql3/statements: wire-in mutation_fragments_select_statement cql3/restrictions/statement_restrictions: fix indentation cql3/restrictions/statement_restrictions: add check_indexes flag cql3/statments/select_statement: add mutation_fragments_select_statement cql3: add SELECT MUTATION FRAGMENTS select statement sub-type service/pager: allow passing a query functor override service/storage_proxy: un-embed coordinator_query_options replica: add mutation_dump replica: extract query_state into own header replica/table: add make_nonpopulating_cache_reader() replica/table: add select_memtables_as_mutation_sources() tools,mutation: extract the low-level json utilities into mutation/json.hh tools/json_writer: fold SstableKey() overloads into callers tools/json_writer: allow writing metadata and value separately tools/json_writer: split mutation_fragment_json_writer in two classes tools/json_writer: allow passing custom std::ostream to json_writer	2023-07-19 11:54:11 +03:00
Botond Dénes	2174276bb7	service/storage_proxy: un-embed coordinator_query_options So it can be forward declared. Add an embedded alias to reduce churn. Requires similarly un-embedding clock_type.	2023-07-19 01:28:28 -04:00
Kefu Chai	bab16eb30e	treewide: remove #includes not use directly for faster build times and clear inter-module dependencies, we should not #includes headers not directly used. instead, we should only #include the headers directly used by a certain compilation unit. in this change, the source files under "/compaction" directories are checked using clangd, which identifies the cases where we have an #include which is not directly used. all the #includes identified by clangd are removed. because some source files rely on the incorrectly included header file, those ones are updated to #include the header file they directly use. if a forward declaration suffice, the declaration is added instead. see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 17:36:31 +08:00
Gleb Natapov	94fcba5662	storage_proxy: remove unused variable	2023-06-22 15:26:20 +03:00
Tomasz Grabiec	10e05eec66	storage_proxy: Obtain shard from erm in the read path dht::shard_of() does not use the correct sharder for tablet-based tables. Code which is supposed to work with all kinds of tables should use erm::get_sharder().	2023-06-21 00:58:24 +02:00
Petr Gusev	d34da12240	storage_proxy: add fencing_token and related infrastructure A new stale_topology_exception was introduced, it's raised in apply_fence when an RPC comes with a stale fencing_token. An overload of apply_fence with future will be used to wrap the storage_proxy methods which need to be fenced.	2023-06-15 15:48:00 +04:00
Kamil Braun	a740fbf58a	storage_proxy: rename `init_messaging_service` to `start_remote` The function now has more responsibilities than before, rename it and add a comment to better illustrate this.	2023-06-14 11:41:36 +02:00
Kamil Braun	f26e98c3be	storage_proxy: don't pass `gossiper&` and `messaging_service&` during initialization These services are now passed during `init_messaging_service`, and that's when the `remote` object is constructed. The `remote` object is then destroyed in `uninit_messaging_service`. Also, `migration_manager*` became `migration_manager&` in `init_messaging_service`.	2023-06-14 11:41:36 +02:00
Kamil Braun	10f11b89ea	storage_proxy: prepare for missing `remote` Prepare the users of `remote` for the possibility that it's gone. The `remote()` accessor throws an error if it's gone. Observe that `remote()` is only used in places where it's verified that we really want to send a message to a remote node, with a small exception: `truncate_blocking`, which truncates locally by sending an RPC to ourselves (and truncate always sends RPC to the whole cluster; we might want to change this behavior in the future, see #11087). Other places are easy to check (it's either implementations of `apply_remotely` which is only called for remote nodes, or there's an `if` that checks we don't apply the operation to ourselves). There is one direct access to `_remote` which checks first if `_remote` is available: `storage_proxy::is_alive`. If `_remote` is unavailable, we consider nodes other than us dead. Indeed, if `gossiper` is unavailable, we didn't have a chance to gossip with other nodes and mark them alive.	2023-06-14 11:41:36 +02:00
Kamil Braun	ddcbade919	storage_proxy: don't access `remote` when calculating target replicas for local queries We only want to access `remote` when it's necessary - when we're performing a query that involves remote nodes. We want to support local queries when `remote` (in particular, `gossiper&`) is unavailable. Add a helper, `storage_proxy::filter_replicas_for_read`, which will check if it's a local query and return early in that case without accessing `remote`.	2023-06-14 11:41:34 +02:00
Kamil Braun	ff8d88a228	storage_proxy: introduce const version of `remote()` One version is implemented using the other (with `const_cast`) because some additional safety checks will be added in later commit.	2023-06-13 12:44:03 +02:00
Kamil Braun	0ef35ceed4	service: storage_proxy: make hint write handlers cancellable Whether a write handler should be cancellable is now controlled by a parameter passed to `create_write_response_handler`. We plumb it down from `send_to_endpoint` which is called by hints manager. This will cause hint write handlers to immediately timeout when we shutdown or when a destination node is marked as dead. Fixes #8079	2023-05-29 11:03:18 +02:00
Kamil Braun	eddb7406b4	service: storage_proxy: rename `view_update_handlers_list` The list will be used for non-view-update write handlers as well, so generalize the name. Also generalize some variable names used in the implementation. This commit only renames things + some comments were added, there are no logical changes.	2023-05-29 10:59:50 +02:00
Kamil Braun	c7ef9a12ee	service: storage_proxy: make it possible to cancel all write handler types The `view_update_write_response_handler` class, which is a subclass of `abstract_write_response_handler`, was created for a single purpose: to make it possible to cancel a handler for a view update write, which means we stop waiting for a response to the write, timing out the handler immediately. This was done to solve issue with node shutdown hanging because it was waiting for a view update to finish; view updates were configured with 5 minute timeout. See #3966, #4028. Now we're having a similar problem with hint updates causing shutdown to hang in tests (#8079). `view_update_write_response_handler` implements cancelling by adding itself to an intrusive list which we then iterate over to timeout each handler when we shutdown or when gossiper notifies `storage_proxy` that a node is down. To make it possible to reuse this algorithm for other handlers, move the functionality into `abstract_write_response_handler`. We inherit from `bi::list_base_hook` so it introduces small memory overhead to each write handler (2 pointers) which was only present for view update handlers before. But those handlers are already quite large, the overhead is small compared to their size. Not all handlers are added to the cancelling list, this is controlled by the `cancellable` parameter passed to the constructor. For now we're only cancelling view handlers as before. In following commits we'll also cancel hint handlers.	2023-05-29 10:42:57 +02:00
Petr Gusev	052b91fb1f	storage_proxy: rename get_live_sorted_endpoints->get_endpoints_for_reading We are going to use remapped_endpoints_for_reading, we need to make sure we use it in the right place. The get_live_sorted_endpoints function looks like what we need - it's used in all read code paths. From its name, however, this was not obvious. Also, we add the parameter ks_name as we'll need it to pass to remapped_endpoints_for_reading.	2023-05-09 18:42:03 +04:00
Pavel Emelyanov	739455c3aa	code: Remove global proxy No code needs global proxy anymore. Keep on-stack values in main and cql_test_env and keep the pointer on debug:: namespace. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-21 14:18:59 +03:00
Pavel Emelyanov	ab8fc0e166	proxy: Carry replication map with repair mutation(s) The create_write_response_handler() for read repair needs the e.r.m. from the caller, because it effectively accepts list of endpoints from it. So this patch equips all read_repair_mutation-s with the e.r.m. pointer so that the handler creation can use it. It's the same for all mutations, so it's a waste of space, but it's not bad -- there's typically few mutations in this range and the entry passed there is temporary, so even lots of them won't occupy lots of memory for long. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-14 14:03:39 +03:00
Pavel Emelyanov	140f373e15	proxy: Wrap read repair entries into read_repair_mutation The schedule_repair() operates on a map of endpoint:mutations pairs. Next patch will need to extend this entry and it's going to be easier if the entry is wrapped in a helper structure in advance. This is where the forwardable reference cursor from the previous patch gets its user. The schedule_repair() produces a range of rvalue wrappers, but the create_write_response_handler accepting it is OK, it copies mutations anyway. The printing operator is added to facilitate mutations logging from mutate_internal() method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-14 14:01:12 +03:00
Michał Jadwiszczak	360dbf98f1	storage_proxy: add `describe_ring()` method In order to execute `DESC CLUSTER`, there has to be a way to describe ring. `storage_service` is not available at query execution. This patch adds `describe_ring()` as a method of `storage_proxy()` (using helper function from `locator/util.hh`).	2022-12-10 12:51:05 +01:00

1 2 3 4 5 ...

428 Commits