scylladb

Author	SHA1	Message	Date
Amnon Heiman	72414b613b	Split the timed_rate_moving_average into data and timer This patch split the timed_rate_moving_average functionality into two, a data class: rates_moving_average, and a wrapper class timed_rate_moving_average that uses a timer to update the rates periodically. To make the transition as simple as possible timed_rate_moving_average, takes the original API. A new helper class meter_timer was introduced to handle the timer update functionality. This change required minimal code adaptation in some other parts of the code. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2022-07-26 15:59:33 +03:00
Benny Halevy	dc93564247	storage_proxy: abstract_read_resolver: swallow gate_closed exception Like other errors triggered on shutdown, this one is triggered by #8995. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11029	2022-07-14 09:26:34 +03:00
Avi Kivity	957bf48eb2	Merge 'Don't throw exceptions on the replica side when handling single partition reads and writes' from Piotr Dulikowski This PR gets rid of exception throws/rethrows on the replica side for writes and single-partition reads. This goal is achieved without using `boost::outcome` but rather by replacing the parts of the code which throw with appropriate seastar idioms and by introducing two helper functions: 1.`try_catch` allows to inspect the type and value behind an `std::exception_ptr`. When libstdc++ is used, this function does not need to throw the exception and avoids the very costly unwind process. This based on the "How to catch an exception_ptr without even try-ing" proposal mentioned in https://github.com/scylladb/scylla/issues/10260. This function allows to replace the current `try..catch` chains which inspect the exception type and account it in the metrics. Example: ```c++ // Before try { std::rethrow_exception(eptr); } catch (std::runtime_exception& ex) { // 1 } catch (...) { // 2 } // After if (auto* ex = try_catch<std::runtime_exception>(eptr)) { // 1 } else { // 2 } ``` 2. `make_nested_exception_ptr` which is meant to be a replacement for `std::throw_with_nested`. Unlike the original function, it does not require an exception being currently thrown and does not throw itself - instead, it takes the nested exception as an `std::exception_ptr` and produces another `std::exception_ptr` itself. Apart from the above, seastar idioms such as `make_exception_future`, `co_await as_future`, `co_return coroutine::exception()` are used to propagate exceptions without throwing. This brings the number of exception throws to zero for single partition reads and writes (tested with scylla-bench, --mode=read and --mode=write). Results from `perf_simple_query`: ``` Before (`719724e4df`): Writes: Normal: 127841.40 tps ( 56.2 allocs/op, 13.2 tasks/op, 50042 insns/op, 0 errors) Timeouts: 94770.81 tps ( 53.1 allocs/op, 5.1 tasks/op, 78678 insns/op, 1000000 errors) Reads: Normal: 138902.31 tps ( 65.1 allocs/op, 12.1 tasks/op, 43106 insns/op, 0 errors) Timeouts: 62447.01 tps ( 49.7 allocs/op, 12.1 tasks/op, 135984 insns/op, 936846 errors) After (d8ac4c02bfb7786dc9ed30d2db3b99df09bf448f): Writes: Normal: 127359.12 tps ( 56.2 allocs/op, 13.2 tasks/op, 49782 insns/op, 0 errors) Timeouts: 163068.38 tps ( 52.1 allocs/op, 5.1 tasks/op, 40615 insns/op, 1000000 errors) Reads: Normal: 151221.15 tps ( 65.1 allocs/op, 12.1 tasks/op, 43028 insns/op, 0 errors) Timeouts: 192094.11 tps ( 41.2 allocs/op, 12.1 tasks/op, 33403 insns/op, 960604 errors) ``` Closes #10368 * github.com:scylladb/scylla: database: avoid rethrows when handling exceptions from commitlog database: convert throw_commitlog_add_error to use make_nested_exception_ptr utils: add make_nested_exception_ptr storage_proxy: don't rethrow when inspecting replica exceptions on write path database: don't rethrow rate_limit_exception storage_proxy: don't rethrow the exception in abstract_read_resolver::error utils/exceptions.cc: don't rethrow in is_timeout_exception utils/exceptions: add try_catch utils: add abi/eh_ia64.hh storage_proxy: don't rethrow exceptions from replicas when accounting read stats message: get rid of throws in send_message{,_timeout,_abortable} database/{query,query_mutations}: don't rethrow read semaphore exceptions	2022-07-11 14:01:41 +03:00
Benny Halevy	acae3cc223	treewide: stop use of deprecated coroutine::make_exception Convert most use sites from `co_return coroutine::make_exception` to `co_await coroutine::return_exception{,_ptr}` where possible. In cases this is done in a catch clause, convert to `co_return coroutine::exception`, generating an exception_ptr if needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10972	2022-07-07 15:02:16 +03:00
Piotr Dulikowski	2008db58c4	storage_proxy: don't rethrow when inspecting replica exceptions on write path Now, storage_proxy::send_to_live_endpoints doesn't rethrow exceptions received from the replica logic when inspecting them.	2022-07-05 16:41:09 +02:00
Piotr Dulikowski	ffb95c4840	storage_proxy: don't rethrow the exception in abstract_read_resolver::error Now, the abstract_read_resolver::error uses the utils::try_catch utility to analyse the error received from replica instead of rethrowing it.	2022-07-05 16:41:09 +02:00
Piotr Dulikowski	491cc2a8df	storage_proxy: don't rethrow exceptions from replicas when accounting read stats Now, make_{data,mutation_data,digest}_requests don't rethrow the exception received from replicas when increasing the error count metric.	2022-07-04 19:27:06 +02:00
Avi Kivity	dab56b82fa	Merge 'Per-partition rate limiting' from Piotr Dulikowski Due to its sharded and token-based architecture, Scylla works best when the user workload is more or less uniformly balanced across all nodes and shards. However, a common case when this assumption is broken is the "hot partition" - suddenly, a single partition starts getting a lot more reads and writes in comparison to other partitions. Because the shards owning the partition have only a fraction of the total cluster capacity, this quickly causes latency problems for other partitions within the same shard and vnode. This PR introduces per-partition rate limiting feature. Now, users can choose to apply per-partition limits to their tables of choice using a schema extension: ``` ALTER TABLE ks.tbl WITH per_partition_rate_limit = { 'max_writes_per_second': 100, 'max_reads_per_second': 200 }; ``` Reads and writes which are detected to go over that quota are rejected to the client using a new RATE_LIMIT_ERROR CQL error code - existing error codes didn't really fit well with the rate limit error, so a new error code is added. This code is implemented as a part of a CQL protocol extension and returned to clients only if they requested the extension - if not, the existing CONFIG_ERROR will be used instead. Limits are tracked and enforced on the replica side. If a write fails with some replicas reporting rate limit being reached, the rate limit error is propagated to the client. Additionally, the following optimization is implemented: if the coordinator shard/node is also a replica, we account the operation into the rate limit early and return an error in case of exceeding the rate limit before sending any messages to other replicas at all. The PR covers regular, non-batch writes and single-partition reads. LWT and counters are not covered here. Results of `perf_simple_query --smp=1 --operations-per-shard=1000000`: - Write mode: ``` `8f690fdd47` (PR base): 129644.11 tps ( 56.2 allocs/op, 13.2 tasks/op, 49785 insns/op) This PR: 125564.01 tps ( 56.2 allocs/op, 13.2 tasks/op, 49825 insns/op) ``` - Read mode: ``` `8f690fdd47` (PR base): 150026.63 tps ( 63.1 allocs/op, 12.1 tasks/op, 42806 insns/op) This PR: 151043.00 tps ( 63.1 allocs/op, 12.1 tasks/op, 43075 insns/op) ``` Manual upgrade test: - Start 3 nodes, 4 shards each, Scylla version `8f690fdd47` - Create a keyspace with scylla-bench, RF=3 - Start reading and writing with scylla-bench with CL=QUORUM - Manually upgrade nodes one by one to the version from this PR - Upgrade succeeded, apart from a small number of operations which failed when each node was being put down all reads/writes succeeded - Successfully altered the scylla-bench table to have a read and write limit and those limits were enforced as expected Fixes: #4703 Closes #9810 * github.com:scylladb/scylla: storage_proxy: metrics for per-partition rate limiting of reads storage_proxy: metrics for per-partition rate limiting of writes database: add stats for per partition rate limiting tests: add per_partition_rate_limit_test config: add add_per_partition_rate_limit_extension function for testing cf_prop_defs: guard per-partition rate limit with a feature query-request: add allow_limit flag storage_proxy: add allow rate limit flag to get_read_executor storage_proxy: resultize return type of get_read_executor storage_proxy: add per partition rate limit info to read RPC storage_proxy: add per partition rate limit info to query_result_local(_digest) storage_proxy: add allow rate limit flag to mutate/mutate_result storage_proxy: add allow rate limit flag to mutate_internal storage_proxy: add allow rate limit flag to mutate_begin storage_proxy: choose the right per partition rate limit info in write handler storage_proxy: resultize return types of write handler creation path storage_proxy: add per partition rate limit to mutation_holders storage_proxy: add per partition rate limit info to write RPC storage_proxy: add per partition rate limit info to mutate_locally database: apply per-partition rate limiting for reads/writes database: move and rename: classify_query -> classify_request schema: add per_partition_rate_limit schema extension db: add rate_limiter storage_proxy: propagate rate_limit_exception through read RPC gms: add TYPED_ERRORS_IN_READ_RPC cluster feature storage_proxy: pass rate_limit_exception through write RPC replica: add rate_limit_exception and a simple serialization framework docs: design doc for per-partition rate limiting transport: add rate_limit_error	2022-06-24 01:32:13 +03:00
Piotr Dulikowski	442901f14a	storage_proxy: metrics for per-partition rate limiting of reads Adds a metric "read_rate_limited" which indicates how many times a read operation was rejected due to per-partition rate limiting. The metric differentiates between reads rejected by the coordinator and reads rejected by replicas.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	6e5d486970	storage_proxy: metrics for per-partition rate limiting of writes Adds a metric "write_rate_limited" which indicates how many times a write operation was rejected due to per-partition rate limiting. The metric differentiates between writes rejected by the coordinator and writes rejected by replicas.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	a7ad70600d	query-request: add allow_limit flag Adds allow_limit flag to the read_command. The flag decides whether rate limiting of this operation is allowed.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	c691e94190	storage_proxy: add allow rate limit flag to get_read_executor Adds a flag to get_read_executor which decides whether the read should be rate limited or not. The read executors were modified to choose the appropriate per partition rate limit info parameter and send it to the replicas.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	3357066387	storage_proxy: resultize return type of get_read_executor Now, get_read_executor is able to return coordinator exceptions without throwing them. In an upcoming commit, it will start returning rate limit exception in some cases and it is preferable to return them without throwing.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	d3d9add219	storage_proxy: add per partition rate limit info to read RPC Now, the read RPC accept the per partition rate limit info parameter. It is passed on to query_result_local(_digest) methods.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	e8e8ada4b4	storage_proxy: add per partition rate limit info to query_result_local(_digest) The query_result_local and query_result_local_digest methods were updated to accept db::per_partition_rate_limit::info structure and pass it on to database::accept.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	e6beab3106	storage_proxy: add allow rate limit flag to mutate/mutate_result Now, mutate/mutate_result accept a flag which decides whether the write should be rate limited or not. The new parameter is mandatory and all call sites were updated.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	1f65c4e001	storage_proxy: add allow rate limit flag to mutate_internal Now, mutate_internal accepts a flag which decides whether the write should be rate limited or not.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	1e4e92ed8b	storage_proxy: add allow rate limit flag to mutate_begin Now, mutate_begin accepts a flag which decides whether given write should be rate limited or not.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	76e95e7ae8	storage_proxy: choose the right per partition rate limit info in write handler Now, write response handler calculates the appropriate rate limit info parameter and passes it to the mutation holder.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	2a7ba76c3e	storage_proxy: resultize return types of write handler creation path The mutate_prepare and create_write_response_handler(_helper) functions are modified to be able to return exceptions without throwing them. In an upcoming commit, create_write_response_handler will sometimes return rate limit exception, and it is preferable to return them without throwing.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	3f88ecdea6	storage_proxy: add per partition rate limit to mutation_holders Now, `apply_locally` and `apply_remotely` accept the per partition rate limit info parameter.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	02469e0b15	storage_proxy: add per partition rate limit info to write RPC Adds db::per_partition_rate_limit::info parameter to the write RPC. The rate limit info controls the behavior of the rate limiter on the replica.	2022-06-22 20:16:48 +02:00
Piotr Dulikowski	c06376b383	storage_proxy: add per partition rate limit info to mutate_locally Now, mutate_locally accepts a parameter that controls the rate limiter behavior on the replica.	2022-06-22 20:16:48 +02:00
Piotr Dulikowski	2162bb9f3b	storage_proxy: propagate rate_limit_exception through read RPC This commit modifies the read RPC and the storage_proxy logic so that the coordinator knows whether a read operation failed due to rate limit being exceeded, and returns `exceptions::rate_limit_exception` if that happens.	2022-06-22 20:16:48 +02:00
Piotr Dulikowski	51546b0609	storage_proxy: pass rate_limit_exception through write RPC This commit modifies the storage_proxy logic so that the coordinator knows whether a write operation failed due to rate limit being exceeded, and returns `exceptions::rate_limit_exception` when that happens.	2022-06-22 20:16:48 +02:00
Pavel Emelyanov	f0cafc35fd	proxy stats: Get rack/datacenter from topology The reference is already at hand. The get_ep_stats() calls another helper that also maps endpoint to datacenter, but it can get the obtained dc sstring via argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:27 +03:00
Pavel Emelyanov	8ffe249430	proxy stats: Push topology arg to get_ep_stats The latter will need it to get dc info from. All the callers are either storage proxy or have storage proxy pointer/reference to get topology from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:27 +03:00
Pavel Emelyanov	507db73586	proxy: Get rack/datacenter from topology Proxy has shared token metadata from which it can get the topology. This change obsoletes static get_local_dc() helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:26 +03:00
Avi Kivity	4b53af0bd5	treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime of the function object is less ambiguous, and so it is safer. Replace all eligible occurences (i.e. caller is a coroutine). One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra attention since there was a handle_exception() continuation attached. It is converted to a try/catch. Closes #10699	2022-05-31 09:06:24 +03:00
Avi Kivity	a6b554409b	storage_proxy: stop using deprecated std::not1 and std::bind1st in cas(), get_paxos_participants(), and create_write_response_handler_helper() Use equivalent std::not_fn and std::bind_front instead. Closes #10622	2022-05-22 22:13:12 +03:00
Piotr Sarna	b8a36ff253	service,test: add a test case for error during pruning The test case checks that errors which occur during materialized view pruning are properly propagated back to the user.	2022-05-19 10:16:04 +02:00
Avi Kivity	528ab5a502	treewide: change metric calls from make_derive to make_counter make_derive was recently deprecated in favor of make_counter, so make the change throughput the codebase. Closes #10564	2022-05-14 12:53:55 +02:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Pavel Emelyanov	3b4af86ad9	proxy (and suddenly redis): Don't check latency_counter.is_start() The lcs at those places are explicitly start()ed beforehand. The is_start() check is necessary when using the latency_counter with a histogram that may or may not start the counter (this is the case in several class table methods). tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-09 17:20:41 +03:00
Botond Dénes	9623589c77	Merge 'Futurize data_read_resolver::resolve and to_data_query_result' from Benny Halevy This series futurizes two synchronous functions used for data reconciliation: `data_read_resolver::resolve` and `to_data_query_result` and does so by introducing lower-level asynchronous infrastructure: `mutation_partition_view::accept_gently`, `frozen_mutation::unfreeze_gently` and `frozen_mutation::consume_gently`, and `mutation::consume_gently`. This trades some cycles on this cold path to prevent known reactor stalls. Fixes #2361 Fixes #10038 Closes #10482 * github.com:scylladb/scylla: mutation: add consume_gently frozen_mutation: add consume_gently query: coroutinize to_data_query_result frozen_mutation: add unfreeze_gently mutation_partition_view: add accept_gently methods storage_proxy: futurize data_read_resolver::resolve	2022-05-06 10:23:02 +03:00
Benny Halevy	c9612855c7	query: coroutinize to_data_query_result Reduce stalls by maybe yielding in-between partitions, and by awaiting unfreeze_gently where possible. Refs #10038 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-05 13:32:25 +03:00
Benny Halevy	e12454f175	frozen_mutation: add unfreeze_gently And use in data_read_resolver::resolve Fixes #2361 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-05 13:32:25 +03:00
Benny Halevy	f02c25f2c3	storage_proxy: futurize data_read_resolver::resolve Allow yielding in data_read_resolver::resolve to prevent reactor stalls. TODO: unfreeze_gently, to prevent stalls due to large partitions. Refs #2361 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-05-05 13:32:25 +03:00
Avi Kivity	19ab3edd77	gms: feature_service: remove variable/helper function duplication Each feature has a private variable and a public accessor. Since the accessor effectively makes the variable public, avoid the intermediary and make the variable public directly. To ease mechanical translation, the variable name is chosen as the function name (without the cluster_supports_ prefix). References throughout the codebase are adjusted.	2022-05-04 18:59:56 +03:00
Pavel Emelyanov	11c99fc41b	table: Don't use global gossiper The table::get_hit_rate needs gossiper to get hitrates state from. There's no way to carry gossiper reference on the table itself, so it's up to the callers of that method to provide it. Fortunately, there's only one caller -- the proxy -- but the call chain to carry the reference it not very short ... oh, well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-05-03 10:33:08 +03:00
Benny Halevy	e88871f4ec	replica: database: move shard_of implementation to mutation layer We don't need the database to determine the shard of the mutation, only its schema. So move the implementation to the respecive definitions of mutation and frozen_mutation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #10430	2022-04-27 14:40:24 +03:00
Avi Kivity	582802825a	treewide: use system-#include (angle brackets) for seastar Seastar is an external library from Scylla's point of view so we should use the angle bracket #include style. Most of the source follows this, this patch fixes a few stragglers. Also fix cases of #include which reached out to seastar's directory tree directly, via #include "seastar/include/sesatar/..." to just refer to <seastar/...>. Closes #10433	2022-04-26 14:46:42 +03:00
Avi Kivity	469bca5369	storage_proxy: coroutinize mutate_locally (vector overload) The do_with() means we have an unconditional allocation, so we can justify the coroutine's allocation (replacing it). Meanwhile, coroutine::parallel_for_each() reduces an allocation if mutate_locally() blocks. Closes #10387	2022-04-19 10:59:16 +03:00
Avi Kivity	36aee57978	storage_proxy: convert rpc handlers from lambdas to member functions Currently, rpc handlers are all lambdas inside storage_proxy::init_messaging_service(). This means any stack trace refers to storage_proxy::init_messaging_service::lambda#n instead of a meaningful function name, and it makes init_messaging_service() very intimidating. Fix that by moving all such lambdas to regular member functions. This is easy now that they don't capture anything except `this`, which we provide during registration via std::bind_front(). A few #includes and forward declarations had to be added to storage_proxy.hh. This is unfortunate, but can only be solved by splitting storage_proxy into a client part and a server part.	2022-04-17 19:03:06 +03:00
Avi Kivity	f7e8109b16	storage_proxy: don't capture messaging_service in server callbacks We'd like to make the server callbacks member functions, rather than lambdas, so we need to eliminate their captures. This patch eliminats 'ms' by referringn to the already existing member '_messaging' instead.	2022-04-17 17:55:05 +03:00
Avi Kivity	4cac2eb43e	storage_proxy: don't capture migration_manager in server callbacks We'd like to make the server callbacks member functions, rather than lambdas, so we need to eliminate their captures. This patch eliminates 'mm' by making it a member variable and capturing 'this' instead. In one case 'mm' was used by a handle_write() intermediate lambda so we have to make that non-static and capture it too. uninit_messaging_service() clears the member variable to preserve the same lifetime 'mm' had before, in case that's important.	2022-04-17 17:54:51 +03:00
Gleb Natapov	a3e8ae0979	storage_proxy: fix silencing of remote read errors Filtering remote rpc errors based on exception type did not work because the remote errors were reported as std::runtime_error and all rpc exceptions inherit from it. New rpc propagates remote errors using special type rpc::remote_verb_error now, so we can filter on that instead. Fixes #10339 Message-Id: <YlQYV5G6GksDytGp@scylladb.com>	2022-04-11 18:53:25 +03:00
Pavel Emelyanov	9fdb49c86a	Merge 'fix hang on shutdown while ddl query is running and there is no quorum' from Gleb A node that runs DDL query while its cluster does not have a quorum cannot be shutdown since the query is not abortable. The series makes it abortable and also fixes the order in which components are shutdown to avoid the deadlock. * gleb/raft_shutdown_v4 of git@github.com:scylladb/scylla-dev.git: migration_manager: drain migration manager before stopping protocol servers on shutdown migration_manager: pass abort source to raft primitives storage_proxy: relax some read error reporting	2022-04-04 17:25:13 +03:00
Gleb Natapov	1409b885a0	storage_proxy: relax some read error reporting Silence request_aborted read error since it is expected to happen suring shutdown and report remote rpc errors as warnings instead of errors since if they are indeed server they should be handled by the rpc client, but OTOH some non critical errors do expect to happen during shutdown.	2022-03-31 10:00:29 +03:00
Pavel Emelyanov	965d2a0a4f	code,system_keyspace: Remove system_keyspace::get_local_host_id() The host id is cached on db::config object that's available in all the places that need it. This allows removing the method in question from the system_keyspace and not caring that anyone that needs host_id would have to depend on system_keyspace instance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-25 13:21:59 +03:00

1 2 3 4 5 ...

887 Commits