scylladb

Author	SHA1	Message	Date
Petr Gusev	819d59eeba	storage_proxy_stats: add fenced_out_requests metric We have to drop const qualifiers because now check_fence needs to mutate this metric.	2025-09-15 11:24:53 +02:00
Kefu Chai	b3e2561ed8	service: do not include unused headers these unused includes were identified by clang-include-cleaner. after auditing these source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-03-20 11:18:16 +08:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Gleb Natapov	12937aeb7f	storage_proxy: move to addressing nodes by host ids instead of ips In this rather large path we mode to address nodes in storage proxy by host ids instead of ips. Some subsystems storage proxy calls to are not yet converted to host ids, so we translate back and forth when we interact with them.	2024-12-02 10:31:11 +02:00
Dawid Medrek	23bea50de0	service/storage_proxy: Add metrics for received hints In this commit, we add two new metrics to storage proxy: * `received_hints_total`, * `received_hints_bytes_total`. Before these changes, we had to rely solely on other metrics indicating how many hints nodes have written, rejected, sent, etc. Because hints are subject to many more or less controllable factors, e.g. a target node still being a replica for a mutation, it was very difficult to approximate how many hints a given node might have received or what part of its load they were. The newly introduced metrics are supposed to help reason about those.	2024-06-12 14:44:47 +02:00
Botond Dénes	d82a31f15f	service/storage_proxy: add useful version of base write throttle metrics There are two metrics to help observe base-write throttling: * current_throttled_base_writes * last_mv_flow_control_delay Both show a snapshot of what is happening right at the time of querying these metrincs. This doesn't work well when one wants to investigate the role throttling is playing in occasional write timeouts.s Prometheus scrapes metrics in multi-second intervals, and the probability of that instant catching the throttling at play is very small (almost zero). Add two new metrics: * throttled_base_writes_total * mv_flow_control_delay_total These accumulate all values, allowing graphana to derive the values and extract information about throttle events that happened in the past (but not necessarily at the instant of the scrape). Note that dividing the two values, will yield the average delay for a throttle, which is also useful. Closes scylladb/scylladb#18435	2024-05-13 18:02:06 +03:00
Calle Wilund	f18e967939	storage_proxy: Make split_stats resilient to being called from different scheduling group Fixes #11017 When doing writes, storage proxy creates types deriving from abstract_write_response_handler. These are created in the various scheduling groups executing the write inducing code. They pick up a group-local reference to the various metrics used by SP. Normally all code using (and esp. modifying) these metrics are executed in the same scheduling group. However, if gossip sees a node go down, it will notify listeners, which eventually calls get_ep_stat and register_metrics. This code (before this patch) uses _active_ scheduling group to eventually add metrics, using a local dict as guard against double regs. If, as described above, we're called in a different sched group than the original one however, this can cause double registrations. Fixed here by keeping a reference to creating scheduling group and using this, not active one, when/if creating new metrics. Closes #14294	2023-06-21 10:08:27 +03:00
Piotr Smaroń	5f6491987d	Deregister table's metrics when disposing a table to work around #8627 The metrics that are being deregistered (in this PR) cause Scylla to crash when a table is dropped, but the corresponding table object in memory is not yet deallocated, and a new table with the same name is created. This caused a double-metrics-registration exception to be thrown. In order to avoid it, we are deregistering table's metrics as soon as the table is marked to be disposed from the database. Table's representation in memory can still live, but shouldn't forbid other table with the same name to be created. Fixes #13548 Closes #13971	2023-05-23 18:41:51 +03:00
Amnon Heiman	5ac20ac861	Reduce the number of per-scheduling group metrics This patch reduces the number of metrics ScyllaDB generates. Motivation: The combination of per-shard with per-scheduling group generates a lot of metrics. When combined with histograms, which require many metrics, the problem becomes even bigger. The two tools we are going to use: 1. Replace per-shard histograms with summaries 2. Do not report unused metrics. The storage_proxy stats holds information for the API and the metrics layer. We replaced timed_rate_moving_average_and_histogram and time_estimated_histogram with the unfied timed_rate_moving_average_summary_and_histogram which give us an option to report per-shard summaries instead of histogram. All the counters, histograms, and summaries were marked as skip_when_empty. The API was modified to use timed_rate_moving_average_summary_and_histogram. Closes #11173	2022-08-11 13:31:19 +03:00
Avi Kivity	dab56b82fa	Merge 'Per-partition rate limiting' from Piotr Dulikowski Due to its sharded and token-based architecture, Scylla works best when the user workload is more or less uniformly balanced across all nodes and shards. However, a common case when this assumption is broken is the "hot partition" - suddenly, a single partition starts getting a lot more reads and writes in comparison to other partitions. Because the shards owning the partition have only a fraction of the total cluster capacity, this quickly causes latency problems for other partitions within the same shard and vnode. This PR introduces per-partition rate limiting feature. Now, users can choose to apply per-partition limits to their tables of choice using a schema extension: ``` ALTER TABLE ks.tbl WITH per_partition_rate_limit = { 'max_writes_per_second': 100, 'max_reads_per_second': 200 }; ``` Reads and writes which are detected to go over that quota are rejected to the client using a new RATE_LIMIT_ERROR CQL error code - existing error codes didn't really fit well with the rate limit error, so a new error code is added. This code is implemented as a part of a CQL protocol extension and returned to clients only if they requested the extension - if not, the existing CONFIG_ERROR will be used instead. Limits are tracked and enforced on the replica side. If a write fails with some replicas reporting rate limit being reached, the rate limit error is propagated to the client. Additionally, the following optimization is implemented: if the coordinator shard/node is also a replica, we account the operation into the rate limit early and return an error in case of exceeding the rate limit before sending any messages to other replicas at all. The PR covers regular, non-batch writes and single-partition reads. LWT and counters are not covered here. Results of `perf_simple_query --smp=1 --operations-per-shard=1000000`: - Write mode: ``` `8f690fdd47` (PR base): 129644.11 tps ( 56.2 allocs/op, 13.2 tasks/op, 49785 insns/op) This PR: 125564.01 tps ( 56.2 allocs/op, 13.2 tasks/op, 49825 insns/op) ``` - Read mode: ``` `8f690fdd47` (PR base): 150026.63 tps ( 63.1 allocs/op, 12.1 tasks/op, 42806 insns/op) This PR: 151043.00 tps ( 63.1 allocs/op, 12.1 tasks/op, 43075 insns/op) ``` Manual upgrade test: - Start 3 nodes, 4 shards each, Scylla version `8f690fdd47` - Create a keyspace with scylla-bench, RF=3 - Start reading and writing with scylla-bench with CL=QUORUM - Manually upgrade nodes one by one to the version from this PR - Upgrade succeeded, apart from a small number of operations which failed when each node was being put down all reads/writes succeeded - Successfully altered the scylla-bench table to have a read and write limit and those limits were enforced as expected Fixes: #4703 Closes #9810 * github.com:scylladb/scylla: storage_proxy: metrics for per-partition rate limiting of reads storage_proxy: metrics for per-partition rate limiting of writes database: add stats for per partition rate limiting tests: add per_partition_rate_limit_test config: add add_per_partition_rate_limit_extension function for testing cf_prop_defs: guard per-partition rate limit with a feature query-request: add allow_limit flag storage_proxy: add allow rate limit flag to get_read_executor storage_proxy: resultize return type of get_read_executor storage_proxy: add per partition rate limit info to read RPC storage_proxy: add per partition rate limit info to query_result_local(_digest) storage_proxy: add allow rate limit flag to mutate/mutate_result storage_proxy: add allow rate limit flag to mutate_internal storage_proxy: add allow rate limit flag to mutate_begin storage_proxy: choose the right per partition rate limit info in write handler storage_proxy: resultize return types of write handler creation path storage_proxy: add per partition rate limit to mutation_holders storage_proxy: add per partition rate limit info to write RPC storage_proxy: add per partition rate limit info to mutate_locally database: apply per-partition rate limiting for reads/writes database: move and rename: classify_query -> classify_request schema: add per_partition_rate_limit schema extension db: add rate_limiter storage_proxy: propagate rate_limit_exception through read RPC gms: add TYPED_ERRORS_IN_READ_RPC cluster feature storage_proxy: pass rate_limit_exception through write RPC replica: add rate_limit_exception and a simple serialization framework docs: design doc for per-partition rate limiting transport: add rate_limit_error	2022-06-24 01:32:13 +03:00
Piotr Dulikowski	442901f14a	storage_proxy: metrics for per-partition rate limiting of reads Adds a metric "read_rate_limited" which indicates how many times a read operation was rejected due to per-partition rate limiting. The metric differentiates between reads rejected by the coordinator and reads rejected by replicas.	2022-06-22 20:16:49 +02:00
Piotr Dulikowski	6e5d486970	storage_proxy: metrics for per-partition rate limiting of writes Adds a metric "write_rate_limited" which indicates how many times a write operation was rejected due to per-partition rate limiting. The metric differentiates between writes rejected by the coordinator and writes rejected by replicas.	2022-06-22 20:16:49 +02:00
Pavel Emelyanov	f0cafc35fd	proxy stats: Get rack/datacenter from topology The reference is already at hand. The get_ep_stats() calls another helper that also maps endpoint to datacenter, but it can get the obtained dc sstring via argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:27 +03:00
Pavel Emelyanov	8ffe249430	proxy stats: Push topology arg to get_ep_stats The latter will need it to get dc info from. All the callers are either storage proxy or have storage proxy pointer/reference to get topology from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-06-22 11:47:27 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Piotr Sarna	2e544a0c89	storage_proxy: add metrics for too many in-flight hints failures When there are too many in-flight hints, writes start returning overloaded exceptions. We're missing metrics for that, and these could be useful when judging if the system is in overloaded state.	2020-11-10 16:26:18 +01:00
Amnon Heiman	6e1f042b93	storage_proxy: use time_estimated_histogram for latencies This patch change storage_proxy to use time_estimated_histogram. Besides the type, it changes how values are inserted and how the histogram is used by the API. An example how a metric looks like after the change: scylla_storage_proxy_coordinator_write_latency_bucket{le="640.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="768.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="896.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1024.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1280.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1536.000000",scheduling_group_name="statement",shard="0",type="histogram"} 0 scylla_storage_proxy_coordinator_write_latency_bucket{le="1792.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2 scylla_storage_proxy_coordinator_write_latency_bucket{le="2048.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2 scylla_storage_proxy_coordinator_write_latency_bucket{le="2560.000000",scheduling_group_name="statement",shard="0",type="histogram"} 3 scylla_storage_proxy_coordinator_write_latency_bucket{le="3072.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5 scylla_storage_proxy_coordinator_write_latency_bucket{le="3584.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5 scylla_storage_proxy_coordinator_write_latency_bucket{le="4096.000000",scheduling_group_name="statement",shard="0",type="histogram"} 7 scylla_storage_proxy_coordinator_write_latency_bucket{le="5120.000000",scheduling_group_name="statement",shard="0",type="histogram"} 8 scylla_storage_proxy_coordinator_write_latency_bucket{le="6144.000000",scheduling_group_name="statement",shard="0",type="histogram"} 9 scylla_storage_proxy_coordinator_write_latency_bucket{le="7168.000000",scheduling_group_name="statement",shard="0",type="histogram"} 11 scylla_storage_proxy_coordinator_write_latency_bucket{le="8192.000000",scheduling_group_name="statement",shard="0",type="histogram"} 11 scylla_storage_proxy_coordinator_write_latency_bucket{le="10240.000000",scheduling_group_name="statement",shard="0",type="histogram"} 19 scylla_storage_proxy_coordinator_write_latency_bucket{le="12288.000000",scheduling_group_name="statement",shard="0",type="histogram"} 49 scylla_storage_proxy_coordinator_write_latency_bucket{le="14336.000000",scheduling_group_name="statement",shard="0",type="histogram"} 132 scylla_storage_proxy_coordinator_write_latency_bucket{le="16384.000000",scheduling_group_name="statement",shard="0",type="histogram"} 294 scylla_storage_proxy_coordinator_write_latency_bucket{le="20480.000000",scheduling_group_name="statement",shard="0",type="histogram"} 1035 scylla_storage_proxy_coordinator_write_latency_bucket{le="24576.000000",scheduling_group_name="statement",shard="0",type="histogram"} 2790 scylla_storage_proxy_coordinator_write_latency_bucket{le="28672.000000",scheduling_group_name="statement",shard="0",type="histogram"} 5788 scylla_storage_proxy_coordinator_write_latency_bucket{le="32768.000000",scheduling_group_name="statement",shard="0",type="histogram"} 9815 scylla_storage_proxy_coordinator_write_latency_bucket{le="40960.000000",scheduling_group_name="statement",shard="0",type="histogram"} 19821 scylla_storage_proxy_coordinator_write_latency_bucket{le="49152.000000",scheduling_group_name="statement",shard="0",type="histogram"} 30063 scylla_storage_proxy_coordinator_write_latency_bucket{le="57344.000000",scheduling_group_name="statement",shard="0",type="histogram"} 38642 scylla_storage_proxy_coordinator_write_latency_bucket{le="65536.000000",scheduling_group_name="statement",shard="0",type="histogram"} 44987 scylla_storage_proxy_coordinator_write_latency_bucket{le="81920.000000",scheduling_group_name="statement",shard="0",type="histogram"} 51821 scylla_storage_proxy_coordinator_write_latency_bucket{le="98304.000000",scheduling_group_name="statement",shard="0",type="histogram"} 54197 scylla_storage_proxy_coordinator_write_latency_bucket{le="114688.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55054 scylla_storage_proxy_coordinator_write_latency_bucket{le="131072.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55363 scylla_storage_proxy_coordinator_write_latency_bucket{le="163840.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55520 scylla_storage_proxy_coordinator_write_latency_bucket{le="196608.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55545 scylla_storage_proxy_coordinator_write_latency_bucket{le="229376.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="262144.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="327680.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="393216.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="458752.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="524288.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="655360.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="786432.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="917504.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1048576.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1310720.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1572864.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="1835008.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="2097152.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="2621440.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="3145728.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="3670016.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="4194304.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="5242880.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="6291456.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="7340032.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="8388608.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="10485760.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="12582912.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="14680064.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="16777216.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="20971520.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="25165824.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="29360128.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="33554432.000000",scheduling_group_name="statement",shard="0",type="histogram"} 55549 scylla_storage_proxy_coordinator_write_latency_bucket{le="+Inf",scheduling_group_name="statement",shard="0",type="histogram"} 55549 Fixes #4746 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:23:02 +03:00
Gleb Natapov	d555fb60d7	lwt: add counters for background and foreground paxos operations Paxos may leave an operation in a background after returning result to a caller. Lest add a counter for background/foreground paxos handlers so that it will be easier to detect memory related issues. Message-Id: <20200510092942.GA24506@scylladb.com>	2020-05-11 14:37:00 +02:00
Pavel Emelyanov	513ce1e6a5	storage_proxy_stats: Make get_ep_stat() noexcept The .get_ep_stat(ep) call can throw when registering metrics (we have issue for it, #5697). This is not expected by it callers, in particular abstract_write_response_handler::timeout_cb breaks in the middle and doesn't call the on_timeout() and the _proxy->remove_response_handler(), which results in not removed and not released responce handler. In turn not released response handler doesn't set the _ready future on which response_wait() waits -> stuck. Although the issue with .get_ep_stat() should be fixed, an exception in it mustn't lead to deadlocks, so the fix is to make the get_ep_stat() noexcept by catching the exception and returning a dummy stat object instead to let caller(s) finish. Fixes #5985 Tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200430163639.5242-1-xemul@scylladb.com>	2020-04-30 19:40:08 +03:00
Gleb Natapov	8a408ac5a8	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com>	2020-03-30 21:02:14 +03:00
Konstantin Osipov	94ee511f6a	lwt: implement cas_failed_read_round_optimization metric Presently lightweight transactions piggy back the old row value on prepare round response. If one of the participants did not provide the old value or the values from peers don't match, we perform a full read round which will repair the Paxos table and the base table, if necessary, at all participants. Capture the fact that read optimization has failed in a metric. Message-Id: <20200304192955.84208-2-kostja@scylladb.com>	2020-03-05 12:20:45 +01:00
Piotr Sarna	70c9889ef7	storage_proxy: remove dead metrics code This patch removes an implementation of register_split_metrics_for, which is not used anywhere in the codebase. Message-Id: <e83f3e9d109113fe0553919032f005d4ab3a3023.1581851904.git.sarna@scylladb.com>	2020-02-16 17:00:45 +02:00
Gleb Natapov	ed3e423922	lwt: add counter for a case where timeout is sent prematurely There is a case in current PAXOS implementation where timeout is returned because the code cannot guaranty whether the value is accepted or not in case of a contention. The counter will help to correlate this condition with failed requests. Message-Id: <20200211160653.30317-2-gleb@scylladb.com>	2020-02-16 11:22:30 +02:00
Eliran Sinvani	971711a546	storage proxy: migrate to per scheduling group statistics This commit builds on top of the introduced per scheduling group statistics template and employs it for achieving a per scheduling group statistics in storage_proxy. Some of the statistics also had meaning as a global - per shard one. Those are the ones for determining if to throttle the write request. This was handled by creating a global stats struct that will hold those stats and by changing the stat update to also include the global one. One point that complicated it is an already existing aggregation over the per shard stats that now became a per scheduling group per shard stats, converting the aggregation to a two-dimensional aggregation. One thing this commit doesn't handle is validating that an individual statistic didn't "cross a scheduling group boundary", such validation is possible but it can easily be added in the future. There is a subtlety to doing so since if the operation did cross to other scheduling group two connected statistics can lose balance for example written bytes and completed write transactions. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:44 +01:00
Eliran Sinvani	8cfc2aad57	internalize storage proxy statistics metric registration The storage proxy statistics structure did not contain a method for registering the statistics for metric groups, instead, each user had to register some of the metrics by itself. There is no real reason for separating the metrics registration from the statistics data. There is even less justification for doing this only for part of the stats as is the case for those statistics. This commit internalize the metrics registration in the storage_proxy stats structures. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2020-01-30 15:01:40 +01:00
Vladimir Davydov	c27ab87410	storage_proxy: add cas request accounting This patch implements accounting of Cassandra's metrics related to lightweight transactions, namely: cas_read_latency transactional read latency (histogram) cas_write_latency transactional write latency (histogram) cas_read_timeouts number of transactional read timeouts cas_write_timeouts number of transactional write timeouts cas_read_unavailable number of transactional read unavailable errors cas_write_unavailable number of transactional write unavailable errors cas_read_unfinished_commit number of transaction commit attempts that occurred on read cas_write_unfinished_commit number of transaction commit attempts that occurred on write cas_write_condition_not_met number of transaction preconditions that did not match current values cas_read_contention how many contended reads were encountered (histogram) cas_write_contention how many contended writes were encountered (histogram)	2019-10-29 19:25:47 +03:00
Konstantin Osipov	56f3bda4c7	metrics: introduce a metric for non-local reads A read which arrived to a non-replica and had to be forwarded to a replica by the coordinator is accounted in an own metric, reads_coordinator_outside_replica_set. Most often such read is produced by a driver which is unaware of token distribution on the ring. If a read was forwarded to another replica due to heat weighted load balancing or query preference set by the user, it's not accounted in the metric. In case of a multi-partition read (a query using IN statement, e.g. x in (1, 2, 3)), if any of the keys is read from a non-local node the read is accounted as a non-local. The rationale behind it is that if the user tries to be careful and send IN queries only to the same vnode, they are rewarded with the counter staying at zero, while if they send multi-partition IN queries without any precautions, they will see the metric go up which gives them a starting point for investigating performance problems. Closes #4338	2019-07-08 19:23:38 +03:00
Konstantin Osipov	da1d1b74da	metrics: account writes forwarded by a coordinator in an own metric. Add a metric to account writes which arrived to a non-replica and had to be forwarded by a coordinator to a replica. The name of the added metric is 'writes_coordinator_outside_replica_set'. Do not account forwarded read repair writes, since they are already accounted by a reads_coordinator_outside_replica_set metric, added in a subsequent patch. In scope of #4338.	2019-07-08 18:17:48 +03:00
Nadav Har'El	ccf731a820	Materialized views: add metric for current flow-control delay The materialized views flow control mechanism works by adding a certain delay to each client request, designed to slow down the client to the rate at we can complete the background view work. Until now we could observe this mechanism only indirectly, in whether or not it succeeded to keep the view backlog bounded; But we had no way to directly observe the delay that we decided to add. In fact, we had a bug where this delay was constantly zero, and we didn't even notice :-) So in this patch we add a new metric, scylla_storage_proxy_coordinator_last_mv_flow_control_delay The metric is a floating point number, in units of seconds. This metric is somewhat peculiar that it always contains the last delay used for some request - unlike other metrics it doesn't measure the "current" value of something. Moreover, it can jump wildly because there is no guarantee that each request's delay will be identical (in particular, different requests may involve different base replicas which have different view backlogs, so decide on different delays). In the future we may want to supplement this metric with some sort of delay histogram. But even this simple metric is already useful to debug certain scenarios and understand if the materialized-views flow control is working or not. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190227133630.26328-1-nyh@scylladb.com>	2019-03-20 09:14:59 -03:00
Duarte Nunes	819b6f3406	service/storage_proxy: Add counters for delayed base writes Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Gleb Natapov	207b57a892	storage_proxy: count number of timed out write attempts after CL is reached It is useful to have this counter to investigate the reason for read repairs. Non zero value means that writes were lost after CL is reached and RR is expected. Message-Id: <20181009120900.GF22665@scylladb.com>	2018-10-09 15:17:07 +03:00
Tomasz Grabiec	82270c8699	storage_proxy: Fix misqualification of reads as foreground or background in some cases The foreground reads metric is derived from the number of live read executors minus the number of background reads. Background reads are counted down when their resolver times out. However, a read executor may still be around for a while, resulting in such reads being accounted as foreground. Usually, the gap in which this happens is short, because executor reference holders timeout quickly as well. It's not always the case though. For instance, local read executor doesn't time out quickly when the target shard has an overloaded CPU, and it takes a while before the request goes through all the queues, even if IO is not involved. Observed in #3628. Fixes #3734. Another problem is that all reads which received CL responses are accounted as background, until all replicas respond, but if such read needs reconciliation, it's still practically a foreground read and should be accounted as such. Found during code review. Fixes #3745. This patch fixes both issues by rearranging accounting to track foreground reads instead of background reads, and considering all reads as foreground until the resulting promise is resolved. Message-Id: <1535999620-25784-1-git-send-email-tgrabiec@scylladb.com>	2018-09-05 20:42:51 +03:00
Avi Kivity	bea1f715dc	storage_proxy: count cross-shard operations Count operations which were started on one shard and were performed on another, due to non-shard-aware driver and/or RPC. Message-Id: <20180723155118.8545-1-avi@scylladb.com>	2018-07-25 16:21:04 +01:00
Piotr Sarna	1d590b3ca4	storage_proxy: decouple write_stats from stats This commit extracts metrics related to writes from stats structure, so it can be easily replaced later, e.g. for materialized view metrics. References #3385 References #3416	2018-05-22 16:52:58 +02:00

35 Commits