scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 11:36:54 +00:00

Author	SHA1	Message	Date
Amnon Heiman	c31a58f2e9	replica/table.cc: Do not register per-table metrics for system There is a set of per-table metrics that should only be registered for user tables. As time passes there are more keyspaces that are not for the user keyspace and there is now a function that covers all those cases. This patch replaces the implementation to use is_internal_keyspace. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2022-07-27 16:58:52 +03:00
Amnon Heiman	9a3e70adfb	histogram_metrics_helper.hh: Add to_metrics_summary function The to_metrics_summary is a helper function that create a metrics type summary from a timed_rate_moving_average_with_summary object. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2022-07-27 16:58:52 +03:00
Amnon Heiman	c220e3a00f	Unified histogram, estimated_histogram, rates, and summaries Currently, there are two metrics reporting mechanisms: the metrics layer and the API. In most cases, they use the same data sources. The main difference is around histograms and rate. The API calculates an exponentially weighted moving average using a timer that decays the average on each time tick. It calculates a poor-man histogram by holding the last few entries (typically the last 256 entries). The caller to the API uses those last entries to build a histogram. We want to add summaries to Scylla. Similar to the API rate and histogram, summaries are calculated per time interval. This patch creates a unified mechanism by introducing an object that would hold both the old-style histogram and the new (estimated_histogram). On each time tick, a summary would be calculated. In the future, we'll replace the API to report summaries instead of the old-style histogram and deprecate the old style completely. summary_calculator uses two estimated_histogram to calculate a summary. timed_rate_moving_average_summary_and_histogram is a unifed class for ihistogram, rates, summary, and estimated_histogram and will replace timed_rate_moving_average_and_histogram. Follow-up patches would move code from using timed_rate_moving_average_and_histogram to timed_rate_moving_average_summary_and_histogram. By keeping the API it would make the transition easy. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2022-07-27 16:58:25 +03:00
Amnon Heiman	72414b613b	Split the timed_rate_moving_average into data and timer This patch split the timed_rate_moving_average functionality into two, a data class: rates_moving_average, and a wrapper class timed_rate_moving_average that uses a timer to update the rates periodically. To make the transition as simple as possible timed_rate_moving_average, takes the original API. A new helper class meter_timer was introduced to handle the timer update functionality. This change required minimal code adaptation in some other parts of the code. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2022-07-26 15:59:33 +03:00
Amnon Heiman	5bf51ed4af	utils/histogram.hh: should_sample should use a bitmask This patch fixes a bug in should_sample that uses its bitmask incorrectly. basic_ihistogram has a feature that allows it to sample values instead of taking a timer each time. To decide if it should sample or not, it uses a bitmask. The bitmask is of the form 2^n-1, which means 1 out of 2^n will be sampled. For example, if the mask is 0x1 (2^2-1) 1 out of 2 will be sampled. If the mask is 0x7 (2^3-1) 1 out of 8 will be sampled. There was a bug in the should_sampled() method. The correct form is (value&mask) == mask Ref #2747 It does not solve all of #2747, just the bug part of it. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2022-07-26 15:59:33 +03:00
Amnon Heiman	99bc6d882b	estimated_histogram: add missing getter method This patch adds the square bracket operator method that was missing.	2022-07-26 15:59:33 +03:00
Avi Kivity	29c28dcb0c	Merge 'Unstall get_range_to_address_map' from Benny Halevy Prevent stalls in this path as seen in performance testing. Also, add a respective rest_api test. Fixes #11114 Closes #11115 * github.com:scylladb/scylla: storage_service: reserve space in get_range_to_address_map and friends storage_service: coroutinize get_range_to_address_map and friends storage_service: pass replication map to get_range_to_address_map and friends storage_service: get_range_to_address_map: move selection of arbitrary ks to api layer test: rest_api: test range_to_endpoint_map and describe_ring	2022-07-25 18:06:28 +03:00
Piotr Sarna	c195ce1b82	query: allow merging non-empty forward_result with an empty one Merging empty results was already allowed, but in one way only: empty.merge(nonempty, r); // was permitted nonempty.merge(empty, r); // not permitted With this commit, both methods are permitted. In order to remove copying, the other result is now taken by rvalue reference, with all call sites being updated accordingly. Fixes #10446 Fixes #10174 Closes #11064	2022-07-25 18:06:28 +03:00
Benny Halevy	bc5f6cf45d	storage_service: reserve space in get_range_to_address_map and friends To reduce the chance of reallocation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-25 18:06:28 +03:00
Avi Kivity	4fde9414dc	Merge 'logalloc reclaim_timer improvements' from Michael Livshin * round up reported time to microseconds * add backtrace if stall detected * add call site name (hierarchical when timers are nested) * put timers in more places * reduce possible logspam in nested timers by making sure to report on things only once and to not report on durations smaller than those already reported on Closes #10576 * github.com:scylladb/scylla: utils: logalloc: fix indentation utils: logalloc: split the reclaim_timer in compact_and_evict_locked() utils: logalloc: report segment stats if reclaim_segments() times out utils: logalloc: reclaim_timer: add optional extra log callback utils: logalloc: reclaim_timer: report non-decreasing durations utils: logalloc: have reclaim_timer print reserve limits utils: logalloc: move reclaim timer destructor for more readability utils: logalloc: define a proper bundle type for reclaim_timer stats utils: logalloc: add arithmetic operations to segment_pool::stats utils: logalloc: have reclaim timers detect being nested utils: logalloc: add more reclaim_timers utils: logalloc: move reclaim_timer to compact_and_evict_locked utils: logalloc: pull reclaim_timer definition forward utils: logalloc: reclaim_timer make tracker optional utils: logalloc: reclaim_timer: print backtrace if stall detected utils: logalloc: reclaim_timer: get call site name utils: logalloc: reclaim_timer: rename set_result utils: logalloc: reclaim_timer: rename _reserve_segments member utils: logalloc: reclaim_timer round up microseconds	2022-07-25 18:06:28 +03:00
Benny Halevy	5eb31eff64	storage_service: coroutinize get_range_to_address_map and friends And add calls to maybe_yield to prevent stalls in this path as seen in performance testing. Also, add a respective rest_api test. Fixes #11114 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-25 18:06:28 +03:00
Tomasz Grabiec	76d20aeb96	Merge 'Refactor group 0 operations (joining, leaving, removing).' from Kamil Braun A series of refactors to the `raft_group0` service. Read the commits in topological order for best experience. This PR is more or less equivalent to the second-to-last commit of PR https://github.com/scylladb/scylla/pull/10835, I split it so we could have an easier time reviewing and pushing it through. Closes #11024 * github.com:scylladb/scylla: service: storage_service: additional assertions and comments service/raft: raft_group0: additional logging, assertions, comments service/raft: raft_group0: pass seed list and `as_voter` flag to `join_group0` service/raft: raft_group0: rewrite `remove_from_group0` service/raft: raft_group0: rewrite `leave_group0` service/raft: raft_group0: split `leave_group0` from `remove_from_group0` service/raft: raft_group0: introduce `setup_group0` service/raft: raft_group0: introduce `load_my_addr` service/raft: raft_group0: make some calls abortable service/raft: raft_group0: remove some temporary variables service/raft: raft_group0: refactor `do_discover_group0`. service/raft: raft_group0: rename `create_server_for_group` to `create_server_for_group0` service/raft: raft_group0: extract `start_server_for_group0` function service/raft: raft_group0: create a private section service/raft: discovery: `seeds` may contain `self`	2022-07-25 18:06:28 +03:00
Benny Halevy	3d62a1592f	storage_service: pass replication map to get_range_to_address_map and friends Before they are made asynchronous in the next patch, so they work on a coherent snapshot of the token_metadata and replication map as their caller. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-25 18:06:28 +03:00
Petr Gusev	52142bb8b3	raft_group_registry, is_alive for non-existent server_id We could yield between updating the list of servers in raft/fsm and updating the raft_address_map, e.g. in case of a set_configuration. If tick_leader happens before the raft_address_map is updated, is_alive will be called with server_id that is not in the map yet. Fix: scylladb/scylla-dtest#2753 Closes #11111	2022-07-25 18:06:28 +03:00
Benny Halevy	0b474866a3	storage_service: get_range_to_address_map: move selection of arbitrary ks to api layer It is only needed for the "storage_service/describe_ring" api and service/storage_service shouldn't bother with it. It's an api sugar coating. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-25 18:06:28 +03:00
Yaron Kaikov	c42c5111eb	SCYLLA-VERSION-GEN: use semver-compatible version Setting Scylla to use semantic versioning. (Ref: https://semver.org/) Closes: https://github.com/scylladb/scylla/issues/9543 Closes #10957	2022-07-25 18:06:28 +03:00
Benny Halevy	429f110110	test: rest_api: test range_to_endpoint_map and describe_ring Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-07-25 18:06:28 +03:00
Nadav Har'El	85688a7a7e	Merge 'cql3: grammar: make the whereClause production return a single expression' from Avi Kivity Currently, the WHERE clause grammar is constrained to a conjunction of relations: `WHERE a = ? AND b = ? AND c > ?`. The restriction happens in three places: 1. the grammar will refuse to parse anything else 2. our filtering code isn't prepared for generic expressions 3. the interface between the grammar and the rest of the cql3 layer is via a vector of terms rather than an expression While most of the work will be in extending the filtering code, this series tackles the interface; it changes the `whereClause` production to return an expression rather than a vector. Since much of cql3 layer is interested in terms, a new boolean_factors() function is introduced to convert an expression to its boolean terms. Closes #11105 * github.com:scylladb/scylla: cql3: grammar: make where clause return an expression cql3: util: deinline where clause utilities cql3: util: change where clause utilities to accept a single expression rather than a vector of terms cql3: statement_restrictions: accept a single expression rather than a vector cql3: statement_restrictions: merge `if` and `for` cql3: select_statement: remove wrong but harmless std::move() in prepare_restrictions cql3: expr: add boolean_factors() function to factorize an expression cql3: expression: define operator==() for expressions cql3: values: add operator==() for raw_value	2022-07-25 18:06:28 +03:00
Botond Dénes	b673b4bee3	Merge 'let scylla-gdb.py recognize coroutines' from Michael Livshin "scylla task_histogram" and "scylla fiber" will now show coroutine "promises". Refs #10894 Closes #11071 * github.com:scylladb/scylla: test: gdb: test that "task_histogram -a" finds some coroutines scylla-gdb.py: recognize coroutine-related symbols as task types scylla-gdb.py: whitelist the .text section for task "vtables" scylla-gdb.py: fix an error message	2022-07-25 06:48:45 +03:00
Nadav Har'El	f1e3494a10	cql-pytest: fix a test to not fail on very slow machines The cql-pytest cassandra_tests/validation/operations/select_test.py:: testSelectWithAlias uses a TTL but not because it wants to test the TTL feature - it just wants to check the SELECT aliasing feature. The test writes a TTL of 100 and then reads it back using an alias. We would normally expect to read back 100 or 99, but to guard against a very slow test machine, the test verified that we read back something between 70 and 100. I thought that allowing a ridiculous 30 second delay between the write and the read requests was more than enough. But in one run of the aarch64 debug build, this ridiculous 30 seconds wasn't ridiculous enough - the delay ended up 35 seconds, and the test failed! So in this patch, I just make it even more ridiculous - we write 1000 and expect to read something over 100 - allowing a 900 second delay in the test. Note that neither the earlier 30-second or current 900-second delay slows down the test in any way - this test will normally complete in milliseconds. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11085	2022-07-24 21:17:59 +03:00
Avi Kivity	9823e75d16	cql3: grammar: make where clause return an expression In preparation of the relaxation of the grammar to return any expression, change the whereClause production to return an expression rather than terms. Note that the expression is still constrained to be a conjunction of relations, and our filtering code isn't prepared for more. Before the patch, if the WHERE clause was optional, the grammar would pass an empty vector of expressions (which is exactly correct). After the patch, it would pass a default-constructed expression. Now that happens to be an empty conjunction, which is exactly what's needed, but it is too accidental, so the patch changes optional WHERE clauses to explicitly generate an empty conjunction if the WHERE clause wasn't specified.	2022-07-22 20:14:48 +03:00
Avi Kivity	a037f9a086	cql3: util: deinline where clause utilities Some where clause related functions were unnecessarily inline; another was just recently de-templated. Move them to .cc.	2022-07-22 20:14:48 +03:00
Avi Kivity	fd663bcb94	cql3: util: change where clause utilities to accept a single expression rather than a vector of terms Conversion to terms happens internally via boolean_factors().	2022-07-22 20:14:48 +03:00
Avi Kivity	a5dd588465	cql3: statement_restrictions: accept a single expression rather than a vector Move closer to the goal of accepting a generic expression for WHERE clause by accepting a generic expression in statement_restrictions. The various callers will synthesize it from a vector of terms.	2022-07-22 20:14:48 +03:00
Avi Kivity	43aca25496	cql3: statement_restrictions: merge `if` and `for` A `for` loop does nothing on an empty container, so no need for an extra `if` for that condition. Drop the `if`.	2022-07-22 20:14:48 +03:00
Avi Kivity	4aa0a03b7e	cql3: select_statement: remove wrong but harmless std::move() in prepare_restrictions std::move(_where_clause) is wrong, because _where_clause is used later (when analyzing GROUP BY), but also harmless (because the statement_restrictions constructor accepts it by const reference). To avoid confusion in the next patch where we'll pass _where_clause to a different function, remove the bad std::move() in advance here.	2022-07-22 20:14:48 +03:00
Avi Kivity	8085b9f57a	cql3: expr: add boolean_factors() function to factorize an expression When analyzing a WHERE clause, we want to separate individual factors (usually relations), and later partition them into partition key, clustering key, and regular column relations. The first step is separation, for which this helper is added. Currently, it is not required since the grammar supplies the expression in separated form, but this will not work once it is relaxed to allow any expression in the WHERE clause. A unit test is added.	2022-07-22 20:14:48 +03:00
Avi Kivity	1efb2fecbe	cql3: expression: define operator==() for expressions This is useful for tests, to check that expression manipulations yield the expected results.	2022-07-22 20:14:48 +03:00
Avi Kivity	eec441d365	cql3: values: add operator==() for raw_value This is useful for implementing operator==() for expressions, which in turn require comparing constants, which contain raw_values. Note that this is not CQL comparison (that would be implemented in cql3::expr::evaluate() and would return a CQL boolean, not a C++ boolean, but a traditional C++ value comparison.	2022-07-22 20:13:49 +03:00
Anna Stuchlik	f46b207472	doc: update the links that are false external links and result in 404 Closes #11086	2022-07-22 14:17:42 +03:00
Botond Dénes	d12d429c47	Merge 'doc: add the upgrage guides from 2022.x.y to 2022.x.z' from Anna Stuchlik Fix https://github.com/scylladb/scylla-docs/issues/4041 I've added the upgrade guides from 2022.x.y to 2022.x.z. They are based on the previous upgrade guides for patch releases. Closes #11104 * github.com:scylladb/scylla: doc: add the new upgrade guide to the toctree doc: add the upgrage guides from 2022.x.y to 2022.x.z	2022-07-22 09:06:16 +03:00
Michael Livshin	0f1a884c90	test: gdb: test that "task_histogram -a" finds some coroutines Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-07-21 19:12:21 +03:00
Michael Livshin	6cbb367ba7	scylla-gdb.py: recognize coroutine-related symbols as task types The criteria is too permissive because coroutine symbols (those without the "[clone .resume]" part at the end, anyway) look like normal function names; hopefully this won't give too many false positives to become a problem. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-07-21 19:12:21 +03:00
Michael Livshin	f2c37b772d	scylla-gdb.py: whitelist the .text section for task "vtables" Actual vtables do not reside there, but coroutine object vptrs point at the actual coroutine code, which is. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-07-21 19:12:21 +03:00
Michael Livshin	080bd7c481	scylla-gdb.py: fix an error message Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-07-21 19:12:21 +03:00
Gleb Natapov	f1f1176963	service: raft: do not allow downgrading non expiring entry to expiring one in raft_address_map Expiring entries are added when a message is received from an unknown host. If the host is later added to the raft configuration they become non expiring. After that they can only be removed when the host is dropped from the configuration, but they should never become expiring again. Refs #10826	2022-07-21 17:40:04 +02:00
Anna Stuchlik	23515c8695	doc: add the new upgrade guide to the toctree	2022-07-21 17:05:28 +02:00
Anna Stuchlik	bf5bf44ddd	doc: add the upgrage guides from 2022.x.y to 2022.x.z	2022-07-21 16:46:06 +02:00
Asias He	39db15d2cb	misc_services: Fix cache hitrate update This patch avoids unncessary CACHE_HITRATES updates through gossip. After this patch: Publish CACHE_HITRATES in case: - We haven't published it at all - The diff is bigger than 1% and we haven't published in the last 5 seconds - The diff is really big 10% Note: A peer node can know the cache hitrate through read_data read_mutation_data and read_digest RPC verbs which have cache_temperature in the response. So there is no need to update CACHE_HITRATES through gossip in high frequency. We do the recalculation faster if the diff is bigger than 0.01. It is useful to do the calculation even if we do not publish the CACHE_HITRATES though gossip, since the recalculation will call the table->set_global_cache_hit_rate to set the hitrate. Fixes #5971 Closes #11079	2022-07-21 11:31:30 +03:00
Nadav Har'El	5faf3c711d	doc, alternator: document the possibility of write reordering In issue #10966, a user noticed that Alternator writes may be reordered (a later write to an item is ignored with the earlier write to the same item "winning") if Scylla nodes do not have synchronized time and if always_use_lwt write isolation mode is not used. In this patch I add to docs/alternator/compatibility.md a section about this issue, what causes it, and how to solve or at least mitigate it. Fixes #10966 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11094	2022-07-21 09:22:56 +02:00
Kamil Braun	4e42aeb0df	service: storage_service: additional assertions and comments	2022-07-20 19:39:29 +02:00
Kamil Braun	25bb8384af	service/raft: raft_group0: additional logging, assertions, comments Move some rare logs from TRACE to INFO level. Add some assertions. Write some more comments, including FIXMEs and TODOs. Remove unnecessary `_shutdown_gate.hold()` (this is not a background task).	2022-07-20 19:39:29 +02:00
Kamil Braun	c9f1ec1268	service/raft: raft_group0: pass seed list and `as_voter` flag to `join_group0` Group 0 discovery would internally fetch the seed list from gossiper. Gossiper would return the seed list from conf/scylla.yaml. This seed list is proper for the bootstrapping scenario - we specify the initial contact points for a node that joins a cluster. We'll have to use a different list of seeds for group 0 discovery for the upgrade scenario. Prepare for that by taking the seed list as a parameter. In the bootstrap scenario we'll pass the seed list down from `storage_service::join_cluster`. Additionally, `join_group0` now takes an `as_voter` flag, which is `false` in the bootstrap scenario (we initially join as a non-voter) but will be `true` in the upgrade scenario.	2022-07-20 19:39:29 +02:00
Kamil Braun	684d8171ca	service/raft: raft_group0: rewrite `remove_from_group0` See previous commit. `remove_from_group0` had a similar problem as `leave_group0`: it would handle the case where `raft_group0::_group0` variant was not `raft::group_id` (i.e. we haven't joined group 0), but RAFT local feature was enabled - i.e. the yet-unimplemented upgrade case - by running discovery and calling `send_group0_modify_config`. Instead, if we see that we've joined group 0 before, assume that we're still a member and simply use the Raft `modify_config` API to remove another server. If we're not a member it means we either decommissioned or were removed by someone else; then we have no business trying to remove others. There's also the unimplemented upgrade case but that will come in another pull request. Finally, add some logic for handling an edge case: suppose we joined group 0 recently and we still didn't fully update our RPC address map (it's being updated asynchronously by Raft's io_fiber). Thus we may fail to find a member of group 0 in the address map. To handle this, ensure we're up-to-date by performing a Raft read barrier. State some assumptions in a comment. Add a TODO for handling failures. Remove unnecessary `_shutdown_gate.hold()` (this is not a background task).	2022-07-20 19:39:29 +02:00
Kamil Braun	eeeef0bc50	service/raft: raft_group0: rewrite `leave_group0` One of the following cases is true: 1. RAFT local feature is disabled. Then we don't do anything related to group 0. 2. RAFT local feature is enabled and when we bootstrapped, we joined group 0. Then `raft_group0::_group0` variant holds the `raft::group_id` alternative. 3. RAFT local feature is enabled and when we bootstrapped we didn't join group 0. This means the RAFT local feature was disabled when we bootstrapped and we're in the (unimplemented yet) upgrade scenario. `raft_group0::_group0` variant holds the `std::monostate` alternative. The problem with the previous implementation was that it checked for the conditions of the third case above - that RAFT local feature is enabled but `_group0` does not hold `raft::group_id` - and if those conditions were true, it executed some logic that didn't really make sense: it ran the discovery algorithm and called `send_group0_modify_config` RPC. In this rewrite I state some assumptions that `leave_group0` makes: - we've finished the startup procedure. - we're being run during decommission - after the node entered LEFT status. In the new implementation, if `_group0` does not hold `raft::group_id` (checked by the internal `joined_group0()` helper), we simply return. This is the yet-unimplemented upgrade case left for a follow-up PR. Otherwise we fetch our Raft server ID (at this point it must be present - otherwise it's a fatal error) and simply call `modify_config` from the `raft::server` API. Remove unnecessary call to `_shutdown_gate.hold()` (this is not a background task).	2022-07-20 19:39:29 +02:00
Kamil Braun	75608bcd2f	service/raft: raft_group0: split `leave_group0` from `remove_from_group0` `leave_group0` was responsible for both removing a different node from group 0 and removing ourselves (leaving) group 0. The two scenarios are a bit different and the handling will be rewritten in following commits. Split `leave_group0` into two functions. Remove the incorrect comment about idempotency - saying that the procedure is idempotent is an oversimplification, one could argue it's incorrect since the second call simply hangs, at least in the case of leaving group 0; following commits will state what's happening more precisely. Add some additional logging and assertions where the two functions are called in `storage_service`.	2022-07-20 19:39:29 +02:00
Kamil Braun	ee0219dfe3	service/raft: raft_group0: introduce `setup_group0` Contains all logic for deciding to join (or not join) group 0. Prepare for the case where we don't want to join group 0 immediately on startup - the upgrade scenario (will be implemented in a follow-up). Move the group 0 setup step earlier in `storage_service::join_cluster`. `join_group0()` is now a private member of `raft_group0`. Some more comments were written.	2022-07-20 19:39:29 +02:00
Kamil Braun	4b0db59671	service/raft: raft_group0: introduce `load_my_addr` Compared to `load_or_create_my_addr` this function assumes that the address is already present on disk; if not, it's a fatal error. Use it in places where it would indeed be a fatal error if the address was missing.	2022-07-20 19:39:29 +02:00
Kamil Braun	f0f9aa5c7d	service/raft: raft_group0: make some calls abortable There are some calls to `modify_config` which should react to aborts (e.g. when we shutdown Scylla). There are also calls to `send_group0_modify_config` which should probably also react to aborts, but the functions don't take an abort_source parameter. This is fixable but I left TODOs for now.	2022-07-20 19:39:29 +02:00
Kamil Braun	ab8c3c6742	service/raft: raft_group0: remove some temporary variables Make the code a bit shorter.	2022-07-20 19:39:29 +02:00

1 2 3 4 5 ...

32169 Commits