scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 07:53:20 +00:00

Author	SHA1	Message	Date
Piotr Grabowski	fbc042ff02	build: add abseil-cpp dependency to Nix devenv After `8635d2442` commit, the abseil submodule was removed in favor of using pre-built abseil distribution. Installation of abseil-cpp was added to install-dependencies.sh and dbuild image, but no change was made to the Nix development environment, which resulted in error while executing ./configure.py (while in Nix devenv): Package absl_raw_hash_set was not found in the pkg-config search path. Perhaps you should add the directory containing `absl_raw_hash_set.pc' to the PKG_CONFIG_PATH environment variable No package 'absl_raw_hash_set' found Fix the issue by adding "abseil-cpp" to buildInputs in default.nix.	2023-01-19 15:03:55 +01:00
Avi Kivity	aab5954cfb	Merge 'reader_concurrency_semaphore: add more layers of defense against OOM' from Botond Dénes The reader concurrency semaphore has no mechanism to limit the memory consumption of already admitted read. Once memory collective memory consumption of all the admitted reads is above the limit, all it can do is to not admit any more. Sometimes this is not enough and the memory consumption of the already admitted reads balloons to the point of OOMing the node. This pull-request offers a solution to this: it introduces two more layers of defense above this: a soft and a hard limit. Both are multipliers applied on the semaphores normal memory limit. When the soft limit threshold is surpassed, all readers but one are blocked via a new blocking `request_memory()` call which is used by the `tracking_file_impl`. The reader to be allowed to proceed is chosen at random, it is the first reader which happens to request memory after the limit is surpassed. This is both very simple and should avoid situations where the algorithm choosing the reader to be allowed to proceed chooses a reader which will then always time out. When the hard limit threshold is surpassed, `reader_concurrency_semaphore::consume()` starts throwing `std::bad_alloc`. This again will result in eliminating whichever reader was unlucky enough to request memory at the right moment. With this, the semaphore is now effectively enforcing an upper bound for memory consumption, defined by the hard limit. Refs: https://github.com/scylladb/scylladb/issues/11927 Closes #11955 * github.com:scylladb/scylladb: test: reader_concurrency_semaphore_test: add tests for semaphore memory limits reader_permit: expose operator<<(reader_permit::state) reader_permit: add id() accessor reader_concurrency_semaphore: add foreach_permit() reader_concurrency_semaphore: document the new memory limits reader_concurrency_semaphore: add OOM killer reader_concurrency_semaphore: make consume() and signal() private test: stop using reader_concurrency_semaphore::{consume,signal}() directly reader_concurrency_semaphore: move consume() out-of-line reader_permit: consume(): make it exception-safe reader_permit: resource_units::reset(): only call consume() if needed reader_concurrency_semaphore: tracked_file_impl: use request_memory() reader_concurrency_semaphore: add request_memory() reader_concurrency_semaphore: wrap wait list reader_concurrency_semaphore: add {serialize,kill}_limit_multiplier parameters test/boost/reader_concurrency_semaphore_test: dummy_file_impl: don't use hardoced buffer size reader_permit: add make_new_tracked_temporary_buffer() reader_permit: add get_state() accessor reader_permit: resource_units: add constructor for already consumed res reader_permit: resource_units: remove noexcept qualifier from constructor db/config: introduce reader_concurrency_semaphore_{serialize,kill}_limit_multiplier scylla-gdb.py: scylla-memory: extract semaphore stats formatting code scylla-gdb.py: fix spelling of "graphviz"	2023-01-18 17:02:55 +02:00
Avi Kivity	9a54cb5deb	Merge 'cql3/expr: make it possible to prepare binary_operator' from Jan Ciołek `prepare_expression` takes an unprepared CQL expression straight from the parser output and prepares it. Preparation consists of various type checks that are needed to ensure that the expression is correct and to reason about it. While `prepare_expression` supports a number of different types of expressions, until now it was impossible to prepare a `binary_operator`. Eventually we would like to be able to prepare all kinds of expressions, so this PR adds the missing support for `binary_operator`. Closes #12550 * github.com:scylladb/scylladb: expr_test: test preparing binary_operator with NULL RHS expr_test: test preparing IS NOT NULL binary_operator expr_test: test preparing binary_operator with LIKE expr_test: test preparing binary_operator with CONTAINS KEY expr_test: test preparing binary_operator with CONTAINS expr_test: test preparing binary_operator with IN expr_test: test preparing binary_operator with =, !=, <, <=, >, >= expr_test: use make__untyped function in existing tests expr_test_utils: add utilities to create untyped_constant expr_test_utils: add make_float_ and make_double_* cql3: expr: make it possible to prepare binary_operator using prepare_expression cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators cql3: expr: pass non-empty keyspace name in prepare_binary_operator cql3: expr: take reference to schema in prepare_binary_operator	2023-01-18 16:55:18 +02:00
Jenkins Promoter	75a3dd2fc8	release: prepare for 5.3.0-dev	2023-01-18 16:22:41 +02:00
Avi Kivity	71bbd7475c	Update seastar submodule * seastar 8889cbc198...d41af8b592 (14): > Merge 'Perf stall detector related improvements' from Travis Downs Ref #8828, #7882, #11582 (may help make progress) > build: pass HEAPPROF definition to src/core/reactor.cc too > Limit memory address space per core to 64GB when hwloc is not available > build: revert use pkg_search_module(.. IMPORTED_TARGET ..) changes > Fix missing newlines in seastar-addr2line > Use an integral type for uniform_int_distribution > Merge 'tls_test: use a dedicated https server for testing' from Kefu Chai > build: use ${CMAKE_BINARY_DIR} when running 'cmake --build ..' > build: do not set c-ares_FOUND with PARENT_SCOPE > reactor: drop unused member function declaration > sstring: refactor to_sstring() using fmt::format_to() > http: delay input stream close until responses sent > build: enable non-library targets using default option value > Merge 'sstring: specialize uninitialize_string() and use resize_and_overwrite if available' from Kefu Chai Closes #12509	2023-01-18 15:50:57 +02:00
Jan Ciolek	ae0e955b90	expr_test: test preparing binary_operator with NULL RHS Make sure that preparing binary_operator works properly when the RHS is NULL. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:46 +01:00
Jan Ciolek	65b8a09409	expr_test: test preparing IS NOT NULL binary_operator Add unit test which check that preparing binary_operators which represent IS NOT NULL works as expected Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:46 +01:00
Jan Ciolek	5b3e6769f1	expr_test: test preparing binary_operator with LIKE Add unit test which check that preparing binary_operators with the LIKE operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com	2023-01-18 12:04:45 +01:00
Jan Ciolek	e876496f7f	expr_test: test preparing binary_operator with CONTAINS KEY Add unit test which check that preparing binary_operators with the CONTAINS KEY operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:45 +01:00
Jan Ciolek	c6d2e1a03e	expr_test: test preparing binary_operator with CONTAINS Add unit test which check that preparing binary_operators with the CONTAINS operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:45 +01:00
Jan Ciolek	6b147ecaea	expr_test: test preparing binary_operator with IN Add unit test which check that preparing binary_operators with the IN operation works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:45 +01:00
Jan Ciolek	669d791250	expr_test: test preparing binary_operator with =, !=, <, <=, >, >= Add unit test which check that preparing binary_operators with basic comparison operations works as expected. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	60803d12a9	expr_test: use make_*_untyped function in existing tests Use the newly introduced convenience methods that create untyped_constant in existing tests. This will make the code more readable by removing visual clutter that came with the previous overly verbose code. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	819390f9fe	expr_test_utils: add utilities to create untyped_constant expression tests often need to create instances of untyped_constant. Creating them by hand is tedious because the required code is overly verbose. Having convenience functions for it speeds up test writing. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	362bf7f534	expr_test_utils: add make_float_* and make_double_* Add utilities to create float and double values in tests. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:44 +01:00
Jan Ciolek	da3c07955a	cql3: expr: make it possible to prepare binary_operator using prepare_expression prepare_expression didn't allow to prepare binary_operators. so it's now implemented. If prepare_binary_operator is unable to infer the types it will fail with an exception instead of returning std::nullopt, but we can live with that for now. Preparing binary_operators inside the WHERE clause is currently more complicated than just calling prepare_binary_operator. Preparation of the WHERE clause is done inside statement_restrictions constructor. It's done by iterating over all binary_operators, validating them and then preparing. The validation contains additional checks with custom error messages. Preparation has to be done after validation, because otherwise the error messages will change and some tests will start failing. Because of that we can't just call prepare_expression on the WHERE clause yet. It's still useful to have the ability to prepare binary_operators using prepare_expression. In cases where we know that the WHERE clause is valid, we can just call prepare_expression and be done with it. Once grammar is fully relaxed the artificial constraints checked by the validation code will be removed and it will be possible to prepare the whole WHERE clause using just prepare_expression. prepare_expression does a bit more than prepare_binary_operator. In case where both sides of the binary_operator are known it will evaluate the whole binary_operator to a constant value. Query analysis code is NOT ready to encounter constant boolean values inside the WHERE clause, so for the WHERE we still use prepare_binary_operator which doesn't evaluate the binary_operator to a constant value. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	5f8b1a1a60	cql3/expr: check that RHS of IS NOT NULL is a null value when preparing binary operators When preparing a binary operator we first prepare the LHS, which gives us information about its type and allows to infer the desired type of RHS. Then the RHS is prepared with the expectation that it is compatible with the inferred type. This is enough for all types of operations apart from IS NOT NULL. For IS NOT we should also check that the RHS value is actually null. It's not enough to check that RHS is of right type. Before this change preparing `int_col IS NOT 123` would end in success, which is wrong. The missing check doesn't cause any real problems, it's impossible for the user to produce such input because the parser will reject it. Still it's better to have the check because in the future the grammar might get more relaxed and the parser could become more generic, making it possible to write such things. It would be better to introduce unary_operators, but that's a bigger change. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	703e9f21ff	cql3: expr: pass non-empty keyspace name in prepare_binary_operator For some reason we passed an empty keyspace name to prepare_expression when preparing the LHS of a binary operator. This doesn't look correct. We have keyspace name available from the schema_ptr so let's use that. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:43 +01:00
Jan Ciolek	9a0c5789a2	cql3: expr: take reference to schema in prepare_binary_operator prepare_binary_operator takes a schema_ptr, but it would be useful to take a reference to schema instead. Every schema_ptr can be easily converted to a reference so there is no loss of functionality. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-01-18 12:04:40 +01:00
Nadav Har'El	48e2d6a541	Merge 'utils: throw error on malformed input in base64 decode' from Marcin Maliszkiewicz Several cases where fixed in this patches, all are related to processing of malformed base64 data. Main purpose was to bring alternator implementation closer to what DynamoDB does. We now: - Throw error when padding is missing during base64 decoding - Throw error when base64 data is malformed - In alternator when invalid base64 data is fetched from DB (as opposed to being part of user's request) we now exclude such row during filtering Additionally some small code quality improvements: - avoid unnecessary type conversions in calls to rjson:from_strings functions - avoid some copy constructions in calls to rjson:from_strings functions Fixes https://github.com/scylladb/scylladb/issues/6487 Closes #11944 * github.com:scylladb/scylladb: alternator: evaluate expressions as false for stored malformed binary data rjson: avoid copy constructors in from_string calls when possible alternator: remove unused parameters from describe_items func utils: throw error on malformed input in base64 decode utils: throw error on missing padding in base64 decode	2023-01-18 12:40:57 +02:00
Tomasz Grabiec	563998b69a	Merge 'raft: improve group 0 reconfiguration failure handling' from Kamil Braun Make it so that failures in `removenode`/`decommission` don't lead to reduced availability, and any leftovers in group 0 can be removed by `removenode`: - In `removenode`, make the node a non-voter before removing it from the token ring. This removes the possibility of having a group 0 voting member which doesn't correspond to a token ring member. We can still be left with a non-voter, but that's doesn't reduce the availability of group 0. - As above but for `decommission`. - Make it possible to remove group 0 members that don't correspond to token ring members from group 0 using `removenode`. - Add an API to query the current group 0 configuration. Fixes #11723. Closes #12502 * github.com:scylladb/scylladb: test: test_topology: test for removing garbage group 0 members test/pylib: move some utility functions to util.py db: system_keyspace: add a virtual table with raft configuration db: system_keyspace: improve system.raft_snapshot_config schema service: storage_service: better error handling in `decommission` service: storage_service: fix indentation in removenode service: storage_service: make `removenode` work for group 0 members which are not token ring members service/raft: raft_group0: perform read_barrier in wait_for_raft service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove service/raft: raft_group0: link to Raft docs where appropriate service/raft: raft_group0: more logging service/raft: raft_group0: separate function for checking and waiting for Raft	2023-01-17 21:23:15 +01:00
Kamil Braun	d134c458e5	test/pylib: increase timeout when waiting for cluster before test Increase the timeout from default 5 minutes to 10 minutes. Sent as a workaround for #12546 to unblock next promotions. Closes #12547	2023-01-17 21:03:09 +02:00
Kamil Braun	4f1c317bdc	test: test_raft_upgrade: stop servers gracefully in test_recovery_after_majority_loss This test is frequently failing due to a timeout when we try to restart one of the nodes. The shutdown procedure apparently hangs when we try to stop the `hints_manager` service, e.g.: ``` INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Stopped INFO 2023-01-13 03:18:02,946 [shard 0] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Asked to stop INFO 2023-01-13 03:18:02,946 [shard 1] hints_manager - Stopped INFO 2023-01-13 03:22:56,997 [shard 0] hints_manager - Stopped ``` observe the 5 minute delay at the end. There is a known issue about `hints_manager` stop hanging: #8079. Now, for some reason, this is the only test case that is hitting this issue. We don't completely understand why. There is one significant difference between this test case and others: this is the only test case which kills 2 (out of 3) servers in the cluster and then tries to gracefully shutdown the last server. There's a hypothesis that the last server gets stuck trying to send hints to the killed servers. We weren't able to prove/falsify it yet. But if it's true, then this patch will: - unblock next promotions, - give us some important information when we see that the issue stops appearing. In the patch we shutdown all servers gracefully instead of killing them, like we do in the other test cases. Closes #12548	2023-01-17 20:51:09 +02:00
Pavel Emelyanov	4f415413d2	raft: Fix non-existing state_machine::apply_entry in docs The docs mention that method, but it doesn't exist. Instead, the state_machine interface defines plain .apply() one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12541	2023-01-17 12:53:05 +01:00
Kamil Braun	5545547d07	test: test_topology: test for removing garbage group 0 members Verify that `removenode` can remove group 0 members which are not token ring members.	2023-01-17 12:28:00 +01:00
Kamil Braun	c959ec455a	test/pylib: move some utility functions to util.py They were used in test_raft_upgrade, but we want to use them in other test files too.	2023-01-17 12:28:00 +01:00
Kamil Braun	a483915c62	db: system_keyspace: add a virtual table with raft configuration Add a new virtual table `system.raft_state` that shows the currently operating Raft configuration for each present group. The schema is the same as `system.raft_snapshot_config` (the latter shows the config from the last snapshot). In the future we plan to add more columns to this table, showing more information (like the current leader and term), hence the generic name. Adding the table requires some plumbing of `sharded<raft_group_registry>&` through function parameters to make it accessible from `register_virtual_tables`, but it's mostly straightforward. Also added some APIs to `raft_group_registry` to list all groups and find a given group (returning `nullptr` if one isn't found, not throwing an exception).	2023-01-17 12:28:00 +01:00
Kamil Braun	2bfe85ce9b	db: system_keyspace: improve system.raft_snapshot_config schema Remove the `ip_addr` column which was not used. IP addresses are not part of Raft configuration now and they can change dynamically. Swap the `server_id` and `disposition` columns in the clustering key, so when querying the configuration, we first obtain all servers with the current disposition and then all servers with the previous disposition (note that a server may appear both in current and previous).	2023-01-17 12:28:00 +01:00
Kamil Braun	c3ed82e5fb	service: storage_service: better error handling in `decommission` Improve the error handling in `decommission` in case `leave_group0` fails, informing the user what they should do (i.e. call `removenode` to get rid of the group 0 member), and allowing decommission to finish; it does not make sense to let the node continue to run after it leaves the token ring. (And I'm guessing it's also not safe. Or maybe impossible.)	2023-01-17 12:28:00 +01:00
Kamil Braun	beb0eee007	service: storage_service: fix indentation in removenode	2023-01-17 12:28:00 +01:00
Kamil Braun	aba33dd352	service: storage_service: make `removenode` work for group 0 members which are not token ring members Due to failures we might end up in a situation where we have a group 0 member which is not a token ring member: a decommission/removenode which failed after leaving/removing a node from the token ring but before leaving / removing a node from group 0. There was no way to get rid of such a group 0 member. A node that left the token ring must not be allowed to run further (or it can cause data loss, data resurrection and maybe other fun stuff), so we can't run decommission a second time (even if we tried, it would just say that "we're not a member of the token ring" and abort). And `removenode` would also not work, because it proceeds only if the node requested to be removed is a member of the token ring. We modify `removenode` so it can run in this situation and remove the group 0 member. The parts of `removenode` related to token ring modification are now conditioned on whether the node was a member of the token ring. The final `remove_from_group0` step is in its own branch. Some minor refactors were necessary. Some log messages were also modified so it's easier to understand which messages correspond the "token movement" part of the procedure. The `make_nonvoter` step happens only if token ring removal happens, otherwise we can skip directly to `remove_from_group0`. We also move `remove_from_group0` outside the "try...catch", fixing #11723. The "node ops" part of the procedure is related strictly to token ring movement, so it makes sense for `remove_from_group0` to happen outside. Indentation is broken in this commit for easier reviewability, fixed in the following commit. Fixes: #11723	2023-01-17 12:28:00 +01:00
Kamil Braun	ec2cd29e42	service/raft: raft_group0: perform read_barrier in wait_for_raft Right now wait_for_raft is called before performing group 0 configuration changes. We want to also call it before checking for membership, for that it's desirable to have the most recent information, hence call read_barrier. In the existing use cases it's not strictly necessary, but it doesn't hurt.	2023-01-17 12:28:00 +01:00
Kamil Braun	db734cd74f	service: storage_service: make leaving node a non-voter before removing it from group 0 in decommission/removenode removenode currently works roughly like this: 1. stream/repair data so it ends up on new replica sets (calculated without the node we want to remove) 2. remove the node from the token ring 3. remove the node from group 0 configuration. If the procedure fails before after step 2 but before step 3 finishes, we're in trouble: the cluster is left with an additional voting group 0 member, which reduces group 0's availability, and there is no way to remove this member because `removenode` no longer considers it to be part of the cluster (it consults the token ring to decide). Improve this failure scenario by including a new step at the beginning: make the node a non-voter in group 0 configuration. Then, even if we fail after removing the node from the token ring but before removing it from group 0, we'll only be left with a non-voter which doesn't reduce availability. We make a similar change for `decommission`: between `unbootstrap()` (which streams data) and `leave_ring()` (which removes our tokens from the ring), become a non-voter. The difference here is that we don't become a non-voter at the beginning, but only after streaming/repair. In `removenode` it's desirable to make the node a non-voter as soon as possible because it's already dead. In decommission it may be desirable for us to remain a voter if we fail during streaming because we're still alive and functional in that case. In a later commit we'll also make it possible to retry `removenode` to remove a node that is only a group 0 member and not a token ring member.	2023-01-17 12:28:00 +01:00
Kamil Braun	1eee349a17	test: test_raft_upgrade: remove test_raft_upgrade_with_node_remove The test would create a scenario where one node was down while the others started the Raft upgrade procedure. The procedure would get stuck, but it was possible to `removenode` the downed node using one of the alive nodes, which would unblock the Raft upgrade procedure. This worked because: 1. the upgrade procedure starts by ensuring that all peers can be contacted, 2. `removenode` starts by removing the node from the token ring. After removing the node from the token ring, the upgrade procedure becomes able to contact all peers (the peers set no longer contains the down node). At the end, after removing the node from the token ring, `removenode` would actually get stuck for a while, waiting for the upgrade procedure to finish before removing the peer from group 0. After the upgrade procedure finished, `removenode` would also finish. (so: first the upgrade procedure waited for removenode, then removenode waited for the upgrade procedure). We want to modify the `removenode` procedure and include a new step before removing the node from the token ring: making the node a non-voter. The purpose is to improve the possible failure scenarios. Previously, if the `removenode` procedure failed after removing the node from the token ring but before removing it from group 0, the cluster would contain a 'garbage' group 0 member which is a voter - reducing group 0's availability. If the node is made a non-voter first, then this failure will not be as big of a problem, because the leftover group 0 member will be a non-voter. However, to correctly perform group 0 operations including making someone a nonvoter, we must first wait for the Raft upgrade procedure to finish (or at least wait until everyone joins group 0). Therefore by including this 'make the node a non-voter' step at the beginning of `removenode`, we make it impossible to remove a token ring member in the middle of the upgrade procedure, on which the test case relied. The test case would get stuck waiting for the `removenode` operation to finish, which would never finish because it would wait for the upgrade procedure to finish, which would not finish because of the dead peer. We remove the test case; it was "lucky" to pass in the first place. We have a dedicated mechanism for handling dead peers during Raft upgrade procedure: the manual Raft group 0 RECOVERY procedure. There are other test cases in this file which are using that procedure.	2023-01-17 12:28:00 +01:00
Kamil Braun	4f0801406e	service/raft: raft_group0: link to Raft docs where appropriate Resolve some TODOs.	2023-01-17 12:28:00 +01:00
Kamil Braun	2befbaa341	service/raft: raft_group0: more logging Make the logs in leave_group0 consistent with logs in remove_from_group0.	2023-01-17 12:28:00 +01:00
Kamil Braun	77dc1c4c70	service/raft: raft_group0: separate function for checking and waiting for Raft leave_group0 and remove_from_group0 functions both start with the following steps: - if Raft is disabled or in RECOVERY mode, print a simple log message and abort - if Raft cluster feature flag is not yet enabled, print a complex log message and abort - wait for Raft upgrade procedure to finish - then perform the actual group 0 reconfiguration. Refactor these preparation steps to a separate function, `wait_for_raft`. This reduces code duplication; the function will also be used in more operations later (becoming a nonvoter or turning another server into a nonvoter). We also change the API so that the preparation function is called from outside by the caller before they call the reconfiguration function. This is because in later commits, some of the call sites (mainly `removenode`) will want to check explicitly whether Raft is enabled and wait for Raft's availabilty, then perform a sequence of steps related to group 0 configuration depending on the result. Also add a private function `raft_upgrade_complete()` which we use to assert that Raft is ready to be used.	2023-01-17 12:27:58 +01:00
Wojciech Mitros	5f45b32bfa	forward_service: prevent heap use-after-free of forward_aggregates Currently, we create `forward_aggregates` inside a function that returns the result of a future lambda that captures these aggregates by reference. As a result, the aggregates may be destructed before the lambda finishes, resulting in a heap use-after-free. To prolong the lifetime of these aggregates, we cannot use a move capture, because the lambda is wrapped in a with_thread_if_needed() call on these aggregates. Instead, we fix this by wrapping the entire return statement in a do_with(). Fixes #12528 Closes #12533	2023-01-17 13:25:57 +02:00
Botond Dénes	8ea128cc27	test: reader_concurrency_semaphore_test: add tests for semaphore memory limits	2023-01-17 05:27:04 -05:00
Botond Dénes	ec1c615029	reader_permit: expose operator<<(reader_permit::state)	2023-01-17 05:27:04 -05:00
Botond Dénes	78583b84f1	reader_permit: add id() accessor Effectively returns the address of the underlying permit impl as an `uintptr_t`. This can be used to determine the identity of the permit.	2023-01-17 05:27:04 -05:00
Botond Dénes	7f8469db27	reader_concurrency_semaphore: add foreach_permit() Allows iterating over all permits.	2023-01-17 05:27:04 -05:00
Botond Dénes	4c70b58993	reader_concurrency_semaphore: document the new memory limits	2023-01-17 05:27:04 -05:00
Botond Dénes	edb32cb171	reader_concurrency_semaphore: add OOM killer When the collective memory consumption of all readers goes above $kill_limit_multiplier * $memory_limit, consume() will throw std::bad_alloc(), instantly unwinding the read that is unlucky enough to have requested the last bytes of memory. This should help situation where there are some problematic partitions, either because of large cells or because they are scattered in too many sstables. Currently nothing prevents such reads from bringing down the entire node via OOM.	2023-01-17 05:27:04 -05:00
Botond Dénes	81e2a2be7d	reader_concurrency_semaphore: make consume() and signal() private Using this API is quite dangerous as any mistakes can lead to leaking resources from the semaphore. Also, soon we will tie this API closer to permits, so they won't be as generic. Make them private so we don't have to worry about correct usage. All external users are patched away already.	2023-01-17 05:27:04 -05:00
Botond Dénes	ab18e7b178	test: stop using reader_concurrency_semaphore::{consume,signal}() directly These methods will soon be retired (made private) so migrate away from them. Consume memory through a permit instead. It is also safer this way: all memory consumed through the permit is guaranteed to be released when the permit is destroyed at the latest.	2023-01-17 05:27:04 -05:00
Botond Dénes	8f9e8aafdf	reader_concurrency_semaphore: move consume() out-of-line Its about to get a little bit more complex.	2023-01-17 05:27:04 -05:00
Botond Dénes	e4ef28284b	reader_permit: consume(): make it exception-safe reader_concurrency_semaphroe::consume() will soon throw.	2023-01-17 05:27:04 -05:00
Botond Dénes	029269af42	reader_permit: resource_units::reset(): only call consume() if needed reset() is called from the destructor, with null resources. Calling consume() can be avoided in this case and in fact it is required as consume() is soon going to throw in some cases.	2023-01-17 05:27:04 -05:00
Botond Dénes	dd9a0a16e6	reader_concurrency_semaphore: tracked_file_impl: use request_memory() Use the recently added `request_memory()` to aquire the memory units for the I/O. This allows blocking all but one readers when memory consumption grows too high.	2023-01-17 05:27:04 -05:00

1 2 3 4 5 ...

34681 Commits