scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Piotr Dulikowski	e5922e650e	storage_proxy: resultify (do_)query Adjusts do_query so that it propagates and returns failed results. The query_result method is added which is result-aware, and the old query method was changed to call query_result.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	e39c5b6eba	storage_proxy: resultify query_singular Now, query_singular propagates and returns failed results without rethrowing them.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	2f5f746ae2	storage_proxy: propagate failed results through query_partition_key_range Now, query_partition_key_range propagates the failed result from query_partition_key_range_concurrent.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	608032b2b5	storage_proxy: resultify query_partition_key_range_concurrent Now, query_partition_key_range_concurrent propagates and returns exceptions as values, if possible.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	10923d9d58	storage_proxy: modify handle_read_error to also handle exception containers Now, storage_proxy::handle_read_error can work with both exception containers and exception_ptrs.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	89fe804a1a	abstract_read_executor: return result from execute()	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	15fa5e30f5	abstract_read_executor: return and handle result from has_cl() The has_cl() method is changed to return a future with a result. The result returned from has_cl() is handled without throwing.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	68b5b84fbe	storage_proxy: resultify handling errors from read-repair Now, failed results returned from read-repair are handled without throwing.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	dd860c70ce	abstract_read_executor::reconcile: resultise handling of data_resolver->done() Now, the logic of handling exceptions returned in reconcile() from data_resolver->done() was changed so that the failed result does not need to be converted to an exceptional future.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	5accfd8dae	abstract_read_executor::execute: resultify handling of data_resolver->done() Now, the logic of handling exceptions returned in execute() from data_resolver->done() was changed so that the failed result does not need to be converted to an exceptional future.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	ee2c4725c3	abstract_read_executor: resultify _result_promise Adjusts the type of _result_promise so that it holds a result.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	e7f960d041	abstract_read_executor: return result from done()	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	28d562ddf6	abstract_read_resolver: fail promises by passing exception as value Now, on read timeouts and failures, _cl_promise and _done_promise is set to a failed result instead of an exceptional promise.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	5438973c9d	abstract_read_resolver: resultify promises Changes the types of _done_promise and _cl_promise so that they hold a result.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	adfd9d2f7a	abstract_read_resolver::fail_request: make non-virtual This method is not overrided by any of the derived classes, so it does not need to be virtual. (cherry picked from commit b7fb93dc46531bca8db535301a069df52991f9d9)	2022-02-17 12:34:37 +02:00
Avi Kivity	7cc43f8aa8	Merge 'utils: add result_try and result_futurize_try' from Piotr Dulikowski Adds `utils::result_try` and `utils::result_futurize_try` - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures using the same exception handling logic. For example, you can convert the following try..catch block: try { return a_function_that_may_throw(); } catch (const my_exception& ex) { return 123; } catch (...) { throw; } ...to this: return utils::result_try([&] { return a_function_that_may_throw_or_return_a_failed_result(); }, utils::result_catch<my_exception>([&] (const Ex&) { return 123; }), utils::result_catch_dots([&] (auto&& handle) { return handle.into_result(); }); Similarly, `utils::result_futurize_try` can be used to migrate `then_wrapped` or `f.handle_exception()` constructs. As an example of the usability of the new constructs, two places in the current code which need to simultaneously handle exceptions and failed results are converted to use `result_try` and `result_futurize_try`. Results of `perf_simple_query --smp 1 --operations-per-shard 1000000 --write`: ``` 127041.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52422 insns/op) 126958.60 tps ( 67.2 allocs/op, 14.2 tasks/op, 52409 insns/op) 127088.37 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op) 127560.84 tps ( 67.2 allocs/op, 14.2 tasks/op, 52424 insns/op) 127826.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52406 insns/op) 126801.02 tps ( 67.2 allocs/op, 14.2 tasks/op, 52420 insns/op) 125371.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52425 insns/op) 126498.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52427 insns/op) 126359.41 tps ( 67.2 allocs/op, 14.2 tasks/op, 52423 insns/op) 126298.27 tps ( 67.2 allocs/op, 14.2 tasks/op, 52410 insns/op) ``` The number of tasks and allocations is unchanged. The number of instructions per operations seems similar, it may have increased slightly (by 10-20) but it's hard to tell for sure because of the noisiness of the results. Tests: unit(dev) Closes #10045 * github.com:scylladb/scylla: transport: use result_try in process_request_one storage_proxy: use result_futurize_try in mutate_end storage_proxy: temporarily throw exception from result in mutate_end utils: add result_try and result_futurize_try	2022-02-13 19:38:13 +02:00
Piotr Dulikowski	6abeec6299	utils/result: split into `combinators` and `loop` file Segregates result utilities into: - result.hh - basic definitions related to results with exception containers, - result_combinators.hh - combinators for working with results in conjunction with futures, - result_loop.hh - loop-like combinators, currently has only result_parallel_for_each. The motivation for the split is: 1. In headers, usually only result.hh will be needed, so no need to force most .cc files to compile definitions from other files, 2. Less files need to be recompiled when a combinator is added to result_combinators or result_loop. As a bonus, `result_with_exception` was moved from `utils::internal` to just `utils`.	2022-02-10 18:19:05 +01:00
Piotr Dulikowski	98bde8d6d2	storage_proxy: use result_futurize_try in mutate_end Adapts the mutate_end exception handling logic so that it uses the new utils::result_futurize_try function to handle both exceptional futures and failed results in an unified way.	2022-02-10 17:35:32 +01:00
Piotr Dulikowski	d5d24a5140	storage_proxy: temporarily throw exception from result in mutate_end Temporarily removes the logic which handles failed results in a non-throwing way. Exceptions from failed results are thrown and handled in try..catch. The reason for this change is that it makes the following commit, which migrates the whole try..catch block to utils::result_futurize_try much nicer. The next commit will also bring back the non-throwing handling of the failed result.	2022-02-10 17:35:32 +01:00
Piotr Dulikowski	4c1eae7600	storage_proxy: change mutate_with_triggers to return future<result<>> Changes the interface of `mutate_with_triggers` so that it returns `future<result<>>` instead of `future<>`. No intermediate `mutate_with_triggers_result` method is introduced because all call sites will be changed in this PR so that they properly handle failed `result<>`s with exceptions-as-values.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	7ed668a177	storage_proxy: add mutate_atomically_result Similarly to `mutate_result` introduced in the previous commit, `mutate_atomically_result` is introduced which returns some exceptions inside `result<>`. The pre-existing `mutate_atomically` keeps the same interface but uses `mutate_atomically_result` internally, converting failed `result<>` to exceptional future if needed.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	f9ff5e7692	storage_proxy: return result<> from mutate_result In order to be able to propagate exceptions-as-values from storage_proxy but without having to modify all call sites of `mutate`, an in-between method `mutate_result` is introduced which returns some exceptions inside `result<>`. Now, `mutate` just calls the latter and converts those exceptions to exceptional future if needed.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	f02b8614af	storage_proxy: return result<> from mutate_internal Changes the interface of `mutate_internal` so that it returns a `future<result<>>` instead of `future<>`.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	f8bbf67e64	storage_proxy: properly propagate future from mutate_begin to mutate_end Modifies all call sites of `mutate_begin` and `mutate_end` so that the failed result<> created in the former is properly propagated to the latter.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	e2893368a7	storage_proxy: handle exceptions as values in mutate_end Instead of stupidly rethrowing the exception in failed result<>, the `storage_proxy::mutate_end` function now inspects it with a visitor, which does not involve any rethrows. Moreover, mutate_end now also returns a `future<result<>>` instead of just `future<>`.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	5c00b27662	storage_proxy: let mutate_end take a future<result<>> Changes the `storage_proxy::mutate_end` method to accept a `future<result<>>` instead of `future<>`. For the time being, all call call sites of that method pass a future which is either exceptional or contains a result<> with a value. Moreover, in case of a failed result<>, mutate_end just rethrows the exception. Both of these will change in the upcoming commits of this PR.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	59efe085af	storage_proxy: resultify mutate_begin Changes the `storage_proxy::mutate_begin` method to return a future<result<>>.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	3a92513ef6	storage_proxy: use result in the _ready future of write handlers Changes the type of the _ready promise in abstract_write_response_handler - a promise used by the coordinator logic to wait until the write operation is complete - to keep a `result<>` instead of `void`. Now, a timeout is signalled by setting the promise to a value containing a `result<>` with a mutation write timeout exception - previously it was signalled by setting the promise to an exceptional value. This is just a first step on a long road of throwless propagation of the error to the cql_server - for now, a failed result is immediately converted to an exceptional future in `storage_proxy::response_wait`.	2022-02-08 11:08:42 +01:00
Piotr Dulikowski	6ac98f26e0	storage_proxy: introduce helpers for dealing with results Adds a number of typedefs in order to make working with coordinator exceptions-as-values easier.	2022-02-08 11:08:42 +01:00
Michał Sala	0fe59082ec	storage_proxy: extract query_ranges_to_vnodes_generator to a separate file Such separation allows using query_ranges_to_vnodes_generator by other services without needing a storage_proxy dependency.	2022-02-01 21:14:41 +01:00
Benny Halevy	4272dd0b28	storage_proxy: mutate_counter_on_leader_and_replicate: use container to get to shard proxy Rather than using the global helper, get_local_storage_proxy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220131151516.3461049-2-bhalevy@scylladb.com>	2022-01-31 18:14:31 +02:00
Benny Halevy	8acdc6ebdc	storage_proxy: paxos: don't use global storage_proxy Rather than calling get_local_storage_proxy(), use paxos_response_handler::_proxy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220131151516.3461049-1-bhalevy@scylladb.com>	2022-01-31 18:14:31 +02:00
Nadav Har'El	1ce73c2ab3	Merge 'utils::is_timeout_exception: Ensure we handle nested exception types' from Calle Wilund Fixes #9922 storage proxy uses is_timeout_exception to traverse different code paths. `a6202ae079` broke this (because bit rot and intermixing), by wrapping exception for information purposes. This adds check of nested types in exception handling, as well as a test for the routine itself. Closes #9932 * github.com:scylladb/scylla: database/storage_proxy: Use "is_timeout_exception" instead of catch match utils::is_timeout_exception: Ensure we handle nested exception types	2022-01-18 23:49:41 +02:00
Calle Wilund	868b572ec8	database/storage_proxy: Use "is_timeout_exception" instead of catch match Might miss cases otherwise. v2: Fix broken control flow v3: Avoid throw - use make_exception_future instead.	2022-01-18 15:40:41 +00:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Gleb Natapov	1db151bd75	storage_proxy: move all verbs to the IDL Define all verbs in the IDL instead of manually codding them.	2022-01-10 14:58:28 +02:00
Gleb Natapov	ff6a0fffaf	storage_proxy: convert more address vectors to inet_address_vector_replica_set	2022-01-10 13:48:20 +02:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Asias He	a8ad385ecd	repair: Get rid of the gc_grace_seconds The gc_grace_seconds is a very fragile and broken design inherited from Cassandra. Deleted data can be resurrected if cluster wide repair is not performed within gc_grace_seconds. This design pushes the job of making the database consistency to the user. In practice, it is very hard to guarantee repair is performed within gc_grace_seconds all the time. For example, repair workload has the lowest priority in the system which can be slowed down by the higher priority workload, so that there is no guarantee when a repair can finish. A gc_grace_seconds value that is used to work might not work after data volume grows in a cluster. Users might want to avoid running repair during a specific period where latency is the top priority for their business. To solve this problem, an automatic mechanism to protect data resurrection is proposed and implemented. The main idea is to remove the tombstone only after the range that covers the tombstone is repaired. In this patch, a new table option tombstone_gc is added. The option is used to configure tombstone gc mode. For example: 1) GC a tombstone after gc_grace_seconds cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ; This is the default mode. If no tombstone_gc option is specified by the user. The old gc_grace_seconds based gc will be used. 2) Never GC a tombstone cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'}; 3) GC a tombstone immediately cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'}; 4) GC a tombstone after repair cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'}; In addition to the 'mode' option, another option 'propagation_delay_in_seconds' is added. It defines the max time a write could possibly delay before it eventually arrives at a node. A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc option can only be used after the whole cluster supports the new feature. A mixed cluster works with no problem. Tests: compaction_test.py, ninja test Fixes #3560 [avi: resolve conflicts vs data_dictionary]	2022-01-04 19:48:14 +02:00
Avi Kivity	4a323772c1	Merge 'Use the same page size limit in reverse queries as in forward reads' from Piotr Jastrzębski The default for get_unlimited_query_max_result_size() is 100MB (adjustable through config), whereas query::result_memory_limiter::maximum_result_size is 1MB (hard coded, should be enough for everybody) This limit is then used by the replica to decide when to break pages and, in case of reversed clustering order reads, when to fail the read when accumulated data crosses the threshold. The latter behavior stems from the fact that reversed reads had to accumulate all the data (read in forward order) before they can reverse it and return the result. Reverse reads thus need a higher limit so that they have a higher chance of succeeding. Most readers are now supporting reading in reverse natively, and only reversing wrappers (make_reversing_reader()) inserted on top of ka/la sstable readers need to accumulate all the data. In other cases, we could break pages sooner. This should lead to better stability (less memory usage) and performance (lower page build latency, higher read concurrency due to less memory footprint). Tests: unit(dev) Closes #9815 * github.com:scylladb/scylla: storage_proxy: Send page_size in the read_command gms: add SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT feature result_memory_accounter: use new max_result_size::get_page_size in check_local_limit max_result_size: Add page_size field	2021-12-29 15:04:01 +02:00
Avi Kivity	966bb3c8f0	service: storage_proxy: fix lowres_clock::duration assumption calculate_delay() implicitly converts a lowres_clock::duration to std::chrono::microseconds. This fails if lowres_clock::duration has higher resolution than microseconds. Fix by using an explicit conversion, which always works.	2021-12-28 21:17:14 +02:00
Piotr Jastrzebski	7fa3fa6e65	storage_proxy: Send page_size in the read_command When the whole cluster is already supporting separate_page_size_and_safety_limit, start sending page_size in read_command. This new value will be used for determining the page size instead of hard_limit. Fixes #9487 Fixes #7586 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2021-12-28 16:38:02 +01:00
Avi Kivity	7bdc999bba	service: paxos_state: wean off get_local_storage_proxy() Instead of calling get_local_storage_proxy in paxos_state, get it from the caller (who is, in fact, storage_proxy or one of its components). Some of the callers, although they are storage_proxy components, don't have a storage_proxy reference handy and so they ignomiously call get_local_storage_proxy() themselves. This will be adjusted later. The other callers who are, in fact, storage_proxy, have to take special care not to cross a shard boundary. When they do, smp::submit_to() is converted to sharded::invoke_on() in order to get the correct local instance. Test: unit (dev) Closes #9824	2021-12-20 00:31:13 +02:00
Avi Kivity	c2da20484d	storage_proxy: provide access to data_dictionary Probably storage_proxy is not the correct place to supply data_dictionary, but it is available to practically all of the coordinator code, so it is convenient.	2021-12-15 13:54:08 +02:00
Michał Sala	27ff3e7de7	storage_proxy: check partition ranges contiguity storage_proxy::query_partition_key_range_concurrent() iterates through vnodes produced by its argument query_ranges_to_vnodes_generator&& ranges_to_vnodes and tries to merge them. This commit introduces checking if subsequent vnodes are contiguous with each other, before merging them. Fixes #9167 Closes #9175	2021-11-23 15:48:55 +02:00
Benny Halevy	744275df73	batchlog_manager: get_batch_log_mutation_for: move to storage_proxy And rename to get_batchlog_mutation_for while at it, as it's about the batchlog, not batch_log. This resolves a circular dependency between the batchlog_manager and the storage_proxy that required it in the case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Benny Halevy	55967a8597	batchlog_manager: endpoint_filter: move to gossiper There's nothing in this function that actually requries the batchlog manager instance. It uses a random number engine that's moved along with it to class gossiper. This resolves a circular dependency between the batchlog_manager and storage_proxy. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 08:27:30 +02:00
Avi Kivity	f3d5b2b2b0	Merge "Add effective_replication_map factory" from Benny " Add a sharded locator::effective_replication_map_factory that holds shared effective_replication_maps. To search for e_r_m in the factory, we use a compound `factory_key`: <replication_strategy type, replication_strategy options, token_metadata ring version>. Start the sharded factory in main (plus cql_test_env and tools/schema_loader) and pass a reference to it to storage_proxy and storage_server. For each keyspace, use the registry to create the effective_replication_map. When registered, effective_replication_map objects erase themselves from the factory when destroyed. effective_replication_map then schedules a background task to clear_gently its contents, protected by the e_r_m_f::stop() function. Note that for non-shard 0 instances, if the map is not found in the registry, we construct it by cloning the precalculated replication_map from shard 0 to save the cpu cycles of re-calculating it time and again on every shard. Test: unit(dev), schema_loader_test(debug) DTest: bootstrap_test.py:TestBootstrap.decommissioned_wiped_node_can_join_test update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_with_repair_test (dev) " * tag 'effective_replication_map_factory-v7' of https://github.com/bhalevy/scylla: effective_replication_map: clear_gently when destroyed database: shutdown keyspaces test: cql_test_env: stop view_update_generator before database shuts down effective_replication_map_factory: try cloning replication map from shard 0 tools: schema_loader: start a sharded erm_factory storage_service: use erm_factory to create effective_replication_map keyspace: use erm_factory to create effective_replication_map effective_replication_map: erase from factory when destroyed effective_replication_map_factory: add create_effective_replication_map effective_replication_map: enable_lw_shared_from_this effective_replication_map: define factory_key keyspace: get a reference to the erm_factory main: pass erm_factory to storage_service main: pass erm_factory to storage_proxy locator: add effective_replication_map_factory	2021-11-19 18:19:38 +02:00
Benny Halevy	242043368e	main: pass erm_factory to storage_proxy To be used for creating the effective_replication_map per keyspace. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-19 10:46:51 +02:00

1 2 3 4 5 ...

836 Commits