Commit Graph

836 Commits

Author SHA1 Message Date
Piotr Dulikowski
e5922e650e storage_proxy: resultify (do_)query
Adjusts do_query so that it propagates and returns failed results. The
query_result method is added which is result-aware, and the old query
method was changed to call query_result.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
e39c5b6eba storage_proxy: resultify query_singular
Now, query_singular propagates and returns failed results without
rethrowing them.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
2f5f746ae2 storage_proxy: propagate failed results through query_partition_key_range
Now, query_partition_key_range propagates the failed result from
query_partition_key_range_concurrent.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
608032b2b5 storage_proxy: resultify query_partition_key_range_concurrent
Now, query_partition_key_range_concurrent propagates and returns
exceptions as values, if possible.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
10923d9d58 storage_proxy: modify handle_read_error to also handle exception containers
Now, storage_proxy::handle_read_error can work with both exception
containers and exception_ptrs.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
89fe804a1a abstract_read_executor: return result from execute() 2022-02-22 16:08:52 +01:00
Piotr Dulikowski
15fa5e30f5 abstract_read_executor: return and handle result from has_cl()
The has_cl() method is changed to return a future with a result. The
result returned from has_cl() is handled without throwing.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
68b5b84fbe storage_proxy: resultify handling errors from read-repair
Now, failed results returned from read-repair are handled without
throwing.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
dd860c70ce abstract_read_executor::reconcile: resultise handling of data_resolver->done()
Now, the logic of handling exceptions returned in reconcile() from
data_resolver->done() was changed so that the failed result does not
need to be converted to an exceptional future.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
5accfd8dae abstract_read_executor::execute: resultify handling of data_resolver->done()
Now, the logic of handling exceptions returned in execute() from
data_resolver->done() was changed so that the failed result does not
need to be converted to an exceptional future.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
ee2c4725c3 abstract_read_executor: resultify _result_promise
Adjusts the type of _result_promise so that it holds a result.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
e7f960d041 abstract_read_executor: return result from done() 2022-02-22 16:08:52 +01:00
Piotr Dulikowski
28d562ddf6 abstract_read_resolver: fail promises by passing exception as value
Now, on read timeouts and failures, _cl_promise and _done_promise is set
to a failed result instead of an exceptional promise.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
5438973c9d abstract_read_resolver: resultify promises
Changes the types of _done_promise and _cl_promise so that they hold a
result.
2022-02-22 16:08:52 +01:00
Piotr Dulikowski
adfd9d2f7a abstract_read_resolver::fail_request: make non-virtual
This method is not overrided by any of the derived classes, so it does
not need to be virtual.

(cherry picked from commit b7fb93dc46531bca8db535301a069df52991f9d9)
2022-02-17 12:34:37 +02:00
Avi Kivity
7cc43f8aa8 Merge 'utils: add result_try and result_futurize_try' from Piotr Dulikowski
Adds `utils::result_try` and `utils::result_futurize_try` - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures using the same exception handling logic.

For example, you can convert the following try..catch block:

    try {
        return a_function_that_may_throw();
    } catch (const my_exception& ex) {
        return 123;
    } catch (...) {
        throw;
    }

...to this:

    return utils::result_try([&] {
        return a_function_that_may_throw_or_return_a_failed_result();
    },  utils::result_catch<my_exception>([&] (const Ex&) {
        return 123;
    }), utils::result_catch_dots([&] (auto&& handle) {
        return handle.into_result();
    });

Similarly, `utils::result_futurize_try` can be used to migrate `then_wrapped` or `f.handle_exception()` constructs.

As an example of the usability of the new constructs, two places in the current code which need to simultaneously handle exceptions and failed results are converted to use `result_try` and `result_futurize_try`.

Results of `perf_simple_query --smp 1 --operations-per-shard 1000000 --write`:

```
127041.61 tps ( 67.2 allocs/op,  14.2 tasks/op,   52422 insns/op)
126958.60 tps ( 67.2 allocs/op,  14.2 tasks/op,   52409 insns/op)
127088.37 tps ( 67.2 allocs/op,  14.2 tasks/op,   52411 insns/op)
127560.84 tps ( 67.2 allocs/op,  14.2 tasks/op,   52424 insns/op)
127826.61 tps ( 67.2 allocs/op,  14.2 tasks/op,   52406 insns/op)

126801.02 tps ( 67.2 allocs/op,  14.2 tasks/op,   52420 insns/op)
125371.51 tps ( 67.2 allocs/op,  14.2 tasks/op,   52425 insns/op)
126498.51 tps ( 67.2 allocs/op,  14.2 tasks/op,   52427 insns/op)
126359.41 tps ( 67.2 allocs/op,  14.2 tasks/op,   52423 insns/op)
126298.27 tps ( 67.2 allocs/op,  14.2 tasks/op,   52410 insns/op)
```

The number of tasks and allocations is unchanged. The number of instructions per operations seems similar, it may have increased slightly (by 10-20) but it's hard to tell for sure because of the noisiness of the results.

Tests: unit(dev)

Closes #10045

* github.com:scylladb/scylla:
  transport: use result_try in process_request_one
  storage_proxy: use result_futurize_try in mutate_end
  storage_proxy: temporarily throw exception from result in mutate_end
  utils: add result_try and result_futurize_try
2022-02-13 19:38:13 +02:00
Piotr Dulikowski
6abeec6299 utils/result: split into combinators and loop file
Segregates result utilities into:

- result.hh - basic definitions related to results with exception
  containers,
- result_combinators.hh - combinators for working with results in
  conjunction with futures,
- result_loop.hh - loop-like combinators, currently has only
  result_parallel_for_each.

The motivation for the split is:

1. In headers, usually only result.hh will be needed, so no need to
   force most .cc files to compile definitions from other files,
2. Less files need to be recompiled when a combinator is added to
   result_combinators or result_loop.

As a bonus, `result_with_exception` was moved from `utils::internal` to
just `utils`.
2022-02-10 18:19:05 +01:00
Piotr Dulikowski
98bde8d6d2 storage_proxy: use result_futurize_try in mutate_end
Adapts the mutate_end exception handling logic so that it uses the new
utils::result_futurize_try function to handle both exceptional futures
and failed results in an unified way.
2022-02-10 17:35:32 +01:00
Piotr Dulikowski
d5d24a5140 storage_proxy: temporarily throw exception from result in mutate_end
Temporarily removes the logic which handles failed results in a
non-throwing way. Exceptions from failed results are thrown and handled
in try..catch.

The reason for this change is that it makes the following commit, which
migrates the whole try..catch block to utils::result_futurize_try much
nicer. The next commit will also bring back the non-throwing handling of
the failed result.
2022-02-10 17:35:32 +01:00
Piotr Dulikowski
4c1eae7600 storage_proxy: change mutate_with_triggers to return future<result<>>
Changes the interface of `mutate_with_triggers` so that it returns
`future<result<>>` instead of `future<>`. No intermediate
`mutate_with_triggers_result` method is introduced because all call
sites will be changed in this PR so that they properly handle failed
`result<>`s with exceptions-as-values.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
7ed668a177 storage_proxy: add mutate_atomically_result
Similarly to `mutate_result` introduced in the previous commit,
`mutate_atomically_result` is introduced which returns some exceptions
inside `result<>`. The pre-existing `mutate_atomically` keeps the same
interface but uses `mutate_atomically_result` internally, converting
failed `result<>` to exceptional future if needed.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
f9ff5e7692 storage_proxy: return result<> from mutate_result
In order to be able to propagate exceptions-as-values from storage_proxy
but without having to modify all call sites of `mutate`, an in-between
method `mutate_result` is introduced which returns some exceptions
inside `result<>`. Now, `mutate` just calls the latter and converts
those exceptions to exceptional future if needed.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
f02b8614af storage_proxy: return result<> from mutate_internal
Changes the interface of `mutate_internal` so that it returns a
`future<result<>>` instead of `future<>`.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
f8bbf67e64 storage_proxy: properly propagate future from mutate_begin to mutate_end
Modifies all call sites of `mutate_begin` and `mutate_end` so that the
failed result<> created in the former is properly propagated to the
latter.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
e2893368a7 storage_proxy: handle exceptions as values in mutate_end
Instead of stupidly rethrowing the exception in failed result<>, the
`storage_proxy::mutate_end` function now inspects it with a visitor,
which does not involve any rethrows. Moreover, mutate_end now also
returns a `future<result<>>` instead of just `future<>`.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
5c00b27662 storage_proxy: let mutate_end take a future<result<>>
Changes the `storage_proxy::mutate_end` method to accept a
`future<result<>>` instead of `future<>`.

For the time being, all call call sites of that method pass a future
which is either exceptional or contains a result<> with a value.
Moreover, in case of a failed result<>, mutate_end just rethrows the
exception. Both of these will change in the upcoming commits of this PR.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
59efe085af storage_proxy: resultify mutate_begin
Changes the `storage_proxy::mutate_begin` method to return a
future<result<>>.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
3a92513ef6 storage_proxy: use result in the _ready future of write handlers
Changes the type of the _ready promise in
abstract_write_response_handler - a promise used by the coordinator
logic to wait until the write operation is complete - to keep a
`result<>` instead of `void`. Now, a timeout is signalled by setting the
promise to a value containing a `result<>` with a mutation write timeout
exception - previously it was signalled by setting the promise to an
exceptional value.

This is just a first step on a long road of throwless propagation of the
error to the cql_server - for now, a failed result is immediately
converted to an exceptional future in `storage_proxy::response_wait`.
2022-02-08 11:08:42 +01:00
Piotr Dulikowski
6ac98f26e0 storage_proxy: introduce helpers for dealing with results
Adds a number of typedefs in order to make working with coordinator
exceptions-as-values easier.
2022-02-08 11:08:42 +01:00
Michał Sala
0fe59082ec storage_proxy: extract query_ranges_to_vnodes_generator to a separate file
Such separation allows using query_ranges_to_vnodes_generator by other
services without needing a storage_proxy dependency.
2022-02-01 21:14:41 +01:00
Benny Halevy
4272dd0b28 storage_proxy: mutate_counter_on_leader_and_replicate: use container to get to shard proxy
Rather than using the global helper, get_local_storage_proxy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220131151516.3461049-2-bhalevy@scylladb.com>
2022-01-31 18:14:31 +02:00
Benny Halevy
8acdc6ebdc storage_proxy: paxos: don't use global storage_proxy
Rather than calling get_local_storage_proxy(),
use paxos_response_handler::_proxy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220131151516.3461049-1-bhalevy@scylladb.com>
2022-01-31 18:14:31 +02:00
Nadav Har'El
1ce73c2ab3 Merge 'utils::is_timeout_exception: Ensure we handle nested exception types' from Calle Wilund
Fixes #9922

storage proxy uses is_timeout_exception to traverse different code paths.
a6202ae079 broke this (because bit rot and
intermixing), by wrapping exception for information purposes.

This adds check of nested types in exception handling, as well as a test
for the routine itself.

Closes #9932

* github.com:scylladb/scylla:
  database/storage_proxy: Use "is_timeout_exception" instead of catch match
  utils::is_timeout_exception: Ensure we handle nested exception types
2022-01-18 23:49:41 +02:00
Calle Wilund
868b572ec8 database/storage_proxy: Use "is_timeout_exception" instead of catch match
Might miss cases otherwise.

v2: Fix broken control flow
v3: Avoid throw - use make_exception_future instead.
2022-01-18 15:40:41 +00:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Gleb Natapov
1db151bd75 storage_proxy: move all verbs to the IDL
Define all verbs in the IDL instead of manually codding them.
2022-01-10 14:58:28 +02:00
Gleb Natapov
ff6a0fffaf storage_proxy: convert more address vectors to inet_address_vector_replica_set 2022-01-10 13:48:20 +02:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00
Asias He
a8ad385ecd repair: Get rid of the gc_grace_seconds
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.

To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.

In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:

1) GC a tombstone after gc_grace_seconds

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;

This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.

2) Never GC a tombstone

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};

3) GC a tombstone immediately

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};

4) GC a tombstone after repair

cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};

In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.

A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.

Tests: compaction_test.py, ninja test

Fixes #3560

[avi: resolve conflicts vs data_dictionary]
2022-01-04 19:48:14 +02:00
Avi Kivity
4a323772c1 Merge 'Use the same page size limit in reverse queries as in forward reads' from Piotr Jastrzębski
The default for get_unlimited_query_max_result_size() is 100MB (adjustable through config), whereas query::result_memory_limiter::maximum_result_size is 1MB (hard coded, should be enough for everybody)

This limit is then used by the replica to decide when to break pages and, in case of reversed clustering order reads, when to fail the read when accumulated data crosses the threshold. The latter behavior stems from the fact that reversed reads had to accumulate all the data (read in forward order) before they can reverse it and return the result. Reverse reads thus need a higher limit so that they have a higher chance of succeeding.

Most readers are now supporting reading in reverse natively, and only reversing wrappers (make_reversing_reader()) inserted on top of ka/la sstable readers need to accumulate all the data. In other cases, we could break pages sooner. This should lead to better stability (less memory usage) and performance (lower page build latency, higher read concurrency due to less memory footprint).

Tests: unit(dev)

Closes #9815

* github.com:scylladb/scylla:
  storage_proxy: Send page_size in the read_command
  gms: add SEPARATE_PAGE_SIZE_AND_SAFETY_LIMIT feature
  result_memory_accounter: use new max_result_size::get_page_size in check_local_limit
  max_result_size: Add page_size field
2021-12-29 15:04:01 +02:00
Avi Kivity
966bb3c8f0 service: storage_proxy: fix lowres_clock::duration assumption
calculate_delay() implicitly converts a lowres_clock::duration to
std::chrono::microseconds. This fails if lowres_clock::duration has
higher resolution than microseconds.

Fix by using an explicit conversion, which always works.
2021-12-28 21:17:14 +02:00
Piotr Jastrzebski
7fa3fa6e65 storage_proxy: Send page_size in the read_command
When the whole cluster is already supporting
separate_page_size_and_safety_limit,
start sending page_size in read_command. This new value will be used
for determining the page size instead of hard_limit.

Fixes #9487
Fixes #7586

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2021-12-28 16:38:02 +01:00
Avi Kivity
7bdc999bba service: paxos_state: wean off get_local_storage_proxy()
Instead of calling get_local_storage_proxy in paxos_state, get it from the
caller (who is, in fact, storage_proxy or one of its components).

Some of the callers, although they are storage_proxy components, don't
have a storage_proxy reference handy and so they ignomiously call
get_local_storage_proxy() themselves. This will be adjusted later.

The other callers who are, in fact, storage_proxy, have to take special
care not to cross a shard boundary. When they do, smp::submit_to()
is converted to sharded::invoke_on() in order to get the correct local instance.

Test: unit (dev)

Closes #9824
2021-12-20 00:31:13 +02:00
Avi Kivity
c2da20484d storage_proxy: provide access to data_dictionary
Probably storage_proxy is not the correct place to supply
data_dictionary, but it is available to practically all of
the coordinator code, so it is convenient.
2021-12-15 13:54:08 +02:00
Michał Sala
27ff3e7de7 storage_proxy: check partition ranges contiguity
storage_proxy::query_partition_key_range_concurrent() iterates through
vnodes produced by its argument query_ranges_to_vnodes_generator&&
ranges_to_vnodes and tries to merge them. This commit introduces
checking if subsequent vnodes are contiguous with each other, before
merging them.

Fixes #9167

Closes #9175
2021-11-23 15:48:55 +02:00
Benny Halevy
744275df73 batchlog_manager: get_batch_log_mutation_for: move to storage_proxy
And rename to get_batchlog_mutation_for while at it,
as it's about the batchlog, not batch_log.

This resolves a circular dependency between the
batchlog_manager and the storage_proxy that required
it in the case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 08:27:30 +02:00
Benny Halevy
55967a8597 batchlog_manager: endpoint_filter: move to gossiper
There's nothing in this function that actually requries
the batchlog manager instance.

It uses a random number engine that's moved along with it
to class gossiper.

This resolves a circular dependency between the
batchlog_manager and storage_proxy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-23 08:27:30 +02:00
Avi Kivity
f3d5b2b2b0 Merge "Add effective_replication_map factory" from Benny
"
Add a sharded locator::effective_replication_map_factory that holds
shared effective_replication_maps.

To search for e_r_m in the factory, we use a compound `factory_key`:
<replication_strategy type, replication_strategy options, token_metadata ring version>.

Start the sharded factory in main (plus cql_test_env and tools/schema_loader)
and pass a reference to it to storage_proxy and storage_server.

For each keyspace, use the registry to create the effective_replication_map.

When registered, effective_replication_map objects erase themselves
from the factory when destroyed. effective_replication_map then schedules
a background task to clear_gently its contents, protected by the e_r_m_f::stop()
function.

Note that for non-shard 0 instances, if the map
is not found in the registry, we construct it
by cloning the precalculated replication_map
from shard 0 to save the cpu cycles of re-calculating
it time and again on every shard.

Test: unit(dev), schema_loader_test(debug)
DTest: bootstrap_test.py:TestBootstrap.decommissioned_wiped_node_can_join_test update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_schema_changes_with_repair_test (dev)
"

* tag 'effective_replication_map_factory-v7' of https://github.com/bhalevy/scylla:
  effective_replication_map: clear_gently when destroyed
  database: shutdown keyspaces
  test: cql_test_env: stop view_update_generator before database shuts down
  effective_replication_map_factory: try cloning replication map from shard 0
  tools: schema_loader: start a sharded erm_factory
  storage_service: use erm_factory to create effective_replication_map
  keyspace: use erm_factory to create effective_replication_map
  effective_replication_map: erase from factory when destroyed
  effective_replication_map_factory: add create_effective_replication_map
  effective_replication_map: enable_lw_shared_from_this
  effective_replication_map: define factory_key
  keyspace: get a reference to the erm_factory
  main: pass erm_factory to storage_service
  main: pass erm_factory to storage_proxy
  locator: add effective_replication_map_factory
2021-11-19 18:19:38 +02:00
Benny Halevy
242043368e main: pass erm_factory to storage_proxy
To be used for creating the effective_replication_map per keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-11-19 10:46:51 +02:00