Commit Graph

970 Commits

Author SHA1 Message Date
Pavel Emelyanov
64c9359443 storage_proxy: Don't use default-initialized endpoint in get_read_executor()
After calling filter_for_query() the extra_replica to speculate to may
be left default-initialized which is :0 ipv6 address. Later below this
address is used as-is to check if it belongs to the same DC or not which
is not nice, as :0 is not an address of any existing endpoint.

Recent move of dc/rack data onto topology made this place reveal itself
by emitting the internal error due to :0 not being present on the
topology's collection of endpoints. Prior to this move the dc filter
would count :0 as belonging to "default_dc" datacenter which may or may
not match with the dc of the local node.

The fix is to explicitly tell set extra_replica from unset one.

fixes: #11825

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11833
2022-10-25 09:16:50 +03:00
Tomasz Grabiec
87b7e7ff9c Merge 'storage_proxy: prepare for fencing, complex ops' from Avi Kivity
Following up on 69aea59d97, which added fencing support
for simple reads and writes, this series does the same for the
complex ops:
 - partition scan
 - counter mutation
 - paxos

With this done, the coordinator knows about all in-flight requests and
can delay topology changes until they are retired.

Closes #11296

* github.com:scylladb/scylladb:
  storage_proxy: hold effective_replication_map for the duration of a paxos transaction
  storage_proxy: move paxos_response_handler class to .cc file
  storage_proxy: deinline paxos_response_handler constructor/destructor
  storage_proxy: use consistent effective_replication_map for counter coordinator
  storage_proxy: improve consistency in query_partition_key_range{,_concurrent}
  storage_proxy: query_partition_key_range_concurrent: reduce smart pointer use
  storage_proxy: query_partition_key_range_concurrent: improve token_metadata consistency
  storage_proxy: query_singular: use fewer smart pointers
  storage_proxy: query_singular: simplify lambda captures
  locator: effective_replication_map: provide non-smart-pointer accessor to token_metadata
  storage_proxy: use consistent token_metadata with rest of singular read
2022-10-14 15:44:35 +02:00
Avi Kivity
1feaa2dfb4 storage_proxy: handle_write: use coroutine::all() instead of when_all()
coroutine::all() saves an allocation. Since it's safe for lambda
coroutines, remove a coroutine::lambda wrapper.

Closes #11749
2022-10-14 06:56:16 +03:00
Avi Kivity
a2da08f9f9 storage_proxy: hold effective_replication_map for the duration of a paxos transaction
Luckily, all topology calculations are done in get_paxos_participants(),
so all we have to do is it hold the effective_replication_map for the
duration of the transaction, and pass it to get_paxos_participants().
This ensures that the coordinator knows about all in-flight requests
and can fence them from topology changes.
2022-10-13 14:27:26 +03:00
Avi Kivity
69aaa5e131 storage_proxy: move paxos_response_handler class to .cc file
It's not used elsewhere.
2022-10-13 14:27:26 +03:00
Avi Kivity
b2f3934e95 storage_proxy: deinline paxos_response_handler constructor/destructor
They have no business being inline as it's a heavyweight object.
2022-10-13 14:27:26 +03:00
Avi Kivity
94e4ff11be storage_proxy: use consistent effective_replication_map for counter coordinator
Hold the effective_replication_map while talking to the counter leader,
to allow for fencing in the future. The code is somewhat awkward because
the API allows for multiple keyspaces to be in use.

The error code generation, already broken as it doesn't use the correct
table, continues to be broken in that it doesn't use the correct
effective_replication_map, for the same reason.
2022-10-13 14:27:23 +03:00
Avi Kivity
406a046974 storage_proxy: improve consistency in query_partition_key_range{,_concurrent}
query_partition_key_range captures a token_metadata_ptr and uses
it consistently in sequential calls to query_partition_key_range_concurrent
(via tail recursion), but each invocation of
query_partition_key_range_concurrent captures its own
effective_replication_map_ptr. Since these are captured at different times,
they can be inconsistent after the first iteration.

Fix by capturing it once in the caller and propagating it everywhere.
2022-10-13 13:56:52 +03:00
Avi Kivity
5d320e95d5 storage_proxy: query_partition_key_range_concurrent: reduce smart pointer use
Capture token_metadata by reference rather than smart pointer, since
out effective_replication_map_ptr protects it.
2022-10-13 13:56:52 +03:00
Avi Kivity
f75efa965f storage_proxy: query_partition_key_range_concurrent: improve token_metadata consistency
Derive the token_metadata from the effective_replication_map rather than
getting it independently. Not a real bug since these were in the same
continuation, but safer this way.
2022-10-13 13:56:52 +03:00
Avi Kivity
161ce4b34f storage_proxy: query_singular: use fewer smart pointers
Capture token_metadata by reference since we're protecting it with
the mighty effective_replication_map_ptr. This saves a few instructions
to manage smart pointers.
2022-10-13 13:56:33 +03:00
Avi Kivity
efd89c1890 storage_proxy: query_singular: simplify lambda captures
The lambdas in query_singular do not outlive the enclosing coroutine,
so they can capture everything by reference. This simplifies life
for a future update of the lambda, since there's one thing less to
worry about.
2022-10-13 13:52:54 +03:00
Avi Kivity
86a48cf12f storage_proxy: use consistent token_metadata with rest of singular read
query_singular() uses get_token_metadata_ptr() and later, in
get_read_executor(), captures the effective_replication_map(). This
isn't a bug, since the two are captured in the same continuation and
are therefore consistent, but a way to ensure it stays so is to capture
the effective_replication_map earlier and derive the token_metadata from
it.
2022-10-13 13:46:04 +03:00
Botond Dénes
992afc5b8c Merge 'storage_proxy: coroutinize some functions with do_with' from Avi Kivity
do_with() is a sure indicator for coroutinization, since it adds
an allocation (like the coroutine does with its frame). Therefore
translating a function with do_with is at least a break-even, and
usually a win since other continuations no longer allocate.

This series converts most of storage_proxy's function that have
do_with to coroutines. Two remain, since they are not simple
to convert (the do_with() is kept running in the background and
its future is discarded).

Individual patches favor minimal changes over final readability,
and there is a final patch that restores indentation.

The patches leave some moves from coroutine reference parameters
to the coroutine frame, this will be cleaned up in a follow-up. I wanted
this series not to touch headers to reduce rebuild times.

Closes #11683

* github.com:scylladb/scylladb:
  storage_proxy: reindent after coroutinization
  storage_proxy: convert handle_read_digest() to a coroutine
  storage_proxy: convert handle_read_mutation_data() to a coroutine
  storage_proxy: convert handle_read_data() to a coroutine
  storage_proxy: convert handle_write() to a coroutine
  storage_proxy: convert handle_counter_mutation() to a coroutine
  storage_proxy: convert query_nonsingular_mutations_locally() to a coroutine
2022-10-07 07:37:37 +03:00
Michał Chojnowski
a0204c17c5 treewide: remove mentions of seastar::thread::should_yield()
thread_scheduling_group has been retired many years ago.
Remove the leftovers, they are confusing.

Closes #11714
2022-10-05 12:26:37 +03:00
Piotr Dulikowski
51f813d89b storage_proxy: update rate limited reads metric when coordinator rejects
The decision to reject a read operation can either be made by replicas,
or by the coordinator. In the second case, the

  scylla_storage_proxy_coordinator_read_rate_limited

metric was not incremented, but it should. This commit fixes the issue.

Fixes: #11651

Closes #11694
2022-10-04 10:33:58 +03:00
Avi Kivity
7626fd573a storage_proxy: reindent after coroutinization 2022-10-03 19:33:39 +03:00
Avi Kivity
019b18b232 storage_proxy: convert handle_read_digest() to a coroutine
The do_with() makes it at least a break-even, but there's some allocating
continuations that make it a win.

A variable named cmd had two different definitions (a value and a
lw_shared_ptr) that lived in different scopes. I renamed one to cmd1
to disambiguate. We should probably move that to the caller, but that
is not done here.
2022-10-03 19:33:39 +03:00
Avi Kivity
aa5f4bf1f3 storage_proxy: convert handle_read_mutation_data() to a coroutine
The do_with() makes it at least a break-even, but there's some allocating
continuations that make it a win.

A variable named cmd had two different definitions (a value and a
lw_shared_ptr) that lived in different scopes. I renamed one to cmd1
to disambiguate. We should probably move that to the caller, but that
is not done here.
2022-10-03 19:33:39 +03:00
Avi Kivity
bcd134e9b8 storage_proxy: convert handle_read_data() to a coroutine
The do_with() makes it at least a break-even, but there's some allocating
continuations that make it a win.

A variable named cmd had two different definitions (a value and a
lw_shared_ptr) that lived in different scopes. I renamed one to cmd1
to disambiguate. We should probably move that to the caller, but that
is not done here.
2022-10-03 19:33:39 +03:00
Avi Kivity
167c8b1b5e storage_proxy: convert handle_write() to a coroutine
A do_with() makes this at least a break-even.

Some internal lambdas were not converted since they commonly
do not allocate or block.

A finally() continuation is converted to seastar::defer().
2022-10-03 19:33:39 +03:00
Avi Kivity
741d6609a5 storage_proxy: convert handle_counter_mutation() to a coroutine
The do_with means the coroutine conversion is free, and conversion
of parallel_for_each to coroutine::parallel_for_each saves a
possible allocation (though it would not have been allocated usually.

An inner continuation is not converted since it usually doesn't
block, and therefore doesn't allocate.
2022-10-03 19:33:39 +03:00
Avi Kivity
ac5fae4b93 storage_proxy: convert query_nonsingular_mutations_locally() to a coroutine
It's simpler, and the do_with() allocation + task cancels out the
coroutine allocation + task.
2022-10-03 19:33:29 +03:00
Pavel Emelyanov
2b8636a2a9 storage_proxy.hh: Remove unused headers
Add needed forward declarations and fix indirect inclusions in some .ccs

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #11679
2022-10-02 20:48:50 +03:00
Benny Halevy
64140ccf05 cql3, storage_proxy: add support for TRUNCATE USING TIMEOUT
Extend the cql3 truncate statement to accept attributes,
similar to modification statements.

To achieve that we define cql3::statements::raw::truncate_statement
derived from raw::cf_statement, and implement its pure virtual
prepare() method to make a prepared truncate_statement.

The latter, statements::truncate_statement, is no longer derived
from raw::cf_statement, and just stores a schema_ptr to get to the
keyspace and column_family names.

`test_truncate_using_timeout` cql-pytest was added to test
the new USING TIMEOUT feature.

Fixes #11408

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-26 18:30:39 +03:00
Geoffrey Beausire
f435276d2e Merge tokens for everywhere_topology
With EverywhereStrategy, we know that all tokens will be on the same node and the data is typically sparse like LocalStrategy.

Result of testing the feature:
Cluster: 2 DC, 2 nodes in each DC, 256 tokens per nodes, 14 shards per node

Before: 154 scanning operations
After: 14 scanning operations (~10x improvement)

On bigger cluster, it will probably be even more efficient.

Closes #11403
2022-09-08 15:33:23 +03:00
Pavel Emelyanov
b6fdea9a79 code: Call sort_endpoints_by_proximity() via topology
The method is about to be moved from snitch to topology, this patch
prepares the rest of the code to use the latter to call it. The
topology's method just calls snitch, but it's going to change in the
next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:14:01 +03:00
Pavel Emelyanov
642e50f3e3 snitch: Move is_worth_merging_for_range_query to proxy
Proxy is the only place that calls this method. Also the method name
suggests it's not something "generic", but rather an internal logic of
proxy's query processing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:10:46 +03:00
Avi Kivity
8070cdbbf9 storage_proxy: mutate_counters_on_leader: coroutinize
Simplify ahead of refactoring for consistent effective_replication_map.
2022-08-14 17:36:58 +03:00
Avi Kivity
6e330d98d2 storage_proxy: mutate_counters: coroutinize
Simplify ahead of refactoring for consistent effective_replication_map.

This is probably a pessimization of the error case, but the error case
will be terrible in any case unless we resultify it.
2022-08-14 17:28:46 +03:00
Avi Kivity
105b066ff7 storage_proxy: mutate_counters: reorganize error handling
Move the error handling function where it's used so the code
is more straightforward.

Due to some std::move()s later, we must still capture the schema early.
2022-08-14 17:13:22 +03:00
Benny Halevy
d295d8e280 everywhere: define locator::host_id as a strong tagged_uuid type
So it can be distinguished from other uuid-based
identifiers in the system.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11276
2022-08-12 06:01:44 +03:00
Botond Dénes
69aea59d97 Merge 'storage_proxy: use consistent topology, prepare for fencing' from Avi Kivity
Replication is a mix of several inputs: tokens and token->node mappings (topology),
the replication strategy, replication strategy parameters. These are all captured
in effective_replication_map.

However, if we use effective_replication_map:s captured at different times in a single
query, then different uses may see different inputs to effective_replication_map.

This series protects against that by capturing an effective_replication_map just
once in a query, and then using it. Furthermore, the captured effective_replication_map
is held until the query completes, so topology code can know when a topology is no
longer is use (although this isn't exploited in this series).

Only the simple read and write paths are covered. Counters and paxos are left for
later.

I don't think the series fixes any bugs - as far as I could tell everything was happening
in the same continuation. But this series ensures it.

Closes #11259

* github.com:scylladb/scylladb:
  storage_proxy: use consistent topology
  storage_proxy: use consistent replication map on read path
  storage_proxy: use consistent replication map on write path
  storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map
  consistency_level: accept effective_replication_map as parameter, rather than keyspace
  consistency_level: be more const when using replication_strategy
2022-08-12 06:00:30 +03:00
Avi Kivity
a2c4f5aa1a storage_proxy: use consistent topology
Derive the topology from captured and stable effective_replication_map
instead of getting a fresh topology from storage_proxy, since the
fresh topology may be inconsistent with the running query.

digest_read_resolver did not capture an effective_replication_map, so
that is added.
2022-08-11 17:58:42 +03:00
Avi Kivity
883518697b storage_proxy: use consistent replication map on read path
Capture a replication map just once in
abstract_read_executor::_effective_replication_map_ptr. Although it isn't
used yet, it serves to keep a reference count on topology (for fencing),
and some accesses to topology within reads still remain, which can be
converted to use the member in a later patch.
2022-08-11 17:58:42 +03:00
Avi Kivity
01a614fb4d storage_proxy: use consistent replication map on write path
Capture a replication map just once in
abstract_write_handler::_effective_replication_map_ptr and use it
in all write handlers. A few accesses to get the topology still remain,
they will be fixed up in a later patch.
2022-08-11 17:58:42 +03:00
Avi Kivity
f1b0e3d58e storage_proxy: convert get_live{,_sorted}_endpoints() to accept an effective_replication_map
Allow callers to use consistent effective_replication_map:s across calls
by letting the caller select the object to use.
2022-08-11 17:58:42 +03:00
Avi Kivity
46bd0b1e62 consistency_level: accept effective_replication_map as parameter, rather than keyspace
A keyspace is a mutable object that can change from time to time. An
effective_replication_map captures the state of a keyspace at a point in
time and can therefore be consistent (with care from the caller).

Change consistency_level's functions to accept an effective_replication_map.
This allows the caller to ensure that separate calls use the same
information and are consistent with each other.

Current callers are likely correct since they are called from one
continuation, but it's better to be sure.
2022-08-11 17:58:42 +03:00
Amnon Heiman
5ac20ac861 Reduce the number of per-scheduling group metrics
This patch reduces the number of metrics ScyllaDB generates.

Motivation: The combination of per-shard with per-scheduling group
generates a lot of metrics. When combined with histograms, which require
many metrics, the problem becomes even bigger.

The two tools we are going to use:
1. Replace per-shard histograms with summaries
2. Do not report unused metrics.

The storage_proxy stats holds information for the API and the metrics
layer.  We replaced timed_rate_moving_average_and_histogram and
time_estimated_histogram with the unfied
timed_rate_moving_average_summary_and_histogram which give us an option
to report per-shard summaries instead of histogram.

All the counters, histograms, and summaries were marked as
skip_when_empty.

The API was modified to use
timed_rate_moving_average_summary_and_histogram.

Closes #11173
2022-08-11 13:31:19 +03:00
Botond Dénes
6a7dedfe34 service/storage_proxy: set smallest continue pos as query's continue pos
We expect each replica to stop at exactly the same position when the
digests match. Soon however, if replicas have a lot of tombstones, some
may stop earlier then the others. As long as all digests match, this is
fine but we need to make sure we continue from the smallest such
positions on the next page.
2022-08-10 06:03:38 +03:00
Botond Dénes
2656968db2 service/storage_proxy: propagate last position on digest reads
We want to transmit the last position as determined by the replica on
both result and digest reads. Result reads already do that via the
query::result, but digest reads don't yet as they don't return the full
query::result structure, just the digest field from it. Add the last
position to the digest read's return value and collect these in the
digest resolver, along with the returned digests.
2022-08-10 06:03:37 +03:00
Botond Dénes
d1d53f1b84 query: add tombstone-limit to read-command
Propagate the tombstone-limit from coordinator to replicas, to make sure
all is using the same limit.
2022-08-10 06:01:47 +03:00
Botond Dénes
1b669cefed service/storage_proxy: add get_tombstone_limit()
To be used by coordinator side code to determine the correct tombstone
limit to pass to read-command (tombstone limit field added in the next
commit). When this limit is non-zero, the replica will start cutting
pages after the tombstone limit is surpassed.
This getter works similarly to `get_max_result_size()`: if the cluster
feature for empty replica pages is set, it will return the value
configured via db::config::query_tombstone_limit. System queries always
use a limit of 0 (unlimited tombstones).
2022-08-09 10:00:40 +03:00
Benny Halevy
2b017ce285 schema, everywhere: define and use table_schema_version as a strong type
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.

Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:45 +03:00
Benny Halevy
1fda686f96 idl: make idl headers self-sufficient
Add include statements to satisfy dependencies.

Delete, now unneeded, include directives from the upper level
source files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Benny Halevy
37b7a9cce2 utils: get rid of joinpoint
Now that it is no longer used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
56f336d1aa database: get rid of timestamp_func
Pass an optional truncated_at time_point to
truncate_table_on_all_shards instead of the over-complicated
timestamp_func that returns the same time_point on all shards
anyhow, and was only used for coordination across shards.

Since now we synchronize the internal execution phase in
truncate_table_on_all_shards, there is no longer need
for this timestamp_func.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Benny Halevy
46e2a7c83b database: add truncate_table_on_all_shards
As a first step to decouple truncate from flush
and snpashot.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-07 12:53:05 +03:00
Botond Dénes
fbbe2529c1 Merge "Remove global snitch usage from consistency_level.cc" from Pavel Emelyanov
"
There are several helpers in this .cc file that need to get datacenter
for endpoints. For it they use global snitch, because there's no other
place out there to get that data from.

The whole dc/rack info is now moving to topology, so this set patches
the consistency_level.cc to get the topology. This is done two ways.
First, the helpers that have keyspace at hand may get the topology via
ks's effective_replication_map.

Two difficult cases are db::is_local() and db.count_local_endpoints()
because both have just inet_address at hand. Those are patched to be
methods of topology itself and all their callers already mess with
token metadata and can get topology from it.
"

* 'br-consistency-level-over-topology' of https://github.com/xemul/scylla:
  consistency_level: Remove is_local() and count_local_endpoints()
  storage_proxy: Use topology::local_endpoints_count()
  storage_proxy: Use proxy's topology for DC checks
  storage_proxy: Keep shared_ptr<proxy> on digest_read_resolver
  storage_proxy: Use topology local_dc_filter in its methods
  storage_proxy: Mark some digest_read_resolver methods private
  forwarding_service: Use topology local_dc_filter
  storage_service: Use topology local_dc_filter
  consistency_level: Use topology local_dc_filter
  consitency-level: Call count_local_endpoints from topology
  consistency_level: Get datacenter from topology
  replication_strategy: Remove hold snitch reference
  effective_replication_map: Get datacenter from topology
  topology: Add local-dc detection shugar
2022-08-05 13:31:55 +03:00
Pavel Emelyanov
9c662ee0e5 storage_proxy: Use topology::local_endpoints_count()
A continuation of the previous patches -- now all the code that needs
this helper have proxy pointer at hand

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-05 12:19:48 +03:00