Commit Graph

2165 Commits

Author SHA1 Message Date
Piotr Sarna
cbedefb0f9 qos: add a way of merging service level options
In order to combine multiple service level options coming from
multiple roles, a helper function is provided to merge two
of them. The semantics depend on each parameter, but for timeouts,
which are the only parameters at the time of writing this message,
the minimum value of the two is taken. That in particular means
that when service level A has timeout = 50ms and service level B
has timeout = 1s, the resulting service level options
would set the timeout to 50ms.
2021-05-10 12:39:41 +02:00
Piotr Sarna
4ba1ac57a1 cql3: add preserving default values for per-sl timeouts
In order for per-service-level timeouts to work as expected,
a special value is reserved for internally marking the timeouts
as deleted.
2021-05-10 11:48:14 +02:00
Piotr Sarna
fb4e8951f5 qos: make getting service level public 2021-05-10 11:48:14 +02:00
Piotr Sarna
06d0e1853d qos: make finding service level public 2021-05-10 11:48:14 +02:00
Piotr Sarna
e257ec11c0 treewide: remove service level controller from query state
... since it's accessible through its member, client state.
2021-05-10 11:48:14 +02:00
Piotr Sarna
d1f2e8b469 treewide: propagate service level to client state
... since it's going to be used to set up per-service-level
timeouts.
2021-05-10 11:48:14 +02:00
Piotr Sarna
aa37974192 cql3: add timeout to service level params
Timeout value can now be properly parsed from CQL.
2021-05-10 10:43:21 +02:00
Piotr Sarna
3339ea1d0d qos: add timeout to service level info
Service level information now consists of the timeout config,
which stores the timeout value for all operations.
2021-05-10 10:22:11 +02:00
Piotr Sarna
7e6beabf27 migration_manager: allow table updates with timestamp
In order to avoid needless schema disagreements, a way of announcing
a schema change with fixed timestamp is added.
That way, when nodes update schemas of their internal tables (e.g.
during updates), it's possible for all nodes to use an identical
timestamp for this operation, which in turn makes their digests
identical.
2021-05-10 10:10:38 +02:00
Tomasz Grabiec
abe3d7d7d3 Merge 'storage_proxy: use small_vector for vectors of inet_address' from Avi Kivity
storage_proxy uses std::vector<inet_address> for small lists of nodes - for replication (often 2-3 replicas per operation) and for pending operations (usually 0-1). These vectors require an allocation, sometimes more than one if reserve() is not used correctly.

This series switches storage_proxy to use utils::small_vector instead, removing the allocations in the common case.

Test results (perf_simple_query --smp 1 --task-quota-ms 10):

```
before: median 184810.98 tps ( 91.1 allocs/op,  20.1 tasks/op,   54564 insns/op)
after:  median 192125.99 tps ( 87.1 allocs/op,  20.1 tasks/op,   53673 insns/op)
```

4 allocations and ~900 instructions are removed (the tps figure is also improved, but it is less reliable due to cpu frequency changes).

The type change is unfortunately not contained in storage_proxy - the abstraction leaks to providers of replica sets and topology change vectors. This is sad but IMO the benefits make it worthwhile.

I expect more such changes can be applied in storage_proxy, specifically std::unordered_set<gms::inet_address> and vectors of response handles.

Closes #8592

* github.com:scylladb/scylla:
  storage_proxy, treewide: use utils::small_vector inet_address_vector:s
  storage_proxy, treewide: introduce names for vectors of inet_address
  utils: small_vector: add print operator for std::ostream
  hints: messages.hh: add missing #include
2021-05-06 18:00:54 +02:00
Tomasz Grabiec
6aec8cc447 Merge "raft: fixes and improvements for snapshot transfer" from Gleb
* scylla-dev/raft-snapshot-fixes-v4:
  raft: document that add entry my throw commit_status_unknown
  raft: test: add test of a leadership change during ongoing snapshot transfer
  raft: test: retry submitting an entry if it was dropped
  raft: test: wait for the log to be fully replicated on new leader only
  raft: drop waiters with outdated terms
  raft: make snapshot transfer abortable
  raft: accept snapshots transfer from multiple nodes simultaneously
  raft: do not send probes while transferring snapshot
  raft: handle messages sending errors
  raft: test: return error from rpc module if nodes are disconnected
  raft: fix a typo in a variable name
2021-05-06 17:44:22 +02:00
Avi Kivity
d6d6758857 Merge 'Switch to use NODE_OPS_CMD for decommission and bootstrap operation' from Asias He
In commit 323f72e48a (repair: Switch to
use NODE_OPS_CMD for replace operation), we switched replace operation
to use the new NODE_OPS_CMD infrastructure.

In this patch set, we continue the work to switch decommission and bootstrap
operation to use NODE_OPS_CMD.

Fixes #8472
Fixes #8471

Closes #8481

* github.com:scylladb/scylla:
  repair: Switch to use NODE_OPS_CMD for bootstrap operation
  repair: Switch to use NODE_OPS_CMD for decommission operation
2021-05-06 17:28:19 +03:00
Gleb Natapov
6abe2772dc raft: make snapshot transfer abortable
A snapshot transfer may take a lot of time and meanwhile a leader doing
it may lose the leadership. If that happens the ongoing snapshot transfer
becomes obsolete since the snapshot will be rejected by the receiving
node as coming from an old leader. Make snapshot transfer abortable and
abort them when leader changes.
2021-05-06 11:34:31 +03:00
Avi Kivity
cea5493cb7 storage_proxy, treewide: introduce names for vectors of inet_address
storage_proxy works with vectors of inet_addresses for replica sets
and for topology changes (pending endpoints, dead nodes). This patch
introduces new names for these (without changing the underlying
type - it's still std::vector<gms::inet_address>). This is so that
the following patch, that changes those types to utils::small_vector,
will be less noisy and highlight the real changes that take place.
2021-05-05 18:36:48 +03:00
Avi Kivity
ddb1f0e6ca Merge "Choose the user max-result-size for service levels" from Botond
"
Choosing the max-result-size for unlimited queries is broken for unknown
scheduling groups. In this case the system limit (unlimited) will be
chosen. A prime example of this break-down is when service levels are
used.

This series fixes this in the same spirit as the similar semaphore
selection issue (#8508) was fixed: use the user limit as the fall-back
in case of unknown scheduling groups.
To ensure future fixes automatically apply to both query-classification
related configurations, selecting the max result size for unlimited
queries is now delegated to the database, sharing the query
classification logic with the semaphore selection.

Fixes: #8591

Tests: unit(dev)
"

* 'query-max-size-service-level-fix/v2' of https://github.com/denesb/scylla:
  service/storage_proxy: get_max_result_size() defer to db for unlimited queries
  database: add get_unlimited_query_max_result_size()
  query_class_config: add operator== for max_result_size
  database: get_reader_concurrency_semaphore(): extract query classification logic
2021-05-05 18:11:10 +03:00
Botond Dénes
9d5e958331 service/storage_proxy: get_max_result_size() defer to db for unlimited queries
Defer picking the appropriate max result size for unlimited queries to
the database, which is already the place where we make query classifying
decisions. This move means that all these decisions are now centralized
in the database, not scattered in different places and fixing one fixes
all users.
2021-05-05 13:30:50 +03:00
Nadav Har'El
58e275e362 cross-tree: reduce dependency on db/config.hh and database.hh
Every time db/config.hh is modified (e.g., to add a new configuration
option), 110 source files need to be recompiled. Many of those 110 didn't
really care about configuration options, and just got the dependency
accidentally by including some other header file.

In this patch, I remove the include of "db/config.hh" from all header
files. It is only needed in source files - and header files only
need forward declarations. In some cases, source files were missing
certain includes which they got incidentally from db/config.hh, so I
had to add these includes explicitly.

After this patch, the number of source files that get recompiled after a
change to db/config.hh goes down from 110 to 45.
It also means that 65 source files now compile faster because they don't
include db/config.hh and whatever it included.

Additionally, this patch also eliminates a few unnecessary inclusions
of database.hh in other header files, which can use a forward declaration
or database_fwd.hh. Some of the source files including one of those
header files relied on one of the many header files brought in by
database.hh, so we need to include those explicitly.
In view_update_generator.hh something interesting happened - it *needs*
database.hh because of code in the header file, but only included
database_fwd.hh, and the only reason this worked was that the files
including view_update_generator.hh already happened to unnecessarily
include database.hh. So we fix that too.

Refs #1

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210505102111.955470-1-nyh@scylladb.com>
2021-05-05 13:23:00 +03:00
Avi Kivity
6ffd813b7b Merge 'hints: delay repair until hints are replayed' from Piotr Dulikowski
Both hinted handoff and repair are meant to improve the consistency of the cluster's data. HH does this by storing records of failed replica writes and replaying them later, while repair goes through all data on all participaring replicas and makes sure the same data is stored on all nodes. The former is generally cheaper and sometimes (but not always) can bring back full consistency on its own; repair, while being more costly, is a sure way to bring back current data to full consistency.

When hinted handoff and repair are running at the same time, some of the work can be unnecessarily duplicated. For example, if a row is repaired first, then hints towards it become unnecessary. However, repair needs to do less work if data already has good consistency, so if hints finish first, then the repair will be shorter.

This PR introduces a possibility to wait for hints to be replayed before continuing with user-issued repair. The coordinator of the repair operation asks all nodes participating in the repair operation (including itself) to mark a point at the end of all hint queues pointing towards other nodes participating in repair. Then, it waits until hint replay in all those queues reaches marked point, or configured timeout is reached.

This operation is currently opt-in and can be turned on by setting the `wait_for_hint_replay_before_repair_in_ms` config option to a positive value.

Fixes #8102

Tests:
- unit(dev)
- some manual tests:
    - shutting down repair coordinator during hints replay,
    - shutting down node participating in repair during hints replay,

Closes #8452

* github.com:scylladb/scylla:
  repair: introduce abort_source for repair abort
  repair: introduce abort_source for shutdown
  storage_proxy: add abort_source to wait_for_hints_to_be_replayed
  storage_proxy: stop waiting for hints replay when node goes down
  hints: dismiss segment waiters when hint queue can't send
  repair: plug in waiting for hints to be sent before repair
  repair: add get_hosts_participating_in_repair
  storage_proxy: coordinate waiting for hints to be sent
  config: add wait_for_hint_replay_before_repair option
  storage_proxy: implement verbs for hint sync points
  messaging_service: add verbs for hint sync points
  storage_proxy: add functions for syncing with hints queue
  db/hints: make it possible to wait until current hints are sent
  db/hints: add a metric for counting processed files
  db/hints: allow to forcefully update segment list on flush
2021-05-03 18:47:27 +03:00
Eliran Sinvani
fc93133cbe Service level controller: fix wrong default service level removal log
An out of block log print resulted in repeated prints about removal of
the default service level. The period of this print is every time the
configuration is scanned for changes. It happens when the default
service level is one of the last on the map (sorted as in the map).

Fixes #8567

Closes #8576
2021-05-03 09:08:41 +03:00
Pavel Solodovnikov
4c351ff260 raft: switch group_id type from uint64_t to utils::UUID
Introduce a tagged id struct for `group_id`.

Raft code would want to generate quite a lot of unique
raft groups in the future (e.g. tablets). UUID is designed
exactly for that (e.g. larger capacity than `uint64_t`, obviously,
and also has built-in procedures to generate random ids).

Also, this is a preparation to make "raft group 0" use a random
ID instead of a literal fixed `0` as a group id.

The purpose is that every scylla cluster must have a unique ID
for "raft group 0" since we don't want the nodes from some other
cluster to disrupt the current cluster. This can happen if,
for some reason, a foreign node happens to contact a node in
our cluster.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210429170630.533596-3-pa.solodovnikov@scylladb.com>
2021-05-02 16:39:54 +03:00
Asias He
84a78f4558 repair: Switch to use NODE_OPS_CMD for bootstrap operation
In commit 323f72e48a (repair: Switch to
use NODE_OPS_CMD for replace operation), we switched replace operation
to use the new NODE_OPS_CMD infrastructure.

In this patch, we continue the work to switch bootstrap operation to use
NODE_OPS_CMD.

The benefits:

- It is more reliable to detect pending node operations, to avoid
  multiple topology changes at the same time, than using gossip status.

- The cluster reverts to a state before the bootstrap operation
  automatically in case of error much faster than gossip.

- Allows users to pass a list of dead nodes to ignore during bootstrap
  explicitly.

- The BOOTSTRAP gossip status is not needed any more. This is one step
  closer to achieve gossip-less topology change.

Fixes #8472
2021-04-28 09:53:04 +08:00
Piotr Dulikowski
958a13577c storage_proxy: add abort_source to wait_for_hints_to_be_replayed
Now, the function wait_for_hints_to_be_replayed will take an abort
source and will stop when abort is requested.
2021-04-27 16:16:54 +02:00
Piotr Dulikowski
22e06ace2c storage_proxy: stop waiting for hints replay when node goes down
Now, when repair coordinator is waiting for hints to be replayed on some
remote node and that node goes down, it stops waiting for it.
2021-04-27 16:11:47 +02:00
Piotr Dulikowski
46075af7c4 storage_proxy: coordinate waiting for hints to be sent
Adds a `wait_for_hints_to_be_replayed` function which waits until all
hints between specified endpoints are replayed.

For each node, a hint sync point is created. Then, repair coordinator
waits until the hint sync point is reached on every node, or timeout
occurs. This is done by querying each host participating in repair every
second in order if the sync point is still there.
2021-04-27 15:31:42 +02:00
Piotr Dulikowski
485036ac33 storage_proxy: implement verbs for hint sync points
Implements HINT_QUEUE_MARK and HINT_QUEUE_SYNC verb handlers in
`storage_proxy`.
2021-04-27 15:06:39 +02:00
Piotr Dulikowski
244738b0d5 storage_proxy: add functions for syncing with hints queue
Adds two methods to `storage_proxy`:

- `create_hint_queue_sync_point` - creates a "hint sync point" which
  is kept present in storage_proxy until all hint queues on the local
  node reach their curent end. It will also disappear if given deadline
  is reached first.
- `check_hint_queue_sync_point` - checks if given hint sync point still
  exists.

The created sync point waits for hint queues in all hint managers, on
all shards.
2021-04-27 15:06:39 +02:00
Eliran Sinvani
02d37cb133 workload prioritization: Fix configuration change detection
The configuration detection is based on a loop that
advances two iterators and compares the two collection
for deducing the configuration change. In order to
correctly deduce the changes the iteration have to be
according to the key (service level name) order for both
of the collections. If it doesn't happen the results are
undefined and in some cases can lead to a crash of the
system. The bug is that the _service_level_db field was
implemented using an unordered_map which obviously don't
guarantie the configuration change detection assumption.
The fix was simply to change the field type to a map
instead of unordered_map.

Another problem is that when a static service level (i.e
default) is at the end of the keys list, it is repeatedly
being deleted - which doesn't really do anything since deleting
a static service level is just retaining it's defult values
but it is stil wrong.
2021-04-27 12:29:31 +02:00
Eliran Sinvani
946fc6af08 workload prioritization: add exception protection in configuration polling
Exceptions around the loop polling were not handled properly.
This is an issue due to the fact that if an unhandled exception
slips out to the configuration polling loop itself it will break
it. When the configuration polling loop is broken, any further
change to the configuration will not be acted uppon in the nodes
where the loop is broken until the node is restarted. The chances
for exceptions are now greater than before since in one of the
previous commits we started quering the workload prioritization
configuration table with a sensible, shorter timeout.
This change also adds a logger for the workload prioritization
module and some logging mainly arround the configuration polling loop.

Most logs are added in the info level since they are not expected to
happen frequently but when they do we would like to have some
information by default regarding what broke the loop.
2021-04-27 12:29:31 +02:00
Avi Kivity
54b76e82bc Merge "Make migration manager main-local" from Pavel
"
There are few places left that call for migration manager
by global reference. This set patches all those places
and makes the migration manager a service that locally
lives in main(). Surprisingly, the largest changes are to
get rid of global migration manager calls from ... the
migration manager itself.

Two tricks here. First, repair code gets its private global
migration manager pointer. That's not nice, but it aligned
with current repair design -- all its references are now
"global". Some day they all will be moved into sharded
repair service, for now these globals just describe the real
dependencies of the repair code.

Second is storage proxy that needs to call migration manager
to get schema. Proper layering makes migration manager sit
on top of storage proxy, so the direct back-reference is
not nice. To overcome this the proxy gets migration manager's
shared_from_this() pointer and drops all of them on stop.
This makes sure that by the time migration manager stops
no references from proxy exist.

tests: unit(dev), start-stop, start-drain-stop
"

* 'br-turn-migration-manager-local' of https://github.com/xemul/scylla: (21 commits)
  migration_manager: Make it main-local
  tests: Have own migration manager instances
  tests: Use migration_manager from cql_test_env
  migration_manager: Call maybe_sync from this
  migration_manager: Make get_schema_for_... methods
  migration_manager: Hide get_schema_definition
  streaming: Keep migration_manager ptr in rpc lambdas
  storage_proxy: Keep migration_manager ptr in rpc lambdas
  streaming: Get migration_manager shared_ptr in messaging
  storage_proxy: Get migration_manager shared_ptr in messaging
  migration_manager: Make maybe_sync a method
  migration_manager: Open-code merge lambda
  migration_manager: Turn do_announce_new_type non-static
  migration_manager: Make announce() non-static method
  storage_servive: Use local migration manager
  storage_service: Keep migration manager on board
  migration_manager: Use 'this' where appropriate
  repair: Use private migration manager pointer
  repair: Keep private sharded migration manager pointer
  redis: Carry sharded migration manager over init
  ...
2021-04-25 13:29:16 +03:00
Benny Halevy
5ca8f28297 storage_service: load_new_sstables: log success message as info, not warning
Success is important, but nothing to be warned about.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210425070909.476226-1-bhalevy@scylladb.com>
2021-04-25 12:39:47 +03:00
Pavel Emelyanov
13d264d6bd migration_manager: Make it main-local
Now everybody is patched to use component-local instance of migration
manager and its global instance can be moved into main() scope.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
3c3535b4d8 migration_manager: Call maybe_sync from this
The only caller of maybe_sync() method is now the method itself
and can stop using global migration manager instance and switch
to using 'this'.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
6b31c47a75 migration_manager: Make get_schema_for_... methods
These two helpers are now namespace-scoped methods, but both
need the migration manager instance inside. All their callers
are now patched to have the migration manager at hands, so
the helpers can be turned into methods.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
1021a180e7 migration_manager: Hide get_schema_definition
This method is exclusively used inside migration manager code,
so (for now) no use in keeping it exposed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
d76ff4b32f storage_proxy: Keep migration_manager ptr in rpc lambdas
This patch is the bridge between the previous one and the
next one and is quite messy to be merged with either.

No heavy changes -- just copy the migration manager's ptr
onto rpc lambdas. Will be used in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
a4569a30f3 storage_proxy: Get migration_manager shared_ptr in messaging
The proxy's messaging code uses migration manager to obtain schema.
Since proxy is more low-level service than migration manager, it's
incorrect to make proxy reference the manager directly. Instead,
push the shared_ptr into proxy's messaging code. This kills two
birst with one stone:

1: let proxy use migration manager
2: makes sure that by the time migration manager is stopped
   the proxy's use of this pointer is gone (unregistered from
   rpc)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
b73a93dab7 migration_manager: Make maybe_sync a method
Right now the maybe_sync is namespace-scope function. Turn it into
a migration_manager method so that it can use 'this' instead of
get_local_migration_manager().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
46bf6872d5 migration_manager: Open-code merge lambda
This lambda uses global migration manager instance. Open-coding
this short lambda makes further patching simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
ef20d4ee59 migration_manager: Turn do_announce_new_type non-static
It's the only place that calls recently patched .announce()
method, so instead of grabbing global migration manager,
use 'this'.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
7aa1b5d395 migration_manager: Make announce() non-static method
This method needs to get migration manager instance to call
methods on it, so turn it non-static to have the instance in
'this'. Caller (yes, only one) gets local migration manager
itself, but will be patched soon.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
877ad36424 storage_servive: Use local migration manager
Now when the migration manager is on board storage service
can use it insted of global instance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
b7fe191e3d storage_service: Keep migration manager on board
The storage service needs migration manager to sync schema
on lifecycle notifiers and to stop the guy on drain. So this
patch just pushes the migration manager reference all the
way through the storage service constructor.

Few words about tests. Since now storage service needs the
migration manager in constructor, some tests should take it
from somewhere. The cql_test_env already has (and uses) it,
all the others can just provide a not-started sharded one,
it won't be in use in _those_ tests anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
6d1eede472 migration_manager: Use 'this' where appropriate
Some its non-static method call get_local_migration_manager
instead of using 'this'. None of these places use this to
get cross-shard instance, so it's safe to use 'this' there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Pavel Emelyanov
e7dc059917 migration_manager: Merge migration_task in
The migration_task is the class with the single static method
that's called from a single place in migration manager and
this method calls migration manager back right at once. There's
no much sense in keeping this abstraction, merge it into the
migration manager.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-23 17:13:24 +03:00
Eliran Sinvani
480a12d7b3 Materialized views: fix possibly old views comming from other nodes
Migration manager has a function to get a schema (for read or write),
this function queries a peer node and retrieves the schema from it. One
scenario where it can happen is if an old node, queries an old not fixed
index.
This makes a hole through which views that are only adjusted for reading
can slip through.

Here we plug the hole by fixing such views before they are registered.

Closes #8509
2021-04-22 15:38:10 +02:00
Piotr Sarna
55ae110774 qos: make sure to wait for sl updates on shutdown
The service level controller spawns an updating thread,
which wasn't properly waited for during shutdown.
This behavior is now fixed.
In order to make the shutdown order more standardized,
the operation is split into two phases - draining and stopping.

Tests: manual

Fixes #8468
2021-04-22 09:58:27 +02:00
Asias He
1513de633b repair: Switch to use NODE_OPS_CMD for decommission operation
In commit 323f72e48a (repair: Switch to
use NODE_OPS_CMD for replace operation), we switched replace operation
to use the new NODE_OPS_CMD infrastructure.

In this patch, we continue the work to switch decommission operation to use
NODE_OPS_CMD.

The benefits:

- A UUID is used to identify each node operation across the cluster.

- It is more reliable to detect pending node operations, to avoid
  multiple topology changes at the same time.

- The cluster reverts to a state before the decommission operation
  automatically in case of error. Without this patch, the node to be
  decommissioned will be stuck in decommission status forever until it
  is restarted and goes back to normal status.

- Allows users to pass a list of dead nodes to ignore for decommission
  explicitly.

- The LEAVING gossip status is not needed any more. This is one step
  closer to achieve gossip-less topology change.

- Allows us to trigger of off-strategy easily on the node receiving the
  ranges

Fixes #8471
2021-04-21 20:35:54 +08:00
Avi Kivity
daeddda7cc treewide: remove inclusions of storage_proxy.hh from headers
storage_proxy.hh is huge and includes many headers itself, so
remove its inclusions from headers and re-add smaller headers
where needed (and storage_proxy.hh itself in source files that
need it).

Ref #1.
2021-04-20 21:23:00 +03:00
Avi Kivity
cdf30524f3 storage_proxy: unnest coordinator_query_result
Nested classes cannot be forward declared, and
storage_proxy::coordinator_query_result is used in pagers, where
we'd like to forward-declare it. Unnest it and introduce an alias
for compatibility.
2021-04-20 21:23:00 +03:00
Kamil Braun
617813ba66 sys_dist_ks: new keyspace for system tables with Everywhere strategy
`system_distributed_everywhere` is a new keyspace that uses Everywhere
replication strategy. This is useful, for example, when we want to store
internal data that should be accessible by every node; the data can be
written using CL=ALL (e.g. during node operations such as node
bootstrap, which require all nodes to be alive - at least currently) and
then read by each node locally using CL=ONE (e.g. during node restarts).

Closes #8457
2021-04-19 11:22:57 +03:00