Commit Graph

618 Commits

Author SHA1 Message Date
Asias He
c82250e0cf gossip: Allow deferring advertise of local node to be up
Currently the replacing node sets the status as STATUS_UNKNOWN when it
starts gossip service for the first time before it sets the status to
HIBERNATE to start the replacing operation. This introduces the
following race:

1) Replacing node using the same IP address of the node to be replaced
starts gossip service without setting the gossip STATUS (will be seen as
STATUS_UNKNOWN by other nodes)

2) Replacing node waits for gossip to settle and learns status and
tokens of existing nodes

3) Replacing node announces the HIBERNATE STATUS.

After Step 1 and before Step 3, existing nodes will mark the replacing
node as UP, but haven't marked the replacing node as doing replacing
yet. As a result, the replacing node will not be excluded from the read
replicas and will be considered a target node to serve CQL reads.

To fix, we make the replacing node avoid responding echo message when it is not
ready.

Fixes #7312

Closes #7714
2021-01-26 19:02:11 +01:00
Juliusz Stasiewicz
b150906d39 gossip: Added SNITCH_NAME to application_state
Snitch name needs to be exchanged within cluster once, on shadow
round, so joining nodes cannot use wrong snitch. The snitch names
are compared on bootstrap and on normal node start.

If the cluster already used mixed snitches, the upgrade to this
version will fail. In this case customer needs to add a node with
correct snitch for every node with the wrong snitch, then put
down the nodes with the wrong snitch and only then do the upgrade.

Fixes #6832

Closes #7739
2020-12-09 15:45:25 +02:00
Asias He
0a3a2a82e1 api: Add force_remove_endpoint for gossip
It is used to force remove a node from gossip membership if something
goes wrong.

Note: run the force_remove_endpoint api at the same time on _all_ the
nodes in the cluster in order to prevent the removed nodes come back.
Becasue nodes without running the force_remove_endpoint api cmd can
gossip around the removed node information to other nodes in 2 *
ring_delay (2 * 30 seconds by default) time.

For instance, in a 3 nodes cluster, node 3 is decommissioned, to remove
node 3 from gossip membership prior the auto removal (3 days by
default), run the api cmd on both node 1 and node 2 at the same time.

$ curl -X POST --header "Accept: application/json"
"http://127.0.0.1:10000/gossiper/force_remove_endpoint/127.0.0.3"
$ curl -X POST --header "Accept: application/json"
"http://127.0.0.2:10000/gossiper/force_remove_endpoint/127.0.0.3"

Then run 'nodetool gossipinfo' on all the nodes to check the removed nodes
are not present.

Fixes #2134

Closes #5436
2020-11-29 13:58:46 +02:00
Piotr Jastrzebski
d2897d8f8b alternator: guard streams with an experimental flag
Add new alternator-streams experimental flag for
alternator streams control.

CDC becomes GA and won't be guarded by an experimental flag any more.
Alternator Streams stay experimental so now they need to be controlled
by their own experimental flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:16 +01:00
Piotr Jastrzebski
e9072542c1 Mark CDC as GA
Enable CDC by default.
Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc
flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:13 +01:00
Benny Halevy
a0436ea324 gossiper: convert to shared_token_metadata
get() the latest token_metadata& from the
shared_token_metadata before each use.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
29ed59f8c4 main: start a shared_token_metadata
And use it to get a token_metadata& compatible
with current usage, until the services are converted to
use token_metadata_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Nadav Har'El
7ff72b0ba5 Merge 'secondary_index: fix returned rows token ordering' from Piotr Grabowski
Fixes returned rows ordering to proper signed token ordering. Before this change, rows were sorted by token, but using unsigned comparison, meaning that negative tokens appeared after positive tokens.

Rename `token_column_computation` to `legacy_token_column_computation` and add some comments describing this computation.

Added (new) `token_column_computation` which returns token as `long_type`, which is sorted using signed comparison - the correct ordering of tokens.

Add new `correct_idx_token_in_secondary_index` feature, which flags that the whole cluster is able to use new `token_column_computation`.

Switch token computation in secondary indexes to (new) `token_column_computation`, which fixes the ordering. This column computation type is only set if cluster supports `correct_idx_token_in_secondary_index` feature to make sure that all nodes
will be able to compute new `token_column_computation`. Also old indexes will need to be rebuilt to take advantage of this fix, as new token column computation type is only set for new indexes.

Fix tests according to new token ordering and add one new test to validate this aspect explicitly.

Fixes #7443

Tested manually a scenario when someone created an index on old version of Scylla and then migrated to new Scylla. Old index continued to work properly (but returning in wrong order). Upon dropping and re-creating the index, it still returned the same data, but now in correct order.

Closes #7534

* github.com:scylladb/scylla:
  tests: add token ordering test of indexed selects
  tests: fix tests according to new token ordering
  secondary_index: use new token_column_computation
  feature: add correct_idx_token_in_secondary_index
  column_computation: add token_column_computation
  token_column_computation: rename as legacy
2020-11-05 18:44:49 +01:00
Piotr Grabowski
6624d933c9 feature: add correct_idx_token_in_secondary_index
Add new correct_idx_token_in_secondary_index feature, which will be used
to determine if all nodes in the cluster support new 
token_column_computation. This column computation will replace
legacy_token_column_computation in secondary indexes, which was 
incorrect as this column computation produced values that when compared 
with unsigned comparison (CQL type bytes comparison) resulted in 
different ordering  than token signed comparison. See issue:

https://github.com/scylladb/scylla/issues/7443
2020-11-04 12:02:42 +01:00
Benny Halevy
e4614d4836 gossiper: mark trivial methods noexcept
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:47 +02:00
Benny Halevy
1ba4c84ae2 gossiper: get_cluster_name, get_partitioner_name: make noexcept
These methods can return a const sstring& rather than
allocating a sstring. And with that they can be marked noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:29 +02:00
Benny Halevy
11a8912093 gossiper: get_gossip_status: return string_view and make noexcept
Change get_gossip_status to return string_view,
and with that it can be noexcept now that it doesn't
allocate memory via sstring.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
126e486fde gms/endpoint_state: mark methods using get_status noexcept
Now that get_status returns string_view, just compare it with a const char*
rather than making a sstring out of it, and consequently, can be marked noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
6b9191b6c2 gms/endpoint_state: get_status: return string_view and make noexcept
get_status doesn't need to allocate a sstring, it can just
return a std::string_view to the status string, if found.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
232c665bab gms/endpoint_state: mark get_application_state_ptr and is_cql_ready noexcept
Although std::map::find is not guaranteed to be noexcept
it depends on the comperator used and in this case comparing application_state
is noexcept.  Therefore, we can safely mark get_application_state_ptr noexcept.

is_cql_ready depends on get_application_state_ptr and otherwise
handles an exceptions boost::lexical_cast so it can be marked
noexcept as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
5d8e2c038b gms/endpoint_state: mark trivial methods noexcept
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
d4c364507e gms/heart_beat_state: mark methods noexcept
Now that get_next_version() is noexcept,
update_heart_beat can be noexcept too.

All others are trivially noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
68a2920201 gms/versioned_value: mark trivial methods noexcept
Also, versioned_value::compare_to() can be marked const.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
c295f521b9 gms/version_generator: mark get_next_version noexcept
It is trivially so.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
e28d80ec0c messaging: msg_addr: mark methods noexcept
Based on gms::inet_address.

With that, gossiper::get_msg_addr can be marked noexcept (and const while at it).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
232fc19525 gms/inet_address: mark methods noexcept
Based on the corresponding net::inet_address calls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Asias He
d47033837a gossiper: Use dedicated gossip scheduling group
Gossip currently runs inside the default (main) scheduling group. It is
fine to run inside default scheduling group. From time to time, we see
many tasks in main scheduling group and we suspect gossip. It is best
we can move gossip to a dedicated scheduling group, so that we can catch
bugs that leak tasks to main group more easily.

After this patch, we can check:

scylla_scheduler_time_spent_on_task_quota_violations_ms{group="gossip",shard="0"}

Fixes: #7154
Tests: unit(dev)
2020-10-29 12:53:37 +02:00
Tomasz Grabiec
14fdd2f501 Merge "Gossip echo message improvement" from Asias
This series improves gossip echo message handling in a loaded cluster.

Refs: #7197

* git://github.com/asias/scylla.git gossip_echo_improve_7197:
  gossiper: Handle echo message on any shard
  gossiper: Increase echo message timeout
  gossiper: Remove unused _last_processed_message_at
2020-09-24 15:13:55 +02:00
Asias He
88b7587755 gossiper: Handle echo message on any shard
Echo message does not need to access gossip internal states, we can run
it on all shards and avoid forwarding to shard zero.

This makes gossip marking node up more robust when shard zero is loaded.

There is an argument that we should make echo message return only when
all shards have responded so that all shards are live and responding.

However, in a heavily loaded cluster, one shard might be overloaded on
multiple nodes in the cluster at the same time. If we require echo
response on all shards, we have a chance local node will mark all peer
nodes as down. As a result, the whole cluster is down. This is much
worse than not excluding a node with a slow shard from a cluster.

Refs: #7197
2020-09-24 10:10:54 +08:00
Asias He
173d115a64 gossiper: Remove unused _last_processed_message_at
It is not used any more. We can get rid of it.

Refs: #7197
2020-09-24 09:48:54 +08:00
Pavel Emelyanov
a75b048616 gossiper: Unregister verbs if shadow round aborts start
The gossiper verbs are registered in two places -- start_gossiping
and do_shadow_round(). And unregistered in one -- stop_gossiping
iff the start took place. Respectively, there's a chance that after
a shadow round scylla exits without starting gossiping thus leaving
verbs armed.

Fix by unregistering verbs on stop if they are still registered.

fixes: #7262
tests: manual(start, abort start after shadow round), unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200921140357.24495-1-xemul@scylladb.com>
2020-09-22 10:18:01 +02:00
Piotr Sarna
dd085b146a gms: add comments for deprecated features
Features which are propagated to other nodes via gossip,
but assumed they are supported in the code, are now marked
with comments.
2020-09-14 12:59:19 +02:00
Piotr Sarna
defe6f49df gms: remove unused feature bits
Checks for features introduced over 2 years ago were removed
in previous commits, so all that is left is removing the feature
bits itself. Note that the feature strings are still sent
to other nodes just to be double sure, but the new code assumes
that all these features are implicitly enabled.
2020-09-14 12:35:28 +02:00
Piotr Sarna
21a77612b3 gms: add a cluster feature for fixed hashing
The new hashing routine which properly takes null cells
into account is now enabled if the whole cluster is aware of it.
2020-09-10 13:16:44 +02:00
Pavel Emelyanov
812eed27fe code: Force formatting of pointer in .debug and .trace
... and tests. Printin a pointer in logs is considered to be a bad practice,
so the proposal is to keep this explicit (with fmt::ptr) and allow it for
.debug and .trace cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-26 20:44:11 +03:00
Asias He
6cadf4e4fa gossip: Apply state for local node in shadow round
We saw errors in killed_wiped_node_cannot_join_test dtest:

  Aug 2020 10:30:43 [node4] Missing: ['A node with address 127.0.76.4 already exists, cancelling join']:

The test does:
  n1, n2, n3, n4
  wipe data on n4
  start n4 again with the same ip address

Without this patch, n4 will bootstrap into the cluster new tokens. We
should prevent n4 to bootstrap because there is an existing
node in the cluster.

In shadow round, the local node should apply the application state of
the node with the same ip address. This is useful to detect a node
trying to bootstrap with the same IP address of an existing node.

Tests: bootstrap_test.py
Fixes: #7073
2020-08-25 12:53:59 +03:00
Avi Kivity
0dcb16c061 Merge "Constify access to token_metadata" from Benny
"
We keep refrences to locator::token_metadata in many places.
Most of them are for read-only access and only a few want
to modify the token_metadata.

Recently, in 94995acedb,
we added yielding loops that access token_metadata in order
to avoid cpu stalls.  To make that possible we need to make
sure they token_metadata object they are traversing won't change
mid-loop.

This series is a first step in ensuring the serialization of
updates to shared token metadata to reading it.

Test: unit(dev)
Dtest: bootstrap_test:TestBootstrap.start_stop_test{,_node}, update_cluster_layout_tests.py -a next-gating(dev)
"

* tag 'constify-token-metadata-access-v2' of github.com:bhalevy/scylla:
  api/http_context: keep a const sharded<locator::token_metadata>&
  gossiper: keep a const token_metadata&
  storage_service: separate get_mutable_token_metadata
  range_streamer: keep a const token_metadata&
  storage_proxy: delete unused get_restricted_ranges declaration
  storage_proxy: keep a const token_metadata&
  storage_proxy: get rid of mutable get_token_metadata getter
  database: keep const token_metadata&
  database: keyspace_metadata: pass const locator::token_metadata& around
  everywhere_replication_strategy: move methods out of line
  replication_strategy: keep a const token_metadata&
  abstract_replication_strategy: get_ranges: accept const token_metadata&
  token_metadata: rename calculate_pending_ranges to update_pending_ranges
  token_metadata: mark const methods
  token_ranges: pending_endpoints_for: return empty vector if keyspace not found
  token_ranges: get_pending_ranges: return empty vector if keyspace not found
  token_ranges: get rid of unused get_pending_ranges variant
  replication_strategy: calculate_natural_endpoints: make token_metadata& param const
  token_metadata: add get_datacenter_racks() const variant
2020-08-22 20:47:45 +03:00
Benny Halevy
573142d4c4 gossiper: keep a const token_metadata&
gossiper has no need to change token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Pavel Emelyanov
4ea63b2211 gossiper: Share the messaging service with snitch
And make snitch use gossiper's messaging, not global

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Pavel Emelyanov
65bd54604d gossiper: Use messaging service by reference
Gossiper needs messaging service, the messaging is started before the
gossiper, so we can push the former reference into it.

Gossiper is not stopped for real, neither the messaging service is, so
the memory usage is still safe.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-19 20:50:52 +03:00
Avi Kivity
3b1ff90a1a Merge "Get rid of seed concept in gossip" from Asias
"
gossip: Get rid of seed concept

The concept of seed and the different behaviour between seed nodes and
non seed nodes generate a lot of confusion, complication and error for
users. For example, how to add a seed node into into a cluster, how to
promote a non seed node to a seed node, how to choose seeds node in
multiple DC setup, edit config files for seeds, why seed node does not
bootstrap.

If we remove the concept of seed, it will get much easier for users.
After this series, seed config option is only used once when a new node
joins a cluster.

Major changes:

Seed nodes are only used as the initial contact point nodes.

Seed nodes now perform bootstrap. The only exception is the first node
in the cluster.

The unsafe auto_bootstrap option is now ignored.

Gossip shadow round now talks to all nodes instead of just seed nodes.

Refs: #6845
Tests: update_cluster_layout_tests.py + manual test
"

* 'gossip_no_seed_v2' of github.com:asias/scylla:
  gossip: Get rid of seed concept
  gossip: Introduce GOSSIP_GET_ENDPOINT_STATES verb
  gossip: Add do_apply_state_locally helper
  gossip: Do not talk to seed node explicitly
  gossip: Talk to live endpoints in a shuffled fashion
2020-08-17 09:50:51 +03:00
Asias He
d0b3f3dfe8 gossip: Get rid of seed concept
The concept of seed and the different behaviour between seed nodes and
non seed nodes generate a lot of confusion, complication and error for
users. For example, how to add a seed node into into a cluster, how to
promote a non seed node to a seed node, how to choose seeds node in
multiple DC setup, edit config files for seeds, why seed node does not
bootstrap.

If we remove the concept of seed, it will get much easier for users.
After this series, seed config option is only used once when a new node
joins a cluster.

Major changes:

- Seed nodes are only used as the initial contact point nodes.

- Seed nodes now perform bootstrap. The only exception is the first node
  in the cluster.

- The unsafe auto_bootstrap option is now ignored.

- Gossip shadow round now attempts to talk to all nodes instead of just seed nodes.

Manual test:

- bootstrap n1, n2, n3  (n1 and n2 are listed as seed, check only n1
  will skip bootstrap, n2 and n3 will bootstrap)
- shtudown n1, n2, n3
- start n2 (check non seed node can boot)
- start n1 (check n1 talks to both n2 and n3)
- start n3 (check n3 talks to both n1 and n3)

Upgrade/Downgrade test:

- Initialize cluster
  Start 3 node with n1, n2, n3 using old version
  n1 and n2 are listed as seed

- Test upgrade starting from seed nodes
  Rolling restart n1 using new version
  Rolling restart n2 using new version
  Rolling restart n3 using new version

- Test downgrade to old version
  Rolling restart n1 using old version
  Rolling restart n2 using old version
  Rolling restart n3 using old version

- Test upgrade starting from non seed nodes
  Rolling restart n3 using new version
  Rolling restart n2 using new version
  Rolling restart n1 using new version

Notes on upgrade procedure:

There is no special procedure needed to upgrade to Scylla without seed
concept. Rolling upgrade node one by one is good enough.

Fixes: #6845
Tests: ./test.py + update_cluster_layout_tests.py + manual test
2020-08-17 10:35:16 +08:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Asias He
e6ceec1685 gossip: Fix race between shutdown message handler and apply_state_locally
1. The node1 is shutdown
2. The node1 sends shutdown message to node2
3. The node2 receives gossip shutdown message but the handler yields
4. The node1 is restarted
5. The node1 sends new gossip endpoint_state to node2, node2 applies the state
   in apply_state_locally and calls gossiper::handle_major_state_change
   and then calls gossiper::mark_alive
6. The shutdown message handler in step 3 resumes and sets status of node1 to SHUTDOWN
7. The gossiper::mark_alive fiber in step 5 resumes and calls gossiper::real_mark_alive,
   node2 will skip to mark node1 as alive because the status of node1 is
   SHUTDOWN. As a result, node1 is alive but it is not marked as UP by node2.

To fix, we serialize the two operations.

Fixes #7032
2020-08-13 11:06:04 +03:00
Avi Kivity
3530e80ce1 Merge "Support md format" from Benny
"
This series adds support for the "md" sstable format.

Support is based on the following:

* do not use clustering based filtering in the presence
  of static row, tombstones.
* Disabling min/max column names in the metadata for
  formats older than "md".
* When updating the metadata, reset and disable min/max
  in the presence of range tombstones (like Cassandra does
  and until we process them accurately).
* Fix the way we maintain min/max column names by:
  keeping whole clustering key prefixes as min/max
  rather than calculating min/max independently for
  each component, like Cassandra does in the "md" format.

Fixes #4442

Tests: unit(dev), cql_query_test -t test_clustering_filtering* (debug)
md migration_test dtest from git@github.com:bhalevy/scylla-dtest.git migration_test-md-v1
"

* tag 'md-format-v4' of github.com:bhalevy/scylla: (27 commits)
  config: enable_sstables_md_format by default
  test: cql_query_test: add test_clustering_filtering unit tests
  table: filter_sstable_for_reader: allow clustering filtering md-format sstables
  table: create_single_key_sstable_reader: emit partition_start/end for empty filtered results
  table: filter_sstable_for_reader: adjust to md-format
  table: filter_sstable_for_reader: include non-scylla sstables with tombstones
  table: filter_sstable_for_reader: do not filter if static column is requested
  table: filter_sstable_for_reader: refactor clustering filtering conditional expression
  features: add MD_SSTABLE_FORMAT cluster feature
  config: add enable_sstables_md_format
  database: add set_format_by_config
  test: sstable_3_x_test: test both mc and md versions
  test: Add support for the "md" format
  sstables: mx/writer: use version from sstable for write calls
  sstables: mx/writer: update_min_max_components for partition tombstone
  sstables: metadata_collector: support min_max_components for range tombstones
  sstable: validate_min_max_metadata: drop outdated logic
  sstables: rename mc folder to mx
  sstables: may_contain_rows: always true for old formats
  sstables: add may_contain_rows
  ...
2020-08-11 13:29:11 +03:00
Piotr Jastrzebski
80e3923b3c codebase wide: replace find(...) != end() with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
the code pattern looked like:

<collection>.find(<element>) != <collection>.end()

In C++20 the same can be expressed with:

<collection>.contains(<element>)

This is not only more concise but also expresses the intend of the code
more clearly.

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>
2020-08-11 13:28:50 +03:00
Benny Halevy
e8d7744040 features: add MD_SSTABLE_FORMAT cluster feature
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Benny Halevy
65239a6e50 config: add enable_sstables_md_format
MD format is disabled by default at this point.

The option extends enable_sstables_mc_format
so that both are needed to be set for supporting
the md format.

The MD_FORMAT cluster feature will be added in
a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-10 18:53:04 +03:00
Asias He
cd7d64f588 gossip: Introduce GOSSIP_GET_ENDPOINT_STATES verb
The new verb is used to replace the current gossip shadow round
implementation. Current shadow round implementation reuses the gossip
syn and ack async message, which has plenty of drawbacks. It is hard to
tell if the syn messages to a specific peer node has responded. The
delayed responses from shadow round can apply to the normal gossip
states even if the shadow round is done. The syn and ack message
handler are full special cases due to the shadow round. All gossip
application states including the one that are not relevant are sent
back. The gossip application states are applied and the gossip
listeners are called as if is in the normal gossip operation. It is
completely unnecessary to call the gossip listeners in the shadow round.

This patch introduces a new verb to request the exact gossip application
states the shadow round  needed with a synchronous verb and applies the
application states without calling the gossip listeners. This patch
makes the shadow round easier to reason about, more robust and
efficient.

Refs: #6845
Tests: update_cluster_layout_tests.py
2020-07-27 09:15:11 +08:00
Asias He
bebd683177 gossip: Add do_apply_state_locally helper
The code in do_apply_state_locally will be shared in the next patch.

Refs: #6845
Tests: update_cluster_layout_tests.py
2020-07-27 09:00:47 +08:00
Asias He
55271f714e gossip: Do not talk to seed node explicitly
Currently, we talk to a seed node in each gossip round with some
probability, i.e., nr_of_seeds / (nr_of_live_nodes + nr_of_unreachable_nodes)
For example, with 5 seeds in a 50 nodes cluster, the probability is 0.1.

Now that we talk to all live nodes, including the seed nodes, in a
bounded time period. It is not a must to talk to seed node in each
gossip round.

In order to get rid of the seed concept, do not talk to seed node
explicitly in each gossip round.

This patch is a preparatory patch to remove the seed concept in gossip.

Refs: #6845
Tests: update_cluster_layout_tests.py
2020-07-23 14:24:06 +08:00
Asias He
8e219e10e7 gossip: Talk to live endpoints in a shuffled fashion
Currently, we select 10 percent of random live nodes to talk with in
each gossip round. There is no upper bound how long it will take to talk
to all live nodes.

This patch changes the way we select live nodes to talk with as below:

 1) Shuffle all the live endpoints randomly
 2) Split the live endpoints into 10 groups
 3) Talk to one of the groups in each gossip round
 4) Go to step 1 to shuffle again after we groups are talked with

We keep both randomness of selecting nodes as before and determinacy to complete
talking to all live nodes.

In addition, the way to favor newly added node is simplified. When a
new live node is added, it is always added to the front of the group, so
it will be talked with in the next gossip round.

This patch is a preparatory patch to remove the seed concept in gossip.

Refs: #6845
Tests: update_cluster_layout_tests.py
2020-07-23 14:23:59 +08:00
Asias He
a19917eb91 gossiper: Drop replacement_quarantine
It is not used any more after "gossiper: Drop unused replaced_endpoint".

Refs #5482
2020-07-06 11:27:55 +03:00
Asias He
2bc73ad290 gossiper: Drop unused replaced_endpoint
It is not used any more after 75cf1d18b5
(storage_service: Unify handling of replaced node removal from gossip)
in the "Make replacing node take writes" series.

Refs #5482
2020-07-06 11:27:55 +03:00
Rafael Ávila de Espíndola
64c8164e6c everywhere: Update to seastar api v4 (when_all_succeed returning a tuple)
We now just need to replace a few calls to then with then_unpack.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200618172100.111147-1-espindola@scylladb.com>
2020-06-23 19:40:18 +03:00