Commit Graph

653 Commits

Author SHA1 Message Date
Piotr Sarna
da67c594c8 gms: add UDA feature
UDA stands for user-defined aggregates and the feature implies
that the whole cluster supports them.
2021-08-13 11:14:12 +02:00
Piotr Jastrzebski
1bdcef6890 features: assume MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES are always enabled
These features have been around for over 2 years and every reasonable
deployment should have them enabled.

The only case when those features could be not enabled is when the user
has used enable_sstables_mc_format config flag to disable MC sstable
format. This case has been eliminated by removing
enable_sstables_mc_format config flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2021-06-25 10:12:00 +02:00
Asias He
2ad8fb756e gossip: Promote gossip quarantine over log to info level
1) Start node n1, n2, n3

2) Bootstrap n4 and kill n4 in the middle of bootstrap

3) Wipe data on n4 and start n4 again

After step 2, n1, n2 and n3 will remove n4 from gossip after
fat_client_timeout and put n4 in quarantine for quarantine_delay().

If n4 bootstraps again in step 3 before the quarantine finishes, n1, n2
and n3 will ignore gossip updates from n4, and n4 will not learn gossip
updates from the cluster.

After PR #8896, the bootstrap will be rejected.

This patch promotes the gossip quarantine over log to info level, so
that dtest can wait for the log to bootstrap the node again.

Refs #8889
Refs #8890

Closes #8905
2021-06-24 12:51:32 +03:00
Pavel Emelyanov
d606321575 feature: Remove unused friendship with gossiper
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-06-18 20:19:35 +03:00
Asias He
7a32cab524 gossip: Fix use-after-free in real_mark_alive and mark_dead
In commit 11a8912093 (gossiper:
get_gossip_status: return string_view and make noexcept)
get_gossip_status returns a pointer to an endpoint_state in
endpoint_state_map.

After commit 425e3b1182 (gossip: Introduce
direct failure detector), gossiper::mark_dead and gossiper::real_mark_alive
can yield in the middle of the function. It is possible that
endpoint_state can be removed, causing use-after-free to access it.

To fix, make a copy before we yield.

Fixes #8859

Closes #8862
2021-06-16 21:16:26 +02:00
Asias He
c2cfdcd345 gossiper: Set minimum value for quarantine_delay
When a new node bootstraps to join the cluster, it will be set in
bootstrap gossip status. If the node is gone in the middle, the node
will be removed by gossip after the new node fails to update gossip
after fat_client_timeout, which reverts the new node as pending node.

However, if the new node is slow to update gossip and it finishes
bootstrapping after existing nodes have removed the new node after
fat_client_timeout. In handle_state_normal handler, the existing nodes
will fail to find the host id for the new node and throw and in turn
terminate the scylla process.

To mitigate the problem, we set fat_client_timeout which is half of
quarantine_delay to a minimum value if users set a small ring_delay
value.

Refs #8702
Refs #8859

Closes #8860
2021-06-16 09:34:49 +02:00
Asias He
0665d9c346 gossip: Handle nodes removed from live endpoints directly
When a node is removed from the _live_endpoints list directly, e.g., a
node being decommissioned, it is possible the node might not be marked
as down in gossiper::failure_detector_loop_for_node loop before the loop
exits. When the gossiper::failure_detector_loop loop starts again, the
node will not be considered because it is not present in _live_endpoints
list any more. As a result, the node will not be marked as down though
gossiper::failure_detector_loop_for_node loop.

To fix, we mark the nodes that are removed from _live_endpoints
lists as down in the gossiper::failure_detector_loop loop.

Fixes #8712

Closes #8770
2021-06-09 15:02:25 +02:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Asias He
9b902fad79 gossiper: Update timestamp for nodes in ack and ack2 msg handler
In commit 425e3b1182 (gossip: Introduce
direct failure detector), the call to notify_failure_detector inside ack
and ack2 msg handler was removed since there is no need to update the
old failure detector anymore. However, the timestamp for endpoit_state
is also updated inside notify_failure_detector. With the new failure
detector we still need the timestamp for endpoit_state. Otherwise, nodes
might be removed from gossip wrongly.

For example, as we saw in issue #8702:

INFO  2021-05-24 22:45:24,713 [shard 0] gossip - FatClient 127.0.60.2
has been silent for 5000ms, removing from gossip

To fix, update the timestamp as we do before in ack and ack2 msg
handler.

Fixes #8702

Closes #8777
2021-06-06 09:21:23 +03:00
Kamil Braun
2ac9239f6a gms: introduce CDC_GENERATIONS_V2 feature
When a node joins this feature (which it does immediately when upgrading
to a version that has this commit), it says: "I understand the new
generation storage format and the new identifier format". Thus, when the
feature becomes enabled - after all nodes have joined it - it means that
it's safe to create new generations using these new storage/ID formats.
2021-05-25 16:07:23 +02:00
Kamil Braun
4658adbe18 tree-wide: introduce cdc::generation_id_v2
This is a new type of CDC generation identifiers. Compared to old IDs,
additionally to the timestamp it contains an UUID.

These new identifiers will allow a safer and more efficient algorithm of
introducing new generations into a cluster (introduced in a later commit).

For now, nodes keep using the old identifier format when creating new
generations and whenever they learn about a new CDC generation from gossip
they assume that it also is stored in the v1 format. But they do know how
to (de)serialize the second format and how to persist new identifiers in
local tables.
2021-05-24 17:50:21 +02:00
Avi Kivity
50f3bbc359 Merge "treewide: various header cleanups" from Pavel S
"
The patch set is an assorted collection of header cleanups, e.g:
* Reduce number of boost includes in header files
* Switch to forward declarations in some places

A quick measurement was performed to see if these changes
provide any improvement in build times (ccache cleaned and
existing build products wiped out).

The results are posted below (`/usr/bin/time -v ninja dev-build`)
for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX).

Before:

	Command being timed: "ninja dev-build"
	User time (seconds): 28262.47
	System time (seconds): 824.85
	Percent of CPU this job got: 3979%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2129888
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1402838
	Minor (reclaiming a frame) page faults: 124265412
	Voluntary context switches: 1879279
	Involuntary context switches: 1159999
	Swaps: 0
	File system inputs: 0
	File system outputs: 11806272
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

After:

	Command being timed: "ninja dev-build"
	User time (seconds): 26270.81
	System time (seconds): 767.01
	Percent of CPU this job got: 3905%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 2117608
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 1400189
	Minor (reclaiming a frame) page faults: 117570335
	Voluntary context switches: 1870631
	Involuntary context switches: 1154535
	Swaps: 0
	File system inputs: 0
	File system outputs: 11777280
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

The observed improvement is about 5% of total wall clock time
for `dev-build` target.

Also, all commits make sure that headers stay self-sufficient,
which would help to further improve the situation in the future.
"

* 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla:
  transport: remove extraneous `qos/service_level_controller` includes from headers
  treewide: remove evidently unneded storage_proxy includes from some places
  service_level_controller: remove extraneous `service/storage_service.hh` include
  sstables/writer: remove extraneous `service/storage_service.hh` include
  treewide: remove extraneous database.hh includes from headers
  treewide: reduce boost headers usage in scylla header files
  cql3: remove extraneous includes from some headers
  cql3: various forward declaration cleanups
  utils: add missing <limits> header in `extremum_tracking.hh`
2021-05-24 14:24:20 +03:00
Asias He
425e3b1182 gossip: Introduce direct failure detector
Currently, gossip uses the updates of the gossip heartbeat from gossip
messages to decide if a node is up or down. This means if a node is
actually down but the gossip messages are delayed in the network, the
marking of node down can be delayed.

For example, a node sends 20 gossip messages in 20 seconds before it
is dead. Each message is delayed 15 seconds by the network for some
reason. A node receives those delayed messages one after another.
Those delayed messages will prevent this node from being marked as down.
Because heartbeat update is received just before the threshold to mark a
node down is triggered which is around 20 seconds by default.

As a result, this node will not be marked as down in 20 * 15 seconds =
300 seconds, much longer than the ~20 seconds node down detection time
in normal cases.

In this patch, a new failure detector is implemented.

- Direct detection

The existing failure detector can get gossip heartbeat updates
indirectly.  For example:

Node A can talk to Node B
Node B can talk to Node C
Node A can not talk to Node C, due to network issues

Node A will not mark Node B to be down because Node A can get heart beat
of Node C from node B indirectly.

This indirect detection is not very useful because when Node A decides
if it should send requests to Node C, the requests from Node A to C will
fail while Node A thinks it can communicate with Node C.

This patch changes the failure detection to be direct. It uses the
existing gossip echo message to detect directly. Gossip echo messages
will be sent to peer nodes periodically. A peer node will be marked as
down if a timeout threshold has been meet.

Since the failure detection is peer to peer, it avoids the delayed
message issue mentioned above.

- Parallel detection

The old failure detector uses shard zero only. This new failure detector
utilizes all the shards to perform the failure detection, each shard
handling a subset of live nodes. For example, if the cluster has 32
nodes and each node has 16 shards, each shard will handle only 2 nodes.
With a 16 nodes cluster, each node has 16 shards, each shard will handle
only one peer node.

A gossip message will be sent to peer nodes every 2 seconds. The extra
echo messages traffic produced compared to the old failure detector is
negligible.

- Deterministic detection

Users can configure the failure_detector_timeout_in_ms to set the
threshold to mark a node down. It is the maximum time between two
successful echo message before gossip marks a node down. It is easier to
understand than the old phi_convict_threshold.

- Compatible

This patch only uses the existing gossip echo message. Nodes with or without
this patch can work together.

Fixes #8488

Closes #8036
2021-05-24 10:47:06 +03:00
Pavel Solodovnikov
fff7ef1fc2 treewide: reduce boost headers usage in scylla header files
`dev-headers` target is also ensured to build successfully.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2021-05-20 01:33:18 +03:00
Kamil Braun
03ad111beb tree-wide: comments on deprecated functions to access global variables
Closes #8665
2021-05-18 11:31:10 +03:00
Eliran Sinvani
5eb84f110e gossiper: remove excess error logging from gossiper
We remove a log of severity error that is later thrown as an
exception, being catched few lines below and then printed out as
a warning.

Fixes #8616

Closes #8617
2021-05-12 15:02:35 +02:00
Asias He
9ea57dff21 gossip: Relax failure detector update
We currently only update the failure detector for a node when a higher
version of application state is received. Since gossip syn messages do
not contain application state, so this means we do not update the
failure detector upon receiving gossip syn messages, even if a message
from peer node is received which implies the peer node is alive.

This patch relaxes the failure detector update rule to update the
failure detector for the sender of gossip messages directly.

Refs #8296

Closes #8476
2021-04-14 13:16:00 +02:00
Avi Kivity
fcc17d43a6 treewide: correct mislicensed source files
alternator/expressions.g had both AGPL and proprietary licensing. The
proprietary one is removed.

gms/inet_address_serializer.hh had only a proprietary license; it is
replaced by the AGPL.

Fixes #8465.

Closes #8466
2021-04-12 17:42:59 +03:00
Kamil Braun
99fd2244a3 tree-wide: introduce cdc::generation_id type
This is a follow-up to the previous commit.

Each CDC generation has a timestamp which denotes a logical point in time
when this generation starts operating. That same timestamp is
used to identify the CDC generation. We use this identification scheme
to exchange CDC generations around the cluster.

However, the fact that a generation's timestamp is used as an ID for
this generation is an implementation detail of the currently used method
of managing CDC generations.

Places in the code that deal with the timestamp, e.g. functions which
take it as an argument (such as handle_cdc_generation) are often
interested in the ID aspect, not the "when does the generation start
operating" aspect. They don't care that the ID is a `db_clock::time_point`.
They may sometimes want to retrieve the time point given the ID (such as
do_handle_cdc_generation when it calls `cdc::metadata::insert`),
but they don't care about the fact that the time point actually IS the ID.

In the future we may actually change the specific type of the ID if we
modify the generation management algorithms.

This commit is an intermediate step that will ease the transition in the
future. It introduces a new type, `cdc::generation_id`. Inside it contains
the timestamp, so:
1. if a piece of code doesn't care about the timestamp, it just passes
   the ID around
2. if it does care, it can simply access it using the `get_ts` function.
   The fact that `get_ts` simply accesses the ID's only field is an
   implementation detail.

Using the occasion, we change the `do_handle_cdc_generation_intercept...`
function to be a standard function, not a coroutine. It turns out that -
depending on the shape of the passed-in argument - the function would
sometimes miscompile (the compiled code would not copy the argument to the
coroutine frame).
2021-04-07 13:47:13 +02:00
Kamil Braun
e486e0f759 tree-wide: rename "cdc streams timestamp" to "cdc generation id"
Each CDC generation always has a timestamp, but the fact that the
timestamp identifies the generation is an implementation detail.
We abstract away from this detail by using a more generic naming scheme:
a generation "identifier" (whatever that is - a timestamp or something
else).

It's possible that a CDC generation will be identified by more than a
timestamp in the (near) future.

The actual string gossiped by nodes in their application state is left
as "CDC_STREAMS_TIMESTAMP" for backward compatibility.

Some stale comments have been updated.
2021-04-06 13:15:31 +02:00
Avi Kivity
40b60e8f09 Merge 'repair: Switch to use NODE_OPS_CMD for replace operation' from Asias He
In commit c82250e0cf (gossip: Allow deferring
advertise of local node to be up), the replacing node is changed to postpone
the responding of gossip echo message to avoid other nodes sending read
requests to the replacing node. It works as following:

1) replacing node does not respond echo message to avoid other nodes to
   mark replacing node as alive

2) replacing node advertises hibernate state so other nodes knows
   replacing node is replacing

3) replacing node responds echo message so other nodes can mark
   replacing node as alive

This is problematic because after step 2, the existing nodes in the
cluster will start to send writes to the replacing node, but at this
time it is possible that existing nodes haven't marked the replacing
node as alive, thus failing the write request unnecessarily.

For instance, we saw the following errors in issue #8013 (Cassandra
stress fails to achieve consistency when only one of the nodes is down)

```
scylla:
[shard 1] consistency - Live nodes 2 do not satisfy ConsistencyLevel (2
required, 1 pending, live_endpoints={127.0.0.2, 127.0.0.1},
pending_endpoints={127.0.0.3}) [shard 0] gossip - Fail to send
EchoMessage to 127.0.0.3: std::runtime_error (Not ready to respond
gossip echo message)

c-s:
java.io.IOException: Operation x10 on key(s) [4c4f4d37324c35304c30]:
Error executing: (UnavailableException): Not enough replicas available
for query at consistency QUORUM (2 required but only 1 alive
```

To solve this problem, we can do the replacing operation in multiple stages.

One solution is to introduce a new gossip status state as proposed
here: gossip: Introduce STATUS_PREPARE_REPLACE #7416

1) replacing node does not respond echo message

2) replacing node advertises prepare_replace state (Remove replacing
   node from natural endpoint, but do not put in pending list yet)

3) replacing node responds echo message

4) replacing node advertises hibernate state (Put replacing node in
   pending list)

Since we now have the node ops verb introduced in
829b4c1438 (repair: Make removenode safe
by default), we can do the multiple stage without introducing a new
gossip status state.

This patch uses the NODE_OPS_CMD infrastructure to implement replace
operation.

Improvements:

1) It solves the race between marking replacing node alive and sending
   writes to replacing node

2) The cluster reverts to a state before the replace operation
   automatically in case of error. As a result, it solves when the
   replacing node fails in the middle of the operation, the repacing
   node will be in HIBERNATE status forever issue.

3) The gossip status of the node to be replaced is not changed until the
   replace operation is successful. HIBERNATE gossip status is not used
   anymore.

4) Users can now pass a list of dead nodes to ignore explicitly.

Fixes #8013

Closes #8330

* github.com:scylladb/scylla:
  repair: Switch to use NODE_OPS_CMD for replace operation
  gossip: Add advertise_to_nodes
  gossip: Add helper to wait for a node to be up
  gossip: Add is_normal_ring_member helper
2021-04-04 12:54:09 +03:00
Asias He
bdb95233e8 gossip: Add advertise_to_nodes
gossiper::advertise_to_nodes() is added to allow respond to gossip echo
message with specified nodes and the current gossip generation number
for the nodes.

This is helpful to avoid the restarted node to be marked as alive during
a pending replace operation.

After this patch, when a node sends a echo message, the gossip
generation number is sent in the echo message. Since the generation
number changes after a restart, the receiver of the echo message can
compare the generation number to tell if the node has restarted.

Refs #8013
2021-04-01 09:38:54 +08:00
Asias He
f690f3ee8e gossip: Add helper to wait for a node to be up
This patch adds gossiper::wait_alive helper to wait for nodes to be up
on all shards.

Refs #8013
2021-04-01 09:38:54 +08:00
Asias He
4f5676630e gossip: Add is_normal_ring_member helper
Check if a node is in NORMAL or SHUTDOWN status which means the node is part of
the token ring from the gossip point of view and operates in normal status or
was in normal status but is shutdown.

Refs #8013
2021-04-01 09:38:54 +08:00
Piotr Jastrzebski
57c7964d6c config: ignore enable_sstables_mc_format flag
Don't allow users to disable MC sstables format any more.
We would like to retire some old cluster features that has been around
for years. Namely MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES. To do this
we first have to make sure that all existing clusters have them enabled.
It is impossible to know that unless we stop supporting
enable_sstables_mc_format flag.

Test: unit(dev)

Refs #8352

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>

Closes #8360
2021-03-31 12:23:59 +03:00
Piotr Sarna
e42dee6afb gms: make CORRECT_STATIC_COMPACT_IN_MC ft unconditionally true
The feature is assumed to be true due to being over 2 years old.
It's still advertised in gossip, but it's assumed to always be present.
2021-03-30 09:37:13 +02:00
Piotr Sarna
08c4350968 gms: make TRUNCATION_TABLE feature unconditionally true
Turns out the feature was not used presently.
Historically, the commit which removed the support is
30a700c5b0 .
2021-03-30 09:36:45 +02:00
Piotr Sarna
c070178c7e gms: make ROW_LEVEL_REPAIR feature unconditionally true
The feature is assumed to be true due to being over 2 years old.
It's still advertised in gossip, but it's assumed to always be present.
2021-03-30 09:36:11 +02:00
Asias He
dc40184faa gossip: Handle timeout error in gossiper::do_shadow_round
Currently, the rpc timeout error for the GOSSIP_GET_ENDPOINT_STATES verb
is not handled in gossiper::do_shadow_round. If the
GOSSIP_GET_ENDPOINT_STATES rpc call to any of the remote nodes goes
timeout, gossiper::do_shadow_round will throw an exception and fail the
whole boot up process.

It is fine that some of the remote nodes timeout in shadow round. It is
not a must to talk to all nodes.

This patch fixes an issue we saw recently in our sct tests:

```
INFO    | scylla[1579]: [shard 0] init - Shutting down gossiping
INFO    | scylla[1579]: [shard 0] gossip - gossip is already stopped
INFO    | scylla[1579]: [shard 0] init - Shutting down gossiping was successful
...

ERR     | scylla[1579]: [shard 0] init - Startup failed: seastar::rpc::timeout_error (rpc call timed out)
```

Fixes #8187

Closes #8213
2021-03-08 13:03:41 +01:00
Botond Dénes
5c84aa52db gms: add RANGE_SCAN_DATA_VARIANT cluster feature
To control the transition to the data variant of range scans. As there
is a difference in how the data and mutation variants calculate pages
sizes, the transition to the former has to happen in a controlled
manner, when all nodes in the cluster support it, to avoid artificial
differences in page content and subsequently triggering false-positive
read repair.
2021-03-02 07:53:53 +02:00
Eliran Sinvani
63b794d104 schema: recalculate digest when computed_columns feature is enabled
The schema digest is affected by the computed_columns feature, this
means that we have to recalculate our schema digest when this feature is
enabled.
2021-02-11 13:48:58 +02:00
Piotr Sarna
8f98c0585f failure_detector: add a missing const qualifier
The mean() method is effectively const, so it should be marked as such.
Message-Id: <14dd39e8419136909fcf10508c34de3752faa7fe.1612953601.git.sarna@scylladb.com>
2021-02-10 13:04:37 +02:00
Piotr Sarna
faca59efa6 failure_detector: add getting last update time point
It can be useful to use the information how long ago an endpoint
responded to heartbeat.
2021-02-08 16:45:58 +01:00
Piotr Sarna
d23584c8f7 failure_detector: return arrival samples by const reference
There's no point in always returning the whole map by value - callers
can decide to copy the map of their own if need be.
2021-02-08 11:50:32 +01:00
Piotr Sarna
445e6e44f4 failure_detector: remove unimplemented is_alive method
The method was never implemented, so it makes no sense to keep
it in the header.
2021-02-08 11:49:50 +01:00
Asias He
c82250e0cf gossip: Allow deferring advertise of local node to be up
Currently the replacing node sets the status as STATUS_UNKNOWN when it
starts gossip service for the first time before it sets the status to
HIBERNATE to start the replacing operation. This introduces the
following race:

1) Replacing node using the same IP address of the node to be replaced
starts gossip service without setting the gossip STATUS (will be seen as
STATUS_UNKNOWN by other nodes)

2) Replacing node waits for gossip to settle and learns status and
tokens of existing nodes

3) Replacing node announces the HIBERNATE STATUS.

After Step 1 and before Step 3, existing nodes will mark the replacing
node as UP, but haven't marked the replacing node as doing replacing
yet. As a result, the replacing node will not be excluded from the read
replicas and will be considered a target node to serve CQL reads.

To fix, we make the replacing node avoid responding echo message when it is not
ready.

Fixes #7312

Closes #7714
2021-01-26 19:02:11 +01:00
Juliusz Stasiewicz
b150906d39 gossip: Added SNITCH_NAME to application_state
Snitch name needs to be exchanged within cluster once, on shadow
round, so joining nodes cannot use wrong snitch. The snitch names
are compared on bootstrap and on normal node start.

If the cluster already used mixed snitches, the upgrade to this
version will fail. In this case customer needs to add a node with
correct snitch for every node with the wrong snitch, then put
down the nodes with the wrong snitch and only then do the upgrade.

Fixes #6832

Closes #7739
2020-12-09 15:45:25 +02:00
Asias He
0a3a2a82e1 api: Add force_remove_endpoint for gossip
It is used to force remove a node from gossip membership if something
goes wrong.

Note: run the force_remove_endpoint api at the same time on _all_ the
nodes in the cluster in order to prevent the removed nodes come back.
Becasue nodes without running the force_remove_endpoint api cmd can
gossip around the removed node information to other nodes in 2 *
ring_delay (2 * 30 seconds by default) time.

For instance, in a 3 nodes cluster, node 3 is decommissioned, to remove
node 3 from gossip membership prior the auto removal (3 days by
default), run the api cmd on both node 1 and node 2 at the same time.

$ curl -X POST --header "Accept: application/json"
"http://127.0.0.1:10000/gossiper/force_remove_endpoint/127.0.0.3"
$ curl -X POST --header "Accept: application/json"
"http://127.0.0.2:10000/gossiper/force_remove_endpoint/127.0.0.3"

Then run 'nodetool gossipinfo' on all the nodes to check the removed nodes
are not present.

Fixes #2134

Closes #5436
2020-11-29 13:58:46 +02:00
Piotr Jastrzebski
d2897d8f8b alternator: guard streams with an experimental flag
Add new alternator-streams experimental flag for
alternator streams control.

CDC becomes GA and won't be guarded by an experimental flag any more.
Alternator Streams stay experimental so now they need to be controlled
by their own experimental flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:16 +01:00
Piotr Jastrzebski
e9072542c1 Mark CDC as GA
Enable CDC by default.
Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc
flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:13 +01:00
Benny Halevy
a0436ea324 gossiper: convert to shared_token_metadata
get() the latest token_metadata& from the
shared_token_metadata before each use.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
29ed59f8c4 main: start a shared_token_metadata
And use it to get a token_metadata& compatible
with current usage, until the services are converted to
use token_metadata_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Nadav Har'El
7ff72b0ba5 Merge 'secondary_index: fix returned rows token ordering' from Piotr Grabowski
Fixes returned rows ordering to proper signed token ordering. Before this change, rows were sorted by token, but using unsigned comparison, meaning that negative tokens appeared after positive tokens.

Rename `token_column_computation` to `legacy_token_column_computation` and add some comments describing this computation.

Added (new) `token_column_computation` which returns token as `long_type`, which is sorted using signed comparison - the correct ordering of tokens.

Add new `correct_idx_token_in_secondary_index` feature, which flags that the whole cluster is able to use new `token_column_computation`.

Switch token computation in secondary indexes to (new) `token_column_computation`, which fixes the ordering. This column computation type is only set if cluster supports `correct_idx_token_in_secondary_index` feature to make sure that all nodes
will be able to compute new `token_column_computation`. Also old indexes will need to be rebuilt to take advantage of this fix, as new token column computation type is only set for new indexes.

Fix tests according to new token ordering and add one new test to validate this aspect explicitly.

Fixes #7443

Tested manually a scenario when someone created an index on old version of Scylla and then migrated to new Scylla. Old index continued to work properly (but returning in wrong order). Upon dropping and re-creating the index, it still returned the same data, but now in correct order.

Closes #7534

* github.com:scylladb/scylla:
  tests: add token ordering test of indexed selects
  tests: fix tests according to new token ordering
  secondary_index: use new token_column_computation
  feature: add correct_idx_token_in_secondary_index
  column_computation: add token_column_computation
  token_column_computation: rename as legacy
2020-11-05 18:44:49 +01:00
Piotr Grabowski
6624d933c9 feature: add correct_idx_token_in_secondary_index
Add new correct_idx_token_in_secondary_index feature, which will be used
to determine if all nodes in the cluster support new 
token_column_computation. This column computation will replace
legacy_token_column_computation in secondary indexes, which was 
incorrect as this column computation produced values that when compared 
with unsigned comparison (CQL type bytes comparison) resulted in 
different ordering  than token signed comparison. See issue:

https://github.com/scylladb/scylla/issues/7443
2020-11-04 12:02:42 +01:00
Benny Halevy
e4614d4836 gossiper: mark trivial methods noexcept
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:47 +02:00
Benny Halevy
1ba4c84ae2 gossiper: get_cluster_name, get_partitioner_name: make noexcept
These methods can return a const sstring& rather than
allocating a sstring. And with that they can be marked noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:29 +02:00
Benny Halevy
11a8912093 gossiper: get_gossip_status: return string_view and make noexcept
Change get_gossip_status to return string_view,
and with that it can be noexcept now that it doesn't
allocate memory via sstring.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
126e486fde gms/endpoint_state: mark methods using get_status noexcept
Now that get_status returns string_view, just compare it with a const char*
rather than making a sstring out of it, and consequently, can be marked noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
6b9191b6c2 gms/endpoint_state: get_status: return string_view and make noexcept
get_status doesn't need to allocate a sstring, it can just
return a std::string_view to the status string, if found.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
232c665bab gms/endpoint_state: mark get_application_state_ptr and is_cql_ready noexcept
Although std::map::find is not guaranteed to be noexcept
it depends on the comperator used and in this case comparing application_state
is noexcept.  Therefore, we can safely mark get_application_state_ptr noexcept.

is_cql_ready depends on get_application_state_ptr and otherwise
handles an exceptions boost::lexical_cast so it can be marked
noexcept as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00