scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Files

Asias He d71a94a08b gossip: Add tokens and host_id in add_saved_endpoint

Problem:

   Start node 1 2 3
   Shutdown node2
   Shutdown node1 node3
   Start node1 node3
   Try to repalce_address for node 2
   The replace operation fails with the error:
     seastar - Exiting on unhandled exception: std::runtime_error
     (Cannot replace_address node2 because it doesn't exist in gossip)

This is because after all nodes shutdown, the other nodes do not have the
tokens and host_id info of node2 until node2 boots up and talks to the cluster.

If node2 can not boots up for whatever reason, currently the only way to
recover node2 is to `nodetool removenode` and bootstrap node2 again. This will
change tokens in the cluster and cause more data movement than just replacing
node2.

To fix, we add the tokens and host_id gossip application state in add_saved_endpoint
during boot up.

This is pretty safe because the generation for application state added by
add_saved_endpoint is zero, if node2 actually boots, other nodes will update
with node2's version.

Before:
$ curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool

    {
        "addrs": "127.0.0.2",
        "generation": 0,
        "is_alive": false,
        "update_time": 1523344828953,
        "version": 0
    }

Node 2 can not be replaced.

After:
$ curl -X GET --header "Accept: application/json" "http://127.0.0.1:10000/failure_detector/endpoints/" | python -mjson.tool

    {
        "addrs": "127.0.0.2",
        "application_state": [
            {
                "application_state": 12,
                "value": "31284090-2557-4036-9367-7bb4ef49c35a",
                "version": 2
            },
            {
                "application_state": 13,
                "value": "... a lot of tokens ...",
                "version": 1
            }
        ],
        "generation": 0,
        "is_alive": false,
        "update_time": 1523344828953,
        "version": 0
    }

Node 2 can be replaced.

Tests: dtest/replace_address_test.py
Fixes: #3347
Message-Id: <117fd6649939e0505847335791be8d7a96e7d273.1523346805.git.asias@scylladb.com>

2018-04-10 13:14:31 +02:00

application_state.cc

gossip: Print SCHEMA_TABLES_VERSION correctly

2017-09-26 08:38:28 +02:00

application_state.hh

service: Advertise schema tables format version through gossip

2017-07-07 19:07:59 +02:00

endpoint_state.cc

gms/endpoint_state: Remove get_application_state()

2017-10-11 10:02:32 +01:00

endpoint_state.hh

gossiper: Replicate endpoint_state::is_alive()