scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-28 18:50:53 +00:00

Files

Pavel Emelyanov c4bf532d37 storage_service: Fix race in removenode/force_removenode/other

Here's another theoretical problem, that involves 3 sequential calls
to respectively removenode, force_removenode and some other operation.
Let's walk through them

First goes the removenode:
  run_with_api_lock
    _operation_in_progress = "removenode"
    storage_service::remove_node
      sleep in replicating_nodes.empty() loop

Now the force_removenode can run:

  run_with_no_api_lock
    storage_service::force_removenode
      check _operation_in_progress (not empty)
      _force_remove_completion = true
      sleep in _operation_in_progress.empty loop

Now the 1st call wakes up and:

    if _force_remove_completion == true
      throw <some exception>
  .finally() handler in run_with_api_lock
    _operation_in_progress = <empty>

At this point some other operation may start. Say, drain:

  run_with_api_lock
    _operation_in_progress = "drain"
    storage_service::drain
      ...
      go to sleep somewhere

No let's go back to the 1st op that wakes up from its sleep.
The code it executes is

    while (!ss._operation_in_progress.empty()) {
        sleep_abortable()
    }

and while the drain is running it will never exit.

However (! and this is the core of the race) should the drain
operation happen _before_ the force_removenode, another check
for _operation_in_progress would have made the latter exit with
the "Operation drain is in progress, try again" message.

Fix this inconsistency by making the check for current operation
every wake-up from the sleep_abortable.

Fixes #5591

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

2020-01-14 10:01:06 +02:00

pager

Use threads when executing user functions

2019-11-07 08:41:08 -08:00

paxos

merge: bouncing lwt request to an owning shard

2020-01-14 09:59:59 +02:00

cache_hitrate_calculator.hh

cache_hitrate_calculator: Do not reinvent the peering_sharded_service

2020-01-03 15:48:19 +02:00

client_state.cc

client_state: store _user as optional instead of shared_ptr