scylladb

Files

Avi Kivity 3cb081eb84 Merge " hinted handoff: fix races during shutdown and draining" from Vlad

"
Fix races that may lead to use-after-free events and file system level exceptions
during shutdown and drain.

The root cause of use-after-free events in question is that space_watchdog blocks on
end_point_hints_manager::file_update_mutex() and we need to make sure this mutex is alive as long as
it's accessed even if the corresponding end_point_hints_manager instance
is destroyed in the context of manager::drain_for().

File system exceptions may occur when space_watchdog attempts to scan a
directory while it's being deleted from the drain_for() context.
In case of such an exception new hints generation is going to be blocked
- including for materialized views, till the next space_watchdog round (in 1s).

Issues that are fixed are #4685 and #4836.

Tested as follows:
1) Patched the code in order to trigger the race with (a lot) higher
probability and running slightly modified hinted handoff replace
dtest with a debug binary for 100 times. Side effect of this
testing was discovering of #4836.
2) Using the same patch as above tested that there are no crashes and
nodes survive stop/start sequences (they were not without this series)
in the context of all hinted handoff dtests. Ran the whole set of
tests with dev binary for 10 times.
"

* 'hinted_handoff_race_between_drain_for_and_space_watchdog_no_global_lock-v2' of https://github.com/vladzcloudius/scylla:
hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()
hinted handoff: make taking file_update_mutex safe
db::hints::manager::drain_for(): fix alignment
db::hints::manager: serialize calls to drain_for()
db::hints: cosmetics: identation and missing method qualifier

2019-10-03 14:38:00 +03:00

manager.cc

Merge " hinted handoff: fix races during shutdown and draining" from Vlad

2019-10-03 14:38:00 +03:00

manager.hh

hinted handoff: make taking file_update_mutex safe

2019-08-20 11:26:19 -04:00

resource_manager.cc

hinted handoff: fix a race on a directory removal between space_watchdog and drain_for()

2019-08-20 11:46:46 -04:00

resource_manager.hh

db,hints: decouple in-flight hints limits from resource manager

2019-07-12 19:21:26 +03:00