mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Go to file

Kamil Braun fd32e2ee10 Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski

The function `gms::version_generator::get_next_version()` can only be called from shard 0 as it uses a global, unsynchronized counter to issue versions. Notably, the function is used as a default argument for the constructor of `gms::versioned_value` which is used from shorthand constructors such as `versioned_value::cache_hitrates`, `versioned_value::schema` etc.

The `cache_hitrate_calculator` service runs a periodic job which updates the `CACHE_HITRATES` application state in the local gossiper state. Each time the job is scheduled, it runs on the next shard (it goes through shards in a round-robin fashion). The job uses the `versioned_value::cache_hitrates` shorthand to create a `versioned_value`, therefore risking a data race if it is not currently executing on shard 0.

The PR fixes the race by moving the call to `versioned_value::cache_hitrates` to shard 0. Additionally, in order to help detect similar issues in the future, a check is introduced to `get_next_version` which aborts the process if the function was called on other shard than 0.

There is a possibility that it is a fix for #17493. Because `get_next_version` uses a simple incrementation to advance the global counter, a data race can occur if two shards call it concurrently and it may result in shard 0 returning the same or smaller value when called two times in a row. The following sequence of events is suspected to occur on node A:

1. Shard 1 calls `get_next_version()`, loads version `v - 1` from the global counter and stores in a register; the thread then is preempted,
2. Shard 0 executes `add_local_application_state()` which internally calls `get_next_version()`, loads `v - 1` then stores `v` and uses version `v` to update the application state,
3. Shard 0 executes `add_local_application_state()` again, increments version to `v + 1` and uses it to update the application state,
4. Gossip message handler runs, exchanging application states with node B. It sends its application state to B. Note that the max version of any of the local application states is `v + 1`,
5. Shard 1 resumes and stores version `v` in the global counter,
6. Shard 0 executes `add_local_application_state()` and updates the application state - again - with version `v + 1`.
7. After that, node B will never learn about the application state introduced in point 6. as gossip exchange only sends endpoint states with version larger than the previous observed max version, which was `v + 1` in point 4.

Note that the above scenario was _not_ reproduced. However, I managed to observe a race condition by:

1. modifying Scylla to run update of `CACHE_HITRATES` much more frequently than usual,
2. putting an assertion in `add_local_application_state` which fails if the version returned by `get_next_version` was not larger than the previous returned value,
3. running a test which performs schema changes in a loop.

The assertion from the second point was triggered. While it's hard to tell how likely it is to occur without making updates of cache hitrates more frequent - not to mention the full theorized scenario - for now this is the best lead that we have, and the data race being fixed here is a real bug anyway.

Refs: #17493

Closes scylladb/scylladb#17499

* github.com:scylladb/scylladb:
  version_generator: check that get_next_version is called on shard 0
  misc_services: fix data race from bad usage of get_next_version

2024-02-25 19:35:34 +01:00

.github

.git: skip *.svg when scanning spelling errors

2024-02-08 19:46:54 +02:00

alternator

alternator: add fmt::formatter for alternator::parsed::path

2024-02-22 16:40:01 +02:00

api

api: Reserve vectors in advance

2024-02-20 19:13:05 +03:00

auth

auth: drop const from methods on write path

2024-02-14 13:24:53 +01:00

bin

…

cdc

cdc: s/string_view/std::string_view/

2024-02-22 13:49:19 +02:00

cmake

cmake: add -Wextra to compiling options

2024-02-18 19:21:54 +02:00

compaction

Fix potential data resurrection when another compaction type does cleanup work

2024-02-25 13:08:04 +02:00

conf

Merge 'Add maintenance socket' from Mikołaj Grzebieluch

2023-12-20 19:04:40 +02:00

cql3

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

data_dictionary

data_dictionary: use fmt::format() when appropriate

2024-02-08 19:44:56 +02:00

Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity

2024-02-22 14:03:43 +02:00

debug

…

dht

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

direct_failure_detector

…

dist

scylla_raid_setup: reference xfsprog on the minimal 1024 block size

2024-02-14 08:44:14 +02:00

docs

Merge 'raft topology: enable writes to previous CDC generations' from Patryk Jędrzejczak

2024-02-22 11:41:25 +01:00

exceptions

exceptions: do not include unused headers

2024-02-06 13:16:03 +02:00

gms

Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski

2024-02-25 19:35:34 +01:00

idl

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

index

Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes

2024-01-24 23:36:54 +02:00

interface

Typos: fix typos in comments

2023-12-02 22:37:22 +02:00

lang

lang: do not include unused headers

2024-02-07 09:27:39 +02:00

licenses

…

locator

locator: fix typo in comment -- s/slecting/selecting/

2024-02-22 13:28:18 +02:00

message

range.hh: retire

2024-02-21 00:24:25 +02:00

mutation

Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai

2024-02-25 09:48:56 +02:00

mutation_writer

mutation_writer: do not include unused headers

2024-01-24 15:20:02 +02:00

node_ops

token_metadata: drop the template

2023-12-12 23:19:54 +04:00

raft

raft: add fmt::formatter for raft::fsm

2024-02-20 09:02:02 +02:00

readers

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

redis

transport/controller: pass unix_domain_socket_permissions to generic_server::listen

2024-02-05 14:22:03 +01:00

reloc

…

repair

Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk

2024-02-23 08:25:45 +02:00

replica

mv: adjust the overhead estimation for view updates

2024-02-21 00:05:49 +02:00

rust

rust: update dependencies

2023-12-17 13:20:25 +02:00

schema

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

scripts

Typos: fix typos in code

2023-12-13 10:45:21 +02:00

seastar @ 5d3ee98073

Update seastar submodule

2024-02-12 12:21:47 +02:00

service

Merge 'misc_services: fix data race from bad usage of get_next_version' from Piotr Dulikowski

2024-02-25 19:35:34 +01:00

sstables

Merge 'sstables: add fmt::formatter for sstable types' from Kefu Chai

2024-02-23 10:09:26 +02:00

streaming

Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk

2024-02-23 08:25:45 +02:00

swagger-ui @ 12f1da1082

…

tasks

tasks: do not include unused headers

2024-02-02 15:20:40 +01:00

test

Fix potential data resurrection when another compaction type does cleanup work

2024-02-25 13:08:04 +02:00

thrift

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

tools

tools/scylla-nodetool: make keyspace argument optional for "ring"

2024-02-23 09:25:29 +02:00

tracing

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

transport

Merge 'Maintenance socket: set filesystem permissions to 660' from Mikołaj Grzebieluch

2024-02-20 15:09:54 +02:00

types

data_value: delete data_value(T*) constructor

2024-02-11 15:42:55 +02:00

unified

Update unified/build_unified.sh

2023-12-05 15:23:38 +02:00

utils

utils/managed_bytes: add fmt::formatters for managed_bytes and friends

2024-02-23 11:32:41 +08:00

.dockerignore

…

.gitattributes

…

.gitignore

docs: download iam csv files

2023-10-02 12:28:56 +03:00

.gitmodules

…

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

amplify.yml

…

backlog_controller.hh

treewide: apply codespell to the comments in source code

2023-12-20 10:25:03 +02:00

build_mode.hh

…

bytes_ostream.hh

utils: managed_bytes: optimize memory usage for small buffers

2024-02-09 20:56:20 +01:00

bytes.cc

…

bytes.hh

bytes.hh: correct spelling of delimiter and delimited

2023-12-18 20:46:21 +02:00

cache_flat_mutation_reader.hh

cache_flat_mutation_reader: fix a broken iterator validity guarantee in ensure_population_lower_bound()

2023-11-16 19:01:18 +01:00

cache_temperature.hh

…

cartesian_product.hh

…

cell_locking.hh

…

checked-file-impl.hh

…

client_data.cc

…

client_data.hh

…

clocks-impl.cc

clocks-impl: format time_point using fmt

2023-11-22 17:44:07 +02:00

clocks-impl.hh

…

clustering_bounds_comparator.hh

utils/managed_bytes: add fmt::formatters for managed_bytes and friends

2024-02-23 11:32:41 +08:00

clustering_interval_set.hh

…

clustering_key_filter.hh

…

clustering_ranges_walker.hh

…

CMakeLists.txt

build: cmake: add "mode_list" target

2023-12-24 12:35:02 +08:00

collection_mutation.cc

collection_mutation: add formatter for collection_mutation_view::printer

2024-02-13 17:42:25 +02:00

collection_mutation.hh

collection_mutation: add formatter for collection_mutation_view::printer

2024-02-13 17:42:25 +02:00

column_computation.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

combine.hh

…

compound_compat.hh

…

compound.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

compress.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

compress.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

concrete_types.hh

use fmt::to_string() for seastar::net::inet_address

2024-02-05 16:56:40 +01:00

configure.py

Merge 'repair: streaming: handle no_such_column_family from remote node' from Aleksandra Martyniuk

2024-02-23 08:25:45 +02:00

CONTRIBUTING.md

…

converting_mutation_partition_applier.cc

…

converting_mutation_partition_applier.hh

…

counters.cc

…

counters.hh

…

coverage_excludes.txt

test.py: support code coverage

2024-01-18 11:11:34 +02:00

coverage_sources.list

configure.py support coverage profiles on standrad build modes

2024-01-18 11:11:34 +02:00

cql_serialization_format.hh

…

db_clock.hh

…

debug.cc

…

debug.hh

…

default.nix

…

Doxyfile

…

duration.cc

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

duration.hh

…

encoding_stats.hh

encoding_state: mark helper methods protected

2023-08-29 15:41:13 +03:00

enum_set.hh

…

fix_system_distributed_tables.py

…

flake.lock

…

flake.nix

…

frozen_schema.cc

…

frozen_schema.hh

…

full_position.hh

…

gc_clock.hh

db: add formatter for gc_clock::time_point

2024-02-11 16:39:25 +02:00

gdbinit

…

gen_segmented_compress_params.py

Typos: fix typos in code

2023-12-13 10:45:21 +02:00

generic_server.cc

transport/controller: pass unix_domain_socket_permissions to generic_server::listen

2024-02-05 14:22:03 +01:00

generic_server.hh

transport/controller: pass unix_domain_socket_permissions to generic_server::listen

2024-02-05 14:22:03 +01:00

HACKING.md

…

hashing_partition_visitor.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

idl-compiler.py

Typos: fix typos in code

2023-12-13 10:45:21 +02:00

inet_address_vectors.hh

abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata

2023-12-12 23:19:53 +04:00

init.cc

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

init.hh

Merge 'Typos: fix typos in code' from Yaniv Kaul

2023-12-06 07:36:41 +02:00

install-dependencies.sh

install-dependencies.sh: remove duplicate python3-pyudev package

2024-02-02 15:20:40 +01:00

install.sh

install.sh: use a temporary file when packaging scylla.yaml

2024-01-01 21:50:29 +02:00

interval.hh

interval: add fmt::formatters for managed_bytes and friends

2024-02-23 10:26:30 +02:00

keys.cc

…

keys.hh

…

LICENSE.AGPL

…

log.hh

…

main.cc

locator: token_metadata: Introduce topology barrier stall detector

2024-02-21 15:05:34 +02:00

map_difference.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

marshal_exception.hh

…

multishard_mutation_query.cc

interval, multishard_mutation_query: fix typos in comments

2024-02-23 09:06:24 +02:00

multishard_mutation_query.hh

treewide: apply codespell to the comments in source code

2023-12-20 10:25:03 +02:00

mutation_query.cc

mutation_query: reconcilable_result: add merge_disjoint()

2024-02-21 02:08:48 -05:00

mutation_query.hh

mutation_query: reconcilable_result: add merge_disjoint()

2024-02-21 02:08:48 -05:00

noexcept_traits.hh

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

…

partition_range_compat.hh

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

partition_slice_builder.cc

…

partition_slice_builder.hh

…

partition_snapshot_reader.hh

…

partition_snapshot_row_cursor.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

protocol_server.hh

…

querier.cc

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

querier.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

query_id.hh

…

query_ranges_to_vnodes.cc

everywhere: reduce dependencies on i_partitioner.hh

2023-11-05 20:47:44 +02:00

query_ranges_to_vnodes.hh

everywhere: reduce dependencies on i_partitioner.hh

2023-11-05 20:47:44 +02:00

query_result_merger.hh

…

query-request.hh

Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity

2024-02-22 14:03:43 +02:00

query-result-reader.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

query-result-set.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

query-result-set.hh

query-result-set: add formatter for query-result-set.hh types

2024-02-21 17:54:48 +08:00

query-result-writer.hh

query: do not kill unpaged queries when they reach the tombstone-limit

2024-02-12 12:34:04 +02:00

query-result.hh

query-result.hh: add formatter for query::result::printer

2024-02-21 17:57:18 +08:00

query.cc

…

read_context.hh

…

reader_concurrency_semaphore.cc

reader_concurrency_semaphore.cc: move stringstream content instead of copying it

2024-01-31 09:31:50 +02:00

reader_concurrency_semaphore.hh

reader_permit: store schema_ptr instead of raw schema pointer

2024-01-11 08:37:56 +02:00

reader_permit.hh

reader_permit: store schema_ptr instead of raw schema pointer

2024-01-11 08:37:56 +02:00

README.md

…

real_dirty_memory_accounter.hh

…

release.cc

…

release.hh

…

reversibly_mergeable.hh

…

row_cache.cc

Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski

2024-02-13 17:37:06 +02:00

row_cache.hh

row_cache: use preemption_source in update()

2024-02-07 18:31:36 +01:00

schema_mutations.cc

…

schema_mutations.hh

…

schema_upgrader.hh

…

scylla_post_install.sh

dist: drop legacy control group parameters

2023-12-11 19:38:28 +09:00

scylla-gdb.py

Merge 'interval: rename nonwrapping_interval to interval' from Avi Kivity

2024-02-22 14:03:43 +02:00

SCYLLA-VERSION-GEN

Typos: fix typos in code

2023-12-13 10:45:21 +02:00

seastarx.hh

…

serialization_visitors.hh

…

serializer_impl.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

serializer.cc

…

serializer.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

service_permit.hh

…

setup.py

…

shell.nix

…

sstables_loader.cc

sstables_loader: load_new_sstables: auto-enable load-and-stream for tablets

2024-01-16 18:43:52 +02:00

sstables_loader.hh

…

supervisor.hh

…

table_helper.cc

keyspace_metadata: Add default value for new_keyspace's durable_writes

2023-12-26 11:47:37 +03:00

table_helper.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

test.py

test.py: support skipping multiple test patterns

2024-02-13 17:32:03 +02:00

timeout_config.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

timeout_config.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

timestamp.hh

…

tombstone_gc_extension.hh

…

tombstone_gc_options.cc

…

tombstone_gc_options.hh

…

tombstone_gc.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

tombstone_gc.hh

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

tox.ini

…

ubsan-suppressions.supp

…

unimplemented.cc

unimplemented: add format_as() for unimplemented::cause

2024-01-19 08:38:30 +02:00

unimplemented.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

validation.cc

…

validation.hh

…

version.hh

…

view_info.hh

everywhere: reduce dependencies on i_partitioner.hh

2023-11-05 20:47:44 +02:00

vint-serialization.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

vint-serialization.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

zstd.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.2%

Python 26.6%

CMake 0.3%

GAP 0.3%

Shell 0.3%