mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 13:06:57 +00:00

Go to file

Tomasz Grabiec bbfa52822e row_cache: Switch readers to use per-entry snapshots

Currently readers are always using the latest snapshot. This is fine
for respecting write atomicity if partitions are fully continuous in
cache (now), but will break write atomicity once partial population is
allowed.

Consider the following case:

  flush write(ck=1), write(ck=2) -> snapshot_1
  cache reader 1 reads and inserts ck=1 @snapshot_1
  flush write(ck=1), write(ck=2) -> snapshot_2
  cache reader 2 reads and inserts ck=2 @snapshot_2

Because cache update is not atomic, it can happen that reader 2 will
complete while the partition hasn't been updated yet for snapshot_2.
In such case, after read 2 the partition would contain ck=1 from
snapshot_1 and ck=2 from snapshot_2. It will match neither of the
snapshots, and this could violate write atomicity.

To solve this problem we conceptually assign each partition key in the
ring to its current snapshot which it reflects. The update process
gradually converts entries in ring order to the new snapshot. Reads
will not be using the latest snapshot, but rather the current snapshot
for the position in the ring they are at.

There is a race between the update process and populating reads. Since
after the update all entries must reflect the new snapshot, reads
using the old snapshot cannot be allowed to insert data which can no
longer be reached by the update process. Before this patch this race
was prevented by the use of a phased_barrier, where readers would keep
phased_barrier::operation alive between starting a read of a partition
and inserting it into cache. Cache update was waiting for all prior
operations before starting the update. Any later read which was not
waited for would use the latest snapshot for reads, so the update
process didn't have to fix anything up for such reads.

After this change, later reads cannot always use the latest snapshot,
they have to use the snapshot corresponding to given entry. So it's
not enough for update() to wait for prior reads in order to prevent
stale populations. The (simple) solution implemented in this patch is
to detect the conflict and abandon population of given sub-range. In
general, reads are allowed to populate given range only if it belongs
to a single snapshot.

Note that the range here is not the whole query range. For population
of continuity, it is the range starting after the previous key and
ending after the key being inserted. When populating a partition
entry, the range is a singular range containing only the partition
key. Readers switch to new snapshots automatically as they move across
the ring. It's possible that the insertion of the partition doesn't
conflict, but continuity does. In such case the entry will be inserted
but continuity will not be set.

2017-06-24 18:06:11 +02:00

.github

Make github issue template less shouty

2016-06-01 10:45:04 +03:00

api

api: return correct values for bloom filter statistics

2017-05-21 13:11:22 +03:00

auth

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

conf

Add a comment experimental line to scylla.yaml

2017-06-22 09:06:19 +03:00

cql3

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

cache: Remove support for wide partitions

2017-06-24 18:06:11 +02:00

debug

debug: add systemtap script to measure interesting latencies during cache updates.

2017-01-26 22:15:16 -05:00

dht

dht: Add ring_position min()/max()

2017-06-24 18:06:11 +02:00

dist

dist/debian: Debian 9(stretch) support

2017-06-21 15:30:22 +03:00

docs

docs: Fix Docker Hub documentation logo

2017-06-19 13:11:59 +03:00

exceptions

cql3::query_processor: use weak_ptr for passing the prepared statements around

2017-04-12 12:24:03 -04:00

gms

Distribute cache temperature over gossiper.

2017-06-13 09:57:14 +03:00

idl

database: introduce cache_temperature class

2017-06-13 09:57:14 +03:00

index

index: Add secondary_index_manager

2017-05-08 10:03:28 +03:00

interface

thrift: change generated code namespace

2017-05-05 05:26:20 +03:00

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

licenses

Merge "CQL 3.3.1 support" from Pekka

2017-01-09 11:54:45 +02:00

locator

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

message

messaging_service: connection drop notifier

2017-06-13 09:57:14 +03:00

repair

repair: Repair on all shards

2017-06-14 17:52:49 +08:00

scripts

scripts/scylla_install_pkg: support Debian

2016-12-06 12:06:30 +02:00

seastar @ 9e2b7ecdd7

Merge seastar upstream

2017-06-20 11:01:40 +03:00

service

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

sstables

Merge "small fixes and cleanup for leveled strategy" from Raphael

2017-06-22 15:45:34 +03:00

streaming

streaming: Do not abort session too early in idle detection

2017-05-24 12:29:50 +03:00

swagger-ui @ 1b212bbe71

Update swagger-ui for local fix (change URL to not to point to pet store)

2015-06-25 14:04:07 +03:00

tests

row_cache: Switch readers to use per-entry snapshots

2017-06-24 18:06:11 +02:00

thrift

thrift/server: Close connections when stopping server

2017-06-02 00:15:20 +02:00

tools/scyllatop

scyllatop: dump all output to stdout instead of running a fancy console interface

2016-08-31 08:31:36 +03:00

tracing

Merge "tracing: tracing spans and time series helper table" from Vlad

2017-05-28 12:01:35 +03:00

transport

cql server: Allow multiple listeners on different ports

2017-05-29 15:53:50 +03:00

utils

utils::timestamped_val: fix the touch() comment

2017-05-26 19:26:56 +03:00

.gitattributes

Add .gitattributes file to classify C++ source

2015-10-05 08:51:51 +02:00

.gitignore

ScyllaTop: top-like tool to see live scylla metrics

2016-02-23 12:32:47 +02:00

.gitmodules

dist: move ComboAMI related code to scylla-ami

2015-09-22 00:17:42 +03:00

.gitorderfile

gitorderfile: make changes into *.py files appear first

2015-05-12 10:13:25 +03:00

atomic_cell_hash.hh

mutation_hasher: handle counter cells properly

2017-02-02 10:35:14 +00:00

atomic_cell_or_collection.hh

atomic_cell: introduce atomic_cell_mutable_view

2017-03-02 09:05:11 +00:00

atomic_cell.hh

database: remove remnants of no longer existing db::serializer.

2017-06-04 13:07:17 +03:00

bytes_ostream.hh

bytes_ostream: make max_chunk_size() an inline function

2016-10-17 11:49:33 +03:00

bytes.cc

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

bytes.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

caching_options.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

canonical_mutation.cc

idl/mutation: add counter serialisation logic

2017-02-02 10:35:14 +00:00

canonical_mutation.hh

remove db/serializer.hh includes

2016-03-02 09:07:09 +00:00

cartesian_product.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

cell_locking.hh

cell_locker: add metrics for lock acquisition

2017-03-02 09:05:12 +00:00

checked-file-impl.hh

Merge seastar upstream

2017-03-14 13:38:38 +02:00

clocks-impl.cc

Move common clock implementation helpers

2017-06-23 11:35:35 -04:00

clocks-impl.hh

Move common clock implementation helpers

2017-06-23 11:35:35 -04:00

clustering_bounds_comparator.hh

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

clustering_key_filter.hh

partition_snapshot_reader: Emit only relevant tombstones

2017-02-13 16:12:15 +01:00

clustering_ranges_walker.hh

clustering_ranges_walker: Introduce contains_tombstone()

2017-04-20 10:54:37 +02:00

CMakeLists.txt

Further improve CMakeLists.txt for CLion

2017-06-23 19:21:28 +02:00

combine.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

compaction_strategy.hh

compaction_strategy: implement resharding strategy for compaction strategies

2017-04-21 17:11:24 -03:00

compatible_ring_position.hh

compatible_ring_position: add function to return token

2016-12-08 14:25:29 -02:00

compound_compat.hh

compound_compat: Accept marker value in serialize_value()

2017-03-28 18:10:39 +02:00

compound.hh

keys: Introduce is_empty() for prefixes

2017-03-28 18:10:39 +02:00

compress.hh

cql3: Disable compression on empty properties

2016-11-09 10:03:59 +02:00

configure.py

Move common clock implementation helpers

2017-06-23 11:35:35 -04:00

CONTRIBUTING.md

CONTRIBUTING.md: add sections for help and issues

2016-11-18 22:21:10 +02:00

converting_mutation_partition_applier.hh

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

counters.cc

mutation: Make counter cell difference consistent with apply

2017-05-23 13:16:03 +02:00

counters.hh

counters: attempt to apply in place

2017-03-02 09:05:11 +00:00

cql_serialization_format.hh

Replace iostream include with iosfwd in headers

2017-01-17 14:52:44 +02:00

database_fwd.hh

database: keep a pointer to the memtable list in a memtable

2016-11-21 18:18:27 +02:00

database.cc

row_cache: Switch to using snapshot_source

2017-06-24 18:06:11 +02:00

database.hh

row_cache: Switch readers to use per-entry snapshots

2017-06-24 18:06:11 +02:00

db_clock.hh

Seal clock definitions

2017-06-23 11:35:35 -04:00

debug.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

digest_algorithm.hh

storage_proxy: avoid calculating digest when only one replica is contacted

2016-11-17 13:04:30 +02:00

disk-error-handler.cc

db: make it possible to use custom error handler with io checker

2016-10-27 15:54:21 -02:00

disk-error-handler.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

Doxyfile

docs: exclude dpdk

2015-06-24 13:09:51 +03:00

enum_set.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

fix_system_distributed_tables.py

tracing: introduce a span ID and parent span ID

2017-04-25 21:52:23 -04:00

fnv1a_hasher.hh

add fnv1a hasher

2017-02-02 10:35:14 +00:00

frozen_mutation.cc

idl/mutation: add counter serialisation logic

2017-02-02 10:35:14 +00:00

frozen_mutation.hh

frozen_mutation: avoid buffer linearization and copy

2016-08-22 09:31:33 +01:00

frozen_schema.cc

idl: allow writers to use any output stream

2016-12-22 13:35:04 +01:00

frozen_schema.hh

remove db/serializer.hh includes

2016-03-02 09:07:09 +00:00

gc_clock.hh

Seal clock definitions

2017-06-23 11:35:35 -04:00

hashing_partition_visitor.hh

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

hashing.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

idl-compiler.py

idl-compiler: Support optional fields in views

2017-04-25 11:43:04 +02:00

IDL.md

Add an IDL definition file

2016-01-24 12:29:21 +02:00

init.cc

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

init.hh

init: add a proper message when there is a bad 'seeds' configuration

2017-04-02 10:41:52 +03:00

intrusive_set_external_comparator.hh

intrusive_set_external_comparator: avoid using boost::intrusive::value_traits_pointers

2017-01-10 18:16:56 +02:00

json.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

keys.cc

partition_key_view: Implement operator<<

2016-09-30 10:54:54 +02:00

keys.hh

compound_view_wrapper: Add tri_compare

2017-05-17 10:33:18 +02:00

LICENSE.AGPL

Add the AGPL license

2015-09-20 10:45:35 +03:00

lister.cc

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

lister.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

log.hh

Merge seastar upstream

2016-06-01 18:28:42 +03:00

main.cc

database: reset node's hit rate information on connection drop

2017-06-13 09:57:14 +03:00

map_difference.hh

map_difference: Allow on unordered_map

2016-04-20 09:54:06 +02:00

md5_hasher.hh

md5_hasher: add finalize_array()

2016-03-11 18:27:13 +00:00

memtable.cc

Allow reading exactly desired byte ranges and fast_forward_to

2017-06-19 18:31:32 +03:00

memtable.hh

Allow reading exactly desired byte ranges and fast_forward_to

2017-06-19 18:31:32 +03:00

mutation_compactor.hh

mutation_partion: Use row_tombstone

2017-04-25 11:46:33 +02:00

mutation_partition_applier.hh

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

mutation_partition_serializer.cc

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

mutation_partition_serializer.hh

idl: allow writers to use any output stream

2016-12-22 13:35:04 +01:00

mutation_partition_view.cc

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

mutation_partition_view.hh

idl: switch to utils::input_stream

2016-08-22 09:31:33 +01:00

mutation_partition_visitor.hh

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

mutation_partition.cc

tests: row_cache: Apply only fully continuous mutations to underlying mutation source

2017-06-24 18:06:11 +02:00

mutation_partition.hh

mutation_partition: Add rows_entry constructor which accepts full contents

2017-06-24 18:06:11 +02:00

mutation_query.cc

mutation_query: to_data_query_result enforces row limit

2016-12-15 10:56:40 +00:00

mutation_query.hh

mutation_query: add an execution stage

2017-03-09 09:27:43 +00:00

mutation_reader.cc

mutation_reader: Introduce make_combined_mutation_source()

2017-06-24 18:06:11 +02:00

mutation_reader.hh

mutation_source: Make copying cheaper

2017-06-24 18:06:11 +02:00

mutation.cc

mutation: Introduce sliced()

2017-06-24 18:06:11 +02:00

mutation.hh

mutation: Introduce sliced()

2017-06-24 18:06:11 +02:00

noexcept_traits.hh

Introduce noexcept_traits

2015-12-07 09:50:27 +01:00

NOTICE.txt

Add NOTICE file as required by the Apache license.

2014-12-24 09:47:18 +02:00

nway_merger.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

ORIGIN

Update ORIGIN for gossip and storage_service

2015-12-01 19:45:04 +08:00

partition_builder.hh

mutation_partition: Add support for specifying continuity

2017-06-24 18:06:11 +02:00

partition_range_compat.hh

Convert to use dht::partition_range_vector and dht::token_range_vector

2016-12-19 14:08:50 +08:00

partition_slice_builder.cc

partition_slice_builder: Add with_ranges()

2017-02-23 18:50:53 +01:00

partition_slice_builder.hh

partition_slice_builder: Add with_ranges()

2017-02-23 18:50:53 +01:00

partition_snapshot_reader.hh

Introduce maybe_merge_versions

2017-06-24 18:06:11 +02:00

partition_version.cc

partition_snapshot: Add getter for range tombstones

2017-06-24 18:06:11 +02:00

partition_version.hh

partition_snapshot: Add const-qualified overload of version()

2017-06-24 18:06:11 +02:00

position_in_partition.hh

position_in_partition: Introduce for_range_start()/for_range_end()

2017-06-24 18:06:11 +02:00

query_result_merger.hh

query_result_merger: Limit rows

2016-12-15 11:00:36 +00:00

query-request.hh

query: Introduce full_clustering_range

2017-02-23 18:50:53 +01:00

query-result-reader.hh

storage_proxy: avoid calculating digest when only one replica is contacted

2016-11-17 13:04:30 +02:00

query-result-set.cc

query: use result_view::consume() where appropriate

2016-08-22 09:31:33 +01:00

query-result-set.hh

Remove exception specifications

2017-05-05 17:02:31 +03:00

query-result-writer.hh

idl: allow writers to use any output stream

2016-12-22 13:35:04 +01:00

query-result.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

query.cc

query: Introduce full_clustering_range

2017-02-23 18:50:53 +01:00

range_tombstone_list.cc

range_tombstone_list: Introduce slice() working with position range

2017-06-24 18:06:11 +02:00

range_tombstone_list.hh

range_tombstone_list: Introduce slice() working with position range

2017-06-24 18:06:11 +02:00

range_tombstone.cc

range_tombstone: Introduce end_position()

2017-02-13 16:12:16 +01:00

range_tombstone.hh

range_tombstone: Introduce trim_front()

2017-06-24 18:06:11 +02:00

range.hh

Fix use after free in nonwrapping_range::intersection

2017-06-12 15:34:36 +01:00

read_context.hh

row_cache: Introduce read_context

2017-06-24 18:06:11 +02:00

README-DPDK.md

README: fix typos and paramter syntax

2015-06-28 10:24:48 +03:00

README.md

README: Guidelines for contributing

2016-11-16 12:50:02 +02:00

release.cc

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

release.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

reversibly_mergeable.hh

Fix pre-ScyllaDB copyright statements

2016-04-08 08:12:47 +03:00

row_cache.cc

row_cache: Switch readers to use per-entry snapshots

2017-06-24 18:06:11 +02:00

row_cache.hh

row_cache: Switch readers to use per-entry snapshots

2017-06-24 18:06:11 +02:00

schema_builder.hh

schema_tables: Use v3 schema tables and formats

2017-05-10 16:44:48 +00:00

schema_mutations.cc

schema_tables: Use v3 schema tables and formats

2017-05-10 16:44:48 +00:00

schema_mutations.hh

schema_tables: Use v3 schema tables and formats

2017-05-10 16:44:48 +00:00

schema_registry.cc

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

schema_registry.hh

schema_registry: Don't leak schemas

2017-02-21 09:56:21 +01:00

schema.cc

schema: Lift maybe_quote() into cql3/util

2017-06-15 19:55:52 +00:00

schema.hh

schema: Lift maybe_quote() into cql3/util

2017-06-15 19:55:52 +00:00

scylla-blocktune

blocktune: fix syntax error in exception handling

2016-10-23 16:40:00 +03:00

scylla-gdb.py

gdb: Fix "scylla heapprof" command

2017-06-14 15:41:39 +02:00

scylla-housekeeping

Merge "Adding private repository to housekeeping" from Amnon

2017-05-17 15:56:46 +03:00

SCYLLA-VERSION-GEN

build: improve support for custom builds

2017-01-22 14:56:52 +02:00

seastarx.hh

seastarx: add missing make_shared forward declaration

2017-06-22 18:16:13 +03:00

serialization_visitors.hh

idl: add start_frame() overload for seastar::simple_output_stream

2017-02-27 17:05:58 +00:00

serializer_impl.hh

serializer_impl: add serializer for bool_class<Tag>

2016-12-14 14:10:01 +00:00

serializer.hh

Merge seastar upstream

2016-09-28 17:34:16 +03:00

sstable_mutation_readers.hh

Allow reading exactly desired byte ranges and fast_forward_to

2017-06-19 18:31:32 +03:00

stdx.hh

seastarx: don't make seastar namespace inline

2017-06-22 18:16:13 +03:00

streamed_mutation.cc

range_tombstone_stream: Make printable

2017-06-24 18:06:11 +02:00

streamed_mutation.hh

range_tombstone_stream: Make printable

2017-06-24 18:06:11 +02:00

supervisor.cc

init: move supervisor_notify() out of main.cc

2017-01-06 10:10:55 +00:00

supervisor.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

test.py

test.py: Ensure view_schema_test runs with only one cpu

2017-05-31 19:17:51 +01:00

timestamp.hh

Seal clock definitions

2017-06-23 11:35:35 -04:00

to_string.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

tombstone.hh

tombstone: Extract out relational operators

2017-04-25 11:43:04 +02:00

types.cc

intern also tuple and user defined types

2017-06-14 14:41:17 +03:00

types.hh

intern also tuple and user defined types

2017-06-14 14:41:17 +03:00

unimplemented.cc

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

unimplemented.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

validation.cc

validation: Add KS validation + convinence methods

2016-04-19 11:49:05 +00:00

validation.hh

validation: Add KS validation + convinence methods

2016-04-19 11:49:05 +00:00

version.hh

Merge seatar upstream (seastar namespace)

2017-05-21 12:26:15 +03:00

view_info.hh

view_info: Store base regular col in the view's PK as column_id

2017-05-17 10:33:18 +02:00

README.md

Scylla

Building Scylla

In addition to required packages by Seastar, the following packages are required by Scylla.

Submodules

Scylla uses submodules, so make sure you pull the submodules first by doing:

git submodule init
git submodule update --init --recursive

Building and Running Scylla on Fedora

Installing required packages:

sudo dnf install yaml-cpp-devel lz4-devel zlib-devel snappy-devel jsoncpp-devel thrift-devel antlr3-tool antlr3-C++-devel libasan libubsan gcc-c++ gnutls-devel ninja-build ragel libaio-devel cryptopp-devel xfsprogs-devel numactl-devel hwloc-devel libpciaccess-devel libxml2-devel python3-pyparsing lksctp-tools-devel protobuf-devel protobuf-compiler systemd-devel libunwind-devel

Build Scylla

./configure.py --mode=release --with=scylla --disable-xen
ninja-build build/release/scylla -j2 # you can use more cpus if you have tons of RAM

Run Scylla

./build/release/scylla

run Scylla with one CPU and ./tmp as data directory

./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1

For more run options:

./build/release/scylla --help

Building Fedora RPM

As a pre-requisite, you need to install Mock on your machine:

# Install mock:
sudo yum install mock

# Add user to the "mock" group:
usermod -a -G mock $USER && newgrp mock

Then, to build an RPM, run:

./dist/redhat/build_rpm.sh

The built RPM is stored in /var/lib/mock/<configuration>/result directory. For example, on Fedora 21 mock reports the following:

INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
INFO: Results and/or logs in: /var/lib/mock/fedora-21-x86_64/result

Building Fedora-based Docker image

Build a Docker image with:

cd dist/docker
docker build -t <image-name> .

Run the image with:

docker run -p $(hostname -i):9042:9042 -i -t <image name>

Contributing to Scylla

Guidelines for contributing

Languages

C++ 72.2%

Python 26.6%

CMake 0.3%

GAP 0.3%

Shell 0.3%