mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 06:53:12 +00:00

Go to file

Avi Kivity 5a178ff635 compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads

The workload in #3844 has these characteristics:
 - very small data set size (a few gigabytes per shard)
 - large working set size (all the data, enough for high cache miss rate)
 - high overwrite rate (so a compaction results in 12X data reduction)

As a result, the compaction backlog controller assigns very few shares to
compaction (low data set size -> low backlog), so compaction proceeds very slowly.
Meanwhile, we have tons of cache misses, and each cache miss needs to read from a
large number of sstables (since compaction isn't progressing). The end result is
a high read amplification, and in this test, timeouts.

While we could declare that the scenario is very artificial, there are other
real-world scenarios that could trigger it. Consider a 100% write load
(population phase) followed by 100% read. Towards the end of the last compaction,
the backlog will drop more and more until compaction slows to a crawl, and until
it completes, all the data (for that compaction) will have to be read from its
input sstables, resulting in read amplification.

We should probably have read amplification affect the backlog, but for now the
simpler solution is to increase the minimum shares to 50 so that compaction
always makes forward progress. This will result in higher-than-needed compaction
bandwidth in some low write rate scenarios so we will see fluctuations in request
rate (what the controller was designed to avoid), but these fluctioations will be
limited to 5%.

Since the base class backlog_controller has a fixed (0, 0) point, remove it
and add it to derived classes (setting it to (0, 50) for compaction).

Fixes #3844 (or at least improves it).
Message-Id: <20181231162710.29410-1-avi@scylladb.com>

(cherry picked from commit b0980ba7c6)

2019-01-04 13:28:43 +02:00

.github

github: direct users asking questions to our mailing list.

2018-06-21 17:43:23 +03:00

api

api: use longs instead of ints for snapshot sizes

2018-10-12 22:01:59 +03:00

auth

auth: add abort_source to waiting for schema agreement

2018-12-04 14:33:05 +00:00

conf

config: re-add murmur3_ignore_msb_bits to scylla.yaml

2018-10-01 10:01:36 +03:00

cql3

Merge 'Add tests for schema changes' from Paweł

2018-12-18 14:57:50 +00:00

data

data::cell: expose size overhead of external chunks

2018-06-28 18:01:17 +01:00

Merge 'Fixes for the view_update_from_staging_generator' from Duarte

2018-12-29 20:22:54 +02:00

debug

debug: scylla_row_cache_report: Remove duplicated phrase from printout

2018-03-07 11:15:57 +02:00

dht

streaming: Expose reason for streaming

2018-11-15 17:45:31 +02:00

dist

build_ami.sh: need to check out the right branch of scylla-jmx

2018-12-25 12:37:11 +02:00

docs

Add docs/metrics.md - documentation on metrics

2018-09-25 17:51:20 +03:00

exceptions

cql: add read/write failure exceptions

2017-12-05 15:02:17 +02:00

gms

Merge "materialized views: Apply backpressure from view replicas" from Duarte

2018-12-20 19:11:56 +02:00

idl

Merge "materialized views: Apply backpressure from view replicas" from Duarte

2018-12-20 19:11:56 +02:00

imr

imr: detect lsa migrator mismatch

2018-08-01 16:50:58 +01:00

index

index: add target_column getter to index

2018-07-11 18:06:21 +02:00

interface

…

libdeflate @ e7e54eab42

Update libdeflate submodule

2018-12-25 14:41:24 +02:00

licenses

Merge "Optimize checksum computation for the MC sstable format" from Tomek

2018-12-08 13:42:43 +02:00

locator

locator: fix abstract_replication_strategy::get_ranges() and friends violating sort order

2018-10-23 07:36:21 +00:00

message

Merge "materialized views: Apply backpressure from view replicas" from Duarte

2018-12-20 19:11:56 +02:00

repair

streaming: Expose reason for streaming

2018-11-15 17:45:31 +02:00

scripts

scripts: coding style fixes

2018-09-17 18:40:23 +03:00

seastar @ 08f1258fc5

Update seastar submodule

2018-12-17 17:00:14 +02:00

service

Merge "Improve times to start / stop the nodes" from Glauber

2019-01-03 14:56:16 +01:00

sstables

sstables: index_reader: Fix abort when _trust_pi == trust_promoted_index::no

2018-12-24 11:45:14 +02:00

streaming

streaming/stream_session: Only stage sstables for tables with views

2018-12-28 20:52:15 +02:00

swagger-ui @ 1b212bbe71

…

tests

tests: cql_test_env: Start the compaction manager

2019-01-03 14:56:42 +01:00

thrift

thrift: limit message size

2018-10-24 19:32:25 +03:00

tools/scyllatop

scyllatop: more coding style fixes

2018-09-17 18:39:53 +03:00

tracing

tracing: Pass string_view instead of string to add_query

2018-08-13 23:57:37 +01:00

transport

cql3/query_processor: Validate presence of statement values timeously

2018-08-15 10:37:13 +01:00

utils

utils/gz: Fix compilation on non-x86 archs

2018-12-08 13:42:43 +02:00

xxHash @ 744892b802

Add xxhash (fast non-cryptographic hash) as submodule

2018-02-01 00:22:50 +00:00

.gitattributes

…

.gitignore

.gitignore: add resources directory

2018-06-19 16:26:51 +03:00

.gitmodules

Merge "Optimize checksum computation for the MC sstable format" from Tomek

2018-12-08 13:42:43 +02:00

.gitorderfile

…

atomic_cell_hash.hh

atomic_cell: switch to new IMR-based cell reperesentation

2018-05-31 15:51:11 +01:00

atomic_cell_or_collection.hh

atomic_cell: switch to new IMR-based cell reperesentation

2018-05-31 15:51:11 +01:00

atomic_cell.cc

atomic_cell: accept fragmented_temporary_buffer::view values

2018-07-18 12:28:06 +01:00

atomic_cell.hh

atomic_cell: accept fragmented_temporary_buffer::view values

2018-07-18 12:28:06 +01:00

backlog_controller.hh

compaction_controller: increase minimum shares to 50 (~5%) for small-data workloads

2019-01-04 13:28:43 +02:00

bytes_ostream.hh

Merge "Optimize sstable writing of large partitions" from Tomasz

2018-12-21 20:40:35 +02:00

bytes.cc

…

bytes.hh

bytes: Add helper for turning bytes_view into sstring_view.

2018-09-25 17:23:40 -07:00

cache_flat_mutation_reader.hh

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

caching_options.hh

…

canonical_mutation.cc

Swap arguments order of mutation constructor

2018-01-21 12:58:42 +02:00

canonical_mutation.hh

…

cartesian_product.hh

…

cell_locking.hh

cell_locking: Use xxhash instead of fnv1a

2018-08-23 11:21:00 +03:00

checked-file-impl.hh

…

clocks-impl.cc

…

clocks-impl.hh

…

clustering_bounds_comparator.hh

Use std::reference_wrapper instead of a plain reference in bound_view.

2018-06-28 11:24:06 +01:00

clustering_key_filter.hh

clustering_key_filter_ranges: Fix move assignment to avoid undefined behaviour.

2018-08-09 00:53:17 +01:00

clustering_ranges_walker.hh

Merge "Enable sstable_mutation_test with SSTables 3.x." from Vladimir

2018-10-12 17:46:49 +03:00

CMakeLists.txt

Merge "Optimize checksum computation for the MC sstable format" from Tomek

2018-12-08 13:42:43 +02:00

coding-style.md

…

combine.hh

…

compaction_strategy.hh

database: rename column_family to table

2018-06-24 14:54:46 +03:00

compatible_ring_position_view.hh

sstables_set::incremental_selector: use ring_position instead of token

2018-07-04 17:42:33 +03:00

compound_compat.hh

Pass sstable version to describe_type

2018-04-24 11:30:26 +02:00

compound.hh

Attach backtrace to marshal_exception-s thrown from generic functions.

2017-12-05 16:14:55 +01:00

compress.cc

Merge "compress: Restore lz4 as default compressor" from Duarte

2018-11-21 16:45:22 +02:00

compress.hh

Merge "compress: Restore lz4 as default compressor" from Duarte

2018-11-21 16:45:22 +02:00

configure.py

Merge " Extract MC sstable writer to a separate compilation unit" from Tomasz

2018-12-21 20:40:35 +02:00

CONTRIBUTING.md

…

converting_mutation_partition_applier.hh

Merge 'Add tests for schema changes' from Paweł

2018-12-18 14:57:50 +00:00

counters.cc

atomic_cell: switch to new IMR-based cell reperesentation

2018-05-31 15:51:11 +01:00

counters.hh

atomic_cell: switch to new IMR-based cell reperesentation

2018-05-31 15:51:11 +01:00

cql_serialization_format.hh

…

database_fwd.hh

database: rename column_family to table

2018-06-24 14:54:46 +03:00

database.cc

Merge "Improve times to start / stop the nodes" from Glauber

2019-01-03 14:56:16 +01:00

database.hh

Merge "materialized views: Apply backpressure from view replicas" from Duarte

2018-12-20 19:11:56 +02:00

db_clock.hh

…

debug.hh

…

digest_algorithm.hh

service/storage_service: Add and use xxhash feature

2018-02-01 01:02:50 +00:00

digester.hh

row: Use cached hash for hash calculation

2018-02-01 01:02:49 +00:00

dirty_memory_manager.hh

database: guarantee a minimum amount of shares when manual operations are requested.

2018-09-27 15:20:31 +02:00

disk-error-handler.cc

Move thread_local declarations out of main.cc

2017-11-27 20:27:42 +01:00

disk-error-handler.hh

…

Doxyfile

…

duration.cc

…

duration.hh

…

encoding_stats.hh

Fix timestamp_epoch value which was truncated on exceeding int32_t type limit.

2018-05-04 15:45:10 -07:00

enum_set.hh

enum_set: Add iterator

2018-02-14 14:15:59 -05:00

fix_system_distributed_tables.py

fix_system_distributed_tables.sh: adjust newly added 'request_size' and 'response_size' columns

2018-09-19 15:46:11 +01:00

flat_mutation_reader.cc

make_flat_multi_range_reader: add generator overload

2018-09-28 14:27:55 +03:00

flat_mutation_reader.hh

make_flat_multi_range_reader: add documentation

2018-09-28 14:27:55 +03:00

frozen_mutation.cc

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

frozen_mutation.hh

Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte

2018-12-05 20:14:57 +00:00

frozen_schema.cc

schema_tables: Require context object in schema load path

2018-02-07 10:11:46 +00:00

frozen_schema.hh

schema_tables: Require context object in schema load path

2018-02-07 10:11:46 +00:00

gc_clock.hh

gc_clock: introduce operator<<(ostream&, gc_clock::time_point)

2017-12-06 19:52:32 -02:00

gen_segmented_compress_params.py

…

HACKING.md

HACKING.md: update ./install-dependencies.sh filename

2018-08-01 18:09:29 +03:00

hashing_partition_visitor.hh

range_tombstone: Replace feed_hash() member function with appending_hash

2018-02-01 00:22:50 +00:00

hashing.hh

…

idl-compiler.py

idl-compiler: specify return type of with_serialized_stream() lambdas

2018-08-24 16:07:20 +01:00

IDL.md

…

init.cc

messaging: tag RPC services with scheduling groups

2018-07-13 13:57:08 +02:00

init.hh

messaging: tag RPC services with scheduling groups

2018-07-13 13:57:08 +02:00

install-dependencies.sh

install_dependencies.sh: centos: add systemd-devel

2018-04-26 14:32:36 +03:00

install.sh

Merge "dist: use perftune.py for disks tuning" from Vlad

2018-11-01 19:19:04 +02:00

intrusive_set_external_comparator.hh

intrusive_set_external_comparator: Introduce container_of_only_member()

2018-03-06 11:50:26 +01:00

json.cc

json: add value_to_quoted_string helper function

2018-07-25 13:16:00 +02:00

json.hh

json: add value_to_quoted_string helper function

2018-07-25 13:16:00 +02:00

keys.cc

keys: schema-aware printing of a partition_key

2018-07-17 14:43:12 +03:00

keys.hh

Merge "Support reading range tombstones" from Piotr and Vladimir

2018-08-27 20:43:38 +02:00

LICENSE.AGPL

…

lister.cc

…

lister.hh

…

log.hh

…

main.cc

Merge "Improve times to start / stop the nodes" from Glauber

2019-01-03 14:56:16 +01:00

MAINTAINERS

scripts/find-maintainer: Find subsystem maintainer

2018-01-30 09:42:35 +00:00

map_difference.hh

…

marshal_exception.hh

…

md5_hasher.hh

Fix Scylla compilation with Crypto++ v6.

2018-03-04 10:23:00 +02:00

memtable-sstable.hh

database, sstables, tests: add large_partition_handler

2018-05-04 14:38:13 +02:00

memtable.cc

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

memtable.hh

Merge "Properly writing/reading shadowable deletions with SSTables 3.x." from Vladimir

2018-10-24 19:32:57 +03:00

multishard_mutation_query.cc

multishard_mutation_query: reset failed readers to inexistent state

2018-12-18 14:46:56 +02:00

multishard_mutation_query.hh

database: add query_mutations_on_all_shards()

2018-09-03 10:31:44 +03:00

multishard_writer.cc

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

multishard_writer.hh

multishard_writer: Introduce multishard_writer

2018-06-28 17:20:28 +08:00

mutation_cleaner.hh

Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz

2018-12-28 13:37:29 +02:00

mutation_compactor.hh

mutation_compactor: add detach_state()

2018-09-03 10:31:44 +03:00

mutation_fragment.cc

mutation_fragment: Add range_tombstone_stream::empty() method.

2018-09-25 17:55:52 -07:00

mutation_fragment.hh

mutation_fragment: Add range_tombstone_stream::empty() method.

2018-09-25 17:55:52 -07:00

mutation_partition_applier.hh

treewide: require type for creating collection_mutation_view

2018-05-31 15:51:11 +01:00

mutation_partition_serializer.cc

atomic_cell: introduce fragmented buffer value interface

2018-05-31 15:51:11 +01:00

mutation_partition_serializer.hh

utils: drop data_output

2018-09-18 17:22:59 +01:00

mutation_partition_view.cc

mutation_partition_view: use column_mapping_entry::is_atomic()

2018-06-28 22:16:42 +01:00

mutation_partition_view.hh

mutation_partition_view: pass cell by value to visitor

2018-06-28 22:11:19 +01:00

mutation_partition_visitor.hh

…

mutation_partition.cc

Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz

2018-12-28 13:37:29 +02:00

mutation_partition.hh

mutation_partition: Fix exception safety of row::apply_monotonically()

2018-08-09 15:29:10 +03:00

mutation_query.cc

query-result: Introduce class result_options

2018-02-01 00:22:50 +00:00

mutation_query.hh

Move reconcilable_result_builder declaration to mutation_query.hh

2018-09-03 10:31:44 +03:00

mutation_reader.cc

Merge "Fix deadlocking multishard readers" from Botond

2018-12-08 14:08:46 +02:00

mutation_reader.hh

Merge "Make inactive shard readers evictable" from Botond

2018-12-04 12:13:13 +02:00

mutation_rebuilder.hh

Swap arguments order of mutation constructor

2018-01-21 12:58:42 +02:00

mutation.cc

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

mutation.hh

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

noexcept_traits.hh

…

NOTICE.txt

utils::crc32: add power64 crc32 HW accelerated implementation

2017-12-08 13:38:13 -05:00

ORIGIN

…

partition_builder.hh

mutation_partition_view: pass cell by value to visitor

2018-06-28 22:11:19 +01:00

partition_range_compat.hh

…

partition_slice_builder.cc

…

partition_slice_builder.hh

…

partition_snapshot_reader.hh

mvcc: Use RAII to ensure that partition versions are merged

2018-06-27 21:51:04 +02:00

partition_snapshot_row_cursor.hh

partition_snapshot_row_cursor: initialize _dummy and _continuous

2018-06-02 19:51:36 +01:00

partition_version_list.hh

mvcc: Introduce partition_version_list

2018-05-30 12:18:56 +02:00

partition_version.cc

Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz

2018-12-28 13:37:29 +02:00

partition_version.hh

Merge "Fix use-after-free when destroying partition_snapshots in the background"from Tomasz

2018-12-28 13:37:29 +02:00

position_in_partition.hh

Merge "Make inactive shard readers evictable" from Botond

2018-12-04 12:13:13 +02:00

querier.cc

querier_cache: unregister queriers evicted due to expired TTL

2019-01-03 13:14:02 +02:00

querier.hh

Merge "Make inactive shard readers evictable" from Botond

2018-12-04 12:13:13 +02:00

query_result_merger.hh

…

query-request.hh

Merge "Make inactive shard readers evictable" from Botond

2018-12-04 12:13:13 +02:00

query-result-reader.hh

query::result_view: add get_last_partition_and_clustering_key()

2018-07-26 12:12:08 +01:00

query-result-set.cc

query::result: avoid copying and linearising cell value

2018-06-25 09:21:47 +01:00

query-result-set.hh

…

query-result-writer.hh

query-result: Use digester instead of md5_hasher

2018-02-01 00:22:50 +00:00

query-result.hh

Configure query result memory limiter size limit during object creation

2018-06-11 15:34:13 +03:00

query.cc

…

range_tombstone_list.cc

Merge "Make in-memory partition version merging preemptable" from Tomasz

2018-07-01 15:32:51 +03:00

range_tombstone_list.hh

Merge "Make in-memory partition version merging preemptable" from Tomasz

2018-07-01 15:32:51 +03:00

range_tombstone.cc

Use std::reference_wrapper instead of a plain reference in bound_view.

2018-06-28 11:24:06 +01:00

range_tombstone.hh

Merge "Multiple fixes to tests/normalizing_reader" from Vladimir

2018-09-27 12:51:47 +02:00

range.hh

range: clean the deduced transformed type

2018-05-10 06:22:39 +03:00

read_context.hh

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: add consume_resources()

2018-12-18 14:34:33 +02:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: use the correct types in the constructor

2018-12-18 14:34:33 +02:00

README-DPDK.md

…

README.md

build_rpm.sh: put temporary mock in build/, not /var/lib.

2018-12-19 12:54:37 +02:00

real_dirty_memory_accounter.hh

cache: real_dirty_memory_accounter: Move unpinning out of the hot path

2018-05-30 14:41:41 +02:00

release.cc

…

release.hh

…

reversibly_mergeable.hh

…

row_cache.cc

flat_mutation_reader: make timeout opt-out rather than opt-in

2018-09-20 11:31:24 +02:00

row_cache.hh

memtable, cache: Run mutation_cleaner worker in its own scheduling group

2018-06-27 21:51:04 +02:00

schema_builder.hh

Merge 'Add tests for schema changes' from Paweł

2018-12-18 14:57:50 +00:00

schema_mutations.cc

schema: persist "view virtual" columns to a separate system table

2018-08-16 15:30:06 +03:00

schema_mutations.hh

schema: persist "view virtual" columns to a separate system table

2018-08-16 15:30:06 +03:00

schema_registry.cc

schema_tables: Require context object in schema load path

2018-02-07 10:11:46 +00:00

schema_registry.hh

schema_tables: Require context object in schema load path

2018-02-07 10:11:46 +00:00

schema_upgrader.hh

atomic_cell: require column_definition for creating atomic_cell views

2018-05-31 15:51:11 +01:00

schema.cc

Merge 'Add tests for schema changes' from Paweł

2018-12-18 14:57:50 +00:00

schema.hh

Merge "Optimize sstable writing of the MC format" from Tomasz

2018-11-24 12:36:40 +02:00

scylla-gdb.py

gdb: Fix scylla heapprof command

2018-07-12 16:51:30 +03:00

scylla-housekeeping

scylla-housekeeping: support new 2018.1 path variation

2018-05-09 15:22:30 +03:00

SCYLLA-VERSION-GEN

Revert "release: prepare for 3.0-rc4"

2019-01-04 12:34:38 +02:00

seastarx.hh

…

serialization_visitors.hh

serialization_visitors: add support for memory_output_stream

2018-09-18 17:22:59 +01:00

serializer_impl.hh

idl: deserialized_bytes_proxy do not assume presence of iterator_type

2018-08-24 16:19:40 +01:00

serializer.hh

idl: serializer: don't assume Iterator::value_type is bytes_view

2018-09-18 11:29:36 +01:00

setup.py

…

stdx.hh

…

supervisor.cc

…

supervisor.hh

…

table_helper.cc

cql: Add schema extensions processing to properties

2018-02-07 10:11:46 +00:00

table_helper.hh

cql3: change cql_statement methods to accept a local storage_proxy

2018-04-16 10:18:28 +02:00

table.cc

Merge "materialized views: Apply backpressure from view replicas" from Duarte

2018-12-20 19:11:56 +02:00

test.py

Merge "Optimize checksum_combine() for CRC32" from Tomek

2018-12-08 13:42:43 +02:00

timeout_config.hh

timeout_config: introduce timeout configuration

2018-04-29 19:52:40 +03:00

timestamp.hh

…

to_string.hh

to_string: Add operator<< overload for std::optional.

2018-08-17 18:20:05 -07:00

tombstone.hh

mutation_partition: Define + operator on tombstones

2018-02-06 14:24:19 +01:00

tox.ini

…

types.cc

types: enable deserializing varint from JSON string

2018-08-21 11:20:11 +01:00

types.hh

cql3: provide to_json_string for optional bytes argument

2018-08-09 18:07:07 +02:00

unimplemented.cc

metadata_type: add Serialization type

2018-04-24 11:30:26 +02:00

unimplemented.hh

metadata_type: add Serialization type

2018-04-24 11:30:26 +02:00

validation.cc

…

validation.hh

…

version.hh

…

view_info.hh

view: add is_index method

2018-06-05 11:10:24 +02:00

vint-serialization.cc

Add signed_vint::serialized_size_from_first_byte

2018-05-09 11:41:00 +02:00

vint-serialization.hh

Add signed_vint::serialized_size_from_first_byte

2018-05-09 11:41:00 +02:00

xx_hasher.hh

digest: Introduce xxHash hash algorithm

2018-02-01 00:22:50 +00:00

README.md

Scylla

Quick-start

$ git submodule update --init --recursive
$ sudo ./install-dependencies.sh
$ ./configure.py --mode=release
$ ninja-build -j4 # Assuming 4 system threads.
$ ./build/release/scylla
$ # Rejoice!

Please see HACKING.md for detailed information on building and developing Scylla.

Running Scylla

Run Scylla

./build/release/scylla

run Scylla with one CPU and ./tmp as data directory

./build/release/scylla --datadir tmp --commitlog-directory tmp --smp 1

For more run options:

./build/release/scylla --help

Building Fedora RPM

As a pre-requisite, you need to install Mock on your machine:

# Install mock:
sudo yum install mock

# Add user to the "mock" group:
usermod -a -G mock $USER && newgrp mock

Then, to build an RPM, run:

./dist/redhat/build_rpm.sh

The built RPM is stored in the build/mock/<configuration>/result directory. For example, on Fedora 21 mock reports the following:

INFO: Done(scylla-server-0.00-1.fc21.src.rpm) Config(default) 20 minutes 7 seconds
INFO: Results and/or logs in: build/mock/fedora-21-x86_64/result

Building Fedora-based Docker image

Build a Docker image with:

cd dist/docker
docker build -t <image-name> .

Run the image with:

docker run -p $(hostname -i):9042:9042 -i -t <image name>

Contributing to Scylla

Guidelines for contributing

Languages

C++ 72.1%

Python 26.7%

CMake 0.3%

GAP 0.3%

Shell 0.3%