mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 20:16:43 +00:00

Files

Avi Kivity 69684e16d8 Merge 'sstables: add SSTable compression with shared dictionaries ' from Michał Chojnowski

This PR extends Scylla's SSTable compression with the ability to use compression dictionaries shared across compression chunks. This involves several changes:

- We refactor `compression_parameters` and friends (`compressor`, `sstables::local_compression`, `sstables::compression`) to prepare for making the construction of `compressor`s asynchronous, to enable sharing pieces of compressors (the dictionaries) across shards.
- We introduce the notion of "hidden compression options" which are written to `CompressionInfo.db` and used to construct decompressors, like regular options, but don't appear in the schema. (We later stuff the SSTable's dictionary into `CompressionInfo.db` using a sequence of such options).
- We add a cluster feature which guards the creation of dictionary-compressed SSTables.
- We introduce a central "compressor factory" (one instance shared by all shards), which from this point onward is used to construct all `compressor` objects (one per SSTable) used to process the SSTables. When constructing a compressor for writing, it uses the "current"/"recommended" dictionary (which is passed to the factory from the actively-observed contents of the group0-managed `system.dicts`). When constructing a compressor for reading, it uses the dictionary written in the hidden compression options in CompressionInfo.db. And it keeps dictionaries deduplicated, so that each unique live dictionary blob has only one instance in memory, shared across shards.
- We teach the relevant `lz4` and `zstd` compressor wrappers about the dictionaries.
- We add a HTTP API call which samples pieces of the given table (i.e. the Data.db files) from across the cluster, trains a dictionary on it, and publishes it via `system.dicts` as the new current dictionary for that table. (And we add some RPC verbs to support that).
- We add a HTTP API call which estimates the impact of various available compression configurations on the compression ratio.
- We add an autotrainer fiber which periodically retrains dicts for dict-aware tables and publishes them if they seem to be a significant improvement.

Known imperfections:
- The factory currently keeps one dictionary instance on the entire node, but we probably want one copy per NUMA node. I didn't do that because exposing NUMA knowledge to Scylla seems to require some changes in Seastar first.

New feature, no backporting involved.

Closes scylladb/scylladb#23025

* github.com:scylladb/scylladb:
  docs: add user-facing documentation for SSTable compression with shared dicts
  docs/dev: add sstable-compression-dicts.md
  test: add test_sstable_compression_dictionaries_autotrain.py
  test: add test_sstable_compression_dictionaries_basic.py
  test/pylib/rest_client: add `keyspace_upgrade_sstables` helper
  main: run a sstable_dict_autotrainer
  api: add the estimate_compression_ratios API call
  dict_autotrainer: introduce sstable_dict_autotrainer
  db/system_keyspace: add query_dict_timestamp
  compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor
  main: clean up sstable compression dicts after table drops
  sstables/compress: discard hidden compression options after the decompressor is created
  compress: change compressor_ptr from shared_ptr to unique_ptr
  api: add the retrain_dict API call
  storage_service: add some dict-related routines
  main: in compression_dict_updated_callback, recognize and use SSTable compression dicts
  storage_service: add do_sample_sstables()
  messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs
  db/system_keyspace: let `system.dicts` helpers be used for dicts other than the RPC compression dict
  raft/group0_state_machine: on `system.dicts` mutations, pass the affected partitition keys to the callback
  database: add sample_data_files()
  database: add take_sstable_set_snapshot()
  compress: teach `lz4_processor` about dictionaries
  compress: teach `zstd_processor` about dictionaries
  sstables: delegate compressor creation to the compressor factory
  sstables: plug an `sstable_compressor_factory` into `sstables_manager`
  sstables: introduce sstable_compressor_factory
  utils/hashers: add get_sha256()
  gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature
  compress: add hidden dictionary options
  compress: remove `compression_parameters::get_compressor()`
  sstables/compress: remove get_sstable_compressor()
  sstables/compress: move ownership of `compressor` to `sstable::compression`
  compress: remove compressor::option_names()
  compress: clean up the constructor of zstd_processor
  compress: squash zstd.cc into compress.cc
  sstables/compress: break the dependency of `compression_parameters` on `compressor`
  compress.hh: switch compressor::name() from an instance member to a virtual call
  bytes: adapt fmt_hex to std::span<const std::byte>

2025-04-01 12:47:34 +03:00

__init__.py

test.py: Add the possibility to run boost test from pytest

2025-02-07 21:40:25 +01:00

address_map_test.cc

test: address_map: check generation handling during entry addition

2025-01-01 12:43:11 +02:00

advanced_rpc_compressor_test.cc

treewide: use angle brackets for including seastar headers

2025-03-17 10:03:06 +02:00

aggregate_fcts_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

allocation_strategy_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

alternator_unit_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

anchorless_list_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

auth_passwords_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

auth_resource_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

auth_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

aws_error_injection_test.cc

aws creds: add env. and file credentials providers

2025-02-05 14:57:19 +02:00

aws_errors_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

batchlog_manager_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

big_decimal_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bloom_filter_test.cc

sstables: Make get_filename() return component_name

2025-03-19 13:03:29 +03:00

bptree_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bptree_validation.hh

test/boost/bptree_validation.hh: add missing include <fmt/format.h>

2025-01-23 06:05:57 -05:00

broken_sstable_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

btree_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

btree_validation.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

bytes_ostream_test.cc

tree: Remove unused boost headers

2025-02-25 10:32:32 +03:00

cache_algorithm_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

cache_mutation_reader_test.cc

moved cache files to db

2025-02-04 12:21:31 +03:00

cached_file_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

caching_options_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

canonical_mutation_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cartesian_product_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

castas_fcts_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

cdc_generation_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cdc_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

cell_locker_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

checksum_utils_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

chunked_managed_vector_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

chunked_vector_test.cc

mutation,test: replace boost::equal with std::ranges::equal

2025-02-26 14:27:42 +03:00

clustering_ranges_walker_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

CMakeLists.txt

utils: add class pluggable

2025-03-05 08:25:50 +02:00

collection_stress.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

column_mapping_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

combined_tests.cc

test: combined_test: relicense

2024-12-25 13:53:54 +02:00

commitlog_cleanup_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

commitlog_test.cc

commitlog: Serialize file deletion

2025-03-17 12:09:00 +00:00

compaction_group_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

compound_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

compress_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

config_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

conftest.py

test.py: refactor paths constants and options

2025-03-30 03:19:29 +00:00

continuous_data_consumer_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

counter_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cql_auth_query_test.cc

tree: Make values mutable to enable move semantics

2025-03-03 13:53:02 +03:00

cql_auth_syntax_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cql_functions_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

cql_query_group_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

cql_query_large_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

cql_query_like_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

cql_query_test.cc

sstables/compress: break the dependency of compression_parameters on compressor

2025-04-01 00:07:27 +02:00

crc_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

data_listeners_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

database_test.cc

test/database: Re-use take_snapshot() helper once more

2025-03-31 13:18:06 +03:00

dict_trainer_test.cc

utils: add dict_trainer

2024-12-23 23:37:02 +01:00

dirty_memory_manager_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

double_decker_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

duration_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

dynamic_bitset_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

encrypted_file_test.cc

encrypted_file_impl: Add encrypted_data_sink

2025-03-20 14:54:24 +00:00

encryption_at_rest_test.cc

encryption_at_rest_test: Add verbosity + earlier stream close to proxy

2025-02-17 13:49:43 +00:00

enum_option_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

enum_set_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

error_injection_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

estimated_histogram_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

exception_container_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

exceptions_fallback_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

exceptions_optimized_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

exceptions_test.inc.cc

exceptions: Add try_catch_nested to universally handle nested exceptions of the same type.

2025-03-26 11:15:13 +01:00

expr_test.cc

boost/expr_test: add vector expression tests

2025-01-28 21:14:49 +01:00

extensions_test.cc

sstables::file_io_extension: Make sstable argument to "wrap" const

2025-03-20 14:54:09 +00:00

file_stream_test.cc

utils: Add "io-wrappers", useful IO helper types

2025-03-20 14:54:09 +00:00

filtering_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

flush_queue_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

fragmented_temporary_buffer_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

frozen_mutation_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

generic_server_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gossiping_property_file_snitch_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

group0_cmd_merge_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

group0_test.cc

Update seastar submodule

2025-01-08 09:37:16 +02:00

hash_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

hashers_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

hint_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

idl_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

incremental_compaction_test.cc

test/lib: mutation_assertions: deinline

2025-02-25 11:40:54 +01:00

index_reader_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

index_with_paging_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

input_stream_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

intrusive_array_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

json_cql_query_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

json_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

keys_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

kmip_wrapper.py

tests: Add EAR tests

2025-01-09 10:40:39 +00:00

large_paging_state_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

like_matcher_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

limiting_data_source_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

linearizing_input_stream_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

lister_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

loading_cache_test.cc

loading_cache_test: test_loading_cache_reload_during_eviction: use manual_clock

2025-03-31 14:53:06 +03:00

locator_topology_test.cc

locator: token_metadata: drop update_host_id() function that does nothing now

2025-01-16 16:37:08 +02:00

log_heap_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

logalloc_standard_allocator_segment_pool_backend_test.cc

…

logalloc_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

managed_bytes_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

managed_vector_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

map_difference_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

memtable_test.cc

sstables::file_io_extension: Make sstable argument to "wrap" const

2025-03-20 14:54:09 +00:00

multishard_combining_reader_as_mutation_source_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

multishard_mutation_query_test.cc

db: prevent accidental copies of result_set_row by making it move-only

2025-02-17 09:48:08 +02:00

murmur_hash_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

mutation_fragment_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

mutation_query_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

mutation_reader_another_test.cc

moved cache files to db

2025-02-04 12:21:31 +03:00

mutation_reader_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

mutation_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

mutation_writer_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

mvcc_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

network_topology_strategy_test.cc

cql3,test: replace boost::range::adjacent_find with std::ranges

2025-03-04 10:08:02 +02:00

nonwrapping_interval_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

observable_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

partitioner_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

per_partition_rate_limit_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

pluggable_test.cc

treewide: use angle brackets for including seastar headers

2025-03-17 10:03:06 +02:00

pretty_printers_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

querier_cache_test.cc

reader_concurrency_semaphore: set_notify_handler(): disable timeout

2025-02-07 02:31:01 -05:00

query_processor_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

radix_tree_printer.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

radix_tree_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

range_assert.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

range_tombstone_list_assertions.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

range_tombstone_list_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

rate_limiter_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

reader_concurrency_semaphore_test.cc

reader_concurrency_semaphore: register_inactive_read(): handle aborted permit

2025-02-28 01:32:46 -05:00

README.md

test.py: Add the possibility to run boost test from pytest

2025-02-07 21:40:25 +01:00

recent_entries_map_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

repair_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

reservoir_sampling_test.cc

utils: introduce reservoir_sampling

2024-12-23 23:37:02 +01:00

restrictions_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

result_utils_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

reusable_buffer_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

role_manager_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

row_cache_test.cc

tree: replace boost::min_element() with std::ranges::min_element()

2025-02-05 21:54:01 +02:00

rust_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

s3_test.cc

test: Add unit test for newly introduced download source

2025-03-21 12:01:06 +03:00

schema_change_test.cc

sstables, test: migrate from boost::copy() to std::ranges::copy()

2025-02-11 14:55:25 +03:00

schema_changes_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

schema_loader_test.cc

sstables: Make get_filename() return component_name

2025-03-19 13:03:29 +03:00

schema_registry_test.cc

test: add test for schema registry maintaining base info for views

2024-12-30 14:59:06 +01:00

secondary_index_test.cc

cql3: secondary index: Limit the size of partition range vectors

2025-03-10 12:18:42 +02:00

serialization_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

serialized_action_test.cc

utils: phased_barrier: add close() method

2024-12-26 06:54:07 +02:00

service_level_controller_test.cc

test/boost: update service_level_controller_test for workload prio

2025-01-02 07:13:34 +01:00

sessions_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

small_vector_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

snitch_reset_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

sorting_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

sstable_3_x_test.cc

sstables/compress: break the dependency of compression_parameters on compressor

2025-04-01 00:07:27 +02:00

sstable_compaction_test.cc

compress: remove compression_parameters::get_compressor()

2025-04-01 00:07:28 +02:00

sstable_conforms_to_mutation_source_test.cc

moved cache files to db

2025-02-04 12:21:31 +03:00

sstable_datafile_test.cc

sstables/compress: break the dependency of compression_parameters on compressor

2025-04-01 00:07:27 +02:00

sstable_directory_test.cc

sstable_directory: Move highest_generation_seen() to distributed_loader.cc

2025-04-01 09:15:14 +03:00

sstable_generation_test.cc

test: ignore unused fmt::to_string() result

2025-03-24 10:19:09 +03:00

sstable_move_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

sstable_mutation_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

sstable_partition_index_cache_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

sstable_resharding_test.cc

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

sstable_set_test.cc

replica: service: pass parent info down to storage_group::split

2025-01-10 10:03:08 +01:00

sstable_test.cc

compress: change compressor_ptr from shared_ptr to unique_ptr

2025-04-01 00:07:29 +02:00

sstable_test.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

stall_free_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

statement_restrictions_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

storage_proxy_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

stream_compressor_test.cc

utils: add stream_compressor

2024-12-23 23:28:12 +01:00

string_format_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

suite.yaml

test/s3: Increase boost/s3_test log levels

2025-03-18 15:59:05 +02:00

summary_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

symmetric_key_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

tablets_test.cc

service: Introduce rack-aware co-location migrations for tablet merge

2025-03-16 22:45:00 +02:00

tagged_integer_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

token_metadata_test.cc

locator: token_metadata: drop update_host_id() function that does nothing now

2025-01-16 16:37:08 +02:00

top_k_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

total_order_check.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tracing_test.cc

test: compile unit tests into a single executable

2024-12-22 19:14:09 +02:00

transport_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tree_test_key.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

types_test.cc

test/boost: add vector type cql_env boost tests

2025-01-28 21:14:49 +01:00

unique_view_test.cc

utils: implement drop-in replacement for replacing boost::adaptors::uniqued

2025-01-21 16:24:45 +08:00

user_function_test.cc

test/boost: add vector type cql_env boost tests

2025-01-28 21:14:49 +01:00

user_types_test.cc

test/boost: add vector type cql_env boost tests

2025-01-28 21:14:49 +01:00

utf8_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

UUID_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

view_build_test.cc

test: lib: eventually: make *EVENTUALLY_EQUAL inline functions

2025-01-22 12:47:33 +02:00

view_complex_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

view_schema_ckey_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

view_schema_pkey_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

view_schema_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

vint_serialization_test.cc

test: avoid spaces when defining user-defined literal operator

2025-03-24 10:17:12 +03:00

virtual_reader_test.cc

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

virtual_table_mutation_source_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

virtual_table_test.cc

config: specialize config_from_string() for sstring

2025-01-26 15:53:12 +02:00

wasm_alloc_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

wasm_test.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

wrapping_interval_test.cc

utils: do not include unused headers

2025-01-14 07:56:39 -05:00

README.md

Scylla unit tests using C++ and the Boost test framework

The source files in this directory are Scylla unit tests written in C++ using the Boost.Test framework. These unit tests come in three flavors:

Some simple tests that check stand-alone C++ functions or classes use Boost's BOOST_AUTO_TEST_CASE.
Some tests require Seastar features, and need to be declared with Seastar's extensions to Boost.Test, namely SEASTAR_TEST_CASE.
Even more elaborate tests require not just a functioning Seastar environment but also a complete (or partial) Scylla environment. Those tests use the do_with_cql_env() or do_with_cql_env_thread() function to set up a mostly-functioning environment behaving like a single-node Scylla, in which the test can run.

While we have many tests of the third flavor, writing new tests of this type should be reserved to white box tests - tests where it is necessary to inspect or control Scylla internals that do not have user-facing APIs such as CQL. In contrast, black-box tests - tests that can be written only using user-facing APIs, should be written in one of newer test frameworks that we offer - such as test/cqlpy or test/alternator (in Python, using the CQL or DynamoDB APIs respectively) or test/cql (using textual CQL commands), or - if more than one Scylla node is needed for a test - using the test/topology* framework.

Running tests

Because these are C++ tests, they need to be compiled before running. To compile a single test executable row_cache_test, use a command like

ninja build/dev/test/boost/row_cache_test

You can also use ninja dev-test to build all C++ tests, or use ninja deb-build to build the C++ tests and also the full Scylla executable (however, note that full Scylla executable isn't needed to run Boost tests).

Replace "dev" by "debug" or "release" in the examples above and below to use the "debug" build mode (which, importantly, compiles the test with ASAN and UBSAN enabling on and helps catch difficult-to-catch use-after-free bugs) or the "release" build mode (optimized for run speed).

To run an entire test file row_cache_test, including all its test functions, use a command like:

build/dev/test/boost/row_cache_test -- -c1 -m1G

to run a single test function test_reproduce_18045() from the longer test file, use a command like:

build/dev/test/boost/row_cache_test -t test_reproduce_18045 -- -c1 -m1G

In these command lines, the parameters before the -- are passed to Boost.Test, while the parameters after the -- are passed to the test code, and in particular to Seastar. In this example Seastar is asked to run on one CPU (-c1) and use 1G of memory (-m1G) instead of hogging the entire machine. The Boost.Test option -t test_reproduce_18045 asks it to run just this one test function instead of all the test functions in the executable.

Unfortunately, interrupting a running test with control-C while doesn't work. This is a known bug (#5696). Kill a test with SIGKILL (-9) if you need to kill it while it's running.

Boost tests can also be run using test.py - which is a script that provides a uniform way to run all tests in scylladb.git - C++ tests, Python tests, etc.

Execution with pytest

To run all tests with pytest execute

pytest test/boost

To execute all tests in one file, provide the path to the source filename as a parameter

pytest test/boost/aggregate_fcts_test.cc

Since it's a normal path, autocompletion works in the terminal out of the box.

To execute only one test function, provide the path to the source file and function name

pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg

To provide a specific mode, use the next parameter --mode dev, if parameter isn't provided pytest tries to use ninja mode_list to find out the compiled modes.

Parallel execution is controlled by pytest-xdist and the parameter -n auto. This command starts tests with the number of workers equal to CPU cores. The useful command to discover the tests in the file or directory is

pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc

That will return all test functions in the file. To execute only one function from the test, you can invoke the output from the previous command. However, suffix for mode should be skipped. For example, output shows in the terminal something like this test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev. So to execute this specific test function, please use the next command

pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg

Writing tests

Because of the large build time and build size of each separate test executable, it is recommended to put test functions into relatively large source files. But not too large - to keep compilation time of a single source file (during development) at reasonable levels.

When adding new source files in test/boost, don't forget to list the new source file in configure.py and also in CMakeLists.txt. The former is needed by our CI, but the latter is preferred by some developers.