mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Go to file

Avi Kivity bb1867c7c7 Merge 'sstables: Add digest checking in the validation path of the sstable layer' from Nikos Dragazis

This PR builds upon the PR for checksum validation (#20207) to further enhance scrub's corruption detection capabilities by validating digests as well. The digest (full checksum) is the checksum over the entire data, as opposed to per-chunk checksums which apply to individual chunks. Until now, digests were not examined on any code paths. This PR integrates digest checking into the compressed/checksummed data sources as an optional feature and enables it only through the validation path of the sstable layer (`sstable::validate()`). The validation path is used by the following tools:

* scrub in validate mode
* `sstable validate`

All other reads, including normal user reads, are unaffected by this change.

The PR consists of:
* Extensions to the compressed and checksummed data sources to support digest checking. The data sources receive the expected digest as a parameter and calculate the actual digest incrementally across multiple get() calls. The check happens on the get() call that reaches EOF and results to an exception if the digest is invalid. A digest check requires reading the whole file range. Therefore, a partial read or skip() is treated as an internal error.
* A new shareable digest component loaded on demand by the validation code. No lifecycle management.
* Grouping of old scrub/validate tests for compressed and uncompressed SSTables to reduce code duplication.
* scrub/validate tests for SSTables with valid checksums but invalid digests, and SSTables with no digests at all.
* scrub/validate tests with 3.x Cassandra SSTables to ensure compatibility.

Refs #19058.

New feature, no backport is needed.

Closes scylladb/scylladb#20720

* github.com:scylladb/scylladb:
  test: Test scrub/validate with SSTables from Cassandra
  compaction: Make quarantine optional for perform_sstable_scrub()
  test: Make random schema optional in scrub_test_framework
  test: Add tests for invalid digests
  test: Merge scrub/validate tests for compressed and uncompressed cases
  sstables: Verify digests on validation path
  sstables: Check if digest component exists
  sstables: Add digest in the SSTable components
  sstables: Add digest check in compressed data source
  sstables: Add digest check in checksummed data source

2024-10-09 21:33:08 +03:00

.github

.github: add db to iwyu's CLEANER_DIR

2024-10-04 20:48:18 +08:00

abseil @ d7aaad83b4

…

alternator

alternator: add "dc" and "rack" options to "/localnodes" request

2024-10-07 20:53:47 +03:00

api

api/storage_service: use ranges when handlging restore API

2024-10-07 10:54:37 +03:00

auth

auth: add "IWYU pragma: keep" to keep boost/regex_fwd.hpp

2024-10-07 20:08:05 +03:00

bin

scripts: fix bin/cqlsh shortcut

2024-09-16 09:52:29 +03:00

cdc

treewide: Prefer bytes_fwd.hh over bytes.hh

2024-10-02 07:29:30 +02:00

cmake

cmake/check_headers: correct typos

2024-10-08 09:38:16 +03:00

compaction

Merge 'sstables: Add digest checking in the validation path of the sstable layer' from Nikos Dragazis

2024-10-09 21:33:08 +03:00

conf

config: drop reversed_reads_auto_bypass_cache

2024-08-13 10:02:42 +02:00

cql3

Merge 'cql3: Print arguments and return type without frozen when describing UDF' from Dawid Mędrek

2024-10-08 16:05:28 +03:00

data_dictionary

treewide: accept list of sstables in "restore" API

2024-10-01 23:24:56 +08:00

view: check_needs_view_update_path: get token_metadata_ptr

2024-10-09 20:56:21 +03:00

debug

…

dht

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

direct_failure_detector

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

dist

scylla_coredump_setup: fix typos in comment

2024-09-30 13:29:34 +03:00

docs

docs: Fix confgroup links

2024-10-09 20:16:15 +03:00

exceptions

treewide: Prefer bytes_fwd.hh over bytes.hh

2024-10-02 07:29:30 +02:00

gms

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

idl

forward_service: rename to mapreduce_service

2024-07-03 19:29:47 +03:00

index

code-cleanup: add missing header guards

2024-07-09 18:31:35 +03:00

lang

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

licenses

…

locator

Merge 'Node replace and remove operations: Add deprecate IP addresses usage warning.' from Sergey Zolotukhin

2024-10-03 11:08:28 +02:00

message

message/messaging_service: guard adding maintenance tenant under cluster feature

2024-09-16 15:34:36 +02:00

mutation

utils/unconst, mutation_partition: switch to ranges

2024-10-07 17:30:12 +03:00

mutation_writer

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

node_ops

cmake/check_headers: correct typos

2024-10-08 09:38:16 +03:00

raft

raft: add more information to start_read_barrier error

2024-10-09 16:24:34 +02:00

readers

Update seastar submodule

2024-09-18 13:59:22 +03:00

redis

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

reloc

…

repair

repair/row_level: remove reader timeout

2024-10-03 11:26:29 +02:00

replica

view: check_needs_view_update_path: get token_metadata_ptr

2024-10-09 20:56:21 +03:00

rust

rust: disable incremental build for release build

2024-06-20 12:01:14 +03:00

schema

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

scripts

[script/pull_github_pr.sh] Check Gating status before merging

2024-10-01 14:46:29 +03:00

seastar @ 3c9c2696a4

Update seastar submodule

2024-09-29 13:47:40 +03:00

service

Merge 'utils: replace dependency on boost ranges with <ranges>' from Avi Kivity

2024-10-09 16:04:48 +03:00

sstables

Merge 'sstables: Add digest checking in the validation path of the sstable layer' from Nikos Dragazis

2024-10-09 21:33:08 +03:00

streaming

view: check_needs_view_update_path: get token_metadata_ptr

2024-10-09 20:56:21 +03:00

swagger-ui @ 12f1da1082

…

tasks

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

test

Merge 'sstables: Add digest checking in the validation path of the sstable layer' from Nikos Dragazis

2024-10-09 21:33:08 +03:00

tools

tools: fix typos in the code

2024-10-09 08:18:36 +03:00

tracing

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

transport

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

types

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

unified

dist: drop scylla-jmx

2024-09-13 07:59:45 +03:00

utils

Merge 'utils: replace dependency on boost ranges with <ranges>' from Avi Kivity

2024-10-09 16:04:48 +03:00

.clang-format

clang-format: argument and function packing

2024-10-04 14:52:41 +02:00

.dockerignore

…

.gitattributes

gitattributes: Mark swagger .js files as binary

2024-06-19 15:07:56 +03:00

.gitignore

Add .idea folder to .gitignore

2024-09-20 11:49:41 +03:00

.gitmodules

dist: drop scylla-jmx

2024-09-13 07:59:45 +03:00

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

amplify.yml

…

backlog_controller.hh

…

build_mode.hh

…

bytes_fwd.hh

cql3: Refactor description

2024-09-20 14:24:53 +02:00

bytes_ostream.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

bytes.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

bytes.hh

cql3: Refactor description

2024-09-20 14:24:53 +02:00

cache_mutation_reader.hh

tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh

2024-09-10 19:05:57 +03:00

cache_temperature.hh

…

cartesian_product.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

cell_locking.hh

Merge 'cell_locker: maybe_rehash: ignore allocation failures' from Benny Halevy

2024-08-12 10:54:56 +03:00

checked-file-impl.hh

…

client_data.cc

…

client_data.hh

transport: do not return client_type from cql_server::connection::make_client_key()

2024-06-07 09:23:06 +08:00

clocks-impl.cc

…

clocks-impl.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

clustering_bounds_comparator.hh

clustering_bounds_comparator: drop operator<< for bound_kind

2024-06-11 18:01:06 +02:00

clustering_interval_set.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

clustering_key_filter.hh

clustering_key_filter: unify get_ranges and get_native_ranges

2024-08-13 10:07:12 +02:00

clustering_ranges_walker.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

CMakeLists.txt

build: cmake: use the same options to configure seastar

2024-08-28 06:15:59 +03:00

collection_mutation.cc

compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats

2024-09-10 19:05:57 +03:00

collection_mutation.hh

tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh

2024-09-10 19:05:57 +03:00

column_computation.hh

…

combine.hh

…

compound_compat.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

compound.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

compress.cc

…

compress.hh

compress, auth: include used headers

2024-05-30 09:16:23 +03:00

concrete_types.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

configure.py

test: add complete_multipart_upload completion tests

2024-10-01 09:06:24 +03:00

CONTRIBUTING.md

…

converting_mutation_partition_applier.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

converting_mutation_partition_applier.hh

…

counters.cc

treewide: use std::ranges sort functions rather than boost

2024-10-01 14:19:05 +03:00

counters.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

coverage_excludes.txt

…

coverage_sources.list

…

cql_serialization_format.hh

…

db_clock.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

debug.cc

…

debug.hh

…

default.nix

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

Doxyfile

…

duration.cc

…

duration.hh

…

encoding_stats.hh

…

enum_set.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

fix_system_distributed_tables.py

…

flake.lock

…

flake.nix

…

frozen_schema.cc

…

frozen_schema.hh

…

full_position.hh

…

gc_clock.hh

…

gdbinit

…

gen_segmented_compress_params.py

…

generic_server.cc

generic_server: make server::stop() idempotent

2024-08-28 15:54:31 +02:00

generic_server.hh

generic_server: convert connection tracking to seastar::gate

2024-08-28 10:59:44 +02:00

HACKING.md

HACKIGN.md: clarify the use of dbuild when running test.py

2024-09-10 13:40:45 +03:00

hashing_partition_visitor.hh

…

idl-compiler.py

…

inet_address_vectors.hh

…

init.cc

Ignore seed name resolution errors on restart.

2024-08-28 14:01:04 +02:00

init.hh

Ignore seed name resolution errors on restart.

2024-08-28 14:01:04 +02:00

install-dependencies.sh

install-dependencies.sh: update node_exporter to 1.8.2

2024-09-25 18:42:25 +03:00

install.sh

install.sh: fix more incorrect permission on strict umask

2024-09-03 10:37:53 +03:00

interval.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

keys.cc

clustering_bounds_comparator: drop operator<< for bound_kind

2024-06-11 18:01:06 +02:00

keys.hh

…

LICENSE.AGPL

…

log.hh

…

main.cc

group0: Stop group0 if node initialization fails

2024-10-06 17:20:52 +03:00

map_difference.hh

…

marshal_exception.hh

…

multishard_mutation_query.cc

treewide: replace boost::irange with std::views::iota where possible

2024-10-03 10:33:33 +03:00

multishard_mutation_query.hh

…

mutation_query.cc

readers: Use reversed schema and native reversed slices

2024-08-13 10:03:46 +02:00

mutation_query.hh

Fix comments refering to half-reversed (legacy) slices

2024-08-13 10:07:12 +02:00

noexcept_traits.hh

…

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

…

partition_range_compat.hh

…

partition_slice_builder.cc

…

partition_slice_builder.hh

…

partition_snapshot_reader.hh

readers: Use reversed schema and native reversed slices

2024-08-13 10:03:46 +02:00

partition_snapshot_row_cursor.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

protocol_server.hh

protocol_server: Keep scheduling group on board

2024-05-24 17:54:29 +03:00

querier.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

querier.hh

querier: consume_page(): add rate-limiting to tombstone warnings

2024-08-06 08:56:11 -04:00

query_id.hh

…

query_ranges_to_vnodes.cc

…

query_ranges_to_vnodes.hh

…

query_result_merger.hh

…

query-request.hh

Fix comments refering to half-reversed (legacy) slices

2024-08-13 10:07:12 +02:00

query-result-reader.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

query-result-set.cc

…

query-result-set.hh

…

query-result-writer.hh

…

query-result.hh

…

query.cc

query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges

2024-08-13 10:07:10 +02:00

read_context.hh

readers: Use reversed schema and native reversed slices

2024-08-13 10:03:46 +02:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources

2024-10-09 14:12:01 +03:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: test constructor: don't ignore metrics param

2024-08-04 21:14:42 +03:00

reader_permit.hh

…

README.md

README: Update the version of C++ to C++23

2024-08-14 12:06:23 +03:00

real_dirty_memory_accounter.hh

…

release.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

release.hh

release: introduce doc_link()

2024-05-08 09:41:17 -04:00

reversibly_mergeable.hh

…

row_cache.cc

row_cache: coroutinize do_update()

2024-09-21 00:07:02 +02:00

row_cache.hh

treewide: rename flat_mutation_reader_v2 to mutation_reader

2024-06-21 07:12:06 +03:00

schema_mutations.cc

…

schema_mutations.hh

…

schema_upgrader.hh

…

scylla_post_install.sh

…

scylla-gdb.py

scylla-gdb.py: drop compatibility code for EOL releases

2024-10-03 15:42:08 +03:00

SCYLLA-VERSION-GEN

Update ScyllaDB version to: 6.3.0-dev

2024-09-17 13:43:04 +03:00

seastarx.hh

treewide: remove dependency on boost asio address_v4

2024-10-01 14:00:50 +03:00

serialization_visitors.hh

…

serializer_impl.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

serializer.cc

…

serializer.hh

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

service_permit.hh

…

setup.py

…

shell.nix

…

sstables_loader.cc

sstable_loader: Remove unused _snapshot_name from download_task_impl

2024-10-07 10:43:13 +03:00

sstables_loader.hh

Merge 'Do not remove objects from backup storage after restore' from Pavel Emelyanov

2024-10-04 14:59:40 +03:00

supervisor.hh

…

table_helper.cc

treewide: use seastar::format() or fmt::format() explicitly

2024-09-11 23:21:40 +03:00

table_helper.hh

cql3: introduce dialect infrastructure

2024-08-29 21:19:23 +03:00

test.py

test: add complete_multipart_upload completion tests

2024-10-01 09:06:24 +03:00

timeout_config.cc

…

timeout_config.hh

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

timestamp.hh

treewide: de-static namespace scope functions in headers

2024-10-01 14:02:50 +03:00

tombstone_gc_extension.hh

…

tombstone_gc_options.cc

…

tombstone_gc_options.hh

…

tombstone_gc.cc

token: move ordering operator inline

2024-07-20 21:21:42 +03:00

tombstone_gc.hh

tombstone_gc_state: introduce with_commitlog_check_disabled()

2024-09-05 17:25:45 +05:30

tox.ini

…

ubsan-suppressions.supp

…

unimplemented.cc

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

unimplemented.hh

treewide: drop thrift support

2024-06-07 06:44:59 +08:00

validation.cc

…

validation.hh

…

version.hh

…

view_info.hh

mv: delete a partition in a single operation when applicable

2024-07-25 11:12:58 +03:00

vint-serialization.cc

…

vint-serialization.hh

…

zstd.cc

zstd: include external header with brackets

2024-07-04 10:42:29 +03:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++23 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its API - CQL. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.5%

Python 26.2%

CMake 0.4%

GAP 0.3%

Shell 0.3%