mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Go to file

Nadav Har'El f604269f0a cql3, secondary index: consistently choose index to use in a query

When a table has secondary indexes on *multiple* columns, and several
such columns are used for filtering in a query, Scylla chooses one
of these indexes as the main driver of the query, and the second
column's restriction is implemented as filtering.

Before this patch, the index to use was chosen fairly randomly, based on
the order of the indexes in the schema. This order may be different in
different coordinators, and may even change across restarts on the same
coordinators. This is not only inconsistent, it can cause outright wrong
results when using *paging* and switching (or restarting) coordinates
in the middle of a paged scan... One coordinator saves one index's key
in the paging state, and then the other coordinator gets this paging
state and wrongly believes it is supposed to be a key of a *different*
index.

The fix in this patch is to pick the index suitable for the first
indexed column mentioned in the query. This has two benefits over
the situation before the patch:

1. The decision of which index to use no longer changes between
   coordinators or across restarts - it just depends on the schema
   and the specific query.

2. Different indexes can have different "specificity" so using one
   or the other can change the query's performance. After this patch,
   the user is in control over which index is used by changing the
   order of terms in the query. A curious user can use tracing to
   check which index was used to implement a particular query.

An xfailing test we had for this issue no longer fails, so the "xfail"
marker is removed.

Fixes #7969

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#14450

2024-05-02 19:52:42 +02:00

.github

[github] add action to verify PR tasks was completed

2024-04-25 15:24:22 +03:00

alternator

treewide: include fmt/ranges.h and/or fmt/std.h

2024-04-19 22:56:16 +08:00

api

api/storage_service: convert runtime_error from repair to http error

2024-04-26 14:25:15 +08:00

auth

auth: move fmt::formatter<auth::resource_kind> up

2024-04-23 12:11:17 +03:00

bin

install.sh: use the native nodetool directly

2024-04-25 22:52:00 +03:00

cdc

db: config: make consistent-topology-changes unused

2024-04-25 14:33:21 +02:00

cmake

build: cmake: always rebuild SCYLLA-{PRODUCT,VERSION,RELEASE}-FILE

2024-03-25 10:28:28 +02:00

compaction

Merge 'treewide: drop FMT_DEPRECATED_OSTREAM macro and homebrew range formatters' from Kefu Chai

2024-04-20 22:25:00 +03:00

conf

db: config: make consistent-topology-changes unused

2024-04-25 14:33:21 +02:00

cql3

cql3, secondary index: consistently choose index to use in a query

2024-05-02 19:52:42 +02:00

data_dictionary

cql3: statements: change default tombstone_gc mode for tablets

2024-04-24 10:42:10 +02:00

Merge 'Relax the way view builder code checks if a table exists' from Pavel Emelyanov

2024-05-01 10:14:58 +03:00

debug

…

dht

treewide: include fmt/ranges.h and/or fmt/std.h

2024-04-19 22:56:16 +08:00

direct_failure_detector

…

dist

scylla_setup: Remove jmx and tools packages from being verified

2024-05-02 13:30:50 +03:00

docs

topology_coordinator: Fix synchronization of tablet split with other concurrent ops

2024-04-30 19:23:28 +02:00

exceptions

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

gms

treewide: fix indentation after the previous patch

2024-04-25 14:33:21 +02:00

idl

topology coordinator: drop unused structure

2024-04-21 16:36:07 +03:00

index

Merge 'scylla-sstable: add support for loading schema of views and indexes' from Botond Dénes

2024-01-24 23:36:54 +02:00

interface

Typos: fix typos in comments

2023-12-02 22:37:22 +02:00

lang

treewide: use fmt::to_string() to transform a UUID to std::string

2024-03-26 13:38:37 +08:00

licenses

…

locator

treewide: do not define FMT_DEPRECATED_OSTREAM

2024-04-19 22:57:36 +08:00

message

treewide: include fmt/ranges.h and/or fmt/std.h

2024-04-19 22:56:16 +08:00

mutation

partition_version: move the base class in move ctor

2024-04-28 18:34:45 +02:00

mutation_writer

mutation_writer: do not include unused headers

2024-01-24 15:20:02 +02:00

node_ops

treewide: include fmt/ranges.h and/or fmt/std.h

2024-04-19 22:56:16 +08:00

raft

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

readers

reader: silence false-positive use-after-move warning

2024-04-23 15:47:50 +03:00

redis

redis/server.hh: suppress -Wimplicit-fallthrough from protocol_parser.hh

2024-05-01 18:47:24 +03:00

reloc

reloc: create $BUILDDIR for getting its path

2024-05-01 09:52:17 +03:00

repair

Merge 'api/storage_service: convert runtime_error from repair to http error ' from Kefu Chai

2024-04-26 13:27:51 +03:00

replica

test: Verify tablet cleanup is properly retried on failure

2024-04-30 19:27:17 +02:00

rust

build: cmake: reference build_mode with ${scylla_build_mode_${CMAKE_BUILD_TYPE}}

2024-04-25 10:51:54 +03:00

schema

Merge 'Extend ALTER TABLE ... DROP to allow specifying timestamp of column drop' from Michał Jadwiszczak

2024-04-29 14:05:05 +02:00

scripts

install.sh: use the native nodetool directly

2024-04-25 22:52:00 +03:00

seastar @ b73e5e7d6c

Update seastar submodule

2024-05-02 07:35:42 +03:00

service

proxy: Remove declaration of nonexisting view_update_write_response_handler class

2024-05-01 10:15:41 +03:00

sstables

sstable_directory: Remove _sstable_dir member

2024-05-02 13:12:59 +03:00

streaming

streaming: Fix use after move in fire_stream_event

2024-04-25 16:48:54 +03:00

swagger-ui @ 12f1da1082

…

tasks

tasks: do not include unused headers

2024-02-02 15:20:40 +01:00

test

cql3, secondary index: consistently choose index to use in a query

2024-05-02 19:52:42 +02:00

thrift

treewide: remove {dclocal_,}read_repair_chance options

2024-04-25 17:15:27 +08:00

tools

tools/scylla-nodetool: implement the resetlocalschema command

2024-05-01 08:49:11 +03:00

tracing

build: cmake: link scylla_tracing against scylla-main

2024-05-01 10:08:11 +03:00

transport

repair, transport: s/get0()/get()/

2024-04-23 15:48:54 +03:00

types

types: do not include unused headers

2024-04-23 12:08:23 +03:00

unified

Update unified/build_unified.sh

2023-12-05 15:23:38 +02:00

utils

utils/chunked_vector: fix some typos in comment

2024-05-01 16:38:43 +03:00

.dockerignore

…

.gitattributes

…

.gitignore

toolchain: support building an optimized clang

2024-04-08 22:53:59 +09:00

.gitmodules

…

.gitorderfile

…

.mailmap

…

absl-flat_hash_map.cc

…

absl-flat_hash_map.hh

…

amplify.yml

…

backlog_controller.hh

treewide: apply codespell to the comments in source code

2023-12-20 10:25:03 +02:00

build_mode.hh

…

bytes_ostream.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

bytes.cc

…

bytes.hh

bytes.hh: stop at '}' in fmt::formatter<fmt_hex>

2024-03-28 08:58:36 +02:00

cache_flat_mutation_reader.hh

cache_flat_mutation_reader: only call get_iterator_in_latest() when pointing at a row

2024-03-27 11:48:42 +01:00

cache_temperature.hh

…

cartesian_product.hh

…

cell_locking.hh

…

checked-file-impl.hh

code: Switch to seastar API level 7

2023-06-06 13:29:16 +03:00

client_data.cc

…

client_data.hh

…

clocks-impl.cc

clocks-impl: format time_point using fmt

2023-11-22 17:44:07 +02:00

clocks-impl.hh

…

clustering_bounds_comparator.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

clustering_interval_set.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

clustering_key_filter.hh

…

clustering_ranges_walker.hh

…

CMakeLists.txt

build: cmake: require {fmt} >= 9.0.0

2024-04-25 16:35:08 +03:00

collection_mutation.cc

collection_mutation: add formatter for collection_mutation_view::printer

2024-02-13 17:42:25 +02:00

collection_mutation.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

column_computation.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

combine.hh

…

compound_compat.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

compound.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

compress.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

compress.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

concrete_types.hh

use fmt::to_string() for seastar::net::inet_address

2024-02-05 16:56:40 +01:00

configure.py

configure.py: revert changing builddir as absolute path

2024-04-29 09:35:21 +03:00

CONTRIBUTING.md

…

converting_mutation_partition_applier.cc

…

converting_mutation_partition_applier.hh

…

counters.cc

counters: move fmt::formatter<counter_{shard,cell}_view>::format() to .cc

2023-05-24 09:36:49 +03:00

counters.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

coverage_excludes.txt

test.py: support code coverage

2024-01-18 11:11:34 +02:00

coverage_sources.list

configure.py support coverage profiles on standrad build modes

2024-01-18 11:11:34 +02:00

cql_serialization_format.hh

…

db_clock.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

debug.cc

…

debug.hh

…

default.nix

…

Doxyfile

…

duration.cc

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

duration.hh

…

encoding_stats.hh

encoding_state: mark helper methods protected

2023-08-29 15:41:13 +03:00

enum_set.hh

…

fix_system_distributed_tables.py

…

flake.lock

…

flake.nix

…

frozen_schema.cc

…

frozen_schema.hh

…

full_position.hh

…

gc_clock.hh

db: add formatter for gc_clock::time_point

2024-02-11 16:39:25 +02:00

gdbinit

…

gen_segmented_compress_params.py

Typos: fix typos in code

2023-12-13 10:45:21 +02:00

generic_server.cc

treewide: do not define FMT_DEPRECATED_OSTREAM

2024-04-19 22:57:36 +08:00

generic_server.hh

transport/controller: pass unix_domain_socket_permissions to generic_server::listen

2024-02-05 14:22:03 +01:00

HACKING.md

…

hashing_partition_visitor.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

idl-compiler.py

Typos: fix typos in code

2023-12-13 10:45:21 +02:00

inet_address_vectors.hh

abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata

2023-12-12 23:19:53 +04:00

init.cc

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

init.hh

Merge 'Typos: fix typos in code' from Yaniv Kaul

2023-12-06 07:36:41 +02:00

install-dependencies.sh

install-dependencies.sh: move cargo out of fedora branch

2024-04-22 15:41:20 +08:00

install.sh

install.sh: use the native nodetool directly

2024-04-25 22:52:00 +03:00

interval.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

keys.cc

clustering_bounds_comparator: add fmt::formtter for bound_{kind,view}

2024-03-11 11:37:48 +02:00

keys.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

LICENSE.AGPL

…

log.hh

…

main.cc

db: config: make consistent-topology-changes unused

2024-04-25 14:33:21 +02:00

map_difference.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

marshal_exception.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

multishard_mutation_query.cc

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

multishard_mutation_query.hh

treewide: apply codespell to the comments in source code

2023-12-20 10:25:03 +02:00

mutation_query.cc

mutation_query: reconcilable_result: add merge_disjoint()

2024-02-21 02:08:48 -05:00

mutation_query.hh

treewide: Use partition_slice::is_reversed()

2024-03-13 08:52:46 +02:00

noexcept_traits.hh

treewide: replace seastar::future::get0() with seastar::future::get()

2024-02-02 22:12:57 +08:00

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

…

partition_range_compat.hh

interval: rename nonwrapping_interval to interval

2024-02-21 19:43:17 +02:00

partition_slice_builder.cc

…

partition_slice_builder.hh

…

partition_snapshot_reader.hh

…

partition_snapshot_row_cursor.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

protocol_server.hh

…

querier.cc

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

querier.hh

treewide: Use partition_slice::is_reversed()

2024-03-13 08:52:46 +02:00

query_id.hh

…

query_ranges_to_vnodes.cc

./: not include unused headers

2024-03-20 09:16:46 +02:00

query_ranges_to_vnodes.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

query_result_merger.hh

…

query-request.hh

query-request: use default-generated operator==

2024-03-07 09:02:42 +03:00

query-result-reader.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

query-result-set.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

query-result-set.hh

query-result-set: add formatter for query-result-set.hh types

2024-02-21 17:54:48 +08:00

query-result-writer.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

query-result.hh

query-result.hh: add formatter for query::result::printer

2024-02-21 17:57:18 +08:00

query.cc

treewide: do not define FMT_DEPRECATED_OSTREAM

2024-04-19 22:57:36 +08:00

read_context.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

reader_concurrency_semaphore.cc

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

reader_concurrency_semaphore.hh

reader_permit: store schema_ptr instead of raw schema pointer

2024-01-11 08:37:56 +02:00

reader_permit.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

README.md

…

real_dirty_memory_accounter.hh

…

release.cc

…

release.hh

…

reversibly_mergeable.hh

…

row_cache.cc

treewide: include fmt/ranges.h and/or fmt/std.h

2024-04-19 22:56:16 +08:00

row_cache.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

schema_mutations.cc

schema_mutations: add fmt::formatter for schema_mutations

2024-03-15 09:49:56 +02:00

schema_mutations.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

schema_upgrader.hh

…

scylla_post_install.sh

dist: drop legacy control group parameters

2023-12-11 19:38:28 +09:00

scylla-gdb.py

scylla-gdb: access io_queue::_streams and io_queue::_fgs with static_vector

2024-04-04 11:39:10 +03:00

SCYLLA-VERSION-GEN

SCYLLA-VERSION-GEN: warn against using - or _ in custom version names

2024-04-30 18:14:51 +03:00

seastarx.hh

…

serialization_visitors.hh

…

serializer_impl.hh

serializer_impl, sstables: fix build failure due to missing includes

2024-04-23 12:03:51 +03:00

serializer.cc

…

serializer.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

service_permit.hh

…

setup.py

…

shell.nix

…

sstables_loader.cc

treewide: include fmt/ranges.h and/or fmt/std.h

2024-04-19 22:56:16 +08:00

sstables_loader.hh

…

supervisor.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

table_helper.cc

keyspace_metadata: Add default value for new_keyspace's durable_writes

2023-12-26 11:47:37 +03:00

table_helper.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

test.py

test.py: add the pytest junit_suite_name parameter

2024-04-15 21:07:00 +03:00

timeout_config.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

timeout_config.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

timestamp.hh

…

tombstone_gc_extension.hh

./: not include unused headers

2024-03-20 09:16:46 +02:00

tombstone_gc_options.cc

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

tombstone_gc_options.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

tombstone_gc.cc

cql3: statements: change default tombstone_gc mode for tablets

2024-04-24 10:42:10 +02:00

tombstone_gc.hh

cql3: statements: change default tombstone_gc mode for tablets

2024-04-24 10:42:10 +02:00

tox.ini

…

ubsan-suppressions.supp

…

unimplemented.cc

unimplemented: add format_as() for unimplemented::cause

2024-01-19 08:38:30 +02:00

unimplemented.hh

./: not include unused headers

2024-01-17 16:30:14 +02:00

validation.cc

…

validation.hh

…

version.hh

…

view_info.hh

treewide: replace formatter<std::string_view> with formatter<string_view>

2024-04-19 07:44:07 +03:00

vint-serialization.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

vint-serialization.hh

Typos: fix typos in code

2023-12-05 15:18:11 +02:00

zstd.cc

./: not include unused headers

2024-01-17 16:30:14 +02:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The community forum and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.2%

Python 26.6%

CMake 0.3%

GAP 0.3%

Shell 0.3%