mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 19:21:01 +00:00

Go to file

Piotr Sarna cf30d4cbcf Merge 'Secondary index of collection columns' from Nadav Har'El

This pull request introduces global secondary-indexing for non-frozen collections.

The intent is to enable such queries:

```
CREATE TABLE test(int id, somemap map<int, int>, somelist<int>, someset<int>, PRIMARY KEY(id));
CREATE INDEX ON test(keys(somemap));
CREATE INDEX ON test(values(somemap));
CREATE INDEX ON test(entries(somemap));
CREATE INDEX ON test(values(somelist));
CREATE INDEX ON test(values(someset));

-- index on test(c) is the same as index on (values(c))
CREATE INDEX IF NOT EXISTS ON test(somelist);
CREATE INDEX IF NOT EXISTS ON test(someset);
CREATE INDEX IF NOT EXISTS ON test(somemap);

SELECT * FROM test WHERE someset CONTAINS 7;
SELECT * FROM test WHERE somelist CONTAINS 7;
SELECT * FROM test WHERE somemap CONTAINS KEY 7;
SELECT * FROM test WHERE somemap CONTAINS 7;
SELECT * FROM test WHERE somemap[7] = 7;
```

We use here all-familiar materialized views (MVs). Scylla treats all the
collections the same way - they're a list of pairs (key, value). In case
of sets, the value type is dummy one. In case of lists, the key type is
TIMEUUID. When describing the design, I will forget that there is more
than one collection type.  Suppose that the columns in the base table
were as follows:

```
pkey int, ckey1 int, ckey2 int, somemap map<int, text>, PRIMARY KEY(pkey, ckey1, ckey2)
```

The MV schema is as follows (the names of columns which are not the same
as in base might be different). All the columns here form the primary
key.

```
-- for index over entries
indexed_coll (int, text), idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over keys
indexed_coll int, idx_token long, pkey int, ckey1 int, ckey2 int
-- for index over values
indexed_coll text, idx_token long, pkey int, ckey1 int, ckey2 int, coll_keys_for_values_index int
```

The reason for the last additional column is that the values from a collection might not be unique.

Fixes #2962
Fixes #8745
Fixes #10707

This patch does not implement **local** secondary indexes for collection columns: Refs #10713.

Closes #10841

* github.com:scylladb/scylladb:
  test/cql-pytest: un-xfail yet another passing collection-indexing test
  secondary index: fix paging in map value indexing
  test/cql-pytest: test for paging with collection values index
  cql, view: rename and explain bytes_with_action
  cql, index: make collection indexing a cluster feature
  test/cql-pytest: failing tests for oversized key values in MV and SI
  cql: fix secondary index "target" when column name has special characters
  cql, index: improve error messages
  cql, index: fix default index name for collection index
  test/cql-pytest: un-xfail several collecting indexing tests
  test/cql-pytest/test_secondary_index: verify that local index on collection fails.
  docs/design-notes/secondary_index: add `VALUES` to index target list
  test/cql-pytest/test_secondary_index: add randomized test for indexes on collections
  cql-pytest/cassandra_tests/.../secondary_index_test: fix error message in test ported from Cassandra
  cql-pytest/cassandra_tests/.../secondary_index_on_map_entries,select_test: test ported from Cassandra is expected to fail, since Scylla assumes that comparison with null doesn't throw error, just evaluates to false. Since it's not a bug, but expected behavior from the perspective of Scylla, we don't mark it as xfail.
  test/boost/secondary_index_test: update for non-frozen indexes on collections
  test/cql-pytest: Uncomment collection indexes tests that should be working now
  cql, index: don't use IS NOT NULL on collection column
  cql3/statements/select_statement: for index on values of collection, don't emit duplicate rows
  cql/expr/expression, index/secondary_index_manager: needs_filtering and index_supports_expression rewrite to accomodate for indexes over collections
  cql3, index: Use entries() indexes on collections for queries
  cql3, index: Use keys() and values() indexes on collections for queries.
  types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented
  cql3/statements/index_target: throw exception to signalize that we didn't miss returning from function
  db/view/view.cc: compute view_updates for views over collections
  view info: has_computed_column_depending_on_base_non_primary_key
  column_computation: depends_on_non_primary_key_column
  schema, index/secondary_index_manager: make schema for index-induced mv
  index/secondary_index_manager: extract keys, values, entries types from collection
  cql3/statements/: validate CREATE INDEX for index over a collection
  cql3/statements/create_index_statement,index_target: rewrite index target for collection
  column_computation.hh, schema.cc: collection_column_computation
  column_computation.hh, schema.cc: compute_value interface refactor
  Cql.g, treewide: support cql syntax `INDEX ON table(VALUES(collection))`

2022-08-16 14:18:51 +02:00

.github

CODEOWNERS: add @psarna and @nyh as owners for docs/alternator

2022-07-16 11:39:04 +03:00

abseil @ 9e408e050f

Update abseil submodule

2022-05-22 23:46:33 +03:00

alternator

query: add tombstone-limit to read-command

2022-08-10 06:01:47 +03:00

api

Reduce the number of per-scheduling group metrics

2022-08-11 13:31:19 +03:00

auth

api: Add API for resetting authorization cache

2022-06-28 19:58:06 -03:00

cdc

query: add tombstone-limit to read-command

2022-08-10 06:01:47 +03:00

compaction

compaction_manager: perform_cleanup, perform_sstable_upgrade: use a lw_shared_ptr for owned token ranges

2022-08-02 08:08:11 +03:00

conf

config: Introduce force_schema_commit_log option

2022-07-06 22:08:56 +02:00

cql3

Merge 'Secondary index of collection columns' from Nadav Har'El

2022-08-16 14:18:51 +02:00

data_dictionary

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

Merge 'Secondary index of collection columns' from Nadav Har'El

2022-08-16 14:18:51 +02:00

debug

…

dht

effective_replication_map: make get_range_addresses asynchronous

2022-08-08 17:31:01 +03:00

direct_failure_detector

treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines

2022-05-31 09:06:24 +03:00

dist

SCYLLA-VERSION-GEN: use semver-compatible version

2022-07-25 18:06:28 +03:00

docs

Merge 'Secondary index of collection columns' from Nadav Har'El

2022-08-16 14:18:51 +02:00

exceptions

exceptions: Define operator<< for exception_code

2022-06-27 14:49:58 +03:00

gms

cql, index: make collection indexing a cluster feature

2022-08-14 10:29:52 +03:00

idl

everywhere: define locator::host_id as a strong tagged_uuid type

2022-08-12 06:01:44 +03:00

index

cql: fix secondary index "target" when column name has special characters

2022-08-14 10:29:52 +03:00

interface

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

lang

wasm: fix compilation without libwasmtime

2022-08-03 18:16:02 +03:00

libdeflate @ e7e54eab42

…

licenses

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

locator

everywhere: define locator::host_id as a strong tagged_uuid type

2022-08-12 06:01:44 +03:00

message

schema, everywhere: define and use table_schema_version as a strong type

2022-08-08 08:09:45 +03:00

mutation_writer

flat_mutation_reader ist tot

2022-05-31 23:42:34 +03:00

raft

raft read_barrier, retry over intermittent rpc failures

2022-08-11 13:31:19 +03:00

readers

mutation_reader_merger: fix indentation

2022-08-03 14:33:07 +03:00

redis

query: add tombstone-limit to read-command

2022-08-10 06:01:47 +03:00

reloc

SCYLLA-VERSION-GEN: use semver-compatible version

2022-07-25 18:06:28 +03:00

repair

Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy

2022-08-09 13:25:53 +03:00

replica

database: detach_column_family(): use reader_concurrency_semaphore::evict_inactive_reads_for_table()

2022-08-15 14:16:41 +03:00

rust

tests: add rust example

2022-05-11 16:49:31 +02:00

scripts

scripts: pull_github_pr.sh: support recovering from a failed cherry-pick

2022-07-04 09:26:45 +03:00

seastar @ f9f5228b74

Update seastar submodule

2022-08-01 17:06:28 +03:00

service

storage_proxy: mutate_counters_on_leader: coroutinize

2022-08-14 17:36:58 +03:00

sstables

everywhere: define locator::host_id as a strong tagged_uuid type

2022-08-12 06:01:44 +03:00

streaming

schema, everywhere: define and use table_schema_version as a strong type

2022-08-08 08:09:45 +03:00

swagger-ui @ 12f1da1082

…

test

Merge 'Secondary index of collection columns' from Nadav Har'El

2022-08-16 14:18:51 +02:00

thrift

query: add tombstone-limit to read-command

2022-08-10 06:01:47 +03:00

tools

everywhere: define locator::host_id as a strong tagged_uuid type

2022-08-12 06:01:44 +03:00

tracing

trace-state: Remove unused fields

2022-06-17 15:02:51 +03:00

transport

Fix broken links

2022-06-28 15:19:36 +01:00

types

types/tuple: Use std::begin() instead of .begin() in tuple_type_impl::build_value_fragmented

2022-08-14 10:29:52 +03:00

unified

SCYLLA-VERSION-GEN: use semver-compatible version

2022-07-25 18:06:28 +03:00

utils

Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy

2022-08-09 13:25:53 +03:00

.dockerignore

…

.gitattributes

gitattributes: Mark *.svg as binary

2022-07-31 15:25:24 +03:00

.gitignore

alternator, db: move the tag code to db/tags

2022-07-25 09:53:33 +02:00

.gitmodules

…

.gitorderfile

…

.mailmap

Add .mailmap

2022-07-04 13:44:28 +03:00

absl-flat_hash_map.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

absl-flat_hash_map.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

atomic_cell_hash.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

atomic_cell_or_collection.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

atomic_cell.cc

atomic_cell: compare_atomic_cell_for_merge: compare ttl if expiry is equal

2022-03-07 11:05:30 +02:00

atomic_cell.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

backlog_controller.hh

backlog_controller: keep scheduling_group by value

2022-08-02 07:38:40 +03:00

bytes_ostream.hh

bytes_ostream: Avoid waste by rounding up allocation size to power-of-two

2022-07-13 16:51:13 +03:00

bytes.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

bytes.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

cache_flat_mutation_reader.hh

Merge 'docs: move docs to docs/dev folder' from David Garcia

2022-07-03 20:37:11 +03:00

cache_temperature.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

caching_options.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

caching_options.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

canonical_mutation.cc

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

canonical_mutation.hh

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

cartesian_product.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

cell_locking.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

checked-file-impl.hh

treewide: use system-#include (angle brackets) for seastar

2022-04-26 14:46:42 +03:00

client_data.cc

client_data: Sanitize connection_notifier

2022-02-18 15:02:26 +03:00

client_data.hh

client_data: Sanitize connection_notifier

2022-02-18 15:02:26 +03:00

clocks-impl.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

clocks-impl.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

clustering_bounds_comparator.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

clustering_interval_set.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

clustering_key_filter.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

clustering_ranges_walker.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

CMakeLists.txt

cql3: Reorganize to_restriction code

2022-07-11 15:47:16 +02:00

collection_mutation.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

collection_mutation.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

column_computation.hh

cql, view: rename and explain bytes_with_action

2022-08-14 10:29:52 +03:00

combine.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

compatible_ring_position.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

compound_compat.hh

compound_compat.hh: add missing methods of iterator

2022-03-08 15:37:03 +02:00

compound.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

compress.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

compress.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

concrete_types.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

configure.py

configure.py: make messaging_service.cc the first source file

2022-08-10 11:18:09 +03:00

CONTRIBUTING.md

Add redirections

2022-06-28 09:39:14 +01:00

converting_mutation_partition_applier.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

converting_mutation_partition_applier.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

counters.cc

everywhere: define locator::host_id as a strong tagged_uuid type

2022-08-12 06:01:44 +03:00

counters.hh

everywhere: define locator::host_id as a strong tagged_uuid type

2022-08-12 06:01:44 +03:00

cql_serialization_format.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

db_clock.hh

gc_clock, db_clock: mark functions noexcept

2022-07-27 13:17:01 +03:00

debug.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

default.nix

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

digest_algorithm.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

digester.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

dirty_memory_manager.cc

logalloc: region: properly track listeners when moved

2022-07-28 11:17:55 +03:00

dirty_memory_manager.hh

logalloc: region: properly track listeners when moved

2022-07-28 11:17:55 +03:00

Doxyfile

…

duration.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

duration.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

encoding_stats.hh

encoding_state: mark functions noexcept

2022-07-27 13:43:17 +03:00

enum_set.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

fix_system_distributed_tables.py

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

frozen_mutation.cc

schema, everywhere: define and use table_schema_version as a strong type

2022-08-08 08:09:45 +03:00

frozen_mutation.hh

schema, everywhere: define and use table_schema_version as a strong type

2022-08-08 08:09:45 +03:00

frozen_schema.cc

idl: make idl headers self-sufficient

2022-08-08 08:02:27 +03:00

frozen_schema.hh

frozen_schema: avoid allocating contiguous memory

2022-02-21 01:39:02 +01:00

full_position.hh

service/storage_proxy: set smallest continue pos as query's continue pos

2022-08-10 06:03:38 +03:00

gc_clock.hh

gc_clock, db_clock: mark functions noexcept

2022-07-27 13:17:01 +03:00

gdbinit

Move dev docs to docs/dev

2022-06-24 18:07:08 +01:00

gen_segmented_compress_params.py

treewide: clean up stray license blurbs

2022-02-13 14:16:16 +02:00

generic_server.cc

generic_server: Gentle iterator

2022-02-18 14:25:08 +03:00

generic_server.hh

generic_server.hh: add missing include

2022-04-04 17:31:55 +03:00

HACKING.md

Move dev docs to docs/dev

2022-06-24 18:07:08 +01:00

hashers.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

hashers.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

hashing_partition_visitor.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

hashing.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

idl-compiler.py

idl-compiler: include serialization impl and visitors in generated dist.impl.hh files

2022-08-08 08:02:27 +03:00

inet_address_vectors.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

init.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

init.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

install-dependencies.sh

Support installing pip provided command symlinks to /usr/bin

2022-07-12 17:26:05 +03:00

install.sh

install.sh: install files with correct permission in strict umask setting

2022-06-20 17:52:03 +03:00

interval.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

intrusive_set_external_comparator.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

keys.cc

replica, partition_snapshot_reader, keys: replace boost::any with std::any

2022-04-28 07:18:53 +03:00

keys.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

LICENSE.AGPL

…

log.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

main.cc

commitlog: Make get_segments_to_replay on-demand

2022-08-11 06:41:23 +00:00

map_difference.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

marshal_exception.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

multishard_mutation_query.cc

query-result-writer: stop when tombstone-limit is reached

2022-08-10 06:03:38 +03:00

multishard_mutation_query.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

mutation_cleaner.hh

db: mutation_cleaner: Enqueue new snapshots at the back

2022-06-28 18:29:29 +03:00

mutation_compactor.hh

mutation_compactor: detach_state(): make it no-op if partition was exhausted

2022-08-02 06:43:24 +03:00

mutation_consumer_concepts.hh

introduce the MutationConsumer concept

2022-02-28 17:11:54 +02:00

mutation_fragment_fwd.hh

flat_mutation_reader: Split readers by file and remove unnecessary includes.

2022-03-14 13:20:25 +02:00

mutation_fragment_stream_validator.hh

mutation_fragment_stream_validator: validate range tombstone changes

2022-03-29 13:19:05 +03:00

mutation_fragment_v2.hh

mutation_fragment_v2: range_tombstone_change: add minimal_memory_usage()

2022-04-28 14:11:51 +03:00

mutation_fragment.cc

position_in_partition: add to_string(partition_region) and parse_partition_region()

2022-06-23 11:19:55 +03:00

mutation_fragment.hh

mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh

2022-06-23 11:19:55 +03:00

mutation_partition_serializer.cc

idl: make idl headers self-sufficient

2022-08-08 08:02:27 +03:00

mutation_partition_serializer.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

mutation_partition_view.cc

idl: make idl headers self-sufficient

2022-08-08 08:02:27 +03:00

mutation_partition_view.hh

mutation_partition_view: add accept_gently methods

2022-05-05 13:32:25 +03:00

mutation_partition_visitor.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

mutation_partition.cc

Merge 'Add support for empty replica pages' from Botond Dénes

2022-08-10 13:38:06 +03:00

mutation_partition.hh

Revert "Merge "memtable-sstable: Add compacting reader when flushing memtable." from Mikołaj"

2022-08-09 11:23:29 +03:00

mutation_query.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

mutation_query.hh

query: coroutinize to_data_query_result

2022-05-05 13:32:25 +03:00

mutation_rebuilder.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

mutation_source_metadata.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

mutation.cc

test: mutation: Compare against compacted mutations

2022-06-15 11:30:01 +02:00

mutation.hh

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

noexcept_traits.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

NOTICE.txt

…

ORIGIN

…

partition_builder.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

partition_range_compat.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

partition_slice_builder.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

partition_slice_builder.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

partition_snapshot_reader.hh

memtable: Fix missing range tombstones during reads under ceratin rare conditions

2022-06-29 19:02:23 +03:00

partition_snapshot_row_cursor.hh

row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy

2022-08-09 02:28:56 +02:00

partition_version_list.hh

row_cache: Fix undefined behavior during eviction under some conditions

2022-08-01 23:53:15 +02:00

partition_version.cc

mvcc: Introduce apply_resume to hold state for partition version merging

2022-06-15 11:30:01 +02:00

partition_version.hh

mvcc: Introduce apply_resume to hold state for partition version merging

2022-06-15 11:30:01 +02:00

position_in_partition.hh

row_cache: Fix missing row if upper bound of population range is evicted and has adjacent dummy

2022-08-09 02:28:56 +02:00

protocol_server.hh

compile: Fix headers so that *-headers targets compile cleanly.

2022-03-25 16:19:26 +02:00

querier.cc

querier: querier_cache: remove now unused evict_all_for_table()

2022-08-15 14:16:41 +03:00

querier.hh

querier: querier_cache: remove now unused evict_all_for_table()

2022-08-15 14:16:41 +03:00

query_class_config.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

query_ranges_to_vnodes.cc

storage_proxy: extract query_ranges_to_vnodes_generator to a separate file

2022-02-01 21:14:41 +01:00

query_ranges_to_vnodes.hh

storage_proxy: extract query_ranges_to_vnodes_generator to a separate file

2022-02-01 21:14:41 +01:00

query_result_merger.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

query-request.hh

query: add tombstone-limit to read-command

2022-08-10 06:01:47 +03:00

query-result-reader.hh

idl: make idl headers self-sufficient

2022-08-08 08:02:27 +03:00

query-result-set.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

query-result-set.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

query-result-writer.hh

query-result-writer: stop when tombstone-limit is reached

2022-08-10 06:03:38 +03:00

query-result.hh

service/storage_proxy: set smallest continue pos as query's continue pos

2022-08-10 06:03:38 +03:00

query.cc

query: result_merger::get() don't reset last-pos on short-reads and last pages

2022-08-10 06:01:49 +03:00

range_tombstone_assembler.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

range_tombstone_change_generator.hh

range_tombstone_change_generator: flush(): add end_of_range

2022-04-21 14:37:10 +03:00

range_tombstone_list.cc

range_tombstone_list: Avoid amortized_reserve()

2022-08-09 11:34:16 +03:00

range_tombstone_list.hh

db: range_tombstone_list: Avoid quadratic behavior when applying

2022-08-05 20:34:07 +03:00

range_tombstone_splitter.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

range_tombstone.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

range_tombstone.hh

Move dev docs to docs/dev

2022-06-24 18:07:08 +01:00

range.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

read_context.hh

row_cache: update reader implementations to v2

2022-04-21 14:57:04 +03:00

reader_concurrency_semaphore.cc

reader_concurrency_semaphore: add evict_inactive_reads_for_table()

2022-08-15 14:16:41 +03:00

reader_concurrency_semaphore.hh

reader_concurrency_semaphore: add evict_inactive_reads_for_table()

2022-08-15 14:16:41 +03:00

reader_permit.hh

evicatble_reader: avoid preemption pitfall around waiting for readmission

2022-03-15 14:37:22 +02:00

README.md

Fix broken links

2022-06-28 15:19:36 +01:00

real_dirty_memory_accounter.hh

memtable: move to replica module and namespace

2022-02-23 09:05:16 +02:00

release.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

release.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

reversibly_mergeable.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

row_cache.cc

cache_tracker: Make clear() leave no garbage

2022-08-02 11:02:22 +02:00

row_cache.hh

row_cache: update reader implementations to v2

2022-04-21 14:57:04 +03:00

schema_builder.hh

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

schema_fwd.hh

schema, everywhere: define and use table_schema_version as a strong type

2022-08-08 08:09:45 +03:00

schema_mutations.cc

schema, everywhere: define and use table_schema_version as a strong type

2022-08-08 08:09:45 +03:00

schema_mutations.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

schema_registry.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

schema_registry.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

schema_upgrader.hh

compile: Fix headers so that *-headers targets compile cleanly.

2022-03-25 16:19:26 +02:00

schema.cc

cql, view: rename and explain bytes_with_action

2022-08-14 10:29:52 +03:00

schema.hh

schema, index/secondary_index_manager: make schema for index-induced mv

2022-08-14 10:29:14 +03:00

scylla_post_install.sh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

scylla-gdb.py

main: start compaction_manager as a sharded service

2022-08-02 07:50:15 +03:00

SCYLLA-VERSION-GEN

configure.py: add date-stamp parameter

2022-08-08 17:28:38 +03:00

seastarx.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

serialization_visitors.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

serializer_impl.hh

serializer_impl: generalize (de)serialization of unordered_set

2022-07-18 18:20:33 +02:00

serializer.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

serializer.hh

code: Convert is_integral assertions to concepts

2022-02-24 19:44:29 +03:00

service_permit.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

setup.py

…

shell.nix

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

sstables_loader.cc

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

sstables_loader.hh

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

supervisor.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

table_helper.cc

treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines

2022-05-31 09:06:24 +03:00

table_helper.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

test.py

test.py: call before/after_test for each test case

2022-08-11 23:39:13 +02:00

timeout_config.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

timeout_config.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

timestamp.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

to_string.hh

to_string: generalize operator<< for unordered_set

2022-07-18 18:20:33 +02:00

tombstone_gc_extension.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

tombstone_gc_options.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

tombstone_gc_options.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

tombstone_gc.cc

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

tombstone_gc.hh

schema, everywhere: define and use table_id as a strong type

2022-08-08 08:09:41 +03:00

tombstone.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

tox.ini

…

types.cc

bit_cast: use std::bit_cast

2022-08-08 08:02:27 +03:00

types.hh

types: publish timestamp_from_string()

2022-08-02 10:33:01 +03:00

ubsan-suppressions.supp

…

unimplemented.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

unimplemented.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

validation.cc

treewide: remove empty comments in top-of-files

2022-05-13 07:11:58 +02:00

validation.hh

treewide: remove empty comments in top-of-files

2022-05-13 07:11:58 +02:00

version.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

view_info.hh

view info: has_computed_column_depending_on_base_non_primary_key

2022-08-14 10:29:14 +03:00

vint-serialization.cc

treewide: remove empty comments in top-of-files

2022-05-13 07:11:58 +02:00

vint-serialization.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

xx_hasher.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

zstd.cc

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

README.md

Scylla

What is Scylla?

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

For more information, please see the ScyllaDB web site.

Build Prerequisites

Scylla is fairly fussy about its build environment, requiring very recent versions of the C++20 compiler and of many libraries to build. The document HACKING.md includes detailed information on building and developing Scylla, but to get Scylla building quickly on (almost) any build machine, Scylla offers a frozen toolchain, This is a pre-configured Docker image which includes recent versions of all the required compilers, libraries and build tools. Using the frozen toolchain allows you to avoid changing anything in your build machine to meet Scylla's requirements - you just need to meet the frozen toolchain's prerequisites (mostly, Docker or Podman being available).

Building Scylla

Building Scylla with the frozen toolchain dbuild is as easy as:

$ git submodule update --init --force --recursive
$ ./tools/toolchain/dbuild ./configure.py
$ ./tools/toolchain/dbuild ninja build/release/scylla

For further information, please see:

Developer documentation for more information on building Scylla.
Build documentation on how to build Scylla binaries, tests, and packages.
Docker image build documentation for information on how to build Docker images.

Running Scylla

To start Scylla server, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --workdir tmp --smp 1 --developer-mode 1

This will start a Scylla node with one CPU core allocated to it and data files stored in the tmp directory. The --developer-mode is needed to disable the various checks Scylla performs at startup to ensure the machine is configured for maximum performance (not relevant on development workstations). Please note that you need to run Scylla with dbuild if you built it with the frozen toolchain.

For more run options, run:

$ ./tools/toolchain/dbuild ./build/release/scylla --help

Testing

See test.py manual.

Scylla APIs and compatibility

By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and Thrift. There is also support for the API of Amazon DynamoDB™, which needs to be enabled and configured in order to be used. For more information on how to enable the DynamoDB™ API in Scylla, and the current compatibility of this feature as well as Scylla-specific extensions, see Alternator and Getting started with Alternator.

Documentation

Documentation can be found here. Seastar documentation can be found here. User documentation can be found here.

Training

Training material and online courses can be found at Scylla University. The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, multi-datacenters and how Scylla integrates with third-party applications.

Contributing to Scylla

If you want to report a bug or submit a pull request or a patch, please read the contribution guidelines.

If you are a developer working on Scylla, please read the developer guidelines.

Contact

The users mailing list and Slack channel are for users to discuss configuration, management, and operations of the ScyllaDB open source.
The developers mailing list is for developers and people interested in following the development of ScyllaDB to discuss technical topics.

Languages

C++ 72.3%

Python 26.5%

CMake 0.3%

GAP 0.3%

Shell 0.3%