scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 03:20:37 +00:00

Files

Botond Dénes 39bcf99f8e Merge 'Apply hard limit to partition range vectors in secondary index queries' from Nikos Dragazis

Secondary index queries fetch partition keys from the index view and store them in an `std::vector`. The vector size is currently limited by the user's page size and the page memory limit (1MiB). These are not enough to prevent large contiguous allocations (which can lead to stalls).

This series introduces a hard limit to the vector size to ensure it does not exceed the allocator's preferred max contiguous allocation size (128KiB). With the size of each element being 120 bytes, this allows for 1092 partition keys. The limit was set to 1000. Any partitions above this limit are discarded.

Discarding partitions breaks the querier cache on the replicas, causing a performance regression, as can be seen from the following measurements:
```
* Cluster: 3 nodes (local Docker containers), 1 vCPU, 4GB memory, dev mode
* Schema:
  CREATE KEYSPACE ks WITH replication = {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'datacenter1': '3'} AND durable_writes = true AND tablets = {'enabled': false};
  CREATE TABLE ks.t1 (pk1 int, pk2 int, ck int, value int, PRIMARY KEY ((pk1, pk2), ck));
  CREATE INDEX t1_pk2_idx ON ks.t1(pk2);
* Query: CONSISTENCY LOCAL_QUORUM; SELECT * FROM ks.t1 where pk2 = 1;

+------------+-------------------+-------------------+
|  Page Size |      Master       |   Vector Limit    |
+============+===================+===================+
|            |   Latency (sec)   |   Latency (sec)   |
+------------+-------------------+-------------------+
|     100    |  5.80 ± 0.13      |  5.64 ± 0.10      |
+------------+-------------------+-------------------+
|    1000    |  4.77 ± 0.07      |  4.62 ± 0.06      |
+------------+-------------------+-------------------+
|    2000    |  4.67 ± 0.07      |  5.13 ± 0.03      |
+------------+-------------------+-------------------+
|    5000    |  4.82 ± 0.09      |  6.25 ± 0.06      |
+------------+-------------------+-------------------+
|   10000    |  4.89 ± 0.36      |  7.52 ± 0.13      |
+------------+-------------------+-------------------+
|     -1     |  4.90 ± 0.67      |  4.79 ± 0.33      |
+------------+-------------------+-------------------+
```
We expect this to be fixed with adaptive paging in a future PR. Until then, users can avoid regressions by adjusting their page size.

Additionally, this series changes the `untyped_result_set` to store rows in a `chunked_vector` instead of an `std::vector`, similarly to the `result_set`. Secondary index queries use an `untyped_result_set` to store the raw result from the index view before processing. With 1MiB results, the `std::vector` would cause a large allocation of this magnitude.

Finally, a unit test is added to reproduce the bug.

Fixes #18536.

The PR fixes stalls of up to 100ms, but there is an easy workaround: adjust the page size. No need to backport.

Closes scylladb/scylladb#22682

* github.com:scylladb/scylladb:
  cql3: secondary index: Limit page size for single-row partitions
  cql3: secondary index: Limit the size of partition range vectors
  cql3: untyped_result_set: Store rows in chunked_vector
  test: Reproduce bug with large allocations from secondary index

2025-03-14 15:06:07 +02:00

alternator

alternator: document the state of tablet support in Alternator

2025-03-14 14:03:15 +03:00

boost

Merge 'Apply hard limit to partition range vectors in secondary index queries' from Nikos Dragazis

2025-03-14 15:06:07 +02:00

broadcast_tables

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cluster

db/hints: Cancel draining when stopping node

2025-03-13 11:55:15 +02:00

cql

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

cqlpy

Merge 'scylla-sstable: add native S3 support' from Ernest Zaslavsky

2025-03-14 15:05:52 +02:00

ldap

test.py: Add possibility to run ldap tests from pytest

2025-02-07 21:40:28 +01:00

lib

Merge 'Main: stop system_keyspace' from Benny Halevy

2025-03-14 13:23:28 +03:00

manual

gossiper: start using host ids to send messages earlier

2025-03-11 12:09:21 +02:00

nodetool

tools/scylla-nodetool: netstats: don't assume both senders and receivers

2025-02-15 20:32:22 +02:00

perf

test: perf_sstable: close frag_stream before destoying it

2025-03-14 11:12:44 +03:00

pylib

test/ldap: assign non-busy ports to ldap

2025-03-14 11:09:19 +03:00

pylib_test

test.py: Create central conftest.

2024-11-24 20:09:48 +02:00

raft

test: Add the possibility to run raft tests with pytest

2025-02-12 14:10:19 +02:00

redis

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

resource

build: cmake: use wasm32-wasip1 as an alternative of wasm32-wasi

2025-01-16 16:28:29 +03:00

rest_api

test: Add unit test for total/live sstable sizes

2025-03-04 19:52:33 +03:00

scylla_gdb

scylla-gdb.py: add scylla tablet-metadata command

2025-02-11 07:29:46 -05:00

unit

tree: Remove unused boost headers

2025-02-15 20:32:22 +02:00

__init__.py

…

CMakeLists.txt

Introduce LDAP role manager & saslauthd authenticator

2025-01-12 14:50:29 +02:00

conftest.py

test.py: extract prepare dirs and S3 mock steps to test/conftest.py

2025-03-03 13:24:37 +03:00

pytest.ini

test.py: introduce prepare_3_nodes_cluster marker

2025-03-04 10:32:43 +01:00

README.md

…

README.md

Scylla in-source tests.

For details on how to run the tests, see docs/dev/testing.md

Shared C++ utils, libraries are in lib/, for Python - pylib/

alternator - Python tests which connect to a single server and use the DynamoDB API unit, boost, raft - unit tests in C++ cqlpy - Python tests which connect to a single server and use CQL topology* - tests that set up clusters and add/remove nodes cql - approval tests that use CQL and pre-recorded output rest_api - tests for Scylla REST API Port 9000 scylla-gdb - tests for scylla-gdb.py helper script nodetool - tests for C++ implementation of nodetool

If you can use an existing folder, consider adding your test to it. New folders should be used for new large categories/subsystems, or when the test environment is significantly different from some existing suite, e.g. you plan to start scylladb with different configuration, and you intend to add many tests and would like them to reuse an existing Scylla cluster (clusters can be reused for tests within the same folder).

To add a new folder, create a new directory, and then copy & edit its suite.ini.