scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Ferenc Szili	d7cfaf3f84	test, simulator: compute load based on tablet size instead of count This patch changes the load balancing simulator so that it computes table load based on tablet sizes instead of tablet count. best_shard_overcommit measured minimal allowed overcommit in cases where the number of tablets can not be evenly distributed across all the available shards. This is still the case, but instead of computing it as an integer div_ceil() of the average shard load, it is now computed by allocating the tablet sizes using the largest-tablet-first method. From these, we can get the lowest overcommit for the given set of nodes, shards and tablet sizes.	2026-02-12 12:54:55 +01:00
Ferenc Szili	216443c050	test, simulator: generate tablet sizes and update load_stats This change adds a random tablet size generator. The tablet sizes are created in load_stats. Further changes to the load balance simulator: - apply_plan() updates the load_stats after a migration plan is issued by the load balancer, - adds the option to set a command line option which controls the tablet size deviation factor.	2026-02-12 12:54:55 +01:00
Ferenc Szili	e31870a02d	test, simulator: postpone creation of load_stats_ptr With size based load balancing, we will have to move the tablet size in load_stats after each internode migration issued by balance_tablets(). This will be done in a subsequent commit in apply_plan() which is called from rebalance_tablets(). Currently, rebalance_tablets() is passed a load_stats_ptr which is defined as: using load_stats_ptr = lw_shared_ptr<const load_stats>; Because this is a pointer to const, apply_plan() can't modify it. So, we pass a reference to load_stats to rebalance_tablets() and create a load_stats_ptr from it for each call to balance_tablets().	2026-02-12 12:54:55 +01:00
Avi Kivity	ceec703bb7	Revert "main: test: add future and abort_source to after_init_func" This reverts commit `7bf7ff785a`. The commit tried to add clean shutdown to `scylla perf` paths, but forgot at least `scylla perf-alternator --workload wr` which now crashes on uninitialized `c.as`. Fixes #28473 Closes scylladb/scylladb#28478	2026-02-02 09:22:24 +01:00
Avi Kivity	6676953555	Merge 'test: perf: add option to write results to json in perf-cql-raw and perf-alternator' from Marcin Maliszkiewicz Adds --json-result option to perf-cql-raw and perf-alternator, the same as perf-simple-query has. It is useful for automating test runs. Related: https://scylladb.atlassian.net/browse/SCYLLADB-434 Bacport: no, original benchmark is not backported Closes scylladb/scylladb#28451 * github.com:scylladb/scylladb: test: perf: add example commands to perf-alternator and perf-cql-raw test: perf: add option to write results to json in perf-cql-raw test: perf: add option to write results to json in perf-alternator test: perf: move write_json_result to a common file	2026-02-01 13:57:10 +02:00
Marcin Maliszkiewicz	80e627c64b	test: perf: add example commands to perf-alternator and perf-cql-raw	2026-01-30 08:48:19 +01:00
Marcin Maliszkiewicz	ea29e4963e	test: perf: add option to write results to json in perf-cql-raw	2026-01-29 10:56:03 +01:00
Marcin Maliszkiewicz	d974ee1e21	test: perf: add option to write results to json in perf-alternator	2026-01-29 10:55:52 +01:00
Marcin Maliszkiewicz	a74b442c65	test: perf: move write_json_result to a common file The implementation is going to be shared with perf-alternator and perf-cql-raw.	2026-01-29 10:54:11 +01:00
Tomasz Grabiec	0d090aa47b	tablets: Cache pointer to stats during plan-making Saves on lookup cost, esp. for candidate evaluation. This showed up in perf profile in the past. Also, lays the ground for splitting stats per rack.	2026-01-28 01:32:00 +01:00
Marcin Maliszkiewicz	32543625fc	test: perf: reuse stream id When one request is super slow and req/s high in theory we have a collision on id, this patch avoids that by reusing id and aborting when there is no free one (unlikely).	2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz	7bf7ff785a	main: test: add future and abort_source to after_init_func This commit avoids leaking seastar::async future from two benchmark tools: perf-alternator and perf-cql-raw. Additionally it adds abort_source for fast and clean shutdown.	2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz	0d20300313	test: perf: add option to stress multiple tables in perf-cql-raw	2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz	a033b70704	test: perf: add perf-cql-raw benchmarking tool The tool supports: - auth or no auth modes - simple read and write workloads - connection pool or connection per request modes - in-process or remote modes, remote may be usefull to assess tool's overhead or use it as bigger scale benchmark - uses prepared statements by default - connection only mode, for testing storms It could support in the future: - TLS mode - different workloads - shard awareness Example usage: > build/release/scylla perf-cql-raw --workdir /tmp/scylla-data --smp 2 --cpus 0,1 \ --developer-mode 1 --workload read --duration 5 2> /dev/null Running test with config: {workload=read, partitions=10000, concurrency=100, duration=5, ops_per_shard=0} Pre-populated 10000 partitions 97438.42 tps (269.2 allocs/op, 1.1 logallocs/op, 35.2 tasks/op, 113325 insns/op, 80572 cycles/op, 0 errors) 102460.77 tps (261.1 allocs/op, 0.0 logallocs/op, 31.7 tasks/op, 108222 insns/op, 75447 cycles/op, 0 errors) 95707.93 tps (261.0 allocs/op, 0.0 logallocs/op, 31.7 tasks/op, 108443 insns/op, 75320 cycles/op, 0 errors) 102487.87 tps (261.0 allocs/op, 0.0 logallocs/op, 31.7 tasks/op, 107956 insns/op, 74320 cycles/op, 0 errors) 100409.60 tps (261.0 allocs/op, 0.0 logallocs/op, 31.7 tasks/op, 108337 insns/op, 75262 cycles/op, 0 errors) throughput: mean= 99700.92 standard-deviation=3039.28 median= 100409.60 median-absolute-deviation=2759.85 maximum=102487.87 minimum=95707.93 instructions_per_op: mean= 109256.53 standard-deviation=2281.39 median= 108337.37 median-absolute-deviation=1034.83 maximum=113324.69 minimum=107955.97 cpu_cycles_per_op: mean= 76184.36 standard-deviation=2493.46 median= 75320.20 median-absolute-deviation=922.09 maximum=80572.19 minimum=74320.00	2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz	1318ff5a0d	test: perf: move cut_arg helper func to common code It will be reused later.	2026-01-19 14:33:10 +01:00
Ferenc Szili	621cb19045	load_sketch: use tablet sizes in load computation This commit changes load_sketch so that it computes node and shard load based on tablet sizes instead of tablet count.	2025-12-27 10:37:23 +01:00
Aleksandra Martyniuk	d66a36058b	service: pass topology and system_keyspace to load_balancer ctor Pass a pointer to service::topology and db::system_keyspace to load balancer. It will be used in the following patches to create rack_list_colocation plan.	2025-12-16 13:25:38 +01:00
Tomasz Grabiec	721434054b	perf-row-cache-update: Add scenario with large tombstone covering many rows Fills memtable with rows and a tombstone which deletes all rows which are already in cache. Similar to raft log workload, but more extreme. With -c1 -m4G, observed really bad performance: update: 1711.976196 [ms], preemption: {count: 22603, 99%: 0.943127 [ms], max: 1494.571776 [ms]}, cache: 2148/2906 [MB], alloc/comp: 1334/869 [MB] (amp: 0.651), pr/me/dr 1062186/0/1062187 cache: 2148/2906 [MB], memtable: 738/1024 [MB], alloc/comp: 993/0 [MB] (amp: 0.000) Which means that max reactor stall during cache update was 1.5 [s] 0.7 GB memtables. 2.1 GB in cache.	2025-12-06 01:03:09 +01:00
Botond Dénes	6ee0f1f3a7	Merge 'replica/table: add a metric for hypothetical total file size without compression' from Michał Chojnowski This patch adds a metric for pre-compression size of sstable files. This patch adds a per-table metric `scylla_column_family_total_disk_space_before_compression`, which measures the hypothetical total size of sstables on disk, if Data.db was replaced with an uncompressed equivalent. As for the implementation: Before the patch, tables and sstable sets are already tracking their total physical file size. Whenever sstables are added or removed, the size delta is propagated from the sstable up through sstable sets into table_stats. To implement the new metric, we turn the size delta that is getting passed around from a one-dimensional to a two-dimensional value, which includes both the physical and the pre-compression size. New functionality, no backport needed. Closes scylladb/scylladb#26996 * github.com:scylladb/scylladb: replica/table: add a metric for hypothetical total file size without compression replica/table: keep track of total pre-compression file size	2025-11-20 09:10:38 +02:00
Michał Chojnowski	1cfce430f1	replica/table: keep track of total pre-compression file size Every table and sstable set keeps track of the total file size of contained sstables. Due to a feature request, we also want to keep track of the hypothetical file size if Data files were uncompressed, to add a metric that shows the compression ratio of sstables. We achieve this by replacing the relevant `uint_64 bytes_on_disk` counters everywhere with a struct that contains both the actual (post-compression) size and the hypothetical pre-compression size. This patch isn't supposed to change any observable behavior. In the next patch, we will use these changes to add a new metric.	2025-11-13 00:49:57 +01:00
Dario Mirovic	549e6307ec	audit: unify `create_audit` and `start_audit` There is no need to have `create_audit` separate from `start_audit`. `create_audit` just stores the passed parameters, while `start_audit` does the actual initialization and startup work. Refs #26022	2025-11-06 03:05:06 +01:00
Botond Dénes	24c6476f73	mutation/mutation_compactor: add tombstone_gc_state to query ctor So tombstones can be purged correctly based on the tombstone gc mode. Currently if repair-mode is used, tombstones are not purged at all, which can lead to purged tombstone being re-replicated to replicas which already purged them via read-repair. This is not a correctness problem, tombstones are not included in data query resutl or digest, these purgable tombstone are only a nuissance for read repair, where they can create extra differences between replicas. Note that for the read repair to trigger, some difference other than in purgable tombstones has to exist, because as mentioned above, these are not included in digets. Fixes: scylladb/scylladb#24332 Closes scylladb/scylladb#26351	2025-10-12 17:48:15 +03:00
Andrzej Jackowski	14081d0727	generic_server: transport: start using `sl:driver` for new connections Before this change, new connections were handled in a default scheduling group (`main`), because before the user is authenticated we do not know which service level should be used. With the new `sl:driver` service level, creation of new connections can be moved to `sl:driver`. We switch the service level as early as possible, in `do_accepts`. There is a possibility, that `sl:driver` will not exist yet, for instance, in specific upgrade cases, or if it was removed. Therefore, we also switch to `sl:driver` after a connection is accepted. Refs: scylladb/scylladb#24411	2025-10-08 08:25:12 +02:00
Pavel Emelyanov	6ad8dc4a44	Merge 'root,replica: mv querier to replica/' from Botond Dénes The querier object is a confusing one. Based on its name it should be in the query/ module and it is already in the query namespace. The query namespace is used for symbols which span the coordinator and replica, or that are mostly coordinator side. The querier is mainly in this namespace due to its similar name and because at the time it was introduced, namespace replica didn't exist yet. But this is a mistake which confuses people. The querier is actually a completely replica-side logic, implementing the caching of the readers on the replica. Move it to the replica module and namespace to make this more clear. Code cleanup, no backport. Closes scylladb/scylladb#26280 * github.com:scylladb/scylladb: replica: move querier code to replica namespace root,replica: mv querier to replica/	2025-10-06 08:26:05 +03:00
Avi Kivity	15fa1c1c7e	Merge 'sstables/trie: translate all key cells in one go, not lazily' from Michał Chojnowski Applying lazy evaluation to the BTI encoding of clustering keys was probably a bad default. The possible benefits are dubious (because it's quite likely that the laziness won't allow us to avoid that much work), but the overhead needed to implement the laziness is large and immediate. In this patch we get rid of the laziness. We rewrite lazy_comparable_bytes_from_clustering_position and lazy_comparable_bytes_from_ring_position so that they performs the key translation eagerly, all components to a single bytes_ostream in one synchronous call. perf_bti_key_translation (microbenchmark added in this series, 1 iteration is 100 translations of a clustering key with 8 cells of int32_type): ``` Before: test iterations median mad min max allocs tasks inst cycles lcb_mismatch_test.lcb_mismatch 9233 109.930us 0.000ns 109.930us 109.930us 4356.000 0.000 2615394.3 614709.6 After: test iterations median mad min max allocs tasks inst cycles lcb_mismatch_test.lcb_mismatch 50952 19.487us 0.000ns 19.487us 19.487us 198.000 0.000 603120.1 109042.9 ``` Enhancement, backport not required. Closes scylladb/scylladb#26302 * github.com:scylladb/scylladb: sstables/trie: BTI-translate the entire partition key at once sstables/trie: avoid an unnecessary allocation of std::generator in last_block_offset() sstables/trie: perform the BTI-encoding of position_in_partition eagerly types/comparable_bytes: add comparable_bytes_from_compound test/perf: add perf_bti_key_translation	2025-10-01 14:59:06 +03:00
Benny Halevy	b17a36c071	tablets: read_tablet_mutations: use unfreeze_and_split_gently Split the tablets mutations by number of rows, based on `min_tablets_in_mutation` (currently calibrated to 1024), similar to the splitting done in `storage_service::merge_topology_snapshot`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-09-30 17:15:41 +03:00
Benny Halevy	3c07e0e877	perf-tablets: change default tables and tablets-per-table tablets-per-table must be a power of 2, so round up 10000 to 16K. also, reduce number of tables to have a total of about 100K tablets, otherwise we hit the maximum commitlog mutation size limit in save_tablet_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-09-30 17:07:06 +03:00
Benny Halevy	2c3fb341e9	perf-tablets: abort on unhandled exception Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-09-30 17:07:06 +03:00
Ernest Zaslavsky	debc756794	treewide: Move transport related files to a `transport` directory As requested in #22112 , moved the files and fixed other includes and build system. Moved files: - generic_server.hh - generic_server.cc - protocol_server.hh Fixes: #22112 This is a cleanup, no need to backport Closes scylladb/scylladb#25090	2025-09-29 11:46:06 +03:00
Botond Dénes	2b4a140610	replica: move querier code to replica namespace The query namespace is used for symbols which span the coordinator and replica, or that are mostly coordinator side. The querier is mainly in this namespace due to its similar name, but this is a mistake which confuses people. Now that the code was moved to replica/, also fix the namespace to be namespace replica.	2025-09-29 06:44:52 +03:00
Botond Dénes	ee3d2f5b43	root,replica: mv querier to replica/ The querier object is a confusing one. Based on its name it should be in the query/ module and it is already in the query namespace. But this is actually a completely replica-side logic, implementing the caching of the readers on the replica. Move it to the replica module to make this more clear.	2025-09-29 06:33:53 +03:00
Michał Chojnowski	3703197c4c	test/perf: add perf_bti_key_translation Add a microbenchmark for translating keys to BTI encoding.	2025-09-29 04:08:00 +02:00
Botond Dénes	1999d8e3d3	compaction: remove using namespace {compaction,sstables} Some files in compaction/ have using namespace {compaction,sstables} clauses, some even in headers. This is considered bad practice and muddies the namespace use. Remove them.	2025-09-25 15:03:57 +03:00
Botond Dénes	86ed627fc4	compaction: move code to namespace compaction The namespace usage in this directory is very inconsistent, with files and classes scattered in: * global namespace * namespace compaction * namespace sstables With cases, where all three used in the same file. This code used to live in sstables/ and some of it still retains namespace sstables as a heritage of that time. The mismatch between the dir (future module) and the namespace used is confusing, so finish the migration and move all code in compaction/ to namespace compaction too. This patch, although large, is mechanic and only the following kind of changes are made: * replace namespace sstable {} with namespace compaction {} * add namespace compaction {} * drop/add sstables:: * drop/add compaction:: * move around forward-declarations so they are in the correct namespace context This refactoring revealed some awkward leftover coupling between sstables and compaction, in sstables/sstable_set.cc, where the make_sstable_set() methods of compaction strategies are implemented.	2025-09-25 15:03:56 +03:00
Tomasz Grabiec	2b03a69065	test: perf: perf-load-balancing: Add parallel-scaleout scenario Simulates reblancing on a single scale-out involving simultaneous addition of multiple nodes per rack. Default parameters create a cluster with 2 racks, 70 tables, 256 tablets/table, 10 nodes, 88 shards/node. Adds 6 nodes in parallel (3 per rack). Current result on my laptop: testlog - Rebalance took 21.874 [s] after 82 iteration(s)	2025-09-23 00:31:31 +02:00
Tomasz Grabiec	0dcaaa061e	test: perf: perf-load-balancing: Convert to tool_app_template To support sub-commands for testing different scenarios. The current scenario is given the name "rolling-add-dec".	2025-09-23 00:30:38 +02:00
Avi Kivity	1258e7c165	Revert "Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski" This reverts commit `fe7e63f109`, reversing changes made to `b5f3f2f4c5`. It is causing test.py failures around cqlpy. Fixes #26163 Closes scylladb/scylladb#26174	2025-09-22 09:32:46 +03:00
Dawid Mędrek	0d2560c07f	test/perf/tablet_load_balancing.cc: Create nodes within one DC In `789a4a1ce7`, we adjusted the test file to work with the configuration option `rf_rack_valid_keyspaces`. Part of the commit was making the two tables used in the test replicate in separate data centers. Unfortunately, that destroyed the point of the test because the tables no longer competed for resources. We fix that by enforcing the same replication factor for both tables. We still accept different values of replication factor when provided manually by the user (by `--rf1` and `--rf2` commandline options). Scylla won't allow for creating RF-rack-invalid keyspaces, but there's no reason to take away the flexibility the user of the test already has. Fixes scylladb/scylladb#26026 Closes scylladb/scylladb#26115	2025-09-21 21:36:43 +02:00
Pavel Emelyanov	a1ea553fe1	code: Replace distributed<> with sharded<> The latter is recommended in seastar, and the former was left as compatibility alias. Latest seastar explicitly marks it as deprecated so once the submodule is updated, compilation logs will explode. Most of the patch is generated with for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]>') ; do sed -e 's/\<distributed<$[A-Za-z0-9:_]$>/sharded<\1>/g' -i $f; done for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done and a small manual change in test/perf/perf.hh Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26136	2025-09-19 12:22:51 +02:00
Avi Kivity	fe7e63f109	Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski This patch series: - Increases the number of allowed scheduling groups to allow creation of `sl:driver` - Implements `create_driver_service_level` that creates `sl:driver` with shares=200 if it wasn't already created - Implements creation of `sl:driver` for new systems and tests in `raft_initialize_discovery_leader` - Modifies `topology_coordinator` to use create `sl:driver` after upgrades. - Implements using `sl:driver` for new connections in `transport/server` - Adds to `transport/server` recognition of driver's control connections and forcing them to keep using `sl:driver`. - Adds tests to verify the new functionality - Modifies existing tests to let them pass after `sl:driver` is added - Modifies the documentation to contain new `sl:driver` The changes were evaluated by a test with the following scenario ([test_connections-sl-driver.py](https://github.com/user-attachments/files/22021273/test_connections-sl-driver.py)): - Start ScyllaDB with one node - Create 1000 keyspaces, 1 table in each keyspace - Start `cassandra-stress` (`-rate threads=50 -mode native cql3`) - Run connection storm with 1000 session (100 python processes, 10 sessions each) The maximum latency during connection storm dropped from 224.94ms to 41.43ms (those numbers are average from 20 test executions, were max latency was in [140ms, 361ms] before change and [31.4ms, 61.5ms] after). The snippet of cassandra-stress output from the moment of connection storm: Before: ``` type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb ... total, 789206, 85887, 85887, 85887, 0.6, 0.3, 2.0, 2.0, 2.5, 5.0, 9.0, 0.09679, 0, 0, 0, 0, 0, 0 total, 909322, 120116, 120116, 120116, 0.4, 0.2, 1.9, 2.0, 2.1, 3.1, 10.0, 0.09053, 0, 0, 0, 0, 0, 0 total, 964392, 55070, 55070, 55070, 0.9, 0.4, 2.0, 4.5, 7.7, 18.9, 11.0, 0.09203, 0, 0, 0, 0, 0, 0 total, 975705, 11313, 11313, 11313, 4.4, 3.5, 6.5, 24.5, 82.7, 83.0, 12.0, 0.11713, 0, 0, 0, 0, 0, 0 total, 987548, 11843, 11843, 11843, 4.2, 3.5, 6.5, 33.7, 48.6, 51.5, 13.0, 0.13366, 0, 0, 0, 0, 0, 0 total, 995422, 7874, 7874, 7874, 6.3, 4.0, 7.7, 85.6, 112.9, 113.5, 14.0, 0.14753, 0, 0, 0, 0, 0, 0 total, 1007228, 11806, 11806, 11806, 4.3, 3.5, 6.5, 29.1, 43.8, 87.1, 15.0, 0.15598, 0, 0, 0, 0, 0, 0 total, 1012840, 5612, 5612, 5612, 8.2, 5.0, 11.5, 121.8, 166.6, 170.1, 16.0, 0.16535, 0, 0, 0, 0, 0, 0 total, 1016186, 3346, 3346, 3346, 13.4, 7.4, 20.1, 204.9, 207.6, 210.4, 17.0, 0.17405, 0, 0, 0, 0, 0, 0 total, 1025462, 9276, 9276, 9276, 6.3, 3.9, 9.6, 74.6, 206.8, 210.0, 18.0, 0.17800, 0, 0, 0, 0, 0, 0 total, 1035979, 10517, 10517, 10517, 4.8, 3.5, 6.7, 38.5, 82.6, 83.0, 19.0, 0.18120, 0, 0, 0, 0, 0, 0 total, 1047488, 11509, 11509, 11509, 4.3, 3.5, 6.0, 32.6, 72.3, 74.0, 20.0, 0.18334, 0, 0, 0, 0, 0, 0 total, 1077456, 29968, 29968, 29968, 1.7, 1.6, 2.9, 3.6, 7.0, 8.2, 21.0, 0.17943, 0, 0, 0, 0, 0, 0 total, 1105490, 28034, 28034, 28034, 1.8, 1.8, 3.5, 4.6, 5.3, 13.8, 22.0, 0.17609, 0, 0, 0, 0, 0, 0 total, 1132221, 26731, 26731, 26731, 1.9, 1.8, 3.8, 5.2, 8.4, 11.1, 23.0, 0.17314, 0, 0, 0, 0, 0, 0 total, 1162149, 29928, 29928, 29928, 1.7, 1.7, 3.0, 4.5, 8.0, 9.1, 24.0, 0.16950, 0, 0, 0, 0, 0, 0 ... ``` After: ``` type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb ... total, 822863, 94379, 94379, 94379, 0.5, 0.3, 2.0, 2.0, 2.1, 3.7, 9.0, 0.06669, 0, 0, 0, 0, 0, 0 total, 937337, 114474, 114474, 114474, 0.4, 0.2, 2.0, 2.0, 2.1, 3.4, 10.0, 0.06301, 0, 0, 0, 0, 0, 0 total, 986630, 49293, 49293, 49293, 1.0, 1.0, 2.0, 2.1, 17.9, 19.0, 11.0, 0.07318, 0, 0, 0, 0, 0, 0 total, 1026734, 40104, 40104, 40104, 1.2, 1.0, 2.0, 2.2, 6.3, 7.1, 12.0, 0.08410, 0, 0, 0, 0, 0, 0 total, 1066124, 39390, 39390, 39390, 1.3, 1.0, 2.0, 2.2, 2.6, 3.4, 13.0, 0.09108, 0, 0, 0, 0, 0, 0 total, 1103082, 36958, 36958, 36958, 1.3, 1.1, 2.1, 2.5, 3.1, 4.2, 14.0, 0.09643, 0, 0, 0, 0, 0, 0 total, 1141987, 38905, 38905, 38905, 1.3, 1.0, 2.0, 2.4, 11.4, 12.7, 15.0, 0.09894, 0, 0, 0, 0, 0, 0 total, 1180023, 38036, 38036, 38036, 1.3, 1.0, 2.0, 3.7, 5.6, 7.1, 16.0, 0.10070, 0, 0, 0, 0, 0, 0 total, 1216481, 36458, 36458, 36458, 1.4, 1.0, 2.1, 3.6, 4.7, 5.0, 17.0, 0.10210, 0, 0, 0, 0, 0, 0 total, 1256819, 40338, 40338, 40338, 1.2, 1.0, 2.0, 2.2, 3.5, 5.4, 18.0, 0.10173, 0, 0, 0, 0, 0, 0 total, 1295122, 38303, 38303, 38303, 1.3, 1.0, 2.0, 2.4, 21.0, 21.1, 19.0, 0.10136, 0, 0, 0, 0, 0, 0 total, 1334743, 39621, 39621, 39621, 1.3, 1.0, 2.0, 2.3, 3.3, 4.0, 20.0, 0.10055, 0, 0, 0, 0, 0, 0 total, 1375579, 40836, 40836, 40836, 1.2, 1.0, 2.0, 2.1, 3.4, 5.7, 21.0, 0.09927, 0, 0, 0, 0, 0, 0 total, 1415576, 39997, 39997, 39997, 1.2, 1.0, 2.0, 2.3, 3.2, 4.1, 22.0, 0.09807, 0, 0, 0, 0, 0, 0 total, 1449268, 33692, 33692, 33692, 1.5, 1.4, 2.5, 3.2, 4.2, 5.6, 23.0, 0.09800, 0, 0, 0, 0, 0, 0 total, 1471873, 22605, 22605, 22605, 2.2, 2.0, 4.8, 5.9, 7.0, 7.9, 24.0, 0.10015, 0, 0, 0, 0, 0, 0 ... ``` Fixes: https://github.com/scylladb/scylladb/issues/24411 This is a new feature, so no backport needed. Closes scylladb/scylladb#25412 * github.com:scylladb/scylladb: docs: workload-prioritization: add driver service level test: add test to verify use of `sl:driver` transport: use `sl:driver` to handle driver's control connections transport: whitespace only change in update_scheduling_group transport: call update_scheduling_group for non-auth connections generic_server: transport: start using `sl:driver` for new connections test: add test_desc_* for driver service level test: service_levels: add tests for sl:driver creation and removal test: add reload_raft_topology_state() to ScyllaRESTAPIClient service_level_controller: automatically create `sl:driver` service_level_controller: methods to create driver service level service_level_controller: handle special sl:driver in DESC output topology_coordinator: add service_level_controller reference system_keyspace: add service_level_driver_created test: add MAX_USER_SERVICE_LEVELS	2025-09-18 19:45:17 +03:00
Andrzej Jackowski	1ad483749a	generic_server: transport: start using `sl:driver` for new connections Before this change, new connections were handled in a default scheduling group (`main`), because before the user is authenticated we do not know which service level should be used. With the new `sl:driver` service level, creation of new connections can be moved to `sl:driver`. We switch the service level as early as possible, in `do_accepts`. There is a possibility, that `sl:driver` will not exist yet, for instance, in specific upgrade cases, or if it was removed. Therefore, we also switch to `sl:driver` after a connection is accepted. Refs: scylladb/scylladb#24411	2025-09-18 09:29:29 +02:00
Ernest Zaslavsky	ddf2588985	treewide: Move replica related files to `replica` directory As requested in #22099, moved the files and fixed other includes and build system. Moved files: - cache_temperature.hh - cell_locking.hh Fixes: #22099 Closes scylladb/scylladb#25079	2025-09-18 08:00:35 +03:00
Dawid Mędrek	789a4a1ce7	test/perf: Adjust tablet_load_balancing.cc to RF-rack-validity We modify the logic to make sure that all of the keyspaces that the test creates are RF-rack-valid. For that, we distribute the nodes across two DCs and as many racks as the provided replication factor. That may have an effect on the load balancing logic, but since this is a performance test and since tablet load balancing is still taking place, it should be acceptable. This commit also finishes work in adjusting perf tests to pass with the `rf_rack_valid_keyspaces` configuration option enabled. The remaining tests either don't attempt to create keyspaces or they already create RF-rack-valid keyspaces. We don't need to explicitly enable the configuration option. It's already enabled by default by `cql_test_config`. The reason why we haven't run into any issue because of that is that performance tests are not part of our CI. Fixes scylladb/scylladb#25127 Closes scylladb/scylladb#25728	2025-09-09 12:46:46 +03:00
Radosław Cybulski	7b3d42f83e	Remove unused boost macro definitions Closes scylladb/scylladb#25742	2025-09-03 10:06:33 +03:00
Dawid Mędrek	fc50e9d0a4	test/perf: Require smp=1 in perf_cache_eviction Trying to run the test with more than one shard results in a failure when generating sharding metadata: ``` ERROR 2025-08-27 16:00:17,551 [shard 0:main] table - Memtable flush failed due to: std::runtime_error (Failed to generate sharding metadata for /tmp/scylla-c9fa42fe/ks/cf-2938a030834e11f0a561ffa33feb022d/me-3gt6_12wh_1gifk2ijgeu1ovc1m5-big-Data.db). Aborting ``` Let's require that the test be run with a single shard. Closes scylladb/scylladb#25703	2025-09-01 08:59:35 +03:00
Raphael S. Carvalho	2c4a9ba70c	treewide: Rename table_state to compaction_group_view Since table_state is a view to a compaction group, it makes sense to rename it as so. With upcoming incremental repair, each replica::compaction_group will be actually two compaction groups, so there will be two views for each replica::compaction_group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-08-08 06:51:28 +03:00
Benny Halevy	dee0d7ffbf	locator: tablets: get rid of synchronous mutate_tablet_map It is currently used only by tests that could very well do with mutate_tablet_map_async. This will simplify the following patch to prevent accidental copy of the tablet_map, provding explicit clone/clone_gently methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-07-22 15:03:02 +03:00
Avi Kivity	6fce817aa8	Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft. Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit(). Backport: no, it's a new feature Fixes: https://github.com/scylladb/scylladb/issues/19649 Fixes https://github.com/scylladb/scylladb/issues/24531 Closes scylladb/scylladb#24886 [avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>] * github.com:scylladb/scylladb: test: add type creation to test_snapshot storage_service: always wake up load balancer on update tablet metadata db: schema_applier: call destroy also when exception occurs db: replica: simplify seeding ERM during shema change db: remove cleanup from add_column_family db: abort on exception during schema commit phase db: make user defined types changes atomic replica: db: make keyspace schema changes atomic db: atomically apply changes to tables and views replica: make truncate_table_on_all_shards get whole schema from table_shards service: split update_tablet_metadata into two phases service: pull out update_tablet_metadata from migration_listener db: service: add store_service dependency to schema_applier service: simplify load_tablet_metadata and update_tablet_metadata db: don't perform move on tablet_hint reference replica: split add_column_family_and_make_directory into steps replica: db: split drop_table into steps db: don't move map references in merge_tables_and_views() db: introduce commit_on_shard function db: access types during schema merge via special storage replica: make non-preemptive keyspace create/update/delete functions public replica: split update keyspace into two phases replica: split creating keyspace into two functions db: rename create_keyspace_from_schema_partition db: decouple functions and aggregates schema change notification from merging code db: store functions and aggregates change batch in schema_applier db: decouple tables and views schema change notifications from merging code db: store tables and views schema diff in schema_applier db: decouple user type schema change notifications from types merging code service: unify keyspace notification functions arguments db: replica: decouple keyspace schema change notifications to a separate function db: add class encapsulating schema merging	2025-07-13 20:47:55 +03:00
Benny Halevy	3feb759943	everywhere: use utils::chunked_vector for list of mutations Currently, we use std::vector<*mutation> to keep a list of mutations for processing. This can lead to large allocation, e.g. when the vector size is a function of the number of tables. Use a chunked vector instead to prevent oversized allocations. `perf-simple-query --smp 1` results obtained for fixed 400MHz frequency and PGO disabled: Before (read path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 89055.97 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 18003 cycles/op, 0 errors) 103372.72 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39380 insns/op, 17300 cycles/op, 0 errors) 98942.27 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39413 insns/op, 17336 cycles/op, 0 errors) 103752.93 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39407 insns/op, 17252 cycles/op, 0 errors) 102516.77 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39403 insns/op, 17288 cycles/op, 0 errors) throughput: mean= 99528.13 standard-deviation=6155.71 median= 102516.77 median-absolute-deviation=3844.59 maximum=103752.93 minimum=89055.97 instructions_per_op: mean= 39403.99 standard-deviation=14.25 median= 39406.75 median-absolute-deviation=9.30 maximum=39416.63 minimum=39380.39 cpu_cycles_per_op: mean= 17435.81 standard-deviation=318.24 median= 17300.40 median-absolute-deviation=147.59 maximum=18002.53 minimum=17251.75 ``` After (read path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 59755.04 tps ( 66.2 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39466 insns/op, 22834 cycles/op, 0 errors) 71854.16 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39417 insns/op, 17883 cycles/op, 0 errors) 82149.45 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 39411 insns/op, 17409 cycles/op, 0 errors) 49640.04 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 19975 cycles/op, 0 errors) 54963.22 tps ( 66.1 allocs/op, 0.0 logallocs/op, 14.3 tasks/op, 39474 insns/op, 18235 cycles/op, 0 errors) throughput: mean= 63672.38 standard-deviation=13195.12 median= 59755.04 median-absolute-deviation=8709.16 maximum=82149.45 minimum=49640.04 instructions_per_op: mean= 39448.38 standard-deviation=31.60 median= 39466.17 median-absolute-deviation=25.75 maximum=39474.12 minimum=39411.42 cpu_cycles_per_op: mean= 19267.01 standard-deviation=2217.03 median= 18234.80 median-absolute-deviation=1384.25 maximum=22834.26 minimum=17408.67 ``` `perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency and PGO disabled: Before (write path): ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 63736.96 tps ( 59.4 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 49667 insns/op, 19924 cycles/op, 0 errors) 64109.41 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 49992 insns/op, 20084 cycles/op, 0 errors) 56950.47 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50005 insns/op, 20501 cycles/op, 0 errors) 44858.42 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50014 insns/op, 21947 cycles/op, 0 errors) 28592.87 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50027 insns/op, 27659 cycles/op, 0 errors) throughput: mean= 51649.63 standard-deviation=15059.74 median= 56950.47 median-absolute-deviation=12087.33 maximum=64109.41 minimum=28592.87 instructions_per_op: mean= 49941.18 standard-deviation=153.76 median= 50005.24 median-absolute-deviation=73.01 maximum=50027.07 minimum=49667.05 cpu_cycles_per_op: mean= 22023.01 standard-deviation=3249.92 median= 20500.74 median-absolute-deviation=1938.76 maximum=27658.75 minimum=19924.32 ``` After (write path) ``` enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no} Disabling auto compaction 53395.93 tps ( 59.4 allocs/op, 16.5 logallocs/op, 14.3 tasks/op, 50326 insns/op, 21252 cycles/op, 0 errors) 46527.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50704 insns/op, 21555 cycles/op, 0 errors) 55846.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50731 insns/op, 21060 cycles/op, 0 errors) 55669.30 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50735 insns/op, 21521 cycles/op, 0 errors) 52130.17 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 50757 insns/op, 21334 cycles/op, 0 errors) throughput: mean= 52713.91 standard-deviation=3795.38 median= 53395.93 median-absolute-deviation=2955.40 maximum=55846.30 minimum=46527.83 instructions_per_op: mean= 50650.57 standard-deviation=182.46 median= 50731.38 median-absolute-deviation=84.09 maximum=50756.62 minimum=50325.87 cpu_cycles_per_op: mean= 21344.42 standard-deviation=202.86 median= 21334.00 median-absolute-deviation=176.37 maximum=21554.61 minimum=21060.24 ``` Fixes #24815 Improvement for rare corner cases. No backport required Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#24919	2025-07-13 19:13:11 +03:00
Marcin Maliszkiewicz	15b4db47c7	storage_service: always wake up load balancer on update tablet metadata Lack of wakeup is error-prone, as it relies on a wakeup occurring elsewhere.	2025-07-10 10:46:55 +02:00

1 2 3 4 5 ...

495 Commits