Commit Graph

450 Commits

Author SHA1 Message Date
Raphael S. Carvalho
2c4a9ba70c treewide: Rename table_state to compaction_group_view
Since table_state is a view to a compaction group, it makes sense
to rename it as so.

With upcoming incremental repair, each replica::compaction_group
will be actually two compaction groups, so there will be two
views for each replica::compaction_group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-08-08 06:51:28 +03:00
Benny Halevy
dee0d7ffbf locator: tablets: get rid of synchronous mutate_tablet_map
It is currently used only by tests that could very well
do with mutate_tablet_map_async.

This will simplify the following patch to prevent
accidental copy of the tablet_map, provding explicit
clone/clone_gently methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-07-22 15:03:02 +03:00
Avi Kivity
6fce817aa8 Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz
This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft.

Pulling database::apply() out of schema merging code will allow to batch changes to subsystems. Future generic code will first call prepare() on all implementations, then single database::apply() and then update() on all implementations, then on each shard it will call commit() for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then post_commit().

Backport: no, it's a new feature

Fixes: https://github.com/scylladb/scylladb/issues/19649
Fixes https://github.com/scylladb/scylladb/issues/24531

Closes scylladb/scylladb#24886

[avi: adjust for std::vector<mutations> -> utils::chunked_vector<mutations>]

* github.com:scylladb/scylladb:
  test: add type creation to test_snapshot
  storage_service: always wake up load balancer on update tablet metadata
  db: schema_applier: call destroy also when exception occurs
  db: replica: simplify seeding ERM during shema change
  db: remove cleanup from add_column_family
  db: abort on exception during schema commit phase
  db: make user defined types changes atomic
  replica: db: make keyspace schema changes atomic
  db: atomically apply changes to tables and views
  replica: make truncate_table_on_all_shards get whole schema from table_shards
  service: split update_tablet_metadata into two phases
  service: pull out update_tablet_metadata from migration_listener
  db: service: add store_service dependency to schema_applier
  service: simplify load_tablet_metadata and update_tablet_metadata
  db: don't perform move on tablet_hint reference
  replica: split add_column_family_and_make_directory into steps
  replica: db: split drop_table into steps
  db: don't move map references in merge_tables_and_views()
  db: introduce commit_on_shard function
  db: access types during schema merge via special storage
  replica: make non-preemptive keyspace create/update/delete functions public
  replica: split update keyspace into two phases
  replica: split creating keyspace into two functions
  db: rename create_keyspace_from_schema_partition
  db: decouple functions and aggregates schema change notification from merging code
  db: store functions and aggregates change batch in schema_applier
  db: decouple tables and views schema change notifications from merging code
  db: store tables and views schema diff in schema_applier
  db: decouple user type schema change notifications from types merging code
  service: unify keyspace notification functions arguments
  db: replica: decouple keyspace schema change notifications to a separate function
  db: add class encapsulating schema merging
2025-07-13 20:47:55 +03:00
Benny Halevy
3feb759943 everywhere: use utils::chunked_vector for list of mutations
Currently, we use std::vector<*mutation> to keep
a list of mutations for processing.
This can lead to large allocation, e.g. when the vector
size is a function of the number of tables.

Use a chunked vector instead to prevent oversized allocations.

`perf-simple-query --smp 1` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (read path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...

89055.97 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   18003 cycles/op,        0 errors)
103372.72 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39380 insns/op,   17300 cycles/op,        0 errors)
98942.27 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39413 insns/op,   17336 cycles/op,        0 errors)
103752.93 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39407 insns/op,   17252 cycles/op,        0 errors)
102516.77 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39403 insns/op,   17288 cycles/op,        0 errors)
throughput:
	mean=   99528.13 standard-deviation=6155.71
	median= 102516.77 median-absolute-deviation=3844.59
	maximum=103752.93 minimum=89055.97
instructions_per_op:
	mean=   39403.99 standard-deviation=14.25
	median= 39406.75 median-absolute-deviation=9.30
	maximum=39416.63 minimum=39380.39
cpu_cycles_per_op:
	mean=   17435.81 standard-deviation=318.24
	median= 17300.40 median-absolute-deviation=147.59
	maximum=18002.53 minimum=17251.75
```

After (read path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
59755.04 tps ( 66.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39466 insns/op,   22834 cycles/op,        0 errors)
71854.16 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39417 insns/op,   17883 cycles/op,        0 errors)
82149.45 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   39411 insns/op,   17409 cycles/op,        0 errors)
49640.04 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   19975 cycles/op,        0 errors)
54963.22 tps ( 66.1 allocs/op,   0.0 logallocs/op,  14.3 tasks/op,   39474 insns/op,   18235 cycles/op,        0 errors)
throughput:
	mean=   63672.38 standard-deviation=13195.12
	median= 59755.04 median-absolute-deviation=8709.16
	maximum=82149.45 minimum=49640.04
instructions_per_op:
	mean=   39448.38 standard-deviation=31.60
	median= 39466.17 median-absolute-deviation=25.75
	maximum=39474.12 minimum=39411.42
cpu_cycles_per_op:
	mean=   19267.01 standard-deviation=2217.03
	median= 18234.80 median-absolute-deviation=1384.25
	maximum=22834.26 minimum=17408.67
```

`perf-simple-query --smp 1 --write` results obtained for fixed 400MHz frequency
and PGO disabled:

Before (write path):
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
63736.96 tps ( 59.4 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   49667 insns/op,   19924 cycles/op,        0 errors)
64109.41 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   49992 insns/op,   20084 cycles/op,        0 errors)
56950.47 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50005 insns/op,   20501 cycles/op,        0 errors)
44858.42 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50014 insns/op,   21947 cycles/op,        0 errors)
28592.87 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50027 insns/op,   27659 cycles/op,        0 errors)
throughput:
	mean=   51649.63 standard-deviation=15059.74
	median= 56950.47 median-absolute-deviation=12087.33
	maximum=64109.41 minimum=28592.87
instructions_per_op:
	mean=   49941.18 standard-deviation=153.76
	median= 50005.24 median-absolute-deviation=73.01
	maximum=50027.07 minimum=49667.05
cpu_cycles_per_op:
	mean=   22023.01 standard-deviation=3249.92
	median= 20500.74 median-absolute-deviation=1938.76
	maximum=27658.75 minimum=19924.32
```

After (write path)
```
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, query_single_key=no, counters=no}
Disabling auto compaction
53395.93 tps ( 59.4 allocs/op,  16.5 logallocs/op,  14.3 tasks/op,   50326 insns/op,   21252 cycles/op,        0 errors)
46527.83 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50704 insns/op,   21555 cycles/op,        0 errors)
55846.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50731 insns/op,   21060 cycles/op,        0 errors)
55669.30 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50735 insns/op,   21521 cycles/op,        0 errors)
52130.17 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   50757 insns/op,   21334 cycles/op,        0 errors)
throughput:
	mean=   52713.91 standard-deviation=3795.38
	median= 53395.93 median-absolute-deviation=2955.40
	maximum=55846.30 minimum=46527.83
instructions_per_op:
	mean=   50650.57 standard-deviation=182.46
	median= 50731.38 median-absolute-deviation=84.09
	maximum=50756.62 minimum=50325.87
cpu_cycles_per_op:
	mean=   21344.42 standard-deviation=202.86
	median= 21334.00 median-absolute-deviation=176.37
	maximum=21554.61 minimum=21060.24
```

Fixes #24815

Improvement for rare corner cases. No backport required

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#24919
2025-07-13 19:13:11 +03:00
Marcin Maliszkiewicz
15b4db47c7 storage_service: always wake up load balancer on update tablet metadata
Lack of wakeup is error-prone, as it relies on a wakeup occurring
elsewhere.
2025-07-10 10:46:55 +02:00
Marcin Maliszkiewicz
847d7f4a3a service: simplify load_tablet_metadata and update_tablet_metadata
- remove load_tablet_metadata(), instead we add wake_up_load_balancer flag
to update_tablet_metadata(), it reduces number of public functions and
also serves as a comment (removed comment with very similar meaning)

- reimplement the code to not use mutate_token_metadata(), this way
it's more readable and it's also needed as we'll split
update_tablet_metadata() in following commits so that we can have
subroutine which doesn't yield (for ensuring atomicity)
2025-07-10 10:40:43 +02:00
Michael Litvak
ddf02c9489 tablets: replace all_tables method
The method all_tables in tablet_metadata is used for iterating over all
tables in the tablet metadata with their tablet maps.

Now that we have co-located tables we need to make the distinction on
which tables we want to iterate over. In some cases we want to iterate
over each group of co-located tables, treating them as one unit, and in
other cases we want to iterate over all tables, doesn't matter if they
are part of a co-located group and have a base table.

We replace all_tables with new methods that can be used for each of the
cases.
2025-07-01 13:20:18 +03:00
Avi Kivity
cd79a8fc25 Revert "Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz"
This reverts commit 0b516da95b, reversing
changes made to 30199552ac. It breaks
cluster.random_failures.test_random_failures.test_random_failures
in debug mode (at least).

Fixes #24513
2025-06-16 22:38:12 +03:00
Tomasz Grabiec
0b516da95b Merge 'Atomic in-memory schema changes application' from Marcin Maliszkiewicz
This change is preparing ground for state update unification for raft bound subsystems. It introduces schema_applier which in the future will become generic interface for applying mutations in raft.

Pulling `database::apply()` out of schema merging code will allow to batch changes to subsystems. Future generic code will first call `prepare()` on all implementations, then single  `database::apply()` and then `update()` on all implementations, then on each shard it will call `commit()` for all implementations, without preemption so that the change is observed as atomic across all subsystems, and then `post_commit()`.

Backport: no, it's a new feature

Fixes: https://github.com/scylladb/scylladb/issues/19649

Closes scylladb/scylladb#20853

* github.com:scylladb/scylladb:
  storage_service: always wake up load balancer on update tablet metadata
  db: schema_applier: call destroy also when exception occurs
  db: replica: simplify seeding ERM during shema change
  db: remove cleanup from add_column_family
  db: abort on exception during schema commit phase
  db: make user defined types changes atomic
  replica: db: make keyspace schema changes atomic
  db: atomically apply changes to tables and views
  replica: make truncate_table_on_all_shards get whole schema from table_shards
  service: split update_tablet_metadata into two phases
  service: pull out update_tablet_metadata from migration_listener
  db: service: add store_service dependency to schema_applier
  service: simplify load_tablet_metadata and update_tablet_metadata
  db: don't perform move on tablet_hint reference
  replica: split add_column_family_and_make_directory into steps
  replica: db: split drop_table into steps
  db: don't move map references in merge_tables_and_views()
  db: introduce commit_on_shard function
  db: access types during schema merge via special storage
  replica: make non-preemptive keyspace create/update/delete functions public
  replica: split update keyspace into two phases
  replica: split creating keyspace into two functions
  db: rename create_keyspace_from_schema_partition
  db: decouple functions and aggregates schema change notification from merging code
  db: store functions and aggregates change batch in schema_applier
  db: decouple tables and views schema change notifications from merging code
  db: store tables and views schema diff in schema_applier
  db: decouple user type schema change notifications from types merging code
  service: unify keyspace notification functions arguments
  db: replica: decouple keyspace schema change notifications to a separate function
  db: add class encapsulating schema merging
2025-06-10 13:45:32 +02:00
Ernest Zaslavsky
30199552ac s3_client: Mitigate connection exhaustion in download_source
The existing `download_source` implementation optimizes performance
by keeping the connection to S3 open and draining data directly from
the socket. While this eliminates the overhead (60-100ms) of repeatedly
establishing new connections, it leads to rapid exhaustion of client-
side connections.

On a single shard, two `mx_readers` for load and stream are enough to
trigger this issue. Since each client typically holds two connections,
readers keeping index and data sources open can cause deadlocks where
processes stall due to unavailable connections.

Introduce `chunked_download_source`, a new S3 download method built on
`download_source`, to dynamically manage connections:

- Buffers data in 5MiB chunks using a producer-consumer model
- Closes connections once buffers reach capacity, returning them to
  the pool for other clients
- Uses a filling fiber that resumes fetching once buffers are
  consumed from the queue

Performance remains comparable to `download_source`, achieving
95MiB/s for sequential 1GiB downloads from S3. However, preloading
large chunks may cause read amplification.

Fixes: https://github.com/scylladb/scylladb/issues/23785

Closes scylladb/scylladb#23880
2025-06-10 12:58:24 +03:00
Marcin Maliszkiewicz
2090e44283 storage_service: always wake up load balancer on update tablet metadata
Lack of wakeup is error-prone, as it relies on a wakeup occurring
elsewhere.
2025-06-06 08:50:34 +02:00
Marcin Maliszkiewicz
1c8fd3a65d service: simplify load_tablet_metadata and update_tablet_metadata
- remove load_tablet_metadata(), instead we add wake_up_load_balancer flag
to update_tablet_metadata(), it reduces number of public functions and
also serves as a comment (removed comment with very similar meaning)

- reimplement the code to not use mutate_token_metadata(), this way
it's more readable and it's also needed as we'll split
update_tablet_metadata() in following commits so that we can have
subroutine which doesn't yield (for ensuring atomicity)
2025-06-06 08:50:33 +02:00
Avi Kivity
5e764d1de2 Merge 'Drop v2 and flat from reader and related names' from Botond Dénes
Following a number of similar code cleanup PR, this one aims to be the last one, definitely dropping flat from all reader and related names.
Similarly, v2 is also dropped from reader names, although it still persists in mutation_fragment_v2, mutation_v2 and related names. This won't change in the foreseeable future, as we don't have plans to drop mutation (the v1 variant).
The changes in this PR are entirely mechanical, mostly just search-and-replace.

Code cleanup, no backport required.

Closes scylladb/scylladb#24087

* github.com:scylladb/scylladb:
  test/boost/mutation_reader_another_test: drop v2 from reader and related names
  test/boost/mutation_reader: s/puppet_reader_v2/puppet_reader/
  test/boost/sstable_datafile_test: s/sstable_reader_v2/sstable_mutation_reader/
  test/boost/mutation_test: s/consumer_v2/consumer/
  test/lib/mutation_reader_assertions: s/flat_reader_assertions_v2/mutation_reader_assertions/
  readers/mutation_readers: s/generating_reader_v2/generating_reader/
  readers/mutation_readers: s/delegating_reader_v2/delegating_reader/
  readers/mutation_readers: s/empty_flat_reader_v2/empty_mutation_reader/
  readers/mutation_source: s/make_reader_v2/make_mutation_reader/
  readers/mutation_source: s/flat_reader_v2_factory_type/mutation_reader_factory/
  readers/mutation_reader: s/reader_consumer_v2/mutation_reader_consumer/
  mutation/mutation_compactor: drop v2 from compactor and related names
  replica/table: s/make_reader_v2/make_mutation_reader/
  mutation_writer: s/bucket_writer_v2/bucket_writer/
  readers/queue: drop v2 from reader and related names
  readers/multishard: drop v2 from reader and related names
  readers/evictable: drop v2 from reader and related names
  readers/multi_range: remove flat from name
2025-05-11 22:22:35 +03:00
Botond Dénes
b5170e27d0 replica/table: s/make_reader_v2/make_mutation_reader/ 2025-05-09 07:53:29 -04:00
Michał Chojnowski
8649adafa8 test: switch uses of make_sstable_compressor_factory() to a seastar::thread-dependent version
In next patches, make_sstable_compressor_factory() will have to
disappear.
In preparation for that, we switch to a seastar::thread-dependent
replacement.
2025-05-07 14:43:04 +02:00
Raphael S. Carvalho
c77f710a0c sstables: Fix quadratic space complexity in partitioned_sstable_set
Interval map is very susceptible to quadratic space behavior when
it's flooded with many entries overlapping all (or most of)
intervals, since each such entry will have presence on all
intervals it overlaps with.

A trigger we observed was memtable flush storm, which creates many
small "L0" sstables that spans roughly the entire token range.

Since we cannot rely on insertion order, solution will be about
storing sstables with such wide ranges in a vector (unleveled).

There should be no consequence for single-key reads, since upper
layer applies an additional filtering based on token of key being
queried.
And for range scans, there can be an increase in memory usage,
but not significant because the sstables span an wide range and
would have been selected in the combined reader if the range of
scan overlaps with them.

Anyway, this is a protection against storm of memtable flushes
and shouldn't be the common scenario.

It works both with tablets and vnodes, by adjusting the token
range spanned by compaction group accordingly.

Fixes #23634.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Avi Kivity
0206da5232 Merge 'readers: strip "flat" and "v2" from names' from Botond Dénes
Continue the effort of normalizing reader names, stripping legacy qualifying terms like "flat" and "v2".
Flat and v2 readers are the default now, we only need to add qualifying terms to readers which are different than the normal.
One such reader remains: `make_generating_reader_v1()`.

This PR contains mostly mechanical changes, done with a sed script. Commits which only contain such mechanical renames are marked as such in the commitlog.

Code cleanup, no backport needed.

Closes scylladb/scylladb#23767

* github.com:scylladb/scylladb:
  readers: mv reversing_v2.hh reversing.hh
  readers: mv generating_v2.hh generating.hh
  tree: s/make_generating_reader_v2/make_generating_reader/
  readers: mv from_mutations_v2.hh from_mutations.hh
  tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s
  readers: mv from_fragments_v2.hh from_fragments.hh
  readers: mv forwardable_v2.hh forwardable.hh
  readers: mv empty_v2.hh empty.hh
  tree: s/make_empty_flat_reader_v2/make_empty_mutation_reader/
  readers/empty_v2.hh: replace forward declarations with include of fwd header
  readers/mutation_reader_fwd.hh: forward declare reader_permit
  readers: mv delegating_v2.hh delegating.hh
  readers/delegating_v2.hh: move reader definition to _impl.hh file
2025-04-16 20:21:51 +03:00
Pavel Emelyanov
8b2cababb6 generic_server: Don't mess with db::config
The db::config is top-level configuration of scylla, we generally try to
avoid using it even in scylla components: each uses its own config
initialized by the service creator out of the db::config itself. The
generic_server is not an exception, all the more so, it already has its
own config.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23705
2025-04-16 17:02:30 +03:00
Botond Dénes
c29c696780 readers: mv from_mutations_v2.hh from_mutations.hh
Completely mechanical change.
2025-04-16 04:46:08 -04:00
Botond Dénes
b104862702 tree: s/make_mutation_reader_from_mutations_v2/make_mutation_reader_from_mutations/s
Completely mechanical change.
2025-04-16 04:46:07 -04:00
Botond Dénes
a9d75c4f9d readers: mv empty_v2.hh empty.hh
Completely mechanical change.
2025-04-16 04:32:56 -04:00
Botond Dénes
05829f98f3 tree: s/make_empty_flat_reader_v2/make_empty_mutation_reader/
Completely mechanical change.
2025-04-16 04:32:56 -04:00
Avi Kivity
ed3e4f33fd Merge 'generic_server: throttle and shed incoming connections according to semaphore limit' from Marcin Maliszkiewicz
Adds new live updatable config: uninitialized_connections_semaphore_cpu_concurrency.

It should help to reduce cpu usage by limiting cpu concurrency for new connections.  As a last resort when those connections are waiting for initial processing too long (over 1m) they are shed.

New connections_shed and connections_blocked metrics are added for tracking.

Testing:
 - manually via simple program creating high number of connection and constantly re-connecting
 - added benchmark

Following are benchmark results:

Before:
```
> build/release/test/perf/perf_generic_server --smp=1
170101.41 tps ( 13.1 allocs/op,   0.0 logallocs/op,   7.0 tasks/op,    4695 insns/op,    3178 cycles/op,        0 errors)
[...]
throughput: mean=173850.06 standard-deviation=1844.48 median=174509.66 median-absolute-deviation=874.23 maximum=175087.49 minimum=170588.54
instructions_per_op: mean=4725.59 standard-deviation=13.35 median=4729.38 median-absolute-deviation=12.49 maximum=4738.61 minimum=4709.96
  cpu_cycles_per_op: mean=3135.08 standard-deviation=32.13 median=3122.68 median-absolute-deviation=22.29 maximum=3179.38 minimum=3103.15
```

After:
```
> build/release/test/perf/perf_generic_server --smp=1
167373.19 tps ( 13.1 allocs/op,   0.0 logallocs/op,   7.0 tasks/op,    4821 insns/op,    3371 cycles/op,        0 errors)
[...]
throughput:
  mean=   171199.55 standard-deviation=2484.58
  median= 171667.06 median-absolute-deviation=2087.63
  maximum=173689.11 minimum=167904.76
instructions_per_op:
  mean=   4801.90 standard-deviation=16.54
  median= 4796.78 median-absolute-deviation=9.32
  maximum=4830.71 minimum=4789.81
cpu_cycles_per_op:
  mean=   3245.26 standard-deviation=32.28
  median= 3230.44 median-absolute-deviation=16.52
  maximum=3297.39 minimum=3215.62
```

The patch adds around 67 insns/op so it's effect on performance should be negligible.

Fixes: https://github.com/scylladb/scylladb/issues/22844

Closes scylladb/scylladb#22828

* github.com:scylladb/scylladb:
  transport: move on_connection_close into connection destructor
  test: perf: make aggregated_perf_results formatting more human readable
  transport: add blocked and shed connection metrics
  generic_server: throttle and shed incoming connections according to semaphore limit
  generic_server: add data source and sink wrappers bookkeeping network IO
  generic_server: coroutinize part of server::do_accepts
  test: add benchmark for generic_server
  test: perf: add option to count multiple ops per time_parallel iteration
  generic_server: add semaphore for limiting new connections concurrency
  generic_server: add config to the constructor
  generic_server: add on_connection_ready handler
2025-04-09 21:41:38 +03:00
Marcin Maliszkiewicz
619944555f test: perf: make aggregated_perf_results formatting more human readable
Before:
throughput: mean=170728.58 standard-deviation=1921.76 median=171084.16 median-absolute-deviation=1501.58 maximum=172913.36 minimum=167288.97
instructions_per_op: mean=4685.89 standard-deviation=12.46 median=4683.92 median-absolute-deviation=9.68 maximum=4706.53 minimum=4666.70
cpu_cycles_per_op: mean=3090.94 standard-deviation=52.69 median=3103.43 median-absolute-deviation=24.55 maximum=3192.99 minimum=3003.00

After:
throughput:
	mean=   168224.81 standard-deviation=854.48
	median= 168829.02 median-absolute-deviation=604.21
	maximum=168829.02 minimum=167620.60
instructions_per_op:
	mean=   4837.02 standard-deviation=20.89
	median= 4851.79 median-absolute-deviation=14.77
	maximum=4851.79 minimum=4822.24
cpu_cycles_per_op:
	mean=   3271.42 standard-deviation=46.29
	median= 3304.16 median-absolute-deviation=32.73
	maximum=3304.16 minimum=3238.69
2025-04-09 10:49:20 +02:00
Marcin Maliszkiewicz
719d04d501 test: add benchmark for generic_server
Changes in configure.py are needed becuase we don't want to embed
this benchmark in scylla binary as perf_simple_query or perf_alternator,
it doesn't directly translate to Scylla performance but we want to use
aggregated_perf_results for precise cpu measurements so we need
different dependecies.
2025-04-09 10:48:42 +02:00
Marcin Maliszkiewicz
b957cedace test: perf: add option to count multiple ops per time_parallel iteration 2025-04-09 10:30:58 +02:00
Marcin Maliszkiewicz
b94acfb37b test: remove alternator code from perf-simple-query
This kind of benchmark was superseded by perf-alternator
which has more options, workflows and most importantly
measures overhead of http server layer (including json parsing).

There is no need to maintain additional code in perf-simple-query.

Closes scylladb/scylladb#23474
2025-04-06 18:15:16 +03:00
Botond Dénes
fcdae20fd1 Merge 'Add tablet enforcing option' from Benny Halevy
This series add a new config option: `tablets_mode_for_new_keyspaces` that replaces the existing
`enable_tablets` option. It can be set to the following values:
    disabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option
    enabled:  New keyspaces use tablets by default, unless disabled by the tablets={'disabled':true} option
    enforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option

`tablets_mode_for_new_keyspaces=disabled` or `tablets_mode_for_new_keyspaces=enabled` control whether
tablets are disabled or enabled by default for new keyspaces, respectively.
In either cases, tablets can be opted-in or out using the `tablets={'enabled':...}`
keyspace option, when the keyspace is created.

`tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces,
like `tablets_mode_for_new_keyspaces=enabled`.
However, it does not allow to opt-out when creating
new keyspaces by setting `tablets = {'enabled': false}`

Refs scylladb/scylla-enterprise#4355

* Requires backport to 2025.1

Closes scylladb/scylladb#22273

* github.com:scylladb/scylladb:
  boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy
  tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option
  db/config: add tablets_mode_for_new_keyspaces option
2025-04-03 16:32:19 +03:00
Botond Dénes
a0d8102a1f replica/memtable: s/make_flat_reader/make_mutation_reader/
Following the recent refactoring of removing "flat" and "v2" from reader
names, replacing all the fully qualified names with simply "mutation_reader".

Closes scylladb/scylladb#23346
2025-04-01 17:58:13 +03:00
Benny Halevy
c62865df90 db/config: add tablets_mode_for_new_keyspaces option
The new option deprecates the existing `enable_tablets` option.
It will be extended in the next patch with a 3rd value: "enforced"
while will enable tablets by default for new keyspace but
without the posibility to opt out using the `tablets = {'enabled':
false}` keyspace schema option.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-24 14:54:45 +02:00
Pavel Emelyanov
ca3b604afa test: Extend s3-perf test with stream download one
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:07 +03:00
Pavel Emelyanov
283e8e0706 test/perf: Tune-up s3 test options parsing
Rename the `--upload bool` into `--operation string` one, so that new
tests can be added in the future. Also rename run_download() to
run_contiguous_get() because this is what the internals of this method
do -- just GET contiguous ranges sequentially.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:07 +03:00
Kefu Chai
c37149d106 test: stop using seastar::at_exit()
seastar::at_exit() was marked deprecated recently. so let's use
the recommended approach to perform cleanups.

following tests were updated in this changes

- scylla perf-tablets: tested with
  scylla perf-tablets
- scylla perf-row-cache-update: tested with
  scylla perf-row-cache-update
- scylla perf-fast-forward: tested with
  scylla perf-fast-forward --populate --run-tests small-partition-skips \
    --smp 1
  scylla perf-fast-forward --run-tests small-partition-skips \
    --smp 1
- scylla perf-load-balancing: tested with
  scylla perf-load-balancing --nodes 3 --tablets1 16 --tablets2 16 --rf1 3 --rf2 3 --shards 16
- unit/row_cache_stress_test: tested with
  row_cache_stress_test --seconds 10
- perf/perf_cache_eviction: tested with
  ./perf_cache_eviction --seconds 1 --smp 1
- perf/perf_row_cache_reads: tested with
  ./perf_row_cache_reads

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23356
2025-03-20 17:44:57 +03:00
Botond Dénes
afa305ffb4 Merge 'perf/perf_sstable: stop using at_exit() ' from Kefu Chai
`seastar::at_exit()` was marked deprecated recently. so let's use the recommended approach to perform cleanups.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#23253

* github.com:scylladb/scylladb:
  perf/perf_sstable: fix the indent
  perf/perf_sstable: stop using at_exit()
2025-03-17 15:30:10 +02:00
Kefu Chai
a82cfbecad test: perf_sstable: close frag_stream before destoying it
the underlying reader should be closed before being destroyed. otherwise
we'd have following failure when testing the "full_scan_streaming":

```
$ scylla perf-sstable --parallelism 1 --iterations 20 --partitions 20 --testdir /tmp/sstable --mode full_scan_streaming

ERROR 2025-03-13 15:04:26,321 [shard  0:main] mutation_reader - N8sstables2mx27mx_sstable_full_scan_readerE [0x60015a36b650]: permit *.*:test: was not closed before destruction, at: 0x235931e 0x2359470 0x239deb3 0x62a1ed3 0x89fd156 0x89c3fba 0x22a6ed3 0x22a8fea 0x22aae17 0x22a9928 0x26bb7d0 0x26bbe3e 0x89bca67 0x246bd8d /lib64/libc.so.6+0x3247 /lib64/libc.so.6+0x330a 0x1657774
------
   seastar::internal::coroutine_traits_base<double>::promise_type
```

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23270
2025-03-14 11:12:44 +03:00
Pavel Emelyanov
f50bcbf4d0 test/perf/s3: Don't forget to stop sharded<tester> on error
In case invoke_on_all(tester::start) throws, the sharded<tester>
instance remains non-stopped and calltrace is reported on test stop. Not
nice, fix it so that sharded<> thing is stopped in any case.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23244
2025-03-13 09:54:09 +02:00
Kefu Chai
68fc067106 perf/perf_sstable: fix the indent
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-03-12 19:00:50 +08:00
Kefu Chai
4f62f79622 perf/perf_sstable: stop using at_exit()
seastar::at_exit() was marked deprecated recently. so let's use
the recommended approach to perform cleanups.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-03-12 19:00:50 +08:00
Tomasz Grabiec
dfc9101dfd test: perf_load_balancing: Set node capacity
Otherwise, load balancer will not make any plan once it becomes
capacity-aware.
2025-03-06 13:35:37 +01:00
Tomasz Grabiec
6169401dbc test: perf_load_balancing: Convert to topology_builder
The test no longer worked becuase load balancer requires proper schema
in the database now. Convert to topology_builder which builds topology
in the database and create schema in the database (which needs proper
topology).
2025-03-06 13:35:37 +01:00
Kefu Chai
6e4cb20a69 tree: implement boost::accumulate with std::ranges library
Replace boost::accumulate() calls with std::ranges facilities. This
change reduces external dependencies and modernizes the codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23062
2025-02-26 23:22:02 +02:00
Tomasz Grabiec
94b5165ac7 tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table
This makes decisions made by the scheduler consistent with decisions
made on table creation, with regard to tablet count.

We want to avoid over-allocation of tablets when table is created,
which would then be reduced by the scheduler's scaling logic. Not just
to avoid wasteful migrations post table creation, but to respect the
per-shard goal. To respect the per-shard goal, the algorithm will no
longer be as simple as looking at hints, and we want to share the
algorithm between the scheduler and initial tablet allocator. So
invoke the scheduler to get the tablet count when table is created.
2025-02-19 14:40:07 +01:00
Botond Dénes
57a06a4c35 Merge 'Enhance s3 client perf test with "uploading" facility and related tunables' from Pavel Emelyanov
The existing test measures latencies of object GET-s. That's nice (though incomplete), but we want to measure upload performance. Here it is.

refs: #22460

Closes scylladb/scylladb#22480

* github.com:scylladb/scylladb:
  test/perf/s3: Add --part-size-mb option for upload test
  test/perf/s3: Add uploading test
  test/perf/s3: Some renames not to be download-centric
  test/perf/s3: Make object/file name configurable
  test/perf/s3: Configure maximum number of sockets
  test/perf/s3: Remove parallelizm
  s3/client: Make http client connections limit configurable
2025-02-17 09:46:11 +02:00
Kefu Chai
7ff0d7ba98 tree: Remove unused boost headers
This commit eliminates unused boost header includes from the tree.

Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22857
2025-02-15 20:32:22 +02:00
Pavel Emelyanov
8f61d26007 test/perf/s3: Add --part-size-mb option for upload test
Test now uses default internal part size, but for performance
comparisons its good to make it configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:26 +03:00
Pavel Emelyanov
6211b39f4b test/perf/s3: Add uploading test
The test picks up a file and uploads it into the bucket, then prints the
time it took and uploading speed. For now it's enough, with existing S3
latencies more timing details can be obtained by turning on trace
logging on s3 logger.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:26 +03:00
Pavel Emelyanov
0919a70ac8 test/perf/s3: Some renames not to be download-centric
Now this test is all about reading objects. Rename some bits in it so
that they can be re-used by future uploading test as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:26 +03:00
Pavel Emelyanov
24c194dcf3 test/perf/s3: Make object/file name configurable
Now the download test first creates a temporary object and then reads
data from it. It's good to have an option to download pre-existing file.
This option will also be used for uploading test (next patches)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00
Pavel Emelyanov
6b27642a79 test/perf/s3: Configure maximum number of sockets
Add the --sockets NR option that limits the number of sockets the
underlying http client is configured to have.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00
Pavel Emelyanov
230d4d7c5e test/perf/s3: Remove parallelizm
The test spawns several fibers that read the same file in parallel.
There's not much point in it, just makes the code harder to maintain.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00