Commit Graph

349 Commits

Author SHA1 Message Date
Avi Kivity
318278ff92 Merge 'tablets: reload only changed metadata' from Botond Dénes
Currently, each change to tablet metadata triggers a full metadata reload from disk. This is very wasteful, especially if the metadata change affects only a single row in the `system.tablets` table. This is the case when the tablet load balancer triggers a migration, this will affect a single row in the table, but today will trigger a full reload.
We expect tablet count to potentially grow to thousands and beyond and the overhead of this full reload can become significant.
This PR makes tablet metadata reload partial, instead of reloading all metadata on topology or schema changes, reload only the partitions that are affected by the change. Copy the rest from the in-memory state.
This is done with two passes: first the change mutations are scanned and a hint is produced. This hint is then passed down to the reload code, which will use it to only reload parts (rows/partitions) of the metadata that has actually changed.

The performance difference between full reload and partial reload is quite drastic:
```
INFO  2024-07-25 05:06:27,347 [shard 0:stat] testlog - Tablet metadata reload:
full      616.39ms
partial     0.18ms
```
This was measured with the modified (by this PR) `perf_tablets`, which creates 100 tables, each with 2K tablets. The test was modified to change a single tablet, then do a full and partial reload respectively, measuring the time it takes for reach.

Fixes: #15294

New feature, no backport needed.

Closes scylladb/scylladb#15541

* github.com:scylladb/scylladb:
  test/perf/perf_tablets: add tablet metadata reload perf measurement
  test/boost/tablets_test: add test for partial tablet metadata updates
  db/schema_tables: pass tablet hint to update_tablet_metadata()
  service/storage_service: load_tablet_metadata(): add hint parameter
  service/migration_listener: update_tablet_metadata(): add hint parameter
  service/raft/group0_state_machine: provide tablet change hint on topology change
  service/storage_service: topology_state_load(): allow providing change hint
  replica/tablets: add update_tablet_metadata()
  replica/tablets: fix indentation
  replica/tablets: extract tablet_metadata builder logic
  replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()
  locator/tablets: add tablet_map::clear_tablet_transition_info()
  locator/tablets: make tablet_metadata cheap to copy
  mutation/canonical_mutation: add key()
2024-08-11 21:27:18 +03:00
Botond Dénes
2b2db510b7 test/perf/perf_tablets: add tablet metadata reload perf measurement
Measure reload perf of full reload vs. partial reload, after changing a
single tablet.
While at it, modify the `--tablets-per-table` parameter, so that it has
a default parameter which works OOTB. The previous default was both too
large (causing oversized commitlog entry errors) and not a power of two.
2024-08-11 09:53:19 -04:00
Botond Dénes
0254cfc7d3 locator/tablets: make tablet_metadata cheap to copy
Keep lw_shared_ptr<tablet_map> in the tablet map and use COW semantics.
To prevent accidental changes to shared tablet_map instances, all
modifications to a tablet_map have to go through a new
`mutate_tablet_map()` method, which implements the copy-modify-swap
idiom.
2024-08-11 09:52:37 -04:00
Botond Dénes
1f4b9a5300 Merge 'compaction: drop compaction executors' possibility to bypass task manager' from Aleksandra Martyniuk
If parent_info argument of compaction_manager::perform_compaction
is std::nullopt, then created compaction executor isn't tracked by task
manager. Currently, all compaction operations should by visible in task
manager.

Modify split methods to keep split executor in task manager. Get rid of
the option to bypass task manager.

Closes scylladb/scylladb#19995

* github.com:scylladb/scylladb:
  compaction: replace optional<task_info> with task_info param
  compaction: keep split executor in task manager
2024-08-11 10:26:43 +03:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Aleksandra Martyniuk
c456a43173 compaction: replace optional<task_info> with task_info param
compaction_manager::perform_compaction does not create task manager
task for compaction if parent_info is set to std::nullopt. Currently,
we always want to create task manager task for compaction.

Remove optional from task info parameters which start compaction.
Track all compactions with task manager.
2024-08-02 14:38:46 +02:00
Tomasz Grabiec
28de5231f4 test: perf: tablet_load_balancing: Test with higher shard and tablet counts
We have up to 200 shards in production, so test this to catch
performance issues.
2024-07-31 12:57:15 +02:00
Tomasz Grabiec
56801b7cb7 test: perf: tablet_load_balancing: Report load balancer stats 2024-07-31 12:57:15 +02:00
Tomasz Grabiec
8f3b623144 test: perf: tablet_load_balancing: Report total tablet counts 2024-07-31 11:38:17 +02:00
Tomasz Grabiec
662a0ff038 test: perf: tablet_load_balancing: Print run parameters in the single simulation case too 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
a040404875 test: perf: tablet_load_balancing: Report time it took to schedule migrations 2024-07-31 11:38:16 +02:00
Tomasz Grabiec
71b8d6b7aa test: perf: tablet_load_balancing: Print warnings about bad overcommit 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
0d50a028a5 test: perf: tablet_load_balancing: Allow running a single simulation 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
3f3660c3fe test: perf: tablet_load_balancing: Report best possible shard overcommit 2024-07-31 11:26:11 +02:00
Tomasz Grabiec
c89a320925 test: perf: tablet_load_balancing: Report global shard overcommit
Rather than maximum per-node shard overcommit. Global shard overcommit
is a better metric since we want to equalize global load not just
per-node load.
2024-07-31 11:26:11 +02:00
Botond Dénes
53a6ec05ed Merge 'replica: remove rwlock for protecting iteration over storage group map' from Raphael "Raph" Carvalho
rwlock was added to protect iterations against concurrent updates to the map.

the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup).

the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time).

to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out.

Fixes #18821.

```
WRITE
=====

./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets --write

- BEFORE

65559.52 tps ( 59.6 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   52841 insns/op,   30946 cycles/op,        0 errors)
67408.05 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53018 insns/op,   30874 cycles/op,        0 errors)
67714.72 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53026 insns/op,   30881 cycles/op,        0 errors)
67825.57 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53015 insns/op,   30821 cycles/op,        0 errors)
67810.74 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53009 insns/op,   30828 cycles/op,        0 errors)

         throughput: mean=67263.72 standard-deviation=967.40 median=67714.72 median-absolute-deviation=547.02 maximum=67825.57 minimum=65559.52
instructions_per_op: mean=52981.61 standard-deviation=79.09 median=53014.96 median-absolute-deviation=36.54 maximum=53025.79 minimum=52840.56
  cpu_cycles_per_op: mean=30869.90 standard-deviation=50.23 median=30874.06 median-absolute-deviation=42.11 maximum=30945.94 minimum=30820.89

- AFTER
65448.76 tps ( 59.5 allocs/op,  16.4 logallocs/op,  14.3 tasks/op,   52788 insns/op,   31013 cycles/op,        0 errors)
67290.83 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53025 insns/op,   30950 cycles/op,        0 errors)
67646.81 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53025 insns/op,   30909 cycles/op,        0 errors)
67565.90 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53058 insns/op,   30951 cycles/op,        0 errors)
67537.32 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   52983 insns/op,   30963 cycles/op,        0 errors)

         throughput: mean=67097.93 standard-deviation=931.44 median=67537.32 median-absolute-deviation=467.97 maximum=67646.81 minimum=65448.76
instructions_per_op: mean=52975.85 standard-deviation=108.07 median=53024.55 median-absolute-deviation=49.45 maximum=53057.99 minimum=52788.49
  cpu_cycles_per_op: mean=30957.17 standard-deviation=37.43 median=30951.31 median-absolute-deviation=7.51 maximum=31013.01 minimum=30908.62

READ
=====

./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets

- BEFORE

79423.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41840 insns/op,   26820 cycles/op,        0 errors)
81076.70 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41837 insns/op,   26583 cycles/op,        0 errors)
80927.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41829 insns/op,   26629 cycles/op,        0 errors)
80539.44 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41841 insns/op,   26735 cycles/op,        0 errors)
80793.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41864 insns/op,   26662 cycles/op,        0 errors)

         throughput: mean=80551.99 standard-deviation=661.12 median=80793.10 median-absolute-deviation=375.37 maximum=81076.70 minimum=79423.36
instructions_per_op: mean=41842.20 standard-deviation=13.26 median=41840.14 median-absolute-deviation=5.68 maximum=41864.50 minimum=41829.29
  cpu_cycles_per_op: mean=26685.88 standard-deviation=93.31 median=26662.18 median-absolute-deviation=56.47 maximum=26820.08 minimum=26582.68

- AFTER
79464.70 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41799 insns/op,   26761 cycles/op,        0 errors)
80954.58 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41803 insns/op,   26605 cycles/op,        0 errors)
81160.90 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41811 insns/op,   26555 cycles/op,        0 errors)
81263.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41814 insns/op,   26527 cycles/op,        0 errors)
81162.97 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   41806 insns/op,   26549 cycles/op,        0 errors)

         throughput: mean=80801.25 standard-deviation=755.54 median=81160.90 median-absolute-deviation=361.72 maximum=81263.10 minimum=79464.70
instructions_per_op: mean=41806.47 standard-deviation=5.85 median=41806.05 median-absolute-deviation=4.05 maximum=41813.86 minimum=41799.36
  cpu_cycles_per_op: mean=26599.22 standard-deviation=94.84 median=26554.54 median-absolute-deviation=50.51 maximum=26761.06 minimum=26527.05
```

Closes scylladb/scylladb#19469

* github.com:scylladb/scylladb:
  replica: remove rwlock for protecting iteration over storage group map
  replica: get rid of fragile compaction group intrusive list
2024-07-12 15:45:36 +03:00
Raphael S. Carvalho
ad5c5bca5f replica: get rid of fragile compaction group intrusive list
It was added to make integration of storage groups easier, but it's
complicated since it's another source of truth and we could have
problems if it becomes inconsistent with the group map.

Fixes #18506.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-07-09 16:53:35 -03:00
Patryk Wrobel
a89e3d10af code-cleanup: add missing header guards
The following command had been executed to get the
list of headers that did not contain '#pragma once':
'grep -rnw . -e "#pragma once" --include *.hh -L'

This change adds missing include guard to headers
that did not contain any guard.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#19626
2024-07-09 18:31:35 +03:00
Benny Halevy
3f23016cc0 perf-simple-query: add mean and standard deviation stats
Currently, perf-simple-query summarizes the statistics
only for the throughput, printing the median,
median absolute deviation, minimum, and maximum.
But the throughput put is typically highly variable
and its median is noisy.

This patch calculates also the mean and standard deviation
and does that also for instructions_per_op and cpu_cycles_per_op
to present a fuller picture of the performance metrics.

Output example:
```
random-seed=3383668492
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
95613.97 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42456 insns/op,   22117 cycles/op,        0 errors)
97538.45 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42454 insns/op,   22094 cycles/op,        0 errors)
95883.37 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42438 insns/op,   22268 cycles/op,        0 errors)
96791.45 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42433 insns/op,   22256 cycles/op,        0 errors)
97894.71 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42420 insns/op,   22010 cycles/op,        0 errors)

         throughput: mean=96744.39 standard-deviation=996.89 median=96791.45 median-absolute-deviation=861.02 maximum=97894.71 minimum=95613.97
instructions_per_op: mean=42440.08 standard-deviation=14.99 median=42437.59 median-absolute-deviation=13.58 maximum=42456.15 minimum=42420.10
  cpu_cycles_per_op: mean=22148.98 standard-deviation=110.43 median=22117.04 median-absolute-deviation=106.89 maximum=22267.70 minimum=22010.42
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19450
2024-06-25 12:25:59 +03:00
Kefu Chai
7b10cc8079 treewide: include seastar headers with brackets
this change was created in the same spirit of ebff5f5d.

despite that we include Seastar as a submodule, Seastar is not a
part of scylla project. so we'd better include its headers using
brackets.

ebff5f5d addressed this cosmetic issue a while back. but probably
clangd's header-insertion helped some of contributor to insert
the missing headers with `"`. so this style of `include` returned
to the tree with these new changes.

unfortunately, clangd does not allow us to configure the style
of `include` at the time of writing.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19406
2024-06-21 19:20:27 +03:00
Kefu Chai
987fd59f21 test: correct some misspellings
fix a typo in source code. this typo was identified by codespell.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19412
2024-06-21 19:16:11 +03:00
Avi Kivity
fdc1449392 treewide: rename flat_mutation_reader_v2 to mutation_reader
flat_mutation_reader_v2 was introduced in a pair of commits in 2021:

  e3309322c3 "Clone flat_mutation_reader related classes into v2 variants"
  08b5773c12 "Adapt flat_mutation_reader_v2 to the new version of the API"

as a replacement for flat_mutation_reader, using range_tombstone_change
instead of range_tombstone to represent represent range tombstones. See
those commits for more information.

The transition was incremental; the last use of the original
flat_mutation_reader was removed in 2022 in commit

  026f8cc1e7 "db: Use mutation_partition_v2 in mvcc"

In turn, flat_mutation_reader was introduced in 2017 in commit

  748205ca75 "Introduce flat_mutation_reader"

To transition from a mutation_reader that nested rows within
a partition in a separate stream, to a flat reader that streamed
partitions and rows in the same stream.

Here, we reclaim the original name and rename the awkward
flat_mutation_reader_v2 to mutation_reader.

Note that mutation_fragment_v2 remains since we still use the original
for compatibilty, sometimes.

Some notes about the transition:

 - files were also renamed. In one case (flat_mutation_reader_test.cc), the
   rename target already existed, so we rename to
    mutation_reader_another_test.cc.

 - a namespace 'mutation_reader' with two definitions existed (in
   mutation_reader_fwd.hh). Its contents was folded into the mutation_reader
   class. As a result, a few #includes had to be adjusted.

Closes scylladb/scylladb#19356
2024-06-21 07:12:06 +03:00
Benny Halevy
fb3db7d81f perf-simple-query: add cpu_cycles / op metric
Example output:
```
bhalevy@[] scylla$ build/release/scylla perf-simple-query --default-log-level=error -c 1 --duration 10
random-seed=4058714023
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
86912.75 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42346 insns/op,   22811 cycles/op,        0 errors)
91348.29 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42306 insns/op,   22362 cycles/op,        0 errors)
87965.84 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42338 insns/op,   22966 cycles/op,        0 errors)
90793.67 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42351 insns/op,   22783 cycles/op,        0 errors)
90104.27 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42358 insns/op,   22875 cycles/op,        0 errors)
90397.13 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42355 insns/op,   22735 cycles/op,        0 errors)
89142.39 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42363 insns/op,   22996 cycles/op,        0 errors)
90410.40 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   42363 insns/op,   22725 cycles/op,        0 errors)
88173.10 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42366 insns/op,   23160 cycles/op,        0 errors)
88416.51 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42379 insns/op,   23102 cycles/op,        0 errors)

median 90104.26849997675
median absolute deviation: 1244.02
maximum: 91348.29
minimum: 86912.75
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18818
2024-06-14 07:42:09 +03:00
Tomasz Grabiec
7b1eea794b test: perf: Add test for tablet load balancer effectiveness 2024-06-02 14:23:00 +02:00
Pavel Emelyanov
83d491af02 config: Remove experimental TABLETS feature
... and replace it with boolean enable_tablets option. All the places
in the code are patched to check the latter option instead of the former
feature.

The option is OFF by default, but the default scylla.yaml file sets this
to true, so that newly installed clusters turn tablets ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18898
2024-05-30 18:03:51 +03:00
Kefu Chai
222dbf2ce4 test/boost: include test/lib/test_utils.hh
this change was created in the same spirit of 505900f18f. because
we are deprecating the operator<< for vector and unorderd_map in
Seastar, some tests do not compile anymore if we disable these
operators. so to be prepared for the change disabling them, let's
include test/lib/test_utils.hh for accessing the printer dedicated
for Boost.test. and also '#include <fmt/ranges.h>' when necessary,
because, in order to format the ranges using {fmt}, we need to
use fmt/ranges.h.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Kefu Chai
759156b56d test: perf: alternator: mark format string as constexpr
before this change, we use `update_item_suffix` as a format string
fed to `format(...)`, which is resolved to `seastar::format()`.
but with a patch which migrates the `seastar::format()` to the backend
with compile-time format check, the caller sites using `format()` would
fail to build, because `update_item_suffix` is not a `constexpr`:
```
/home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -MF test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o.d -o test/perf/CMakeFiles/test-perf.dir/RelWithDebInfo/perf_alternator.cc.o -c /home/kefu/dev/scylladb/test/perf/perf_alternator.cc
/home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: error: call to consteval function 'fmt::basic_format_string<char, const char (&)[1]>::basic_format_string<const char *, 0>' is not a constant expression
  249 |     return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, ""));
      |                                                                     ^
/usr/include/fmt/core.h:2776:67: note: read of non-constexpr variable 'update_item_suffix' is not allowed in a constant expression
 2776 |   FMT_CONSTEVAL FMT_INLINE basic_format_string(const S& s) : str_(s) {
      |                                                                   ^
/home/kefu/dev/scylladb/test/perf/perf_alternator.cc:249:69: note: in call to 'basic_format_string<const char *, 0>(update_item_suffix)'
  249 |     return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, ""));
      |                                                                     ^~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/test/perf/perf_alternator.cc:198:6: note: declared here
  198 | auto update_item_suffix = R"(
      |      ^
```

so, to prepare the change switching to compile-time format checking,
let's mark this variable `static constexpr`. this is also more correct,
as this variable is

* a compile time constant, and
* is not shared across different compilation units.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18685
2024-05-16 15:18:42 +03:00
Avi Kivity
c8cc47df2d Merge 'replica: allocate storage groups dynamically' from Aleksandra Martyniuk
Allocate storage groups dynamically, i.e.:
- on table creation allocate only storage groups that are on this
  shard;
- allocate a storage group for tablet that is moved to this shard;
- deallocate storage group for tablet that is moved out of this shard.

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` before change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64933.90 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42163 insns/op,        0 errors)
65865.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42155 insns/op,        0 errors)
66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
67029.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
68361.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42166 insns/op,        0 errors)

median 66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
median absolute deviation: 784.00
maximum: 68361.21
minimum: 64933.90
```

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` after change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
63744.12 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
66613.16 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
69667.39 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42184 insns/op,        0 errors)
67824.78 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42180 insns/op,        0 errors)
67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)

median 67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)
median absolute deviation: 631.05
maximum: 69667.39
minimum: 63744.12
```

Fixes: #16877.

Closes scylladb/scylladb#17664

* github.com:scylladb/scylladb:
  test: add test for back and forth tablets migration
  replica: allocate storage groups dynamically
  replica: refresh snapshot in compaction_group::cleanup
  replica: add rwlock to storage_group_manager
  replica: handle reads of non-existing tablets gracefully
  service: move to cleanup stage if allow_write_both_read_old fails
  replica: replace table::as_table_state
  compaction: pass compaction group id to reshape_compaction_group
  replica: open code get_compaction_group in perform_cleanup_compaction
  replica: drop single_compaction_group_if_available
2024-05-12 21:22:02 +03:00
Aleksandra Martyniuk
532653f118 replica: replace table::as_table_state
Replace table::as_table_state with table::try_get_table_state_with_static_sharding
which throws if a table does not use static sharding.
2024-05-10 14:56:38 +02:00
Marcin Maliszkiewicz
a1099791c4 test: perf: alternator: add option to skip data pre-population 2024-05-09 13:59:17 +02:00
Marcin Maliszkiewicz
fd416fac3b perf-alternator-workloads: add operations-per-shard option 2024-05-09 13:59:13 +02:00
Marcin Maliszkiewicz
5b8acf182a test: perf: add global secondary indexes write workload for alternator 2024-05-09 13:59:08 +02:00
Marcin Maliszkiewicz
43a64ac558 test: perf: add option to continue after failed request 2024-05-09 13:59:03 +02:00
Marcin Maliszkiewicz
70b5b5024b test: perf: add read modify write workload for alternator (lwt) 2024-05-09 13:58:58 +02:00
Marcin Maliszkiewicz
5b8e554431 test: perf: add scan workload for alternator 2024-05-09 13:58:54 +02:00
Marcin Maliszkiewicz
55030b1550 test: perf: add end-to-end benchmark for alternator
The code is based on similar idea as perf_simple_query. The main differences are:
- it starts full scylla process
- communicates with alternator via http (localhost)
- uses richer table schema with all dynamoDB types instead of only strings

Testing code runs in the same process as scylla so we can easily get various perf counters (tps, instr, allocation, etc).

Results on my machine (with 1 vCPU):
> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload read --duration 10 2> /dev/null
...
median 23402.59616090321
median absolute deviation: 598.77
maximum: 24014.41
minimum: 19990.34

> ./build/release/scylla perf-alternator-workloads --workdir ~/tmp --smp 1 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write --duration 10 2> /dev/null
...
median 16089.34211320635
median absolute deviation: 552.65
maximum: 16915.95
minimum: 14781.97

The above seem more realistic than results from perf_simple_query which are 96k and 49k tps (per core).
2024-05-09 13:58:40 +02:00
Marcin Maliszkiewicz
6152223890 test: perf: extract result aggregation logic to a separate struct
It will be reused later by a new tool.
2024-05-09 13:58:29 +02:00
Benny Halevy
e5ca65f78b test/perf: report also log_allocations/op
Currently perf-simple-query --write ignores
log allocations that happen on the memtable
apply path.

This change adds tracking and accounting
of the number of log allocation,
and reporting of thereof.

For reference, here's the output of
build/release/scylla perf-simple-query --write --default-log-level=error --random-seed=1 -c 1
```
random-seed=1
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=write, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
78073.55 tps ( 59.4 allocs/op,  16.3 logallocs/op,  14.3 tasks/op,   52991 insns/op,        0 errors)
77263.59 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53282 insns/op,        0 errors)
79913.07 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53295 insns/op,        0 errors)
79554.32 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53284 insns/op,        0 errors)
79151.53 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53289 insns/op,        0 errors)

median 79151.53 tps ( 59.3 allocs/op,  16.0 logallocs/op,  14.3 tasks/op,   53289 insns/op,        0 errors)
median absolute deviation: 761.54
maximum: 79913.07
minimum: 77263.59
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-05-02 18:42:41 +03:00
Patryk Jędrzejczak
3a34bb18cd db: config: make consistent-topology-changes unused
We make the `consistent-topology-changes` experimental feature
unused and assumed to be true in 6.0. We remove code branches that
executed if `consistent-topology-changes` was disabled.
2024-04-25 14:33:21 +02:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Avi Kivity
72bbe75d5b Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec
This PR fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because tablet replica allocation for
rebuild will not be able to find a viable destination, as the replacing node
is not considered to be a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is in left state and replacing node is in normal state.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes https://github.com/scylladb/scylladb/issues/17025

Nodes in the left state will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet replica sets.

Currently left nodes are never removed from topology, so will
accumulate in memory. We could garbage-collect them from topology
coordinator if a left node is absent in any replica set. That means we
need a new state - left_for_real.

Closes scylladb/scylladb#17388

* github.com:scylladb/scylladb:
  test: py: Add test for view replica pairing after replace
  raft, api: Add RESTful API to query current leader of a raft group
  test: test_tablets_removenode: Verify replacing when there is no spare node
  doc: topology-on-raft: Document replace behavior with tablets
  tablets, raft topology: Rebuild tablets after replacing node is normal
  tablets: load_balancer: Access node attributes via node struct
  tablets: load_balancer: Extract ensure_node()
  mv: Switch to using host_id-based replica set
  effective_replication_map: Introduce host_id-based get_replicas()
  raft topology: Keep nodes in the left state to topology
  tablets: Introduce read_required_hosts()
2024-03-18 16:16:08 +02:00
Kefu Chai
fe28aac440 test/perf: add fmt::formatter for perf_result_with_aio_writes
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `perf_result_with_aio_writes`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17849
2024-03-18 12:53:39 +02:00
Tomasz Grabiec
f7851696fa tablets: Introduce read_required_hosts()
Will be used by topology loading code to determine which hosts are
needed in topology, even if they're in the left state. We want to load
only left nodes if they are referenced by any tablet, which may happen
temporarily until the replacement replica is rebuilt.
2024-03-15 11:05:29 +01:00
Botond Dénes
9f97d21339 Merge 'Enhance perf-simple-query test' from Pavel Emelyanov
While measuring #17149 with this test some changes were applied, here they are

- keep initial_tablets number in output json's parameters section
- disable auto compaction
- add control over the amount of sstables generated for --bypass-cache case

Closes scylladb/scylladb#17473

* github.com:scylladb/scylladb:
  perf_simple_query: Add --memtable-partitions option
  perf_simple_query: Disable auto compaction
  perf_simple_query: Keep number of initial tablets in output json
2024-03-08 15:21:04 +02:00
Kefu Chai
2ccd9e695d test/perf: add fmt::formatters for scheduling_latency_measurer and perf_result
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* scheduling_latency_measurer
* perf_result

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-23 10:17:50 +08:00
Pavel Emelyanov
027282ee07 perf_simple_query: Add --memtable-partitions option
There's the --partitions one that specifies how many partitions the test
would generate before measuring. When --bypass-cache option is in use,
thus making the test alway engage sstables readers, it makes sense to
add some control over sstables granularity. The new option suggests that
during population phase, memtable gets flushed every $this-number
partitions, not just once at the end (and unknown amount of times in the
middle because of dirty memory limit).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:44:17 +03:00
Pavel Emelyanov
fd4c2e607e perf_simple_query: Disable auto compaction
Usually a perf test doesn't expect that some activity runs in the
background without controls. Compaction is one of a kind, so it makes
sense to keep it off while running the measurement.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:43:23 +03:00
Pavel Emelyanov
74899f71de perf_simple_query: Keep number of initial tablets in output json
When producing the output json file, keep how many initial tablets were
requested (if at all) next to other workload parameters

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-22 14:42:39 +03:00
Avi Kivity
51df8b9173 interval: rename nonwrapping_interval to interval
Our interval template started life as `range`, and was supported
wrapping to follow Cassandra's convention of wrapping around the
maximum token.

We later recognized that an interval type should usually be non-wrapping
and split it into wrapping_range and nonwrapping_range, with `range`
aliasing wrapping_range to preserve compatibility.

Even later, we realized the name was already taken by C++ ranges and
so renamed it to `interval`. Given that intervals are usually non-wrapping,
the default `interval` type is non-wrapping.

We can now simplify it further, recognizing that everyone assumes
that an interval is non-wrapping and so doesn't need the
nonwrapping_interval_designation. We just rename nonwrapping_interval
to `interval` and remove the type alias.
2024-02-21 19:43:17 +02:00