Commit Graph

1246 Commits

Author SHA1 Message Date
Botond Dénes
0d7335dd27 test/lib/random_schema: remove assert on min number of regular columns
It is legal for a schema to have 0 regular columns, so remove the assert
on the schema specification's regular column count.
2024-06-13 01:32:17 -04:00
Avi Kivity
7b301f0cb9 Merge 'Encapsulate wasm and lua management in lang::manager service' from Pavel Emelyanov
After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code.

This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737)

Closes scylladb/scylladb#19166

* github.com:scylladb/scylladb:
  code: Enlighten wasm headers usage
  lang: Unfriend wasm context from manager
  lang, cql3, schema_tables: Don't mess with db::config
  lang: Don't use db::config to create lua context
  lang: Don't use db::config to create wasm context
  lang: Drop manager::precompile() method
  cql3, schema_tables: Generalize function creation
  wasm: Replace startup_context with wasm_config
  lang: Add manager::start() method
  lang: Move manager to lang namespace
  lang: Move wasm::manager to its .cc/.hh files
2024-06-09 19:32:26 +03:00
Gleb Natapov
34cf5c81f6 group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.

Fixes scylladb/scylladb#18863

Closes scylladb/scylladb#19138
2024-06-07 15:31:44 +02:00
Pavel Emelyanov
b854bf4b83 lang: Don't use db::config to create lua context
Similarly to previous patch, lua context needs db::config for creation.
It's better to get the configurables via lang::manager::config.

One thing to note -- lua config carries updateable_values on board, but
respective db::config options and _not_ LiveUpdate-able, so the lua
config could just use simple data types. This patch keeps updateable
values intact for brevity.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
783ccc0a74 lang: Don't use db::config to create wasm context
The managerr needs to get two "fuel" configurables from db::config in
order to create context. Instead of carrying db config from callers,
keep the options on existing lang::manager::config and use them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 13:07:05 +03:00
Pavel Emelyanov
fe7ff7172d wasm: Replace startup_context with wasm_config
The lang::manager starts with the help of a context because it needs to
have std::shared_ptr<> pointg to cross-shard shared wasm engine and
runner thread. For that a context is created in advance, that then helps
sharing the engine and runner across manager instances.

This patch removes the "context" and replaces it with classical
manager::config. With it, it's lang::manager who's now responsible for
initializing itself.

In order to have cross-shard engine and thread pointers, the start()
method uses invoke_on_others() facility to share the pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
0dad72b736 lang: Add manager::start() method
Just like any other sharded<> service, the lang::manager now starts and
stops in a classical sequence of

  await sharded<manager>::start()
  defer([] { await sharded<manager>::stop() })
  await sharded<manager>::invoke_on_all(&manager::start)

For now the method is no-op, next patches will start using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
f950469af5 lang: Move manager to lang namespace
And, while at it, rename local variable to refer to it to as "manager"
not "wasm". Query processor and database also have getters named
"wasm()", these are not renamed yet to keep patch smaller (and those
getters are going to be reworked further anyway).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Pavel Emelyanov
1dec79e97d lang: Move wasm::manager to its .cc/.hh files
It's going to become a facade in front of both -- wasm and lua, so keep
it in files with language independent names.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-07 12:35:57 +03:00
Avi Kivity
cd553848c1 Merge 'auth-v2: use a single transaction in auth related statements ' from Marcin Maliszkiewicz
Due to gradual raft introduction into statements code in cases when single statement modified more than one table or mutation producing function was composed out of simpler ones we violated transactional logic and statement execution was not atomic as whole.

This patch changes that, so now either all changes resulting from statement execution are applied or none. Affected statements types are:
- schema modification
- auth modifications
- service levels modifications

Fixes https://github.com/scylladb/scylladb/issues/17738

Closes scylladb/scylladb#17910

* github.com:scylladb/scylladb:
  raft: rename mutations_collector to group0_batch
  raft: rename announce to commit
  cql3: raft: attach description to each mutations collector group
  auth: unify mutations_generator type
  auth: drop redundant 'this' keyword
  auth: remove no longer used code from standard_role_manager::legacy_modify_membership
  cql3: auth: use mutation collector for service levels statements
  cql3: auth: use mutation collector for alter role
  cql3: auth: use mutation collector for grant role and revoke role
  cql3: auth: use mutation collector for drop role and auto-revoke
  auth: add refactored modify_membership func in standard_role_manager
  auth: implement empty revoke_all in allow_all_authorizer
  auth: drop request_execution_exception handling from default_authorizer::revoke_all
  Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks"
  cql3: auth: use mutation collector for grant and revoke permissions
  cql3: extract changes_tablets function in alter_keyspace_statement
  cql3: auth: use mutation collector for create role statement
  auth: move create_role code into service
  auth: add a way to announce mutations having only client_state ref
  auth: add collect_mutations common helper
  auth: remove unused header in common.hh
  auth: add class for gathering mutations without immediate announce
  auth: cql3: use auth facade functions consistently on write path
  auth: remove unused is_enforcing function
2024-06-06 17:31:26 +03:00
Marcin Maliszkiewicz
63e6334a64 raft: rename mutations_collector to group0_batch 2024-06-06 13:26:34 +02:00
Kamil Braun
57e810c852 Merge 'Serialize repair with tablet migration' from Tomasz Grabiec
We want to exclude repair with tablet migrations to avoid races
between repair reads and writes with replica movement. Repair is not
prepared to handle topology transitions in the middle.

One reason why it's not safe is that repair may successfully write to
a leaving replica post streaming phase and consider all replicas to be
repaired, but in fact they are not, the new replica would not be
repaired.

Other kinds of races could result in repair failures. If repair writes
to a leaving replica which was already cleaned up, such writes will
fail, causing repair to fail.

Excluding works by keeping effective_replication_map_ptr in a version
which doesn't have table's tablets in transitions. That prevents later
transitions from starting because topology coordinator's barrier will
wait for that erm before moving to a stage later than
allow_write_both_read_old, so before any requests start using the new
topology. Also, if transitions are already running, repair waits for
them to finish.

A blocked tablet migration (e.g. due to down node) will block repair,
whereas before it would fail. Once admin resolves the cause of blocked migration,
repair will continue.

Fixes #17658.
Fixes #18561.

Closes scylladb/scylladb#18641

* github.com:scylladb/scylladb:
  test: pylib: Do not block async reactor while removing directories
  repair: Exclude tablet migrations with tablet repair
  repair_service: Propagate topology_state_machine to repair_service
  main, storage_service: Move topology_state_machine outside storage_service
  storage_srvice, toplogy: Extract topology_state_machine::await_quiesced()
  tablet_scheduler: Make disabling of balancing interrupt shuffle mode
  tablet_scheduler: Log whether balancing is considered as enabled
2024-06-06 11:27:03 +02:00
Tomasz Grabiec
c45ce41330 main, storage_service: Move topology_state_machine outside storage_service
It will be propagated to repair_service to avoid cyclic dependency:

storage_service <-> repair_service
2024-06-05 16:11:22 +02:00
Pavel Emelyanov
dcc083110d gossiper: Stop using db::config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Marcin Maliszkiewicz
ac0e164a6b raft: rename announce to commit
Old wording was derived from existing code which
originated from schema code. Name commit better
describes what we do here.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
370a5b547e cql3: raft: attach description to each mutations collector group
This description is readable from raft log table.
Previously single description was provided for the whole
announce call but since it can contain mutations from
various subsystems now description was moved to
add_mutation(s)/add_generator function calls.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
a88b7fc281 cql3: auth: use mutation collector for service levels statements
This is done to achieve single transaction semantics.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
2a6cfbfb33 cql3: auth: use mutation collector for create role statement
This is done to achieve single transaction semantics.

grant_permissions_to_creator is logically part of create role
but its change will be included in following commits
as it spans multiple usages.

Additinally we disabled rollback during create role as
it won't work and is not needed with single transaction logic.
2024-06-04 15:43:04 +02:00
Tomasz Grabiec
7b1eea794b test: perf: Add test for tablet load balancer effectiveness 2024-06-02 14:23:00 +02:00
Pavel Emelyanov
e74a4b038f Merge 'tablets: alter keyspace' from Piotr Smaron
This change supports changing replication factor in tablets-enabled keyspaces.
This covers both increasing and decreasing the number of tablets replicas through
first building topology mutations (`alter_keyspace_statement.cc`) and then
tablets/topology/schema mutations (`topology_coordinator.cc`).
For the limitations of the current solution, please see the docs changes attached to this PR.

Fixes: #16129

Closes scylladb/scylladb#16723

* github.com:scylladb/scylladb:
  test: Do not check tablets mutations on nodes that don't have them
  test: Fix the way tablets RF-change test parses mutation_fragments
  test/tablets: Unmark RF-changing test with xfail
  docs: document ALTER KEYSPACE with tablets
  Return response only when tablets are reallocated
  cql-pytest: Verify RF is changes by at most 1 when tablets on
  cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1
  Reject ALTER with 'replication_factor' tag
  Implement ALTER tablets KEYSPACE statement support
  Parameterize migration_manager::announce by type to allow executing different raft commands
  Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks
  Extend system.topology with 3 new columns to store data required to process alter ks global topo req
  Allow query_processor to check if global topo queue is empty
  Introduce new global topo `keyspace_rf_change` req
  New raft cmd for both schema & topo changes
  Add storage service to query processor
  tablets: tests for adding/removing replicas
  tablet_allocator: make load_balancer_stats_manager configurable by name
2024-05-29 14:17:51 +03:00
Kefu Chai
e42d83dc46 treewide: include used headers
before this change, we rely on `seastar/util/std-compat.hh` to
include the used headers provided by stdandard library. this was
necessary before we moved to a C++20 compliant standard library
implementation. but since Seastar has dropped C++17 support. its
`seastar/util/std-compat.hh` is not responsible for providing these
headers anymore.

so, in this change, we include the used headers directly instead
of relying on `seastar/util/std-compat.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18883
2024-05-27 17:34:38 +03:00
Piotr Smaron
cb40f13831 Add storage service to query processor
Query processor needs to access storage service to check if global
topology request is still ongoing and to be able to wait until it
completes.
2024-05-27 12:48:44 +02:00
Paweł Zakrzewski
c888945354 tablets: tests for adding/removing replicas
Note we're suppressing a UBSanitizer overflow error in UTs. That's
because our linter complains about a possible overflow, which never
happens, but tests are still failing because of it.
2024-05-27 12:48:44 +02:00
Botond Dénes
47dbf23773 Merge 'Rework view services and system-distributed-keyspace dependencies' from Pavel Emelyanov
The system-distributed-keyspace and view-update-generator often go in pair, because streaming, repair and sstables-loader (via distributed-loader) need them booth to check if sstable is staging and register it if it's such. The check is performed by messing directly with system_distributed.view_build_status table, and the registration happens via view-update-generator.

That's not nice, other services shouldn't know that view status is kept in system table. Also view-update-generator is a service to generae and push view updates, the fact that it keeps staging sstables list is the implementation detail.

This PR replaces dependencies on the mentioned pair of services with the single dependency on view-builder (repair, sstables-loader and stream-manager are enlightened) and hides the view building-vs-staging details inside the view_builder.

Along the way, some simplification of repair_writer_impl class is done.

Closes scylladb/scylladb#18706

* github.com:scylladb/scylladb:
  stream_manager: Remove system_distributed_keyspace and view_update_generator
  repair: Remove system_distributed_keyspace and view_update_generator
  streaming: Remove system_distributed_keyspace and view_update_generator
  sstables_loader: Remove system_distributed_keyspace and view_update_generator
  distributed_loader: Remove system_distributed_keyspace and view_update_generator
  view: Make register_staging_sstable() a method of view_builder
  view: Make check_view_build_ongoing() helper a method of view_builder
  streaming: Proparage view_builder& down to make_streaming_consumer()
  repair: Keep view_builder& on repair_writer_impl
  distributed_loader: Propagate view_builder& via process_upload_dir()
  stream_manager: Add view builder dependency
  repair_service: Add view builder dependency
  sstables_loader: Add view_bulder dependency
  main: Start sstables loader later
  repair: Remove unwanted local references from repair_meta
2024-05-27 10:51:11 +03:00
Kefu Chai
46d993a283 test: revert 4c1b6f04
in 4c1b6f04, we added a concept for fmt::is_formattable<>. but it
was not ncessary. the fmt::is_formattable<> trait was enough. the
reason 4c1b6f04 was actually a leftover of a bigger change which
tried to add trait for the cases where fmt::is_formattable<> was
not able to cover. but that was based on the wrong impression that
fmt::is_formattable<> should be able to work with container types
without including, for instance `fmt/ranges.h`. but in 222dbf2c,
we include `fmt/ranges.h` in tests, where the range-alike formatter
is used, that enables `fmt::is_formattable<>` to tell that container
types are formattable.

in short, 4c1b6f04 was created based on a misunderstanding, and
it was a reduced type trait, which is proved to be not necessary.

so, in this change, it is dropped. but the type constraints is
preserved to make the build failure more explicit, if the fallback
formatter does not match with the type to be formatted by Boost.test.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18879
2024-05-27 10:14:59 +03:00
Kefu Chai
4c1b6f0476 test: define a more generic boost_test_print_type
fmt::is_formattable<T>::value is false, even if

* T is a container of U, and
* fmt::is_formattable<U>, and
* U can be formatted using fmt::formatter

so, we have to define a more generic boost_test_print_type()
for the all types supported by {fmt}. it will help us to ditch the
operator<< for vector and unordered_map in Seastar, and allow us
to use the fmt::formatter specialization of the element
types.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-05-26 12:32:43 +08:00
Pavel Emelyanov
8906126a2c stream_manager: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:56 +03:00
Pavel Emelyanov
d917b06857 stream_manager: Add view builder dependency
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Botond Dénes
5e41dd28c7 Merge 'Sanitize sl controller draining' from Pavel Emelyanov
The sl-controller is stopped in three steps. The first (and instantly the second) is unsubscribing from lifecycle notification and draining. The third is stop itself. First two steps are "out of order" as compared to the desired start-stop sequence of any service, this patch fixes these steps.

After this PR the drain_on_shutdown() (the call that drains the node upon stop) finally becomes clean and tidy and is no longer accompanied by ad-hoc fellow drains/stops/aborts/whatever.

refs: #2737

Closes scylladb/scylladb#18731

* github.com:scylladb/scylladb:
  sl_controller: Remove drain() method
  sl_controller: Move abort kicking into do_abort()
  main,sl_controller: Subscribe for early abort
  main: Unsubscribe sl controller next to subscribing
2024-05-21 17:16:23 +03:00
Kefu Chai
86b988a70b test/lib: do not use variable which could be moved away
C++ standard does not define the order in which the parameters
passed to a function are evaluated. so in theory, in
```c++
reusable_sst(sst->get_schema(), std::move(sst));
```
`std::move(sst)` could be evaluated before `sst->get_schema`.
but please note, `std::move(sst)` does not move `sst`
away, it merely cast `sst` to a rvalue reference, it is
`reusable_sst()` which *could* move `sst` away by
consuming it. so following call is much more dangerous
than the above one:
```c++
reusable_sst(sst->get_schema(), modify_sst(std::move(sst)))
```
nevertheless, this usage is still confusing. so instead
of passing a copy of `sst` to `reusable_sst`.

this change is inspired by clang-tidy, it warns like:

```
Warning: /home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: warning: 'sst' used after it was moved [bugprone-use-after-move]
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |                         ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:44: note: move occurred here
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |                                            ^
/home/runner/work/scylladb/scylladb/test/lib/test_services.cc:397:25: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated
  397 |     return reusable_sst(sst->get_schema(), std::move(sst));
      |
```

per the analysis above, this is a false alarm.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18775
2024-05-21 10:02:10 +03:00
Pavel Emelyanov
8d4c8711fa main,sl_controller: Subscribe for early abort
There's stop-signal in main that fires an abort source on stop. Lots of
other services are subscribed in it, add the sl-controller too. For now
it's a no-op, but next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-20 21:26:31 +03:00
Botond Dénes
e1c4e6c151 Merge 'sstables_manager: use maintenance scheduling group to run components reload fiber' from Lakshmi Narayanan Sreethar
PR https://github.com/scylladb/scylladb/pull/18186 introduced a fiber that reloads reclaimed bloom filters when memory becomes available. Use maintenance scheduling group to run that fiber instead of running it in the main scheduling group.

Fixes #18675

Closes scylladb/scylladb#18721

* github.com:scylladb/scylladb:
  sstables_manager: use maintenance scheduling group to run components reload fiber
  sstables_manager: add member to store maintenance scheduling group
2024-05-20 16:38:42 +03:00
Avi Kivity
52fe351c31 Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec
This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N
then there is not even a possibility of moving tablets around to fix the imbalance.
The only way to bring the system to balance is to move tablets within the nodes.

The system is not prepared for intra-node migration currently. Request coordination
is host-based, while for intra-node migration it should be (also) shard-based.
The solution employed here is to keep the coordination between nodes as-is,
and for intra-node migration storage_proxy-level coordinator is not aware of
the migration (no pending host). The replica-side request handler will be a
second-level coordinator which routes requests to shards, similar to how
the first-level coordinator routes them to hosts.

Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.

perf-simple-query test results show no signs of regression:

Command: perf-simple-query -c1 -m1G --write --tablets --duration=10

Before:

> 83294.81 tps ( 59.5 allocs/op,  14.3 tasks/op,   53725 insns/op,        0 errors)
> 87756.72 tps ( 59.5 allocs/op,  14.3 tasks/op,   54049 insns/op,        0 errors)
> 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> 86211.38 tps ( 59.7 allocs/op,  14.3 tasks/op,   54219 insns/op,        0 errors)
> 86559.89 tps ( 59.6 allocs/op,  14.3 tasks/op,   54188 insns/op,        0 errors)
> 86609.39 tps ( 59.6 allocs/op,  14.3 tasks/op,   54117 insns/op,        0 errors)
> 87464.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   54039 insns/op,        0 errors)
> 86185.43 tps ( 59.6 allocs/op,  14.3 tasks/op,   54169 insns/op,        0 errors)
> 86254.71 tps ( 59.6 allocs/op,  14.3 tasks/op,   54139 insns/op,        0 errors)
> 83395.35 tps ( 60.2 allocs/op,  14.4 tasks/op,   54693 insns/op,        0 errors)
>
> median 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> median absolute deviation: 243.04
> maximum: 87756.72
> minimum: 83294.81
>

After:

> 85523.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   53872 insns/op,        0 errors)
> 89362.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54226 insns/op,        0 errors)
> 88167.55 tps ( 59.7 allocs/op,  14.3 tasks/op,   54400 insns/op,        0 errors)
> 87044.40 tps ( 59.7 allocs/op,  14.3 tasks/op,   54310 insns/op,        0 errors)
> 88344.50 tps ( 59.6 allocs/op,  14.3 tasks/op,   54289 insns/op,        0 errors)
> 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> 88725.46 tps ( 59.6 allocs/op,  14.3 tasks/op,   54230 insns/op,        0 errors)
> 88640.08 tps ( 59.6 allocs/op,  14.3 tasks/op,   54210 insns/op,        0 errors)
> 90306.31 tps ( 59.4 allocs/op,  14.3 tasks/op,   54043 insns/op,        0 errors)
> 87343.62 tps ( 59.8 allocs/op,  14.3 tasks/op,   54496 insns/op,        0 errors)
>
> median 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> median absolute deviation: 1007.41
> maximum: 90306.31
> minimum: 85523.06

Command (reads): perf-simple-query -c1 -m1G  --tablets --duration=10

Before:

> 95860.18 tps ( 63.1 allocs/op,  14.1 tasks/op,   42476 insns/op,        0 errors)
> 97537.69 tps ( 63.1 allocs/op,  14.1 tasks/op,   42454 insns/op,        0 errors)
> 97549.23 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97511.29 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97227.32 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 94031.94 tps ( 63.1 allocs/op,  14.1 tasks/op,   42441 insns/op,        0 errors)
> 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> 96401.70 tps ( 63.1 allocs/op,  14.1 tasks/op,   42473 insns/op,        0 errors)
> 96573.77 tps ( 63.1 allocs/op,  14.1 tasks/op,   42440 insns/op,        0 errors)
> 96340.54 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
>
> median 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> median absolute deviation: 571.20
> maximum: 97549.23
> minimum: 94031.94
>

After:

> 99794.67 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 101244.99 tps ( 63.1 allocs/op,  14.1 tasks/op,   42472 insns/op,        0 errors)
> 101128.37 tps ( 63.1 allocs/op,  14.1 tasks/op,   42485 insns/op,        0 errors)
> 101065.27 tps ( 63.1 allocs/op,  14.1 tasks/op,   42465 insns/op,        0 errors)
> 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> 101413.31 tps ( 63.1 allocs/op,  14.1 tasks/op,   42463 insns/op,        0 errors)
> 101464.92 tps ( 63.1 allocs/op,  14.1 tasks/op,   42466 insns/op,        0 errors)
> 101086.74 tps ( 63.1 allocs/op,  14.1 tasks/op,   42488 insns/op,        0 errors)
> 101559.09 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
> 100742.58 tps ( 63.1 allocs/op,  14.1 tasks/op,   42491 insns/op,        0 errors)
>
> median 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> median absolute deviation: 200.33
> maximum: 101559.09
> minimum: 99794.67
>

Fixes #16594

Closes scylladb/scylladb#18026

* github.com:scylladb/scylladb:
  Implement fast streaming for intra-node migration
  test: tablets_test: Test sharding during intra-node migration
  test: tablets_test: Check sharding also on the pending host
  test: py: tablets: Test writes concurrent with migration
  test: py: tablets: Test crash during intra-node migration
  api, storage_service: Introduce API to wait for topology to quiesce
  dht, replica: Remove deprecated sharder APIs
  test: Avoid using deprecated sharded API
  db: do_apply_many() avoid deprecated sharded API
  replica: mutation_dump: Avoid deprecated sharder API
  repair: Avoid deprecated sharder API
  table: Remove optimization which returns empty reader when key is not owned by the shard
  dht: is_single_shard: Avoid deprecated sharder API
  dht: split_range_to_single_shard: Work with static_sharder only
  dht: ring_position_range_sharder: Avoid deprecated sharder APIs
  dht: token: Avoid use of deprecated sharder API by switching to static_sharder
  selective_token_sharder: Avoid use of deprecated sharder API
  docs: Document tablet sharding vs tablet replica placement
  readers/multishard.cc: use shard_for_reads() instead of shard_of()
  multishard_mutation_query.cc: use shard_for_reads() instead of shard_of()
  storage_proxy: Extract common code to apply mutations on many shards according to sharder
  storage_proxy: Prepare per-partition rate-limiting for intra-node migration
  storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate()
  storage_proxy: Prepare mutate_hint() for intra-node tablet migration
  commitlog_replayer: Avoid deprecated sharder::shard_of()
  lwt: Avoid deprecated sharder::shard_of()
  compaction: Avoid deprecated sharder::shard_of()
  dht: Extract dht::static_sharder
  replica: Deprecate table::shard_of()
  locator: Deprecate effective_replication_map::shard_of()
  dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard
  tests: tablets: py: Add intra-node migration test
  tests: tablets: Test that drained nodes are not balanced internally
  tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load
  tests: tablets: Verify that disabling balancing results in no intra-node migrations
  tests: tablets: Check that nodes are internally balanced
  tests: tablets: Improve debuggability by showing which rows are missing
  tablets, storage_service: Support intra-node migration in move_tablet() API
  tablet_allocator: Generate intra-node migration plan
  tablet_allocator: Extract make_internode_plan()
  tablet_allocator: Maintain candidate list and shard tablet count for target nodes
  tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
  tablets, streaming: Implement tablet streaming for intra-node migration
  dht, auto_refreshing_sharder: Allow overriding write selector
  multishard_writer: Handle intra-node migration
  storage_proxy: Handle intra-node tablet migration for writes
  tablets: Get rid of tablet_map::get_shard()
  tablets: Avoid tablet_map::get_shard in cleanup
  tablets: test: Use sharder instead of tablet_map::get_shard()
  tablets: tablet_sharder: Allow working with non-local host
  sharding: Prepare for intra-node-migration
  docs: Document sharder use for tablets
  tablets: Introduce tablet transition kind for intra-node migration
  tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
  sstables, gdb: Track readers in a linked list
  raft topology: Fix global token metadata barrier to not fence ahead of what is drained
2024-05-20 16:13:01 +03:00
Kefu Chai
40ce52c3cc test: use generic boost_test_print_type()
in this change, we trade the `boost_test_print_type()` overloads
for the generic template of `boost_test_print_type()`, except for
those in the very small tests, which presumably want to keep
themselves relative self-contained.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18727
2024-05-20 12:56:20 +03:00
Lakshmi Narayanan Sreethar
79f6746298 sstables_manager: add member to store maintenance scheduling group
Store that maintenance scheduling group inside the sstables_manager. The
next patch will use this to run the components reloader fiber.

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-19 15:23:45 +05:30
Tomasz Grabiec
32a191384a test: Avoid using deprecated sharded API
There is not tablet migration in unit tests, so shard_of() can be
safely replaced with shard_for_reads(). Even if it's used for writes.
2024-05-16 00:28:47 +02:00
Tomasz Grabiec
9da3bd84c7 dht: Extract dht::static_sharder
Before the patch, dht::sharder could be instantiated and it would
behave like a static sharder. This is not safe with regards to
extensions of the API because if a derived implementation forgets to
override some method, it would incorrectly default to the
implementation from static sharder. Better to fail the compilation in
this case, so extract static sharder logic to dht::static_sharder
class and make all methods in dht::sharder pure virtual.

This also allows us to have algorithms indicate that they only work
with static sharder by accepting the type, and have compile-time
safety for this requirement.

schema::get_sharder() is changed to return the static_sharder&.
2024-05-16 00:28:47 +02:00
Pavel Emelyanov
634c066c43 service_level_controller: Add dependency on shared_token_metadata
The controller needs to access topology, so it needs the token metadata
at hand.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-14 15:43:01 +03:00
Avi Kivity
cc8b4e0630 batchlog_manager, test: initialize delay configuration
In b4e66ddf1d (4.0) we added a new batchlog_manager configuration
named delay, but forgot to initialize it in cql_test_env. This somehow
worked, but doesn't with clang 18.

Fix it by initializing to 0 (there isn't a good reason to delay it).
Also provide a default to make it safer.

Closes scylladb/scylladb#18572
2024-05-13 07:57:35 +03:00
Avi Kivity
c8cc47df2d Merge 'replica: allocate storage groups dynamically' from Aleksandra Martyniuk
Allocate storage groups dynamically, i.e.:
- on table creation allocate only storage groups that are on this
  shard;
- allocate a storage group for tablet that is moved to this shard;
- deallocate storage group for tablet that is moved out of this shard.

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` before change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64933.90 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42163 insns/op,        0 errors)
65865.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42155 insns/op,        0 errors)
66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
67029.60 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
68361.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42166 insns/op,        0 errors)

median 66649.36 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42176 insns/op,        0 errors)
median absolute deviation: 784.00
maximum: 68361.21
minimum: 64933.90
```

Output of `./build/release/scylla perf-simple-query -c 1 --random-seed=2248493992` after change:
```
random-seed=2248493992
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
63744.12 tps ( 63.2 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
66613.16 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42153 insns/op,        0 errors)
69667.39 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42184 insns/op,        0 errors)
67824.78 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42180 insns/op,        0 errors)
67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)

median 67244.21 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42174 insns/op,        0 errors)
median absolute deviation: 631.05
maximum: 69667.39
minimum: 63744.12
```

Fixes: #16877.

Closes scylladb/scylladb#17664

* github.com:scylladb/scylladb:
  test: add test for back and forth tablets migration
  replica: allocate storage groups dynamically
  replica: refresh snapshot in compaction_group::cleanup
  replica: add rwlock to storage_group_manager
  replica: handle reads of non-existing tablets gracefully
  service: move to cleanup stage if allow_write_both_read_old fails
  replica: replace table::as_table_state
  compaction: pass compaction group id to reshape_compaction_group
  replica: open code get_compaction_group in perform_cleanup_compaction
  replica: drop single_compaction_group_if_available
2024-05-12 21:22:02 +03:00
Aleksandra Martyniuk
532653f118 replica: replace table::as_table_state
Replace table::as_table_state with table::try_get_table_state_with_static_sharding
which throws if a table does not use static sharding.
2024-05-10 14:56:38 +02:00
Lakshmi Narayanan Sreethar
4d22c4b68b sstable_datafile_test: add testcase to test reclaim during reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 19:57:40 +05:30
Lakshmi Narayanan Sreethar
69b2a127b0 sstable_datafile_test: add test to verify reclaimed components reload
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2024-05-09 17:48:58 +05:30
Botond Dénes
155332ebf8 Merge 'Drain view_builder in generic drain (again)' from Pavel Emelyanov
Some time ago #16558 was merged that moved view builder drain into generic drain. After this merge dtests started to fail from time to time, so the PR was reverted (see #18278). In #18295 the hang was found. View builder drain was moved from "before stopping messaging service to "after" it, and view update write handlers in proxy hanged for hard-coded timeout of 5 minutes without being aborted. Tests don't wait for 5 minutes and kill scylla, then complain about it and fail.

This PR brings back the original PR as well as the necessary fix that cancels view update write handlers on stop.

Closes scylladb/scylladb#18408

* github.com:scylladb/scylladb:
  Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB"
  view: Abort pending view updates when draining
2024-05-09 08:26:44 +03:00
Botond Dénes
96a7ed7efb Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili
This is the second half of the fix for issue #13968. The first half is already merged with PR #18346

Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row.

This change adds a new counter for dead rows to the large_partitions table.

This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells.

This is an enhancement, and backporting is not needed.

Fixes #13968

Closes scylladb/scylladb#18458

* github.com:scylladb/scylladb:
  sstable: added test for counting dead rows
  sstable: added docs for system.large_partitions.dead_rows
  sstable: added cluster feature for dead rows and range tombstones
  sstable: write dead_rows count to system.large_partitions
  sstable: added counter for dead rows
2024-05-09 08:26:43 +03:00
Kamil Braun
03818c4aa9 direct_failure_detector: increase ping timeout and make it tunable
The direct failure detector design is simplistic. It sends pings
sequentially and times out listeners that reached the threshold (i.e.
didn't hear from a given endpoint for too long) in-between pings.

Given the sequential nature, the previous ping must finish so the next
ping can start. We timeout pings that take too long. The timeout was
hardcoded and set to 300ms. This is too low for wide-area setups --
latencies across the Earth can indeed go up to 300ms. 3 subsequent timed
out pings to a given node were sufficient for the Raft listener to "mark
server as down" (the listener used a threshold of 1s).

Increase the ping timeout to 600ms which should be enough even for
pinging the opposite side of Earth, and make it tunable.

Increase the Raft listener threshold from 1s to 2s. Without the
increased threshold, one timed out ping would be enough to mark the
server as down. Increasing it to 2s requires 3 timed out pings which
makes it more robust in presence of transient network hiccups.

In the future we'll most likely want to decrease the Raft listener
threshold again, if we use Raft for data path -- so leader elections
start quickly after leader failures. (Faster than 2s). To do that we'll
have to improve the design of the direct failure detector.

Ref: scylladb/scylladb#16410
Fixes: scylladb/scylladb#16607

---

I tested the change manually using `tc qdisc ... netem delay`, setting
network delay on local setup to ~300ms with jitter. Without the change,
the result is as observed in scylladb/scylladb#16410: interleaving
```
raft_group_registry - marking Raft server ... as dead for Raft groups
raft_group_registry - marking Raft server ... as alive for Raft groups
```
happening once every few seconds. The "marking as dead" happens whenever
we get 3 subsequent failed pings, which is happens with certain (high)
probability depending on the latency jitter. Then as soon as we get a
successful ping, we mark server back as alive.

With the change, the phenomenon no longer appears.

Closes scylladb/scylladb#18443
2024-05-07 23:40:23 +02:00
Ferenc Szili
60bf846f68 sstable: added test for counting dead rows 2024-05-07 15:44:33 +02:00
Kefu Chai
5ca9a46a91 test/lib: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18515
2024-05-05 23:31:48 +03:00
Pavel Emelyanov
67736b5cd3 Reapply "Merge 'Drain view_builder in generic drain' from ScyllaDB"
This reverts commit 9c2a836607.
2024-05-02 08:16:14 +03:00
Kamil Braun
d8313dda43 Merge 'db: config: move consistent-topology-changes out of experimental and make it the default for new clusters' from Patryk Jędrzejczak
We move consistent cluster management out of experimental and
make it the default for new clusters in 6.0. In code, we make the
`consistent-topology-changes` flag unused and assumed to be true.

In 6.0, the topology upgrade procedure will be manual and
voluntary, so some clusters will still be using the gossip-based
topology even though they support the raft-based topology.
Therefore, we need to continue testing the gossip-based topology.
This is possible by using the `force-gossip-topology-changes` flag
introduced in scylladb/scylladb#18284.

Ref scylladb/scylladb#17802

Closes scylladb/scylladb#18285

* github.com:scylladb/scylladb:
  docs: raft.rst: update after removing consistent-topology-changes
  treewide: fix indentation after the previous patch
  db: config: make consistent-topology-changes unused
  test: lib: single_node_cql_env: restart a node in noninitial run_in_thread calls
  test: test_read_required_hosts: run with force-gossip-topology-changes
  storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes
  storage_service: join_token_ring: fix finish_setup_after_join calls
2024-04-26 14:45:29 +02:00