Commit Graph

960 Commits

Author SHA1 Message Date
Avi Kivity
0d52f0684a Merge 'Sanitize gossiper API endpoints management' from Pavel Emelyanov
Gossiper has two blocs of endpoints, both are registered in legacy/random place in main. This PR moves them next to gossiper start and adds unregistration for both.

refs: #2737

Closes scylladb/scylladb#19425

* github.com:scylladb/scylladb:
  api: Remove dedicated failure_detector registration method
  api: Move failure_detector endpoints set/unset to gossiper
  api: Unset failure detector endpoints method
  api: (Un)Register gossiper API in correct place
  api: Unset gossiper endpoints on stop
  asi: Coroutinize set_server_gossip()
2024-06-23 19:35:11 +03:00
Pavel Emelyanov
d8009ed843 api/cache_service: Don't use database to perform map+reduce on
The sharded<database> is used as a map_reduce0() method provider,
there's no real need in database itself. Simple smp::map_reduce()
would work just as good.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#19364
2024-06-21 19:47:25 +03:00
Pavel Emelyanov
755be887a6 api: Remove dedicated failure_detector registration method
It's now empty and can be dropped

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:54 +03:00
Pavel Emelyanov
2bfa1b3832 api: Move failure_detector endpoints set/unset to gossiper
These two api functions both need gossiper service and only it, and thus
should have set/unset calls next to each other. It's worth putting them
into a single place

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:54 +03:00
Pavel Emelyanov
88a6094121 api: Unset failure detector endpoints method
There's one more set of endpoints that need gossiper -- the
failure_detector ones. They are registered, but not unregistered, so
here's the method to do it. It's not called by any code yet, because
next patch would need to rework the caller anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:53 +03:00
Pavel Emelyanov
19f3a9805a api: Unset gossiper endpoints on stop
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:53 +03:00
Pavel Emelyanov
c7547b9c7e asi: Coroutinize set_server_gossip()
One of the next patches will add more async calls here, so not to
create then-chains, convert it into a coroutine

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-21 19:30:53 +03:00
Avi Kivity
02cf17f4dc Merge 'Sanitize load_meter API handlers management' from Pavel Emelyanov
The service in question is pretty small one, but it has its API endpoint that lives in /storage_service group. Currently when a service starts and has any endpoints that depend on it, the endpoint registration should follow it (#2737). Here's the PR that does it for load meter. Another goal of this change is that http context now has one less dependency onboard.

Closes scylladb/scylladb#19390

* github.com:scylladb/scylladb:
  api: Remove ctx->load_meter dependency
  api: Use local load_meter reference in handlers
  api: Fix indentation after previous patch
  api: Coroutinize load_meter::get_load_map handler
  api: Move load meter handlers
  api: Add set/unset methods for load_meter
2024-06-20 17:07:19 +03:00
Pavel Emelyanov
873d76c02b api: Remove ctx->load_meter dependency
Now the API uses captured reference and the explicit dependency is not
needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:38:28 +03:00
Pavel Emelyanov
d85e70ef98 api: Use local load_meter reference in handlers
Now it uses ctx.lm dependency, but the idiomatic way for API is to use
the argument one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:37:48 +03:00
Pavel Emelyanov
bc5e360066 api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:37:39 +03:00
Pavel Emelyanov
e54f651beb api: Coroutinize load_meter::get_load_map handler
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:37:18 +03:00
Pavel Emelyanov
40c178bee2 api: Move load meter handlers
Now they are in storage service set/unset helper, but there's the
dedicated set/unset pair for meter's enpoints.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:36:38 +03:00
Pavel Emelyanov
724d62aa87 api: Add set/unset methods for load_meter
The meter is pretty small sevice and its API is also tiny. Still, it's a
standalone top-level service, and its API should come next to it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-20 12:35:58 +03:00
Aleksandra Martyniuk
3463f495b1 tasks: fix tasks abort
Currently if task_manager::task::impl::abort preempts before children
are recursively aborted and then the task gets unregistered, we hit
use after free since abort uses children vector which is no
longer alive.

Modify abort method so that it goes over all tasks in task manager
and aborts those with the given parent.

Fixes: #19304.
2024-06-18 13:39:29 +02:00
Nadav Har'El
4faceeaa33 Merge 'treewide: drop thrift support' from Kefu Chai
thrift support was deprecated since ScyllaDB 5.2

> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.

so let's drop it. in this change,

* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool.

Fixes #3811
Fixes #18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

- [x] not a fix, no need to backport

Closes scylladb/scylladb#18453

* github.com:scylladb/scylladb:
  config: expand on rpc_keepalive's description
  api: s/rpc/thrift/
  db/system_keyspace: drop thrift_version from system.local table
  transport: do not return client_type from cql_server::connection::make_client_key()
  treewide: drop thrift support
2024-06-17 22:36:49 +03:00
Aleksandra Martyniuk
fb3153d253 api: task_manager: delete module from full_task_status
Delete module field from full_task_status as it is unused.

Closes scylladb/scylladb#18853
2024-06-17 09:03:19 +03:00
Kefu Chai
c03141b4b2 api: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-13 09:32:51 +08:00
Botond Dénes
feea609e37 api/error_injection: add getter for error_injection
Allow external code to obtain information about an error injection
point, including whether it is enabled, and importantly, what its
parameters are. Together with the `set_parameter()` added in the
previous patch, this allows tests to read out the values of internal
parameters, via a set_parameter() injection point.
2024-06-11 04:17:48 -04:00
Kefu Chai
c75442bc2a api: s/rpc/thrift/
replace all occurrences of "rpc" in function names and debugging
messages to "thrift", as "rpc" is way too general, and since we
are removing "thrift" support, let's take this opportunity to
use a more specific name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 09:23:10 +08:00
Kefu Chai
ad649be1bf treewide: drop thrift support
thrift support was deprecated since ScyllaDB 5.2

> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.

so let's drop it. in this change,

* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is
  preserved for backward compatibility, as we could load
  from an existing system.local table which still contains
  this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for
  backward compatibility with java-based nodetool.
* `rpc_port` and `start_rpc` options are preserved, but
  they are marked as "Unused". so that the new release
  of scylladb can consume existing scylla.yaml configurations
  which might contain these settings. by making them
  deprecated, user will be able get warned, and update
  their configurations before we actually remove them
  in the next major release.

Fixes #3811
Fixes #18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-07 06:44:59 +08:00
Aleksandra Martyniuk
30f97ea133 tasks: test: modify test_task methods
Wait until the task is done in test_task::finish_failed and
test_task::finish to ensure that it is folded into its parent.
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
c1b2b8cb2c api: task_manager: do not unregister task in /task_manager/wait_task/
If /task_manager/wait_task/ unregisters the task, then there is no
way to examine children failures, since their statuses can be checked
only through their parent.
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
e6c50ad2d0 tasks: fold finished tasks info their parents
Currently, when a child task is unregistered, it is still kept by its parent. This leads
to excessive memory usage, especially when the tasks are configured to be kept in task
manager after they are finished (task_ttl_in_seconds).

Introduce task_essentials struct which keeps only data necesarry for task manager API.
When a task which has a parent is finished, a foreign pointer to it in its parent is replaced
with respective task_essentials. Once a parent task is finished it is also folded into
its parent (if it has one). Children details of a folded task are lost, unless they
(or some of their subtrees) failed. That is, when a task is finished, we keep:
- a root task (until it is unregistered);
- task_essentials of root's direct children;
- a path (of task_essentials) from root to each failed task (so that the reason
  of a failure could be examined).
2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk
6add9edf8a tasks: change _children type
Keep task children in a map. It's a preparation for further changes.
2024-05-31 10:27:09 +02:00
Kefu Chai
e70b116333 api/api-doc/utils: fix a typo in description
s/mintues/minutes/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18869
2024-05-27 14:15:23 +03:00
Pavel Emelyanov
d86a8252d4 api: Don't switch sched group to start/stop protocol servers
All the protocol servers implementations now maintain scheduling group
on their own, so the API handler can stop caring

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-24 18:00:01 +03:00
Pavel Emelyanov
b24fb8dc87 inet_address: Remove to_sstring() in favor of fmt::to_string
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.

Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18712
2024-05-21 09:43:08 +03:00
Botond Dénes
f239339a29 Merge 'Improve modularity of some per-table API endpoints' from Pavel Emelyanov
There's a set of API endpoints that toggle per-table auto-compaction and tombstone-gc booleans. They all live in two different .cc files under api/ directory and duplicate code of each other. This PR generalizes those handlers, places them next to each other, fixes leak on stop and, as a nice side effect, enlightens database.hh header.

Closes scylladb/scylladb#18703

* github.com:scylladb/scylladb:
  api,database: Move auto-compaction toggle guard
  api: Move some table manipulation helpers from storage_service
  api: Move table-related calls from storage_service domain
  api: Reimplement some endpoints using existing helpers
  api: Lost unset of tombstone-gc endpoints
2024-05-20 18:01:54 +03:00
Avi Kivity
52fe351c31 Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec
This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N
then there is not even a possibility of moving tablets around to fix the imbalance.
The only way to bring the system to balance is to move tablets within the nodes.

The system is not prepared for intra-node migration currently. Request coordination
is host-based, while for intra-node migration it should be (also) shard-based.
The solution employed here is to keep the coordination between nodes as-is,
and for intra-node migration storage_proxy-level coordinator is not aware of
the migration (no pending host). The replica-side request handler will be a
second-level coordinator which routes requests to shards, similar to how
the first-level coordinator routes them to hosts.

Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.

perf-simple-query test results show no signs of regression:

Command: perf-simple-query -c1 -m1G --write --tablets --duration=10

Before:

> 83294.81 tps ( 59.5 allocs/op,  14.3 tasks/op,   53725 insns/op,        0 errors)
> 87756.72 tps ( 59.5 allocs/op,  14.3 tasks/op,   54049 insns/op,        0 errors)
> 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> 86211.38 tps ( 59.7 allocs/op,  14.3 tasks/op,   54219 insns/op,        0 errors)
> 86559.89 tps ( 59.6 allocs/op,  14.3 tasks/op,   54188 insns/op,        0 errors)
> 86609.39 tps ( 59.6 allocs/op,  14.3 tasks/op,   54117 insns/op,        0 errors)
> 87464.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   54039 insns/op,        0 errors)
> 86185.43 tps ( 59.6 allocs/op,  14.3 tasks/op,   54169 insns/op,        0 errors)
> 86254.71 tps ( 59.6 allocs/op,  14.3 tasks/op,   54139 insns/op,        0 errors)
> 83395.35 tps ( 60.2 allocs/op,  14.4 tasks/op,   54693 insns/op,        0 errors)
>
> median 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> median absolute deviation: 243.04
> maximum: 87756.72
> minimum: 83294.81
>

After:

> 85523.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   53872 insns/op,        0 errors)
> 89362.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54226 insns/op,        0 errors)
> 88167.55 tps ( 59.7 allocs/op,  14.3 tasks/op,   54400 insns/op,        0 errors)
> 87044.40 tps ( 59.7 allocs/op,  14.3 tasks/op,   54310 insns/op,        0 errors)
> 88344.50 tps ( 59.6 allocs/op,  14.3 tasks/op,   54289 insns/op,        0 errors)
> 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> 88725.46 tps ( 59.6 allocs/op,  14.3 tasks/op,   54230 insns/op,        0 errors)
> 88640.08 tps ( 59.6 allocs/op,  14.3 tasks/op,   54210 insns/op,        0 errors)
> 90306.31 tps ( 59.4 allocs/op,  14.3 tasks/op,   54043 insns/op,        0 errors)
> 87343.62 tps ( 59.8 allocs/op,  14.3 tasks/op,   54496 insns/op,        0 errors)
>
> median 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> median absolute deviation: 1007.41
> maximum: 90306.31
> minimum: 85523.06

Command (reads): perf-simple-query -c1 -m1G  --tablets --duration=10

Before:

> 95860.18 tps ( 63.1 allocs/op,  14.1 tasks/op,   42476 insns/op,        0 errors)
> 97537.69 tps ( 63.1 allocs/op,  14.1 tasks/op,   42454 insns/op,        0 errors)
> 97549.23 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97511.29 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97227.32 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 94031.94 tps ( 63.1 allocs/op,  14.1 tasks/op,   42441 insns/op,        0 errors)
> 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> 96401.70 tps ( 63.1 allocs/op,  14.1 tasks/op,   42473 insns/op,        0 errors)
> 96573.77 tps ( 63.1 allocs/op,  14.1 tasks/op,   42440 insns/op,        0 errors)
> 96340.54 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
>
> median 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> median absolute deviation: 571.20
> maximum: 97549.23
> minimum: 94031.94
>

After:

> 99794.67 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 101244.99 tps ( 63.1 allocs/op,  14.1 tasks/op,   42472 insns/op,        0 errors)
> 101128.37 tps ( 63.1 allocs/op,  14.1 tasks/op,   42485 insns/op,        0 errors)
> 101065.27 tps ( 63.1 allocs/op,  14.1 tasks/op,   42465 insns/op,        0 errors)
> 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> 101413.31 tps ( 63.1 allocs/op,  14.1 tasks/op,   42463 insns/op,        0 errors)
> 101464.92 tps ( 63.1 allocs/op,  14.1 tasks/op,   42466 insns/op,        0 errors)
> 101086.74 tps ( 63.1 allocs/op,  14.1 tasks/op,   42488 insns/op,        0 errors)
> 101559.09 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
> 100742.58 tps ( 63.1 allocs/op,  14.1 tasks/op,   42491 insns/op,        0 errors)
>
> median 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> median absolute deviation: 200.33
> maximum: 101559.09
> minimum: 99794.67
>

Fixes #16594

Closes scylladb/scylladb#18026

* github.com:scylladb/scylladb:
  Implement fast streaming for intra-node migration
  test: tablets_test: Test sharding during intra-node migration
  test: tablets_test: Check sharding also on the pending host
  test: py: tablets: Test writes concurrent with migration
  test: py: tablets: Test crash during intra-node migration
  api, storage_service: Introduce API to wait for topology to quiesce
  dht, replica: Remove deprecated sharder APIs
  test: Avoid using deprecated sharded API
  db: do_apply_many() avoid deprecated sharded API
  replica: mutation_dump: Avoid deprecated sharder API
  repair: Avoid deprecated sharder API
  table: Remove optimization which returns empty reader when key is not owned by the shard
  dht: is_single_shard: Avoid deprecated sharder API
  dht: split_range_to_single_shard: Work with static_sharder only
  dht: ring_position_range_sharder: Avoid deprecated sharder APIs
  dht: token: Avoid use of deprecated sharder API by switching to static_sharder
  selective_token_sharder: Avoid use of deprecated sharder API
  docs: Document tablet sharding vs tablet replica placement
  readers/multishard.cc: use shard_for_reads() instead of shard_of()
  multishard_mutation_query.cc: use shard_for_reads() instead of shard_of()
  storage_proxy: Extract common code to apply mutations on many shards according to sharder
  storage_proxy: Prepare per-partition rate-limiting for intra-node migration
  storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate()
  storage_proxy: Prepare mutate_hint() for intra-node tablet migration
  commitlog_replayer: Avoid deprecated sharder::shard_of()
  lwt: Avoid deprecated sharder::shard_of()
  compaction: Avoid deprecated sharder::shard_of()
  dht: Extract dht::static_sharder
  replica: Deprecate table::shard_of()
  locator: Deprecate effective_replication_map::shard_of()
  dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard
  tests: tablets: py: Add intra-node migration test
  tests: tablets: Test that drained nodes are not balanced internally
  tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load
  tests: tablets: Verify that disabling balancing results in no intra-node migrations
  tests: tablets: Check that nodes are internally balanced
  tests: tablets: Improve debuggability by showing which rows are missing
  tablets, storage_service: Support intra-node migration in move_tablet() API
  tablet_allocator: Generate intra-node migration plan
  tablet_allocator: Extract make_internode_plan()
  tablet_allocator: Maintain candidate list and shard tablet count for target nodes
  tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
  tablets, streaming: Implement tablet streaming for intra-node migration
  dht, auto_refreshing_sharder: Allow overriding write selector
  multishard_writer: Handle intra-node migration
  storage_proxy: Handle intra-node tablet migration for writes
  tablets: Get rid of tablet_map::get_shard()
  tablets: Avoid tablet_map::get_shard in cleanup
  tablets: test: Use sharder instead of tablet_map::get_shard()
  tablets: tablet_sharder: Allow working with non-local host
  sharding: Prepare for intra-node-migration
  docs: Document sharder use for tablets
  tablets: Introduce tablet transition kind for intra-node migration
  tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
  sstables, gdb: Track readers in a linked list
  raft topology: Fix global token metadata barrier to not fence ahead of what is drained
2024-05-20 16:13:01 +03:00
Pavel Emelyanov
31d05925cc api,database: Move auto-compaction toggle guard
Toggling per-table auto-compaction enabling bit is guarded with
on-database boolean and raii guard. It's only used by a single
api/column_family.cc file, so it can live there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:51 +03:00
Pavel Emelyanov
a43b178f72 api: Move some table manipulation helpers from storage_service
Continuation of the previous patch -- helpers toggling tombstone_gc and
auto_compaction on tables should live in the same file that uses them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
862fcd7bc7 api: Move table-related calls from storage_service domain
The storage_service/(enable|disable)_(tombstone_gc|auto_compaction)
endpoints are not handled by storage_service _service_ and should rather
live in the column_family/ domain which is handler by replica::database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
ba53283d21 api: Reimplement some endpoints using existing helpers
The (enable|disable)_(tombstone_gc|auto_compaction) endpoints living in
column_family domain can benefit from the helpers that do the same in
the storage_service domain. The "difference" is that c.f. endpoints do
it per-table, while s.s. ones operate on a vector of tables, so the
former is a corner case of the latter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
231ffa623c api: Lost unset of tombstone-gc endpoints
On stop all endpoints must be unregistered, these three are lost

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Tomasz Grabiec
7956a2991e api, storage_service: Introduce API to wait for topology to quiesce 2024-05-16 00:28:47 +02:00
Botond Dénes
fd25bb6f9f api/storage_service: add tablet support for /storage_service/tokens_endpoint
Add a keyspace and cf parameter. When specified, the endpoint will
return token -> primary replica mapping for the table's tablet tokens,
not the vnodes.
2024-05-13 07:09:20 -04:00
Botond Dénes
0438febdc9 Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El
The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`.

Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table.

The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards.

Fixes #5883.

Closes scylladb/scylladb#18286

* github.com:scylladb/scylladb:
  test/alternator: test addressing LSI using REST API
  REST API: stop using deprecated, buggy, path parameter
2024-05-09 08:26:43 +03:00
Kefu Chai
0b0e661a85 build: bring abseil submodule back
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.

to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.

in this change

* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
  lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
  flat_hash_map

This reverts commit 8635d24424.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18511
2024-05-05 23:31:09 +03:00
Nadav Har'El
1aacfdf460 REST API: stop using deprecated, buggy, path parameter
The API req->param["name"] to access parameters in the path part of the
URL was buggy - it forgot to do URL decoding and the result of our use
of it in Scylla was bugs like #5883 - where special characters in certain
REST API requests got botched up (encoded by the client, then not
decoded by the server).

The solution is to replace all uses of req->param["name"] by the new
req->get_path_param("name"), which does the decoding correctly.

Unfortunately we needed to change 104 (!) callers in this patch, but the
transformation is mostly mechanical and there is no functional changes in
this patch. Another set of changes was to bring req, not req->param, to
a few functions that want to get the path param.

This patch avoids the numerous deprecation warnings we had before, and
more importantly, it fixes #5883.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-05-02 12:33:46 +03:00
Kefu Chai
0bbaded4ce api/storage_service: convert runtime_error from repair to http error
in `set_repair()`, despite that the repair is performed asynchronously,
we check the options specified by client immediately, and throw
`std::runtime_error`, if any of them is not supported.

before this change, these unhandled exceptions are translated to HTTP
500 error but the underlying HTTP router. but this is misleading, as
these errors are caused by client, not server. and the error message
is missing in the HTTP error message when performing the translation.

in this change, we handle the `runtime_error`, and translate them
into `httpd::bad_param_exception`, so that the client can have
HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error),
and with informative error message.

for instance, if we apply repair with "small_table_optimization" enabled
on a keyspace with tablets enabled. we should have an HTTP error 400
with "The small_table_optimization option is not supported for tablet repair"
as the body of the error. this would much more helpful.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-26 14:25:15 +08:00
Kefu Chai
d737ba1ab2 api/storage_service: coroutinize set_repair()
before this change, `set_repair()` uses a lambda for handling
the client-side requests. and this works great. but the underlying
`repair_start()` throws if any of the given options is not sane.
and we don't handle any of these throw exceptions in `set_repair()`,
from client's point of view, it would get an HTTP 500 error code,
which implies an "Internal Server Error". but actually, we should
blame the client for the error, not the server.

so, to prepare the error handling, let's take the opportunity to
coroutinize the lambda handling the request, so that we can handle
the exception in a more elegant way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-26 14:24:03 +08:00
Pavel Emelyanov
ae4c1c44ec snapshot: Get per-table snapshot size under snapshot lock
Walking per-table snapshot directory without lock is racy. There's
snapshot-ctl locking that's used to get db-wide snapshot details, it
should be used to get per-table snapshot details too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 10:05:51 +03:00
Pavel Emelyanov
186b36165e snapshot: Move per-table snap API to other snapshot endpoints
So that they are collected in one place and to facilitate next patch
that's going to use snapshot-ctl for per-table API too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 10:05:01 +03:00
Botond Dénes
572003c469 Merge 'Cleanup the way snapshot details are propagated via API' from Pavel Emelyanov
There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several *snapshot_details* aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls.

Closes scylladb/scylladb#18317

* github.com:scylladb/scylladb:
  snapshot_ctl: Brush up true_snapshots_size() internals
  snapshot_ctl: Remove unused details struct
  snapshot_ctl: No double recoding of details
  database,snapshots: Move database::snapshot_details into snapshot_ctl
  database,snapshots: Make database::get_snapshot_details() return map, not vector
  table,snapshots: Move table::snapshot_details into snapshot_ctl
2024-04-23 16:28:25 +03:00
Pavel Emelyanov
e8f10be12e snapshot_ctl: No double recoding of details
Currently database::get_snapshot_details() returns a collection of
snapshots. The snapshot_ctl converts this collection into similarly
looking one with slightly different structures inside. The resulting
collection is converted one more time on the API layer into another
similarly looking map.

This patch removes the intermediate conversion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 20:04:32 +03:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kamil Braun
eb9ba914a3 Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy
The problem this series solves is correctly ignoring DOWN nodes state
when replacing a node.

When a node is replaced and there are other nodes that are down, the
replacing node is told to ignore those DOWN nodes using the
`ignore_dead_nodes_for_replace` option.

Since the replacing node is bootstrapping it starts with an empty
system.peers table so it has no notion about any node state and it
learns about all other nodes via gossip shadow round done in
`storage_service::prepare_replacement_info`.

Normally, since the DOWN nodes to ignore already joined the ring, the
remaining node will have their endpoint state already in gossip, but if
the whole cluster was restarted while those DOWN nodes did not start,
the remaining nodes will only have a partial endpoint state from them,
which is loaded from system.peers.

Currently, the partial endpoint state contains only `HOST_ID` and
`TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`.

The first part of this series loads also `DC` and `RACK` from
system.peers to make them available to the replacing node as they are
crucial for building a correct replication map with network topology
replication strategy.

But still, without a `STATUS` those nodes are not considered as normal
token owners yet, and they do not go through handle_state_normal which
adds them to the topology and token_metadata.

The second part of this series uses the endpoint state retrieved in the
gossip shadow round to explicitly add the ignored nodes' state to
topology (including dc and rack) and token_metadata (tokens) in
`prepare_replacement_info`.  If there are more DOWN nodes that are not
explicitly ignored replace will fail (as it should).

Fixes scylladb/scylladb#15787

Closes scylladb/scylladb#15788

* github.com:scylladb/scylladb:
  storage_service: join_token_ring: load ignored nodes state if replacing
  storage_service: replacement_info: return ignore_nodes state
  locator: host_id_or_endpoint: keep value as variant
  gms: endpoint_state: add getters for host_id, dc_rack, and tokens
  storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
  gossiper: add_saved_endpoint: set dc and rack
  gossiper: add_saved_endpoint: fixup indentation
  gossiper: add_saved_endpoint: make host_id mandatory
  gossiper: add load_endpoint_state
  gossiper: start_gossiping: log local state
2024-04-16 10:27:36 +02:00
Pavel Emelyanov
05c4042511 api/lsa: Don't use database to perform invoke-on-all
The sharded<database> is used as a invoke_in_all() method provider,
there's no real need in database itself. Simple smp::invoke_on_all()
would work just as good.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18221
2024-04-16 07:12:40 +03:00
Pavel Emelyanov
f3edde7d2e api: Qualify callback commitlog* argument with const
There's a helper map-reducer that accepts a function to call on
commitlog. All callers accumulate statistics with it, so the commitlog
argument is const pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18238
2024-04-16 07:02:31 +03:00