Commit Graph

951 Commits

Author SHA1 Message Date
Raphael S. Carvalho
39eb44dfa0 replica: get rid of fragile compaction group intrusive list
It was added to make integration of storage groups easier, but it's
complicated since it's another source of truth and we could have
problems if it becomes inconsistent with the group map.

Fixes #18506.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit ad5c5bca5f)
2024-08-13 12:26:11 -03:00
Emil Maskovsky
b4745406da raft: Fix crash in leader_host API handler
The leader_host API handler was eventually using the `req` unique_ptr
after it has been already destroyed (passed down to the future lambda
by reference). This was causing an occassional crash in some tests.

Reworked the leader_host handler to use the req only outside of the
future lambda.

Also updated the code to handle the possibility that the non-default
leader group (other than Group 0) might reside on a different shard
than the shard 0 - using the same concept of calling on all shards via
`invoke_on_all()` as done for the other requests.

Fixes scylladb/scylladb#19714

(cherry picked from commit b2db8f4b9b)

Closes scylladb/scylladb#19742
2024-07-16 13:29:37 +02:00
Pavel Emelyanov
941ec80a00 api: Close response stream of get_compaction_history()
The function must close the stream even if it throws along the way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit b4f9387a9d)
2024-07-03 18:30:17 +00:00
Pavel Emelyanov
ab5041cb03 api: Flush output stream in get_compaction_history() call
It's currently implicitly flushed on its close, but in that case close
can throw while flusing. Next patch wants close not to throw and that's
possible if flushing the stream in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 6d4ba98796)
2024-07-03 18:30:17 +00:00
Pavel Emelyanov
009f5eb69e api: Coroutinize get_compaction_history inner function
The handler returns a function which is then invoked with output_stream
argument to render the json into. This function is converted into
coroutine. It has yet another inner lambda that's passed into
compaction_manager::get_compaction_history() as consumer lambda. It's
coroutinized too.

The indentation looks weird as preparation for future patching.
Hopefullly it's still possible to understand what's going on.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit acb351f4ee)
2024-07-03 18:30:17 +00:00
Pavel Emelyanov
20b99246fd Merge '[Backport 6.0] Close output stream in task manager's API get_tasks handler' from ScyllaDB
If client stops reading response early, the server-side stream throws but must be closed anyway. Seen in another endpoint and fixed by #19541

(cherry picked from commit 4897d8f145)

(cherry picked from commit 986a04cb11)

(cherry picked from commit 1be8b2fd25)

 Refs #19542

Closes scylladb/scylladb#19562

* github.com:scylladb/scylladb:
  api: Fix indentation after previous patch
  api: Close response stream on error
  api: Flush response output stream before closing
2024-07-01 10:47:30 +03:00
Pavel Emelyanov
4e17a5a1c2 api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 1839030e3b)
2024-06-30 19:20:11 +00:00
Pavel Emelyanov
c5c168a1db api: Close output_stream on error
If the get_snapshot_details() lambda throws, the output stream remains
non-closed which is bad. Close it regardless of what.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit a0c1552cea)
2024-06-30 19:20:10 +00:00
Pavel Emelyanov
09272d2478 api: Flush response output stream before closing
Otherwise close() may throw and this is what next patch will want not to
happen.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit d1fd886608)
2024-06-30 19:20:10 +00:00
Pavel Emelyanov
1e7f377b0a api: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 1be8b2fd25)
2024-06-30 19:19:52 +00:00
Pavel Emelyanov
b038177f19 api: Close response stream on error
The handler's lambda is called with && stream object and must close the
stream on its own regardless of what.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 986a04cb11)
2024-06-30 19:19:52 +00:00
Pavel Emelyanov
426bc6a4e1 api: Flush response output stream before closing
The .close() method flushes the stream, but it may throw doing it. Next
patch will want .close() not to throw, for that stream must be flushed
explicitly before closing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 4897d8f145)
2024-06-30 19:19:52 +00:00
Aleksandra Martyniuk
634c0d44ef tasks: fix tasks abort
Currently if task_manager::task::impl::abort preempts before children
are recursively aborted and then the task gets unregistered, we hit
use after free since abort uses children vector which is no
longer alive.

Modify abort method so that it goes over all tasks in task manager
and aborts those with the given parent.

Fixes: #19304.
(cherry picked from commit 3463f495b1)
2024-06-20 14:47:14 +00:00
Botond Dénes
bb18a8152e api/error_injection: add getter for error_injection
Allow external code to obtain information about an error injection
point, including whether it is enabled, and importantly, what its
parameters are. Together with the `set_parameter()` added in the
previous patch, this allows tests to read out the values of internal
parameters, via a set_parameter() injection point.

(cherry picked from commit feea609e37)
2024-06-11 17:32:37 +00:00
Aleksandra Martyniuk
1d34da21a9 tasks: test: modify test_task methods
Wait until the task is done in test_task::finish_failed and
test_task::finish to ensure that it is folded into its parent.

(cherry picked from commit 30f97ea133)
2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk
377bc345f1 api: task_manager: do not unregister task in /task_manager/wait_task/
If /task_manager/wait_task/ unregisters the task, then there is no
way to examine children failures, since their statuses can be checked
only through their parent.

(cherry picked from commit c1b2b8cb2c)
2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk
cb242ad48c tasks: fold finished tasks info their parents
Currently, when a child task is unregistered, it is still kept by its parent. This leads
to excessive memory usage, especially when the tasks are configured to be kept in task
manager after they are finished (task_ttl_in_seconds).

Introduce task_essentials struct which keeps only data necesarry for task manager API.
When a task which has a parent is finished, a foreign pointer to it in its parent is replaced
with respective task_essentials. Once a parent task is finished it is also folded into
its parent (if it has one). Children details of a folded task are lost, unless they
(or some of their subtrees) failed. That is, when a task is finished, we keep:
- a root task (until it is unregistered);
- task_essentials of root's direct children;
- a path (of task_essentials) from root to each failed task (so that the reason
  of a failure could be examined).

(cherry picked from commit e6c50ad2d0)
2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk
baf0385728 tasks: change _children type
Keep task children in a map. It's a preparation for further changes.

(cherry picked from commit 6add9edf8a)
2024-06-04 14:42:08 +00:00
Pavel Emelyanov
b24fb8dc87 inet_address: Remove to_sstring() in favor of fmt::to_string
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.

Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18712
2024-05-21 09:43:08 +03:00
Botond Dénes
f239339a29 Merge 'Improve modularity of some per-table API endpoints' from Pavel Emelyanov
There's a set of API endpoints that toggle per-table auto-compaction and tombstone-gc booleans. They all live in two different .cc files under api/ directory and duplicate code of each other. This PR generalizes those handlers, places them next to each other, fixes leak on stop and, as a nice side effect, enlightens database.hh header.

Closes scylladb/scylladb#18703

* github.com:scylladb/scylladb:
  api,database: Move auto-compaction toggle guard
  api: Move some table manipulation helpers from storage_service
  api: Move table-related calls from storage_service domain
  api: Reimplement some endpoints using existing helpers
  api: Lost unset of tombstone-gc endpoints
2024-05-20 18:01:54 +03:00
Avi Kivity
52fe351c31 Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec
This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N
then there is not even a possibility of moving tablets around to fix the imbalance.
The only way to bring the system to balance is to move tablets within the nodes.

The system is not prepared for intra-node migration currently. Request coordination
is host-based, while for intra-node migration it should be (also) shard-based.
The solution employed here is to keep the coordination between nodes as-is,
and for intra-node migration storage_proxy-level coordinator is not aware of
the migration (no pending host). The replica-side request handler will be a
second-level coordinator which routes requests to shards, similar to how
the first-level coordinator routes them to hosts.

Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.

perf-simple-query test results show no signs of regression:

Command: perf-simple-query -c1 -m1G --write --tablets --duration=10

Before:

> 83294.81 tps ( 59.5 allocs/op,  14.3 tasks/op,   53725 insns/op,        0 errors)
> 87756.72 tps ( 59.5 allocs/op,  14.3 tasks/op,   54049 insns/op,        0 errors)
> 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> 86211.38 tps ( 59.7 allocs/op,  14.3 tasks/op,   54219 insns/op,        0 errors)
> 86559.89 tps ( 59.6 allocs/op,  14.3 tasks/op,   54188 insns/op,        0 errors)
> 86609.39 tps ( 59.6 allocs/op,  14.3 tasks/op,   54117 insns/op,        0 errors)
> 87464.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   54039 insns/op,        0 errors)
> 86185.43 tps ( 59.6 allocs/op,  14.3 tasks/op,   54169 insns/op,        0 errors)
> 86254.71 tps ( 59.6 allocs/op,  14.3 tasks/op,   54139 insns/op,        0 errors)
> 83395.35 tps ( 60.2 allocs/op,  14.4 tasks/op,   54693 insns/op,        0 errors)
>
> median 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> median absolute deviation: 243.04
> maximum: 87756.72
> minimum: 83294.81
>

After:

> 85523.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   53872 insns/op,        0 errors)
> 89362.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54226 insns/op,        0 errors)
> 88167.55 tps ( 59.7 allocs/op,  14.3 tasks/op,   54400 insns/op,        0 errors)
> 87044.40 tps ( 59.7 allocs/op,  14.3 tasks/op,   54310 insns/op,        0 errors)
> 88344.50 tps ( 59.6 allocs/op,  14.3 tasks/op,   54289 insns/op,        0 errors)
> 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> 88725.46 tps ( 59.6 allocs/op,  14.3 tasks/op,   54230 insns/op,        0 errors)
> 88640.08 tps ( 59.6 allocs/op,  14.3 tasks/op,   54210 insns/op,        0 errors)
> 90306.31 tps ( 59.4 allocs/op,  14.3 tasks/op,   54043 insns/op,        0 errors)
> 87343.62 tps ( 59.8 allocs/op,  14.3 tasks/op,   54496 insns/op,        0 errors)
>
> median 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> median absolute deviation: 1007.41
> maximum: 90306.31
> minimum: 85523.06

Command (reads): perf-simple-query -c1 -m1G  --tablets --duration=10

Before:

> 95860.18 tps ( 63.1 allocs/op,  14.1 tasks/op,   42476 insns/op,        0 errors)
> 97537.69 tps ( 63.1 allocs/op,  14.1 tasks/op,   42454 insns/op,        0 errors)
> 97549.23 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97511.29 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97227.32 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 94031.94 tps ( 63.1 allocs/op,  14.1 tasks/op,   42441 insns/op,        0 errors)
> 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> 96401.70 tps ( 63.1 allocs/op,  14.1 tasks/op,   42473 insns/op,        0 errors)
> 96573.77 tps ( 63.1 allocs/op,  14.1 tasks/op,   42440 insns/op,        0 errors)
> 96340.54 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
>
> median 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> median absolute deviation: 571.20
> maximum: 97549.23
> minimum: 94031.94
>

After:

> 99794.67 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 101244.99 tps ( 63.1 allocs/op,  14.1 tasks/op,   42472 insns/op,        0 errors)
> 101128.37 tps ( 63.1 allocs/op,  14.1 tasks/op,   42485 insns/op,        0 errors)
> 101065.27 tps ( 63.1 allocs/op,  14.1 tasks/op,   42465 insns/op,        0 errors)
> 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> 101413.31 tps ( 63.1 allocs/op,  14.1 tasks/op,   42463 insns/op,        0 errors)
> 101464.92 tps ( 63.1 allocs/op,  14.1 tasks/op,   42466 insns/op,        0 errors)
> 101086.74 tps ( 63.1 allocs/op,  14.1 tasks/op,   42488 insns/op,        0 errors)
> 101559.09 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
> 100742.58 tps ( 63.1 allocs/op,  14.1 tasks/op,   42491 insns/op,        0 errors)
>
> median 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> median absolute deviation: 200.33
> maximum: 101559.09
> minimum: 99794.67
>

Fixes #16594

Closes scylladb/scylladb#18026

* github.com:scylladb/scylladb:
  Implement fast streaming for intra-node migration
  test: tablets_test: Test sharding during intra-node migration
  test: tablets_test: Check sharding also on the pending host
  test: py: tablets: Test writes concurrent with migration
  test: py: tablets: Test crash during intra-node migration
  api, storage_service: Introduce API to wait for topology to quiesce
  dht, replica: Remove deprecated sharder APIs
  test: Avoid using deprecated sharded API
  db: do_apply_many() avoid deprecated sharded API
  replica: mutation_dump: Avoid deprecated sharder API
  repair: Avoid deprecated sharder API
  table: Remove optimization which returns empty reader when key is not owned by the shard
  dht: is_single_shard: Avoid deprecated sharder API
  dht: split_range_to_single_shard: Work with static_sharder only
  dht: ring_position_range_sharder: Avoid deprecated sharder APIs
  dht: token: Avoid use of deprecated sharder API by switching to static_sharder
  selective_token_sharder: Avoid use of deprecated sharder API
  docs: Document tablet sharding vs tablet replica placement
  readers/multishard.cc: use shard_for_reads() instead of shard_of()
  multishard_mutation_query.cc: use shard_for_reads() instead of shard_of()
  storage_proxy: Extract common code to apply mutations on many shards according to sharder
  storage_proxy: Prepare per-partition rate-limiting for intra-node migration
  storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate()
  storage_proxy: Prepare mutate_hint() for intra-node tablet migration
  commitlog_replayer: Avoid deprecated sharder::shard_of()
  lwt: Avoid deprecated sharder::shard_of()
  compaction: Avoid deprecated sharder::shard_of()
  dht: Extract dht::static_sharder
  replica: Deprecate table::shard_of()
  locator: Deprecate effective_replication_map::shard_of()
  dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard
  tests: tablets: py: Add intra-node migration test
  tests: tablets: Test that drained nodes are not balanced internally
  tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load
  tests: tablets: Verify that disabling balancing results in no intra-node migrations
  tests: tablets: Check that nodes are internally balanced
  tests: tablets: Improve debuggability by showing which rows are missing
  tablets, storage_service: Support intra-node migration in move_tablet() API
  tablet_allocator: Generate intra-node migration plan
  tablet_allocator: Extract make_internode_plan()
  tablet_allocator: Maintain candidate list and shard tablet count for target nodes
  tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
  tablets, streaming: Implement tablet streaming for intra-node migration
  dht, auto_refreshing_sharder: Allow overriding write selector
  multishard_writer: Handle intra-node migration
  storage_proxy: Handle intra-node tablet migration for writes
  tablets: Get rid of tablet_map::get_shard()
  tablets: Avoid tablet_map::get_shard in cleanup
  tablets: test: Use sharder instead of tablet_map::get_shard()
  tablets: tablet_sharder: Allow working with non-local host
  sharding: Prepare for intra-node-migration
  docs: Document sharder use for tablets
  tablets: Introduce tablet transition kind for intra-node migration
  tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
  sstables, gdb: Track readers in a linked list
  raft topology: Fix global token metadata barrier to not fence ahead of what is drained
2024-05-20 16:13:01 +03:00
Pavel Emelyanov
31d05925cc api,database: Move auto-compaction toggle guard
Toggling per-table auto-compaction enabling bit is guarded with
on-database boolean and raii guard. It's only used by a single
api/column_family.cc file, so it can live there.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:51 +03:00
Pavel Emelyanov
a43b178f72 api: Move some table manipulation helpers from storage_service
Continuation of the previous patch -- helpers toggling tombstone_gc and
auto_compaction on tables should live in the same file that uses them.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
862fcd7bc7 api: Move table-related calls from storage_service domain
The storage_service/(enable|disable)_(tombstone_gc|auto_compaction)
endpoints are not handled by storage_service _service_ and should rather
live in the column_family/ domain which is handler by replica::database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
ba53283d21 api: Reimplement some endpoints using existing helpers
The (enable|disable)_(tombstone_gc|auto_compaction) endpoints living in
column_family domain can benefit from the helpers that do the same in
the storage_service domain. The "difference" is that c.f. endpoints do
it per-table, while s.s. ones operate on a vector of tables, so the
former is a corner case of the latter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Pavel Emelyanov
231ffa623c api: Lost unset of tombstone-gc endpoints
On stop all endpoints must be unregistered, these three are lost

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-16 14:42:50 +03:00
Tomasz Grabiec
7956a2991e api, storage_service: Introduce API to wait for topology to quiesce 2024-05-16 00:28:47 +02:00
Botond Dénes
fd25bb6f9f api/storage_service: add tablet support for /storage_service/tokens_endpoint
Add a keyspace and cf parameter. When specified, the endpoint will
return token -> primary replica mapping for the table's tablet tokens,
not the vnodes.
2024-05-13 07:09:20 -04:00
Botond Dénes
0438febdc9 Merge 'alternator: fix REST API access to an Alternator LSI' from Nadav Har'El
The name of the Scylla table backing an Alternator LSI looks like `basename:!lsiname`. Some REST API clients (including Scylla Manager) when they send a "!" character in the REST API request path may decide to "URL encode" it - convert it to `%21`.

Because of a Seastar bug (https://github.com/scylladb/seastar/issues/725) Scylla's REST API server forgets to do the URL decoding on the path part of the request, which leads to the REST API request failing to address the LSI table.

The first patch in this PR fixes the bug by using a new Seastar API introduced in https://github.com/scylladb/seastar/pull/2125 that does the URL decoding as appropriate. The second patch in the PR is a new test for this bug, which fails without the fix, and passes afterwards.

Fixes #5883.

Closes scylladb/scylladb#18286

* github.com:scylladb/scylladb:
  test/alternator: test addressing LSI using REST API
  REST API: stop using deprecated, buggy, path parameter
2024-05-09 08:26:43 +03:00
Kefu Chai
0b0e661a85 build: bring abseil submodule back
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.

to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.

in this change

* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
  lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
  flat_hash_map

This reverts commit 8635d24424.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18511
2024-05-05 23:31:09 +03:00
Nadav Har'El
1aacfdf460 REST API: stop using deprecated, buggy, path parameter
The API req->param["name"] to access parameters in the path part of the
URL was buggy - it forgot to do URL decoding and the result of our use
of it in Scylla was bugs like #5883 - where special characters in certain
REST API requests got botched up (encoded by the client, then not
decoded by the server).

The solution is to replace all uses of req->param["name"] by the new
req->get_path_param("name"), which does the decoding correctly.

Unfortunately we needed to change 104 (!) callers in this patch, but the
transformation is mostly mechanical and there is no functional changes in
this patch. Another set of changes was to bring req, not req->param, to
a few functions that want to get the path param.

This patch avoids the numerous deprecation warnings we had before, and
more importantly, it fixes #5883.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-05-02 12:33:46 +03:00
Kefu Chai
0bbaded4ce api/storage_service: convert runtime_error from repair to http error
in `set_repair()`, despite that the repair is performed asynchronously,
we check the options specified by client immediately, and throw
`std::runtime_error`, if any of them is not supported.

before this change, these unhandled exceptions are translated to HTTP
500 error but the underlying HTTP router. but this is misleading, as
these errors are caused by client, not server. and the error message
is missing in the HTTP error message when performing the translation.

in this change, we handle the `runtime_error`, and translate them
into `httpd::bad_param_exception`, so that the client can have
HTTP 400 (Bad Request) instead of HTTP 500 (Internal Server Error),
and with informative error message.

for instance, if we apply repair with "small_table_optimization" enabled
on a keyspace with tablets enabled. we should have an HTTP error 400
with "The small_table_optimization option is not supported for tablet repair"
as the body of the error. this would much more helpful.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-26 14:25:15 +08:00
Kefu Chai
d737ba1ab2 api/storage_service: coroutinize set_repair()
before this change, `set_repair()` uses a lambda for handling
the client-side requests. and this works great. but the underlying
`repair_start()` throws if any of the given options is not sane.
and we don't handle any of these throw exceptions in `set_repair()`,
from client's point of view, it would get an HTTP 500 error code,
which implies an "Internal Server Error". but actually, we should
blame the client for the error, not the server.

so, to prepare the error handling, let's take the opportunity to
coroutinize the lambda handling the request, so that we can handle
the exception in a more elegant way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-26 14:24:03 +08:00
Pavel Emelyanov
ae4c1c44ec snapshot: Get per-table snapshot size under snapshot lock
Walking per-table snapshot directory without lock is racy. There's
snapshot-ctl locking that's used to get db-wide snapshot details, it
should be used to get per-table snapshot details too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 10:05:51 +03:00
Pavel Emelyanov
186b36165e snapshot: Move per-table snap API to other snapshot endpoints
So that they are collected in one place and to facilitate next patch
that's going to use snapshot-ctl for per-table API too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-25 10:05:01 +03:00
Botond Dénes
572003c469 Merge 'Cleanup the way snapshot details are propagated via API' from Pavel Emelyanov
There's a database::get_snapshot_details() method that returns collection of all snapshots for all ks.cf out there and there are several *snapshot_details* aux structures around it. This PR keeps only one "details" and cleans up the way it propagates from database up to the respective API calls.

Closes scylladb/scylladb#18317

* github.com:scylladb/scylladb:
  snapshot_ctl: Brush up true_snapshots_size() internals
  snapshot_ctl: Remove unused details struct
  snapshot_ctl: No double recoding of details
  database,snapshots: Move database::snapshot_details into snapshot_ctl
  database,snapshots: Make database::get_snapshot_details() return map, not vector
  table,snapshots: Move table::snapshot_details into snapshot_ctl
2024-04-23 16:28:25 +03:00
Pavel Emelyanov
e8f10be12e snapshot_ctl: No double recoding of details
Currently database::get_snapshot_details() returns a collection of
snapshots. The snapshot_ctl converts this collection into similarly
looking one with slightly different structures inside. The resulting
collection is converted one more time on the API layer into another
similarly looking map.

This patch removes the intermediate conversion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-19 20:04:32 +03:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kamil Braun
eb9ba914a3 Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy
The problem this series solves is correctly ignoring DOWN nodes state
when replacing a node.

When a node is replaced and there are other nodes that are down, the
replacing node is told to ignore those DOWN nodes using the
`ignore_dead_nodes_for_replace` option.

Since the replacing node is bootstrapping it starts with an empty
system.peers table so it has no notion about any node state and it
learns about all other nodes via gossip shadow round done in
`storage_service::prepare_replacement_info`.

Normally, since the DOWN nodes to ignore already joined the ring, the
remaining node will have their endpoint state already in gossip, but if
the whole cluster was restarted while those DOWN nodes did not start,
the remaining nodes will only have a partial endpoint state from them,
which is loaded from system.peers.

Currently, the partial endpoint state contains only `HOST_ID` and
`TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`.

The first part of this series loads also `DC` and `RACK` from
system.peers to make them available to the replacing node as they are
crucial for building a correct replication map with network topology
replication strategy.

But still, without a `STATUS` those nodes are not considered as normal
token owners yet, and they do not go through handle_state_normal which
adds them to the topology and token_metadata.

The second part of this series uses the endpoint state retrieved in the
gossip shadow round to explicitly add the ignored nodes' state to
topology (including dc and rack) and token_metadata (tokens) in
`prepare_replacement_info`.  If there are more DOWN nodes that are not
explicitly ignored replace will fail (as it should).

Fixes scylladb/scylladb#15787

Closes scylladb/scylladb#15788

* github.com:scylladb/scylladb:
  storage_service: join_token_ring: load ignored nodes state if replacing
  storage_service: replacement_info: return ignore_nodes state
  locator: host_id_or_endpoint: keep value as variant
  gms: endpoint_state: add getters for host_id, dc_rack, and tokens
  storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
  gossiper: add_saved_endpoint: set dc and rack
  gossiper: add_saved_endpoint: fixup indentation
  gossiper: add_saved_endpoint: make host_id mandatory
  gossiper: add load_endpoint_state
  gossiper: start_gossiping: log local state
2024-04-16 10:27:36 +02:00
Pavel Emelyanov
05c4042511 api/lsa: Don't use database to perform invoke-on-all
The sharded<database> is used as a invoke_in_all() method provider,
there's no real need in database itself. Simple smp::invoke_on_all()
would work just as good.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18221
2024-04-16 07:12:40 +03:00
Pavel Emelyanov
f3edde7d2e api: Qualify callback commitlog* argument with const
There's a helper map-reducer that accepts a function to call on
commitlog. All callers accumulate statistics with it, so the commitlog
argument is const pointer.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18238
2024-04-16 07:02:31 +03:00
Pavel Emelyanov
8bad828208 api: Add method to delete replica from tablet
Copied from the add_replica counterpart

TODO: Generalize common parts of move_tablet and add_|del_tablet_replica

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Benny Halevy
7c2bd8dc34 locator: host_id_or_endpoint: keep value as variant
Rather than allowing to keep both
host_id and endpoint, keep only one of them
and provide resolve functions that use the
token_metadata to resolve the host_id into
an inet_address or vice verse.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:25:50 +03:00
Tomasz Grabiec
0c74c2c12f Merge 'Extend tablet_transition_kind::rebuild to rebuild tablet to new replica' from Pavel Emelyanov
When altering rf for a keyspace, all tablets in this ks will get more replicas. Part of this process is rebuilding tablets' onto new node(s). This PR extends the tablets transition code to support rebuilding of tablet on new replica.

fixes: #18030

Closes scylladb/scylladb#18082

* github.com:scylladb/scylladb:
  test: Check data presense as well
  test: Test how tablets are copied between nodes
  test: Add sanity test for tablet migration
  api: Add method to add replica to a tablet
  tablet: Make leaving replica optional
2024-04-05 12:51:10 +02:00
Pavel Emelyanov
2a98e95cd0 api: Coroutinize API get_snapshot_details handler
Now it's possible to understand what it does

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18190
2024-04-04 22:20:28 +03:00
Kefu Chai
64b8bb239f api/storage_service: throw if table is not found when move tablets
`database::find_column_family()` throws no_such_column_family
if an unknown ks.cf is fed to it. and we call into this function
without checking for the existence of ks.cf first. since
"/storage_service/tablets/move" is a public interface, we should
translate this error to a better http error.

in this change, we check for the existence of the given ks.cf, and
throw an exception so that it can be caught by seastar::httpd::routers,
and converted to an HTTP error.

Fixes #17198
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17217
2024-04-04 11:23:52 +03:00
Pavel Emelyanov
79ad760e95 api: Add method to add replica to a tablet
The new API submits rebuild transition with new replicas set to be old
(current) replicas plus the provided one. It looks and acts like the
move_tablet API call with several changes:

- lacks the "source" replica argument
- submits "rebuild" transition kind
- cross racks checks are not performed

The 'force' argument is inherited from move_tablet, but is unused now
and is left for future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:16 +03:00
Benny Halevy
1272d736c0 api: storage_service: upgrade_to_raft_topology: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:51 +03:00
Benny Halevy
31026ae27f api: storage_service: upgrade_to_raft_topology: add logging
Upgrading raft topology is an important api call
that should be logged.

When failed, it is also important to log the
exception to get better visibility into why
the call failed.

Indentation will be fixed in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-02 20:02:49 +03:00
Pavel Emelyanov
67c2a06493 api: Rename (un)set_server_load_sstable -> (un)set_server_column_family
The method sets up column family API, not load-sstables one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18022
2024-03-26 12:16:08 +02:00