Commit Graph

204 Commits

Author SHA1 Message Date
Botond Dénes
bb18a8152e api/error_injection: add getter for error_injection
Allow external code to obtain information about an error injection
point, including whether it is enabled, and importantly, what its
parameters are. Together with the `set_parameter()` added in the
previous patch, this allows tests to read out the values of internal
parameters, via a set_parameter() injection point.

(cherry picked from commit feea609e37)
2024-06-11 17:32:37 +00:00
Avi Kivity
52fe351c31 Merge 'Balance tablets within nodes (intra-node migration)' from Tomasz Grabiec
This is needed to avoid severe imbalance between shards which can
happen when some table grows and is split. The inter-node balance can
be equal, so inter-node migration cannot fix the imbalance. Also, if RF=N
then there is not even a possibility of moving tablets around to fix the imbalance.
The only way to bring the system to balance is to move tablets within the nodes.

The system is not prepared for intra-node migration currently. Request coordination
is host-based, while for intra-node migration it should be (also) shard-based.
The solution employed here is to keep the coordination between nodes as-is,
and for intra-node migration storage_proxy-level coordinator is not aware of
the migration (no pending host). The replica-side request handler will be a
second-level coordinator which routes requests to shards, similar to how
the first-level coordinator routes them to hosts.

Tablet sharder is adjusted to handle intra-migration where a tablet
can have two replicas on the same host. For reads, sharder uses the
read selector to resolve the conflict. For writes, the write selector
is used.

The old shard_of() API is kept to represent shard for reads, and new
method is introduced to query the shards for writing:
shard_for_writes(). All writers should be switched to that API, which
is not done in this patch yet.

The request handler on replica side acts as a second-level
coordinator, using sharder to determine routing to shards. A given
sharder has a scope of a single topology version, a single
effective_replication_map_ptr, which should be kept alive during
writes.

perf-simple-query test results show no signs of regression:

Command: perf-simple-query -c1 -m1G --write --tablets --duration=10

Before:

> 83294.81 tps ( 59.5 allocs/op,  14.3 tasks/op,   53725 insns/op,        0 errors)
> 87756.72 tps ( 59.5 allocs/op,  14.3 tasks/op,   54049 insns/op,        0 errors)
> 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> 86211.38 tps ( 59.7 allocs/op,  14.3 tasks/op,   54219 insns/op,        0 errors)
> 86559.89 tps ( 59.6 allocs/op,  14.3 tasks/op,   54188 insns/op,        0 errors)
> 86609.39 tps ( 59.6 allocs/op,  14.3 tasks/op,   54117 insns/op,        0 errors)
> 87464.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   54039 insns/op,        0 errors)
> 86185.43 tps ( 59.6 allocs/op,  14.3 tasks/op,   54169 insns/op,        0 errors)
> 86254.71 tps ( 59.6 allocs/op,  14.3 tasks/op,   54139 insns/op,        0 errors)
> 83395.35 tps ( 60.2 allocs/op,  14.4 tasks/op,   54693 insns/op,        0 errors)
>
> median 86428.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54208 insns/op,        0 errors)
> median absolute deviation: 243.04
> maximum: 87756.72
> minimum: 83294.81
>

After:

> 85523.06 tps ( 59.5 allocs/op,  14.3 tasks/op,   53872 insns/op,        0 errors)
> 89362.47 tps ( 59.6 allocs/op,  14.3 tasks/op,   54226 insns/op,        0 errors)
> 88167.55 tps ( 59.7 allocs/op,  14.3 tasks/op,   54400 insns/op,        0 errors)
> 87044.40 tps ( 59.7 allocs/op,  14.3 tasks/op,   54310 insns/op,        0 errors)
> 88344.50 tps ( 59.6 allocs/op,  14.3 tasks/op,   54289 insns/op,        0 errors)
> 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> 88725.46 tps ( 59.6 allocs/op,  14.3 tasks/op,   54230 insns/op,        0 errors)
> 88640.08 tps ( 59.6 allocs/op,  14.3 tasks/op,   54210 insns/op,        0 errors)
> 90306.31 tps ( 59.4 allocs/op,  14.3 tasks/op,   54043 insns/op,        0 errors)
> 87343.62 tps ( 59.8 allocs/op,  14.3 tasks/op,   54496 insns/op,        0 errors)
>
> median 88355.06 tps ( 59.6 allocs/op,  14.3 tasks/op,   54242 insns/op,        0 errors)
> median absolute deviation: 1007.41
> maximum: 90306.31
> minimum: 85523.06

Command (reads): perf-simple-query -c1 -m1G  --tablets --duration=10

Before:

> 95860.18 tps ( 63.1 allocs/op,  14.1 tasks/op,   42476 insns/op,        0 errors)
> 97537.69 tps ( 63.1 allocs/op,  14.1 tasks/op,   42454 insns/op,        0 errors)
> 97549.23 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97511.29 tps ( 63.1 allocs/op,  14.1 tasks/op,   42470 insns/op,        0 errors)
> 97227.32 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 94031.94 tps ( 63.1 allocs/op,  14.1 tasks/op,   42441 insns/op,        0 errors)
> 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> 96401.70 tps ( 63.1 allocs/op,  14.1 tasks/op,   42473 insns/op,        0 errors)
> 96573.77 tps ( 63.1 allocs/op,  14.1 tasks/op,   42440 insns/op,        0 errors)
> 96340.54 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
>
> median 96978.04 tps ( 63.1 allocs/op,  14.1 tasks/op,   42462 insns/op,        0 errors)
> median absolute deviation: 571.20
> maximum: 97549.23
> minimum: 94031.94
>

After:

> 99794.67 tps ( 63.1 allocs/op,  14.1 tasks/op,   42471 insns/op,        0 errors)
> 101244.99 tps ( 63.1 allocs/op,  14.1 tasks/op,   42472 insns/op,        0 errors)
> 101128.37 tps ( 63.1 allocs/op,  14.1 tasks/op,   42485 insns/op,        0 errors)
> 101065.27 tps ( 63.1 allocs/op,  14.1 tasks/op,   42465 insns/op,        0 errors)
> 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> 101413.31 tps ( 63.1 allocs/op,  14.1 tasks/op,   42463 insns/op,        0 errors)
> 101464.92 tps ( 63.1 allocs/op,  14.1 tasks/op,   42466 insns/op,        0 errors)
> 101086.74 tps ( 63.1 allocs/op,  14.1 tasks/op,   42488 insns/op,        0 errors)
> 101559.09 tps ( 63.1 allocs/op,  14.1 tasks/op,   42468 insns/op,        0 errors)
> 100742.58 tps ( 63.1 allocs/op,  14.1 tasks/op,   42491 insns/op,        0 errors)
>
> median 101212.98 tps ( 63.1 allocs/op,  14.1 tasks/op,   42456 insns/op,        0 errors)
> median absolute deviation: 200.33
> maximum: 101559.09
> minimum: 99794.67
>

Fixes #16594

Closes scylladb/scylladb#18026

* github.com:scylladb/scylladb:
  Implement fast streaming for intra-node migration
  test: tablets_test: Test sharding during intra-node migration
  test: tablets_test: Check sharding also on the pending host
  test: py: tablets: Test writes concurrent with migration
  test: py: tablets: Test crash during intra-node migration
  api, storage_service: Introduce API to wait for topology to quiesce
  dht, replica: Remove deprecated sharder APIs
  test: Avoid using deprecated sharded API
  db: do_apply_many() avoid deprecated sharded API
  replica: mutation_dump: Avoid deprecated sharder API
  repair: Avoid deprecated sharder API
  table: Remove optimization which returns empty reader when key is not owned by the shard
  dht: is_single_shard: Avoid deprecated sharder API
  dht: split_range_to_single_shard: Work with static_sharder only
  dht: ring_position_range_sharder: Avoid deprecated sharder APIs
  dht: token: Avoid use of deprecated sharder API by switching to static_sharder
  selective_token_sharder: Avoid use of deprecated sharder API
  docs: Document tablet sharding vs tablet replica placement
  readers/multishard.cc: use shard_for_reads() instead of shard_of()
  multishard_mutation_query.cc: use shard_for_reads() instead of shard_of()
  storage_proxy: Extract common code to apply mutations on many shards according to sharder
  storage_proxy: Prepare per-partition rate-limiting for intra-node migration
  storage_proxy: Avoid shard_of() use in mutate_counter_on_leader_and_replicate()
  storage_proxy: Prepare mutate_hint() for intra-node tablet migration
  commitlog_replayer: Avoid deprecated sharder::shard_of()
  lwt: Avoid deprecated sharder::shard_of()
  compaction: Avoid deprecated sharder::shard_of()
  dht: Extract dht::static_sharder
  replica: Deprecate table::shard_of()
  locator: Deprecate effective_replication_map::shard_of()
  dht: Deprecate old sharder API: shard_of/next_shard/token_for_next_shard
  tests: tablets: py: Add intra-node migration test
  tests: tablets: Test that drained nodes are not balanced internally
  tests: tablets: Add checks of replica set validity to test_load_balancing_with_random_load
  tests: tablets: Verify that disabling balancing results in no intra-node migrations
  tests: tablets: Check that nodes are internally balanced
  tests: tablets: Improve debuggability by showing which rows are missing
  tablets, storage_service: Support intra-node migration in move_tablet() API
  tablet_allocator: Generate intra-node migration plan
  tablet_allocator: Extract make_internode_plan()
  tablet_allocator: Maintain candidate list and shard tablet count for target nodes
  tablet_allocator: Lift apply_load/can_accept_load lambdas to member functions
  tablets, streaming: Implement tablet streaming for intra-node migration
  dht, auto_refreshing_sharder: Allow overriding write selector
  multishard_writer: Handle intra-node migration
  storage_proxy: Handle intra-node tablet migration for writes
  tablets: Get rid of tablet_map::get_shard()
  tablets: Avoid tablet_map::get_shard in cleanup
  tablets: test: Use sharder instead of tablet_map::get_shard()
  tablets: tablet_sharder: Allow working with non-local host
  sharding: Prepare for intra-node-migration
  docs: Document sharder use for tablets
  tablets: Introduce tablet transition kind for intra-node migration
  tests: tablets: Fix use-after-move of skiplist in rebalance_tablets()
  sstables, gdb: Track readers in a linked list
  raft topology: Fix global token metadata barrier to not fence ahead of what is drained
2024-05-20 16:13:01 +03:00
Tomasz Grabiec
7956a2991e api, storage_service: Introduce API to wait for topology to quiesce 2024-05-16 00:28:47 +02:00
Botond Dénes
fd25bb6f9f api/storage_service: add tablet support for /storage_service/tokens_endpoint
Add a keyspace and cf parameter. When specified, the endpoint will
return token -> primary replica mapping for the table's tablet tokens,
not the vnodes.
2024-05-13 07:09:20 -04:00
Pavel Emelyanov
8bad828208 api: Add method to delete replica from tablet
Copied from the add_replica counterpart

TODO: Generalize common parts of move_tablet and add_|del_tablet_replica

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Pavel Emelyanov
79ad760e95 api: Add method to add replica to a tablet
The new API submits rebuild transition with new replicas set to be old
(current) replicas plus the provided one. It looks and acts like the
move_tablet API call with several changes:

- lacks the "source" replica argument
- submits "rebuild" transition kind
- cross racks checks are not performed

The 'force' argument is inherited from move_tablet, but is unused now
and is left for future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:22:16 +03:00
Avi Kivity
72bbe75d5b Merge 'Fix node replace with tablets for RF=N' from Tomasz Grabiec
This PR fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because tablet replica allocation for
rebuild will not be able to find a viable destination, as the replacing node
is not considered to be a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is in left state and replacing node is in normal state.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes https://github.com/scylladb/scylladb/issues/17025

Nodes in the left state will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on their location. For example materialized views code which pairs base replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet replica sets.

Currently left nodes are never removed from topology, so will
accumulate in memory. We could garbage-collect them from topology
coordinator if a left node is absent in any replica set. That means we
need a new state - left_for_real.

Closes scylladb/scylladb#17388

* github.com:scylladb/scylladb:
  test: py: Add test for view replica pairing after replace
  raft, api: Add RESTful API to query current leader of a raft group
  test: test_tablets_removenode: Verify replacing when there is no spare node
  doc: topology-on-raft: Document replace behavior with tablets
  tablets, raft topology: Rebuild tablets after replacing node is normal
  tablets: load_balancer: Access node attributes via node struct
  tablets: load_balancer: Extract ensure_node()
  mv: Switch to using host_id-based replica set
  effective_replication_map: Introduce host_id-based get_replicas()
  raft topology: Keep nodes in the left state to topology
  tablets: Introduce read_required_hosts()
2024-03-18 16:16:08 +02:00
Tomasz Grabiec
6d50e93f10 raft, api: Add RESTful API to query current leader of a raft group
Example:

  $ curl -X GET "http://127.0.0.1:10000/raft/leader_host"
  "f7f57588-62de-4cac-9e4b-c62bfc458d91"

Accepts optional group_id param, defaults to group0.
2024-03-15 13:20:08 +01:00
Benny Halevy
530d270828 api: /storage_service/tablets/balancing: fix incorrect operation summary
It was probably copy-pasted from /storage_service/tablets/move

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17811
2024-03-14 22:52:57 +01:00
Patryk Wrobel
51da80da7d storage_service/ownership/{keyspace}: handle requests when tablets are enabled
Before this change, when user tried to utilize
'storage_service/ownership/{keyspace}' API with
keyspace parameter that uses tablets, then internal
error was thrown. The code was calling a function,
that is intended for vnodes: get_vnode_effective_replication_map().

This commit introduces graceful handling of such scenario and
extends the API to allow passing 'cf' parameter that denotes
table name.

Now, when keyspace uses tablets and cf parameter is not passed
a descriptive error message is returned via BAD_REQUEST.
Users cannot query ownership for keyspace that uses tablets,
but they can query ownership for a table in a given keyspace that uses tablets.

Also, new tests have been added to test/rest_api/test_storage_service.py and
to test/topology_experimental_raft/test_tablets.py in order to verify the behavior
with and without tablets enabled.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:52:23 +01:00
Botond Dénes
050c6dcad7 api: storage_service/keyspaces: add replication filter
To allow to filter the returned keyspaces based by the replication they
use: tablets or vnodes.
The filter can be disabled by omitting the parameter or passing "all".
The default is "all".

Fixes: #16509

Closes scylladb/scylladb#17319
2024-02-20 09:04:41 +01:00
Patryk Wrobel
3842bf18a7 storage_service/range_to_endpoint_map: allow API to properly handle tablets
This API endpoint was failing when tablets were enabled
because of usage of get_vnode_effective_replication_map().
Moreover, it was providing an error message that was not
user-friendly.

This change extends the handler to properly service the incoming requests.
Furthermore, it introduces two new test cases that verify the behavior of
storage_service/range_to_endpoint_map API. It also adjusts the test case
of this endpoint for vnodes to succeed when tablets are enabled by default.

The new logic is as follows:
 - when tablets are disabled then users may query endpoints
   for a keyspace or for a given table in a keyspace
 - when tablets are enabled then users have to provide
   table name, because effective replication map is per-table

When user does not provide table name when tablets are enabled
for a given keyspace, then BAD_REQUEST is returned with a
meaningful error message.

Fixes: scylladb#17343

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17372
2024-02-18 19:21:53 +02:00
Kamil Braun
e9e24f47ec Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski
This PR implements a procedure that upgrades existing clusters to use
raft-based topology operations. The procedure does not start
automatically, it must be triggered manually by the administrator after
making sure that no topology operations are currently running.

Upgrade is triggered by sending `POST
/storage_service/raft_topology/upgrade` request. This causes the
topology coordinator to start who drives the rest of the process: it
builds the `system.topology` state based on information observed in
gossip and tells all nodes to switch to raft mode. Then, topology
coordinator runs normally.

Upgrade progress is tracked in a new static column `upgrade_state` in
`system.topology`.

The procedure also serves as an extension to the current recovery
procedure on raft. The current recovery procedure requires restarting
nodes in a special mode which disables raft, perform `nodetool
removenode` on the dead nodes, clean up some state on the nodes and
restart them so that they automatically rebuild the group 0. Raft
topology fits into existing procedure by falling back to legacy topology
operations after disabling raft. After rebuilding the group 0, upgrade
needs to be triggered again.

Because upgrade is manual and it might not be convenient for
administrators to run it right after upgrading the cluster, we allow the
cluster to operate in legacy topology operations mode until upgrade,
which includes allowing new nodes to join. In order to allow it, nodes
now ask the cluster about the mode they should use to join before
proceeding by using a new `JOIN_NODE_QUERY` RPC.

The procedure is explained in more detail in `topology-over-raft.md`.

Fixes: https://github.com/scylladb/scylladb/issues/15008

Closes scylladb/scylladb#17077

* github.com:scylladb/scylladb:
  test/topology_custom: upgrade/recovery tests for topology on raft
  cdc/generation_service: in legacy mode, fall back to raft tables
  system_keyspace: add read_cdc_generation_opt
  cdc/generation_service: turn off gossip notifications in raft topo mode
  cql_test_env: move raft_topology_change_enabled var earlier
  group0_state_machine: pull snapshot after raft topology feature enabled
  storage_service: disable persistent feature enabler on upgrade
  storage_service: replicate raft features to system.peers
  storage_service: gossip tokens and cdc generation in raft topology mode
  API: add api for triggering and monitoring topology-on-raft upgrade
  storage_service: infer which topology operations to use on startup
  storage_service: set the topology kind value based on group 0 state
  raft_group0: expose link to the upgrade doc in the header
  feature_service: fall back to checking legacy features on startup
  storage_service: add fiber for tracking the topology upgrade progress
  gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
  topology_coordinator: implement core upgrade logic
  topology_coordinator: extract top-level error handling logic
  storage_service: initialize discovery leader's state earlier
  topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data
  topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data
  topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data
  topology_state_machine: introduce upgrade_state
  storage_service: disallow topology ops when upgrade is in progress
  raft_group0_client: add in_recovery method
  storage_service: introduce join_node_query verb
  raft_group0: make discover_group0 public
  raft_group0: filter current node's IP in discover_group0
  raft_group0: remove my_id arg from discover_group0
  storage_service: make _raft_topology_change_enabled more advanced
  docs: document raft topology upgrade and recovery
2024-02-09 11:54:53 +01:00
Kefu Chai
c1c96bbc16 api/storage_service: drop /storage_service/describe_ring/ API
per its description, "`/storage_service/describe_ring/`" returns the
token ranges of an arbitrary keyspace. actually, it returns the
first keyspace which is of non-local-vnode-based-strategy. this API
is not used by nodetool, neither is it exercised in dtest.
scylla-manager has a wrapper for this API though, but that wrapper
is not used anywhere.

in this change, this API is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17197
2024-02-09 12:49:21 +02:00
Piotr Dulikowski
a672383c2a API: add api for triggering and monitoring topology-on-raft upgrade
Implements the /storage_service/raft_topology/upgrade route. The route
supports two methods: POST, which triggers the cluster-wide upgrade to
topology-on-raft, and GET which reports the status of the upgrade.
2024-02-08 19:12:28 +01:00
Botond Dénes
35da9551fb Merge 'storage_service: Add describe_ring support for tablet table' from Asias He
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509

Closes scylladb/scylladb#17118

* github.com:scylladb/scylladb:
  tablets: Convert to use the new version of for_each_tablet
  storage_service: Add describe_ring support for tablet table
  storage_service: Mark host2ip as const
  tablets: Add for_each_tablet_gently
2024-02-07 10:41:36 +02:00
Asias He
04773bd1df storage_service: Add describe_ring support for tablet table
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509
2024-02-05 18:11:07 +08:00
Aleksandra Martyniuk
89c683f51a api: service: add force param to move_tablet api
Force flag is added to /storage_service/tablets/move. If force is set
to true, replication strategy constraints regarding racks and dcs can
be broken.
2024-02-02 19:08:01 +01:00
Botond Dénes
1a0300dba6 Merge 'compaction_manager: flush tables before cleanup' from Kefu Chai
according to the document "nodetool cleanup"

> Triggers removal of data that the node no longer owns

currently, scylla performs cleanup by rewriting the sstables. but
commitlog segments may still contain the mutations to the tables
which are dropped during sstable rewriting. when scylla server
restarts, the dirty mutations are replayed to the memtable. if
any of these dirty mutations changes the tables cleaned up. the
stale data are reapplied. this would lead to data resurrection.

so, in this change we following the same model of major compaction
where we

1. forcing new active segment,
2. flushing tables being cleaned up
3. perform cleanup using compaction

Fixes #4734

Closes scylladb/scylladb#16757

* github.com:scylladb/scylladb:
  storage_service: fall back to local cleanup in cleanup_all
  compaction: format flush_mode without the helper
  compaction_manager: flush all tables before cleanup
  replica: table: pass do_flush to table::perform_cleanup_compaction()
  api, compaction: promote flush_mode
2024-02-01 13:47:45 +02:00
Kefu Chai
4ec104e086 api: storage_service: correct a typo
s/a any keyspace/a given keyspace/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17098
2024-02-01 10:55:58 +02:00
Kefu Chai
5e0b3671d3 storage_service: fall back to local cleanup in cleanup_all
before this change, if no keyspaces are specified,
scylla-nodetool just enumerate all non-local keyspaces, and
call "/storage_service/keyspace_cleanup" on them one after another.
this is not quite efficient, as each this RESTful API call
force a new active commitlog segment, and flushes all tables.
so, if the target node of this command has N non-local keyspaces,
it would repeat the steps above for N times. this is not necessary.
and after a topology change, we would like to run a global
"nodetool cleanup" without specifying the keyspace, so this
is a typical use case which we do care about.

to address this performance issue, in this change, we improve
an existing RESTful API call "/storage_service/cleanup_all", so
if the topology coordinator is not enabled, we fall back to
a local cleanup to cleanup all non-local keyspaces.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-02-01 11:25:53 +08:00
Botond Dénes
d202d32f81 Merge 'Add an API to trigger snapshot in Raft servers' from Kamil Braun
This allows the user of `raft::server` to cause it to create a snapshot
and truncate the Raft log (leaving no trailing entries; in the future we
may extend the API to specify number of trailing entries left if
needed). In a later commit we'll add a REST endpoint to Scylla to
trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).

The PR adds the API to `raft::server` and a HTTP endpoint that uses it.

In a follow-up PR, we plan to modify group 0 server startup logic to automatically
call this API if it sees that no snapshot is present yet (to automatically
fix the aforementioned 5.2 deployments once they upgrade.)

Closes scylladb/scylladb#16816

* github.com:scylladb/scylladb:
  raft: remove `empty()` from `fsm_output`
  test: add test for manual triggering of Raft snapshots
  api: add HTTP endpoint to trigger Raft snapshots
  raft: server: add `trigger_snapshot` API
  raft: server: track last persisted snapshot descriptor index
  raft: server: framework for handling server requests
  raft: server: inline `poll_fsm_output`
  raft: server: fix indentation
  raft: server: move `io_fiber`'s processing of `batch` to a separate function
  raft: move `poll_output()` from `fsm` to `server`
  raft: move `_sm_events` from `fsm` to `server`
  raft: fsm: remove constructor used only in tests
  raft: fsm: move trace message from `poll_output` to `has_output`
  raft: fsm: extract `has_output()`
  raft: pass `max_trailing_entries` through `fsm_output` to `store_snapshot_descriptor`
  raft: server: pass `*_aborted` to `set_exception` call
2024-01-29 15:06:04 +02:00
Kamil Braun
617e09137d api: add HTTP endpoint to trigger Raft snapshots
This uses the `trigger_snapshot()` API added in previous commit on a
server running for the given Raft group.

It can be used for example in tests or in the context of disaster
recovery (ref scylladb/scylladb#16683).
2024-01-23 16:48:28 +01:00
Kefu Chai
0dbb0ed09f api: storage_service: correct a typo
s/trough/through/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16870
2024-01-19 10:21:41 +02:00
Gleb Natapov
97ab3f6622 storage_service: topology_coordinator: introduce cleanup REST API integrated with the topology coordinator
Introduce new REST API "/storage_service/cleanup_all"
that, when triggered, instructs the topology coordinator to initiate
cluster wide cleanup on all dirty nodes. It is done by introducing new
global command "global_topology_request::cleanup".
2024-01-14 15:45:53 +02:00
Kefu Chai
8c4576f55d api: storage_service: correct the descriptions of two APIs
this change is more about documentation of the RESTful API of
storage_service. as we define the API using Swagger 2.0 format, and
generate the API document from the definitions. so would be great
if the document matches with the API.

in this change, since the keyspace is not queried but mutated. so
changed to a more accurate description.

from the code perspective, it is but cosmetic. as we don't read the
description fields or verify them in our tests.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16637
2024-01-11 08:28:14 +02:00
Eliran Sinvani
4c60804c4c rest api: Add an api for profile dumping
As part of code coverage support we need to work with dumped profiles
for ScyllaDB executables.
Those profiles are created on two occasions:
1. When an application exits notmaly (which will trigger
   __llvm_dump_profile registered in the exit hooks.
2. For ScyllaDB commit d7b524cf10 introduced a manual call to
   __llvm_dump_profile upon receiving a SIGTERM signal.

This commit adds a third option, a rest API to dump the profile.
In addition the target file is logged and the counters are reset, which
enables incremental dumping of the profile.
Except for logging, if the executable is not instrumented, this API call
becomes a no-op so it bears minimal risk in keeping it in our releases.
Specifically for code coverage, the gain will be that we will not be
required to change the entire test run to shut down clusters gracefully
and this will cause minimal effect to the actual test behavior.

The change was tested by manually triggering the API in with and
without instrumentation as well as re triggering it with write
permissions for the profile file disabled (to test fault tolerance).

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-12-27 07:06:54 +02:00
Aleksandra Martyniuk
b485897704 api: tasks: api for starting async compaction
For all compaction types which can be started with api, add an asynchronous
version of api, which returns task_id of the corresponding task manager
task. With the task_id a user can check task status, abort, or wait for it,
using task manager api.
2023-12-11 11:39:33 +01:00
Avi Kivity
9c0f05efa1 Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.

This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.

The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.

The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.

This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.

Closes scylladb/scylladb#15847

* github.com:scylladb/scylladb:
  test: tablets: Add test for failed streaming being fenced away
  error_injection: Introduce poll_for_message()
  error_injection: Make is_enabled() public
  api: Add API to kill connection to a particular host
  range_streamer: Do not block topology change barriers around streaming
  range_streamer, tablets: Do not keep token metadata around streaming
  tablets: Fail gracefully when migrating tablet has no pending replica
  storage_service, api: Add API to disable tablet balancing
  storage_service, api: Add API to migrate a tablet
  storage_service, raft topology: Run streaming under session topology guard
  storage_service, tablets: Use session to guard tablet streaming
  tablets: Add per-tablet session id field to tablet metadata
  service: range_streamer: Propagate topology_guard to receivers
  streaming: Always close the rpc::sink
  storage_service: Introduce concept of a topology_guard
  storage_service: Introduce session concept
  tablets: Fix topology_metadata_guard holding on to the old erm
  docs: Document the topology_guard mechanism
2023-12-07 16:29:02 +02:00
Tomasz Grabiec
733eb21601 api: Add API to kill connection to a particular host
For testing failure scenarios.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
d1c1b59236 storage_service, api: Add API to disable tablet balancing
Load balancing needs to be disabled before making a series of manual
migrations so that we don't fight with the load balancer.

Also will be used in tests to ensure tablets stick to expected locations.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
1f57d1ea28 storage_service, api: Add API to migrate a tablet
Will be used in tests, or for hot fixes in production.
2023-12-06 18:36:17 +01:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Wojciech Mitros
a8c9451fb2 commitlog: add max disk size api
Currently, the max size of commitlog is obtained either from the
config parameter commitlog_total_space_in_mb or, when the parameter
is -1, from the total memory allocated for Scylla.
To facilitate testing of the behavior of commitlog hard limit,
expose the value of commitlog max_disk_size in a dedicated API.

Closes scylladb/scylladb#16020
2023-12-03 17:16:58 +02:00
Benny Halevy
b12b142232 api: add /storage_service/compact
For major compacting all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool compact` translates to
a sequence of `/storage_service/keyspace_compaction` calls).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1b576f358b api: add /storage_service/flush
For flushing all tables in the database.
The advantage of this api is that `commitlog->force_new_active_segment`
happens only once in `database::flush_all_tables` rather than
once per keyspace (when `nodetool flush` translates to
a sequence of `/storage_service/keyspace_flush` calls).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Benny Halevy
1fd85bd37b api: compaction: add flush_memtables option
When flushing is done externally, e.g. by running
`nodetool flush` prior to `nodetool compact`,
flush_memtables=false can be passed to skip flushing
of tables right before they are major-compacted.

This is useful to prevent creation of small sstables
due to excessive memtable flushing.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-11-28 16:37:42 +02:00
Asias He
c605220bb3 repair: Introduce small table optimization
*) Problem:

We have seen in the field it takes longer than expected to repair system tables
like system_auth which has a tiny amount of data but is replicated to all nodes
in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The
main reason for the slowness is that even if the amount of data is small,
repair has to walk though all the token ranges, that is num_tokens *
number_of_nodes_in_the_cluster. The overhead of the repair protocol for each
token range dominates due to the small amount of data per token range. Another
reason is the high network latency between DCs makes the RPC calls used to
repair consume more time.

*) Solution:

To solve this problem, a small table optimization for repair is introduced in
this patch. A new repair option is added to turn on this optimization.

- No token range to repair is needed by the user. It  will repair all token
ranges automatically.

- Users only need to send the repair rest api to one of the nodes in the
cluster. It can be any of the nodes in the cluster.

- It does not require the RF to be configured to replicate to all nodes in the
cluster. This means it can work with any tables as long as the amount of data
is low, e.g., less than 100MiB per node.

*) Performance:

1)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Before:
```
repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1537, round_nr=4612,
round_nr_fast_path_already_synced=4611,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds,
tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0},
{127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2,
0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848},
{127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0},
{127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s,
tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={}
```

After:
```
repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats:
repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes,
role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds,
tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178},
{127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}},
row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1},
{127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}},
row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286},
{127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286},
{127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1,
685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026},
{127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={},
rx_row_nr_peer={}
```

The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X

2)
3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2}

Same test as above except 5ms delay is added to simulate multiple dc
network latency:

The time to repair is reduced from 333s to 0.2s.

333.26758 s / 0.22625381s = 1472.98

3)

3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3}
, 10 ms network latency

Before:

```
repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=2305, round_nr=6916,
round_nr_fast_path_already_synced=6915,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1,
rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015
seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0},
{127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1},
{127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}},
row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2,
1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0},
{127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385},
{127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0},
{127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0},
{127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={}
```

After:

```
repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats:
repair_reason=repair, keyspace=system_auth, tables={role_attributes,
role_members, roles}, ranges_nr=1, round_nr=4,
round_nr_fast_path_already_synced=4,
round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0,
rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915
seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0,
row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3,
178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178},
{127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}},
row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1},
{127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1},
{127.0.57.8, 1}, {127.0.57.9, 1}},
row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2,
0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793},
{127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7,
0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}}
MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634},
{127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4,
2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634},
{127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9,
2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
```

The time to repair is reduced from 986s (16 minutes) to 1.6s

*) Summary

So, a more than 1000X difference is observed for this common usage of
system table repair procedure.

Fixes #16011
Refs  #15159
2023-11-20 15:11:16 +08:00
Kamil Braun
992f1327d3 api: storage_service: add REST API to reload topology state
Some tests may want to modify system.topology table directly. Add a REST
API to reload the state into memory. An alternative would be restarting
the server, but that's slower and may have other side effects undesired
in the test.

The API can also be called outside tests, it should not have any
observable effects unless the user modifies `system.topology` table
directly (which they should never do, outside perhaps some disaster
recovery scenarios).
2023-09-28 11:59:16 +02:00
Tomasz Grabiec
c27d212f4b api, storage_service: Recalculate table digests on relocal_schema api call
Currently, the API call recalculates only per-node schema version. To
workaround issues like #4485 we want to recalculate per-table
digests. One way to do that is to restart the node, but that's slow
and has impact on availability.

Use like this:

  curl -X POST http://127.0.0.1:10000/storage_service/relocal_schema

Fixes #15380

Closes #15381
2023-09-13 18:27:57 +03:00
Aleksandra Martyniuk
cf37ab96f4 api: task_manager: fix indentation
Closes #15173
2023-09-02 08:18:59 +03:00
Aleksandra Martyniuk
5e31ca7d20 tasks: api: show tasks' scopes
To make manual analysis of task manager tasks easier, task_status
and task_stats contain operation scope (e.g. shard, table).

Closes #15172
2023-08-29 11:32:16 +03:00
Benny Halevy
6e416b8ff2 api: storage_service: improve description of /storage_service/host_id
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-08-21 09:20:39 +03:00
Asias He
9b3fd9407b repair: Add ranges_parallelism option
This patch adds the ranges_parallelism option to repair restful API.

Users can use this option to optionally specify the number of ranges
to repair in parallel per repair job to a smaller number than the Scylla
core calculated default max_repair_ranges_in_parallel.

Scylla manager can also use this option to provide more ranges (>N) in
a single repair job but only repairing N ranges_parallelism in parallel,
instead of providing N ranges in a repair job.

To make it safer, unlike the PR #4848, this patch does not allow user to
exceed the max_repair_ranges_in_parallel.

Fixes #4847
2023-08-01 10:58:14 +08:00
Amnon Heiman
d694a42745 api: Add the metrics.json Swagger file
This patch adds the swagger definition for the metrics API.

Currently, the API defines a get and set of the metric_relabel_config.
2023-07-17 17:09:35 +03:00
Amnon Heiman
9e0ec3afba Preparing for V2 API from files
This patch changes the base path of the V2 of the API to be '/'.  That
means that the v2 prefix will be part of the path definition.
Currently, it only affect the config API that is created from code.

The motivation for the change is for Swagger definitions that are read
from a file.  Currently, when using the swagger-ui with a doc path set
to http://localhost:10000/v2 and reading the Swagger from a file swagger
ui will concatenate the path and look for
http://localhost:10000/v2/v2/{path}

Instead, the base path is now '/' and the /v2 prefix will be added by
each endpoint definition.

From the user perspective, there is no change in current functionality.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2023-07-17 17:09:35 +03:00
Mikołaj Grzebieluch
507f750754 api/error_injection: add passing injection's parameters to enable endpoint 2023-07-13 10:10:52 +02:00
Mikołaj Grzebieluch
8b1f5ba293 api/error_injection: add message_injection endpoint
Add an endpoint for sending empty messages to the injected code.
2023-07-06 12:34:53 +02:00
Gleb Natapov
05aa07835d storage_service: delete code that handled REMOVING_TOKENS state
The state is never advertised so the code is never used.
2023-05-25 14:48:09 +03:00
Kefu Chai
b112a3b78a api: storage_service: use string for generation
in this change, the type of the "generation" field of "sstable" in the
return value of RESTful API entry point at
"/storage_service/sstable_info" is changed from "long" to "string".

this change depends on the corresponding change on tools/jmx submodule,
so we have to include the submodule change in this very commit.

this API is used by our JMX exporter, which in turn exposes the
SSTable information via the "StorageService.getSSTableInfo" mBean
operation, which returns the retrieved SSTable info as a list of
CompositeData. and "generation" is a field of an element in the
CompositeData. in general, the scylla JMX exporter is consumed
by the nodetool, which prints out returned SSTable info list with
a pretty formatted table, see
tools/java/src/java/org/apache/cassandra/tools/nodetool/SSTableInfo.java.
the nodetool's formatter is not aware of the schema or type of the
SSTables to be printed, neither does it enforce the type -- it just
tries it best to pretty print them as a tabular.

But the fields in CompositeData is typed, when the scylla JMX exporter
translates the returned SSTables from the RESTful API, it sets the
typed fields of every `SSTableInfo` when constructing `PerTableSSTableInfo`.
So, we should be consistent on the type of "generation" field on both
the JMX and the RESTful API sides. because we package the same version
of scylla-jmx and nodetool in the same precompiled tarball, and enforce
the dependencies on exactly same version when shipping deb and rpm
packages, we should be safe when it comes to interoperability of
scylla-jmx and scylla. also, as explained above, nodetool does not care
about the typing, so it is not a problem on nodetool's front.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13834
2023-05-15 20:33:48 +03:00